Further testing for preserving tensor context with operations #503

AtomicCactus · 2023-05-24T05:17:17Z

Describe the bug

Decomposing a 2D tensor along both modes, while specifying two ranks results in an error because internally there's a tensor created on the CPU as part of the process, which cannot be concatenated with the GPU.

Works fine when the rank is specified as an integer, but not as a list:

rank=16 works
rank=[16,16] crashes

Works fine on the CPU, but performance is not the same.

Steps or Code to Reproduce

import torch
import tensorly as tl
tl.set_backend('pytorch')

x = torch.randn((72, 27)).cuda()
result = tl.decomposition.partial_tucker(x, rank=[16, 32], modes=[0, 1])

Expected behavior

Tucker decomposition should not fail when ranks are provided as an array of values.

Actual result

Traceback (most recent call last):
  File "/home/yuri/Source/tensorly_bug.py", line 6, in <module>
    result = tl.decomposition.partial_tucker(x, rank=[16, 32], modes=[0, 1])
  File "/home/yuri/.local/lib/python3.10/site-packages/tensorly/decomposition/_tucker.py", line 166, in partial_tucker
    core, factors = initialize_tucker(
  File "/home/yuri/.local/lib/python3.10/site-packages/tensorly/decomposition/_tucker.py", line 64, in initialize_tucker
    U, _, _ = svd_interface(
  File "/home/yuri/.local/lib/python3.10/site-packages/tensorly/tenalg/svd.py", line 426, in svd_interface
    U, V = svd_flip(U, V, u_based_decision=u_based_flip_sign)
  File "/home/yuri/.local/lib/python3.10/site-packages/tensorly/tenalg/svd.py", line 42, in svd_flip
    signs = tl.concatenate((signs, tl.ones(tl.shape(V)[0] - tl.shape(U)[1])))
  File "/home/yuri/.local/lib/python3.10/site-packages/tensorly/backend/__init__.py", line 206, in wrapped_backend_method
    return getattr(
  File "/home/yuri/.local/lib/python3.10/site-packages/tensorly/backend/pytorch_backend.py", line 153, in concatenate
    return torch.cat(tensors, dim=axis)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)

Versions

Linux-5.19.0-41-generic-x86_64-with-glibc2.35
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]
NumPy 1.22.4
SciPy 1.7.3
TensorLy 0.8.1

The text was updated successfully, but these errors were encountered:

AtomicCactus · 2023-05-24T05:22:31Z

I think we just need to make sure that on line #42 in svd.py the tl.ones(tl.shape(V)[0] - tl.shape(U)[1]) tensor is moved to the same device as the signs tensor:

JeanKossaifi · 2023-05-25T17:15:47Z

Good catch thanks @AtomicCactus! Could you open a small PR to fix the issue?
Perhaps we could test this kind of issue, at least by changing the dtype of the input tensor in the future.

AtomicCactus · 2023-05-25T19:35:52Z

Sure thing! #504
I didn't add a unit test for this, since I'm not sure if any CI/CD pipeline or other system that runs those has a GPU.

JeanKossaifi · 2023-05-26T00:48:48Z

Thanks @AtomicCactus! I reviewed the PR -- for test we could look at dtype rather than device. Our current CI pipeline doesn't have GPU support.

JeanKossaifi added the bug label May 25, 2023

AtomicCactus mentioned this issue May 25, 2023

Fixed svd_flip() when used on GPU w/ PyTorch backend #504

Merged

aarmey changed the title ~~Partial Tucker with PyTorch backend fails on GPU~~ Further testing for preserving tensor context with operations Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further testing for preserving tensor context with operations #503

Further testing for preserving tensor context with operations #503

AtomicCactus commented May 24, 2023 •

edited

AtomicCactus commented May 24, 2023

JeanKossaifi commented May 25, 2023

AtomicCactus commented May 25, 2023

JeanKossaifi commented May 26, 2023

Further testing for preserving tensor context with operations #503

Further testing for preserving tensor context with operations #503

Comments

AtomicCactus commented May 24, 2023 • edited

Describe the bug

Steps or Code to Reproduce

Expected behavior

Actual result

Versions

AtomicCactus commented May 24, 2023

JeanKossaifi commented May 25, 2023

AtomicCactus commented May 25, 2023

JeanKossaifi commented May 26, 2023

AtomicCactus commented May 24, 2023 •

edited