Error on subindice Embedding type for Torch Tensor moving from GPU to CPU. #1830

vincetrep · 2023-11-06T19:44:17Z

Initial Checks

I have read and followed the docs and still think this is a bug

Description

I have found an issue with the following sequence:

Object
SubIndiceObject with an attribute Optional[AnyEmbedding]

I am running an operation on GPU and storing the TorchEmbedding on that attribute

Then I am converting the value back into a NDArrayEmbedding on CPU and storing in the same attribute.

When indexing the data into weaviate, it still tracks that attribute as a TorchTensor on GPU and tries to convert to a NDArrayEmbedding.

In order to fix it, I ran the cpu() operation prior to saving that tensor on the attribute.

I don't know if it is a bug or by design but logging the scenario nonetheless.

Example Code

No response

Python, DocArray & OS Version

docarray version:  0.39.1

Affected Components

The text was updated successfully, but these errors were encountered:

JoanFM · 2023-11-06T20:11:33Z

Hey @vincetrep ,

do u have a minimal reproducible code that we can use to debug?

JohannesMessner · 2023-11-08T14:55:20Z

I also cannot reproduce the issue, the following executes without problem:

from docarray.index.backends.weaviate import EmbeddedOptions
from docarray.index.backends.weaviate import WeaviateDocumentIndex
from typing import Optional
from docarray.typing import AnyEmbedding, NdArrayEmbedding
from pydantic import Field, parse_obj_as
from docarray import DocList, BaseDoc
import numpy as np
import torch

class SubDoc(BaseDoc):
    embedding: Optional[AnyEmbedding] = Field(is_embedding=True, n_dim=128)


class MyDoc(BaseDoc):
    text: str 
    subdocs: DocList[SubDoc]


dbconfig = WeaviateDocumentIndex.DBConfig(embedded_options=EmbeddedOptions())
doc_index = WeaviateDocumentIndex[MyDoc](db_config=dbconfig)

# creating tensors on GPU
gpu_tensors = [torch.rand(128).to('cuda:2') for i in range(3)]
assert all(t.is_cuda for t in gpu_tensors)
# storing them in DocList
data = DocList[MyDoc](
    [
        MyDoc(text='hello world', subdocs=[SubDoc(embedding=t)]) for t in gpu_tensors
    ]
)
# converting to NdArray on CPU
for doc in data:
    doc.subdocs[0].embedding = parse_obj_as(NdArrayEmbedding, doc.subdocs[0].embedding)
    assert isinstance(doc.subdocs[0].embedding, np.ndarray)  # not on GPU
# insert into index
doc_index.index(data)

root_docs, sub_docs, scores = doc_index.find_subindex(np.random.rand(128), subindex='subdocs', limit=3)

print(scores)

Sumant02 · 2023-12-23T14:51:35Z

`from docarray.index.backends.weaviate import WeaviateDocumentIndex
doc_index = WeaviateDocumentIndexMyDoc

from docarray.index.backends.weaviate import EmbeddedOptions
from docarray.index.backends.weaviate import WeaviateDocumentIndex
from typing import Optional
from docarray.typing import AnyEmbedding, NdArrayEmbedding
from pydantic import Field, parse_obj_as
from docarray import DocList, BaseDoc
import numpy as np
import torch

class SubDoc(BaseDoc):
embedding: Optional[AnyEmbedding] = Field(is_embedding=True, n_dim=128)

class MyDoc(BaseDoc):
text: str
subdocs: DocList[SubDoc]

dbconfig = WeaviateDocumentIndex.DBConfig(embedded_options=EmbeddedOptions())
doc_index = WeaviateDocumentIndexMyDoc

creating tensors on GPU

gpu_tensors = [torch.rand(128).to('cuda:2') for i in range(3)]
assert all(t.is_cuda for t in gpu_tensors)

storing them in DocList

data = DocList[MyDoc](
[
MyDoc(text='hello world', subdocs=[SubDoc(embedding=t)]) for t in gpu_tensors
]
)

converting to NdArray on CPU

for doc in data:
doc.subdocs[0].embedding = parse_obj_as(NdArrayEmbedding, doc.subdocs[0].embedding)
assert isinstance(doc.subdocs[0].embedding, np.ndarray) # not on GPU

insert into index

doc_index.index(data)

root_docs, sub_docs, scores = doc_index.find_subindex(np.random.rand(128), subindex='subdocs', limit=3)
print(scores)`
@JohannesMessner JohannesMessner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on subindice Embedding type for Torch Tensor moving from GPU to CPU. #1830

Error on subindice Embedding type for Torch Tensor moving from GPU to CPU. #1830

vincetrep commented Nov 6, 2023

JoanFM commented Nov 6, 2023

JohannesMessner commented Nov 8, 2023

Sumant02 commented Dec 23, 2023 •

edited

Error on subindice Embedding type for Torch Tensor moving from GPU to CPU. #1830

Error on subindice Embedding type for Torch Tensor moving from GPU to CPU. #1830

Comments

vincetrep commented Nov 6, 2023

Initial Checks

Description

Example Code

Python, DocArray & OS Version

Affected Components

JoanFM commented Nov 6, 2023

JohannesMessner commented Nov 8, 2023

Sumant02 commented Dec 23, 2023 • edited

creating tensors on GPU

storing them in DocList

converting to NdArray on CPU

insert into index

Sumant02 commented Dec 23, 2023 •

edited