-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on subindice Embedding type for Torch Tensor moving from GPU to CPU. #1830
Comments
Hey @vincetrep , do u have a minimal reproducible code that we can use to debug? |
I also cannot reproduce the issue, the following executes without problem: from docarray.index.backends.weaviate import EmbeddedOptions
from docarray.index.backends.weaviate import WeaviateDocumentIndex
from typing import Optional
from docarray.typing import AnyEmbedding, NdArrayEmbedding
from pydantic import Field, parse_obj_as
from docarray import DocList, BaseDoc
import numpy as np
import torch
class SubDoc(BaseDoc):
embedding: Optional[AnyEmbedding] = Field(is_embedding=True, n_dim=128)
class MyDoc(BaseDoc):
text: str
subdocs: DocList[SubDoc]
dbconfig = WeaviateDocumentIndex.DBConfig(embedded_options=EmbeddedOptions())
doc_index = WeaviateDocumentIndex[MyDoc](db_config=dbconfig)
# creating tensors on GPU
gpu_tensors = [torch.rand(128).to('cuda:2') for i in range(3)]
assert all(t.is_cuda for t in gpu_tensors)
# storing them in DocList
data = DocList[MyDoc](
[
MyDoc(text='hello world', subdocs=[SubDoc(embedding=t)]) for t in gpu_tensors
]
)
# converting to NdArray on CPU
for doc in data:
doc.subdocs[0].embedding = parse_obj_as(NdArrayEmbedding, doc.subdocs[0].embedding)
assert isinstance(doc.subdocs[0].embedding, np.ndarray) # not on GPU
# insert into index
doc_index.index(data)
root_docs, sub_docs, scores = doc_index.find_subindex(np.random.rand(128), subindex='subdocs', limit=3)
print(scores) |
`from docarray.index.backends.weaviate import WeaviateDocumentIndex from docarray.index.backends.weaviate import EmbeddedOptions class SubDoc(BaseDoc): class MyDoc(BaseDoc): dbconfig = WeaviateDocumentIndex.DBConfig(embedded_options=EmbeddedOptions()) creating tensors on GPUgpu_tensors = [torch.rand(128).to('cuda:2') for i in range(3)] storing them in DocListdata = DocList[MyDoc]( converting to NdArray on CPUfor doc in data: insert into indexdoc_index.index(data) root_docs, sub_docs, scores = doc_index.find_subindex(np.random.rand(128), subindex='subdocs', limit=3) |
Initial Checks
Description
I have found an issue with the following sequence:
Object
SubIndiceObject with an attribute Optional[AnyEmbedding]
I am running an operation on GPU and storing the TorchEmbedding on that attribute
Then I am converting the value back into a NDArrayEmbedding on CPU and storing in the same attribute.
When indexing the data into weaviate, it still tracks that attribute as a TorchTensor on GPU and tries to convert to a NDArrayEmbedding.
In order to fix it, I ran the cpu() operation prior to saving that tensor on the attribute.
I don't know if it is a bug or by design but logging the scenario nonetheless.
Example Code
No response
Python, DocArray & OS Version
Affected Components
The text was updated successfully, but these errors were encountered: