Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on subindice Embedding type for Torch Tensor moving from GPU to CPU. #1830

Open
2 of 6 tasks
vincetrep opened this issue Nov 6, 2023 · 3 comments
Open
2 of 6 tasks

Comments

@vincetrep
Copy link

Initial Checks

  • I have read and followed the docs and still think this is a bug

Description

I have found an issue with the following sequence:

Object
SubIndiceObject with an attribute Optional[AnyEmbedding]

I am running an operation on GPU and storing the TorchEmbedding on that attribute

Then I am converting the value back into a NDArrayEmbedding on CPU and storing in the same attribute.

When indexing the data into weaviate, it still tracks that attribute as a TorchTensor on GPU and tries to convert to a NDArrayEmbedding.

In order to fix it, I ran the cpu() operation prior to saving that tensor on the attribute.

I don't know if it is a bug or by design but logging the scenario nonetheless.

Example Code

No response

Python, DocArray & OS Version

docarray version:  0.39.1

Affected Components

@JoanFM
Copy link
Member

JoanFM commented Nov 6, 2023

Hey @vincetrep ,

do u have a minimal reproducible code that we can use to debug?

@JohannesMessner
Copy link
Member

I also cannot reproduce the issue, the following executes without problem:

from docarray.index.backends.weaviate import EmbeddedOptions
from docarray.index.backends.weaviate import WeaviateDocumentIndex
from typing import Optional
from docarray.typing import AnyEmbedding, NdArrayEmbedding
from pydantic import Field, parse_obj_as
from docarray import DocList, BaseDoc
import numpy as np
import torch

class SubDoc(BaseDoc):
    embedding: Optional[AnyEmbedding] = Field(is_embedding=True, n_dim=128)


class MyDoc(BaseDoc):
    text: str 
    subdocs: DocList[SubDoc]


dbconfig = WeaviateDocumentIndex.DBConfig(embedded_options=EmbeddedOptions())
doc_index = WeaviateDocumentIndex[MyDoc](db_config=dbconfig)

# creating tensors on GPU
gpu_tensors = [torch.rand(128).to('cuda:2') for i in range(3)]
assert all(t.is_cuda for t in gpu_tensors)
# storing them in DocList
data = DocList[MyDoc](
    [
        MyDoc(text='hello world', subdocs=[SubDoc(embedding=t)]) for t in gpu_tensors
    ]
)
# converting to NdArray on CPU
for doc in data:
    doc.subdocs[0].embedding = parse_obj_as(NdArrayEmbedding, doc.subdocs[0].embedding)
    assert isinstance(doc.subdocs[0].embedding, np.ndarray)  # not on GPU
# insert into index
doc_index.index(data)

root_docs, sub_docs, scores = doc_index.find_subindex(np.random.rand(128), subindex='subdocs', limit=3)

print(scores)

@Sumant02
Copy link

Sumant02 commented Dec 23, 2023

`from docarray.index.backends.weaviate import WeaviateDocumentIndex
doc_index = WeaviateDocumentIndexMyDoc

from docarray.index.backends.weaviate import EmbeddedOptions
from docarray.index.backends.weaviate import WeaviateDocumentIndex
from typing import Optional
from docarray.typing import AnyEmbedding, NdArrayEmbedding
from pydantic import Field, parse_obj_as
from docarray import DocList, BaseDoc
import numpy as np
import torch

class SubDoc(BaseDoc):
embedding: Optional[AnyEmbedding] = Field(is_embedding=True, n_dim=128)

class MyDoc(BaseDoc):
text: str
subdocs: DocList[SubDoc]

dbconfig = WeaviateDocumentIndex.DBConfig(embedded_options=EmbeddedOptions())
doc_index = WeaviateDocumentIndexMyDoc

creating tensors on GPU

gpu_tensors = [torch.rand(128).to('cuda:2') for i in range(3)]
assert all(t.is_cuda for t in gpu_tensors)

storing them in DocList

data = DocList[MyDoc](
[
MyDoc(text='hello world', subdocs=[SubDoc(embedding=t)]) for t in gpu_tensors
]
)

converting to NdArray on CPU

for doc in data:
doc.subdocs[0].embedding = parse_obj_as(NdArrayEmbedding, doc.subdocs[0].embedding)
assert isinstance(doc.subdocs[0].embedding, np.ndarray) # not on GPU

insert into index

doc_index.index(data)

root_docs, sub_docs, scores = doc_index.find_subindex(np.random.rand(128), subindex='subdocs', limit=3)
print(scores)`
@JohannesMessner JohannesMessner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants