Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(python): update function is not accepting scalars #1228

Open
westonpace opened this issue Apr 18, 2024 · 6 comments
Open

bug(python): update function is not accepting scalars #1228

westonpace opened this issue Apr 18, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@westonpace
Copy link
Contributor

LanceDB version

v0.6.7

What happened?

I want to use update to change the value of a fixed size list. If I pass a list of python floats as a value then it works. If I pass a FixedSizeListScalar as a value then it does not work. Supporting this latter case seems useful since FixedSizeListScalar is very easy to get from querying the database (see reproduction).

Are there known steps to reproduce?

import pyarrow as pa

import lancedb

schema = pa.schema({"id": pa.int64(), "vector": pa.list_(pa.float32(), 4)})

db = lancedb.connect("/tmp/my_database")
tbl = db.create_table("my_tbl", schema=schema)

tbl.add([{"id": 1, "vector": [1.0, 2.0, 3.0, 4.0]}])
records = tbl.search().select(["vector"]).limit(1).to_arrow()
vectors = records.column("vector")
value = vectors[0]

# This workaround works                                                                                                                                                                                            
tbl.update("id == 1", {"vector": value.as_py()})

# This should work, but does not work today                                                                                                                                                                        
tbl.update("id == 1", {"vector": value})
@westonpace westonpace added the bug Something isn't working label Apr 18, 2024
@wjones127
Copy link
Contributor

The patch here is very simple, and could even be done in user code:

from lancedb.util import value_to_sql

@value_to_sql.register(pa.Scalar)
def _(value: pa.Scalar) -> str:
    return value_to_sql(value.as_py())

@Sciumo
Copy link

Sciumo commented Apr 18, 2024

Here is a test case

from typing import List
import ollama
import lancedb
from lancedb.pydantic import LanceModel, Vector

def embeddings(data, model = "mxbai-embed-large") -> List[float]:
    embed = ollama.embeddings(model,data)
    return list(embed["embedding"])

HELLO = embeddings("hello")
class Doc(LanceModel):
    filename: str
    text: str
    vector: Vector(len(HELLO))
    
    def update(self, text):
        self.text = text
        self.vector = embeddings(text)


def newdoc(text,filename):
    return Doc( filename=filename, text=text, vector=embeddings(text) )

def docsfrom(docs):
    return [
            Doc(
                text=docs["text"][idx].as_py(),
                filename=docs["filename"][idx].as_py(),
                vector=docs["vector"][idx].as_py().copy(),
            )
            for idx in range(len(docs))
        ]


db = lancedb.connect('./lancedb')
if "test_table" in db.table_names():
    db.drop_table("test_table")
table = db.create_table('test_table', schema=Doc)

data = [newdoc("Hello world", "doc1"),
        newdoc("Goodbye world", "doc2")]

table.add(data,mode="overwrite")

found = docsfrom(table.search(embeddings("Hello world")).limit(1).to_arrow())[0]
found.update("This is a list of integer 10,11,12")
criteria=f"filename==\"{found.filename}\""
table.update(criteria, {"vector":found.vector})

@Sciumo
Copy link

Sciumo commented Apr 18, 2024

added

import pyarrow as pa

@value_to_sql.register(pa.Scalar)
def _(value: pa.Scalar) -> str:
    return value_to_sql(value.as_py())

Still get
ValueError: Invalid user input: LanceError(IO): Only arrays of literals are supported in lance., /home/runner/work/lance/lance/rust/lance/src/io/exec/planner.rs:522:39, /home/runner/work/lance/lance/rust/lance/src/dataset/write/update.rs:126:14

@westonpace
Copy link
Contributor Author

I think, for pydantic support, we need to handle whatever lancedb.pydantic.Vector is

@wjones127
Copy link
Contributor

I think, for pydantic support, we need to handle whatever lancedb.pydantic.Vector is

Indeed. It's sealed inside of a closure, so we need to refactor that to allow the conversion.

@Sciumo
Copy link

Sciumo commented Apr 18, 2024

Bizarre, the following breaks

def embeddings(data, model = "mxbai-embed-large") -> List[float]:
    embed = ollama.embeddings(model,data)
    real = list(embed["embedding"])
    fake = [0.0] * len(real)
    # comment out copy and works
    for i in range(len(real)):
        fake[i] = real[i]
    return fake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants