Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc(js): remove duplicate legacy require("vectordb") for installation #1004

Open
wants to merge 382 commits into
base: main
Choose a base branch
from

Conversation

changhiskhan
Copy link
Contributor

No description provided.

changhiskhan and others added 30 commits December 20, 2023 15:41
Use pathlib for local paths so that pathlib
can handle the correct separator on windows.

Closes #703

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
This is a pretty direct binding to the underlying lance capability
This command hasn't been run for a while...
Modify some grammar, punctuation, and spelling errors.
For object detection, each row may correspond to an image and each image
can have multiple bounding boxes of x-y coordinates. This means that a
`bbox` field is potentially "list of list of float". This adds support
in our pydantic-pyarrow conversion for nested lists.
Closes #721 

fts will return results as a pyarrow table. Pyarrow tables has a
`filter` method but it does not take sql filter strings (only pyarrow
compute expressions). Instead, we do one of two things to support
`tbl.search("keywords").where("foo=5").limit(10).to_arrow()`:

Default path: If duckdb is available then use duckdb to execute the sql
filter string on the pyarrow table.
Backup path: Otherwise, write the pyarrow table to a lance dataset and
then do `to_table(filter=<filter>)`

Neither is ideal. 
Default path has two issues:
1. requires installing an extra library (duckdb)
2. duckdb mangles some fields (like fixed size list => list)

Backup path incurs a latency penalty (~20ms on ssd) to write the
resultset to disk.

In the short term, once #676 is addressed, we can write the dataset to
"memory://" instead of disk, this makes the post filter evaluate much
quicker (ETA next week).

In the longer term, we'd like to be able to evaluate the filter string
on the pyarrow Table directly, one possibility being that we use
Substrait to generate pyarrow compute expressions from sql string. Or if
there's enough progress on pyarrow, it could support Substrait
expressions directly (no ETA)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
API has changed significantly, namely `openai.Embedding.create` no
longer exists.
openai/openai-python#742

Update the OpenAI embedding function and put a minimum on the openai sdk
version.
issue separate requests under the hood and concatenate results
Add support for adding lists of string input (e.g., list of categorical
labels)

Follow-up items: #757 #758
Co-authored-by: Aidan <64613310+aidangomar@users.noreply.github.com>
I found that it was quite incoherent to have to read through the
documentation and having to search which submodule that each class
should be imported from.

For example, it is cumbersome to have to navigate to another
documentation page to find out that `EmbeddingFunctionRegistry` is from
`lancedb.embeddings`
If the input text is None, Tantivy raises an error
complaining it cannot add a NoneType. We handle this
upstream so None's are not added to the document.
If all of the indexed fields are None then we skip
this document.
westonpace and others added 4 commits February 22, 2024 19:56
This also renames the new experimental node package to lancedb. The
classic node package remains named vectordb.

The goal here is to avoid introducing piecemeal breaking changes to the
vectordb crate. Instead, once the new API is stabilized, we will
officially release the lancedb crate and deprecate the vectordb crate.
The same pattern will eventually happen with the npm package vectordb.
westonpace and others added 16 commits February 23, 2024 14:08
…984)

BREAKING CHANGE: users will now need to npm install `apache-arrow` and
`@apache-arrow/ts` themselves.
Initial work for #959. This exposes the basic functionality for each in
all of the APIs. Will add user guide documentation in a later PR.
This changes `lancedb` from a "pure python" setuptools project to a
maturin project and adds a rust lancedb dependency.

The async python client is extremely minimal (only `connect` and
`Connection.table_names` are supported). The purpose of this PR is to
get the infrastructure in place for building out the rest of the async
client.

Although this is not technically a breaking change (no APIs are
changing) it is still a considerable change in the way the wheels are
built because they now include the native shared library.
…table (#1022)

this will work after upgrading lance with
lancedb/lance#1995 merged
see #884 for details

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
upgrade to lance 0.10.1 and update doc string to reflect dynamic
projection options
#1036)

A simple base usage that install the dependencies necessary to use FTS
and Hybrid search

---------

Co-authored-by: Nat Roth <natroth@Nats-MacBook-Pro.local>
Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
This will eventually replace the remote table implementations in python
and node.
…PI (#1031)

I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking
changes from sync to async python.
prrao87 and others added 4 commits March 3, 2024 15:59
The renaming of `vectordb` to `lancedb` broke the [quick start
docs](https://lancedb.github.io/lancedb/basic/#__tabbed_5_3) (it's
pointing to a non-existent directory). This PR fixes the code snippets
and the paths in the docs page.

Additionally, more fixes related to indexing docs below 👇🏽.
alexkohler pushed a commit to alexkohler/lancedb that referenced this pull request Apr 20, 2024
* quick documentation for merge method

* format and check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet