Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add Tensor pydantic type #985

Draft
wants to merge 347 commits into
base: main
Choose a base branch
from

Conversation

changhiskhan
Copy link
Contributor

@changhiskhan changhiskhan commented Feb 17, 2024

  • Can be used to declare data model
  • Can be used to ingest data

closes #958

QianZhu and others added 30 commits December 7, 2023 14:47
We had some build issues with npm publish for cross-compiling arm64
macos on an x86 macos runner. Switching to m1 runner for now until
someone has time to deal with the feature flags.

follow-up tracked here: #688
adding some badges
added a gif to readme for the vectordb repo

---------

Co-authored-by: kaushal07wick <kaushalc6@gmail.com>
- Register open_table as event 
- Because we're dropping 'seach' event currently, changed the name to
'search_table' and introduced throttling
- Throttled events will be counted once per time batch so that the user
is registered but event count doesn't go up by a lot
Co-authored-by: Will Jones <willjones127@gmail.com>
…rch (#693)

Note this currently the filter/where is only implemented for LocalTable
so that it requires an explicit cast to "enable" (see new unit test).
The alternative is to add it to the Table interface, but since it's not
available on RemoteTable this may cause some user experience issues.
Most recent release failed because `release` depends on `node-macos`,
but we renamed `node-macos` to `node-macos-{x86,arm64}`. This fixes that
by consolidating them back to a single `node-macos` job, which also has
the side effect of making the file shorter.
pass vector column name to remote as well.

`vector_column` is already part of `Query` just declearing it as part to
`remote.VectorQuery` as well
Closes lancedb/lance#1738

We add a `flatten` parameter to the signature of `to_pandas`. By default
this is None and does nothing.
If set to True or -1, then LanceDB will flatten structs before
converting to a pandas dataframe. All nested structs are also flattened.
If set to any positive integer, then LanceDB will flatten structs up to
the specified level of nesting.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
westonpace and others added 29 commits February 8, 2024 09:40
A `count_rows` method that takes a filter was recently added to
`LanceTable`. This PR adds it everywhere else except `RemoteTable` (that
will come soon).
1. improved error msg for SaaS create_table and create_index

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
remote SDK tests were completed through lancedb_integtest
…964)

- Rename safe_import -> attempt_import_or_raise (closes
#923)
- Update docs
- Add Notebook example (@changhiskhan you can use it for the talk. Comes
with "open in colab" button)
- Latency benchmark & results comparison, sanity check on real-world
data
- Updates the default openai model to gpt-4
- Fixed typos and added some clarity to the hybrid search docs
- Changed "Airbnb" case to be as per the [official company
name](https://en.wikipedia.org/wiki/Airbnb) (the "bnb" shouldn't be
capitalized", and the text in the document aligns with this
- Fixed headers in nav bar
This PR also reworks the table creation utilities significantly so that
they are more consistent, built on top of each other, and thoroughly
documented.
When we turned on fat LTO builds, we made the release build job **much**
more compute and memory intensive. The ARM runners have particularly low
memory per core, which makes them susceptible to OOM errors. To avoid
issues, I have enabled memory swap on ARM and bumped the side of the
runner.
Using datasets is preferred way to allow filter and projection pushdown,
as well as aggregated larger-than-memory tables.
We depend on C static runtime, but not all Windows machines have that.
So might be worth statically linking it.

reorproject/reor#36 (comment)
Currently if a batch request is given to the remote API, each query is
sent sequentially. We should allow the user to specify a threadpool.
- [x] Can be used to declare data model
- [ ] Can be used to ingest data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: add Tensor type to pydantic module