Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): enable polars predicate pushdown #924

Open
wants to merge 314 commits into
base: main
Choose a base branch
from

Conversation

changhiskhan
Copy link
Contributor

@westonpace i tried turning on lazy frame pushdown, and the is_in filter test is failing with

polars.exceptions.ComputeError: ArrowNotImplementedError: conversion to substrait::Type from large_string

albertlockett and others added 30 commits November 3, 2023 10:24
#623

Fixes issue trying to print response status when using remote client. If
the error is not an HTTP error (e.g. dns/network failure), there won't
be a response.
mirrored store is not garueeteed to have all the files. Ignore the ones
that doesn't exist.
- added check whether a table exists in SaaS open_table
- remove prefilter not supported warning in SaaS search
- fixed issues for SaaS table_names
Colab has pydantic 1.x by default and pydantic 1.x BaseModel objects
don't support weakref creation by default that we use to cache embedding
models
https://github.com/lancedb/lancedb/blob/main/python/lancedb/embeddings/utils.py#L206
. It needs to be added to slot.
Please note: this is not tested as we don't have a server here and
testing against a mock object wouldn't be that interesting.
Sitemap improves SEO by ranking pages and tracking updates.
This is to enable #641.
Should be merged after lancedb/lance#1587 is
released.
Readying for the next Lance release.
In this PR, I add a guide that lets you use Roboflow Inference to
calculate CLIP embeddings for use in LanceDB. This post was reviewed by
@AyushExel.
expose prefilter flag in vectordb rust code.
enable prefiltering in node js, both native and remote
was passing this at the wrong position
raghavdixit99 and others added 25 commits January 30, 2024 23:56
Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
- The JS/TS library actually expects named parameters via an object in
`.createTable()` rather than individual arguments
- Added example on how to search rows by criteria without a vector
search. TS type of `.search()` currently has the `query` parameter as
non-optional so we have to pass undefined for now.
Adding some more quick advice for how to setup AWS S3 with LanceDB.

---------

Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
Passed the following tests

```ts
const keyId = process.env.AWS_ACCESS_KEY_ID;
const secretKey = process.env.AWS_SECRET_ACCESS_KEY;
const sessionToken = process.env.AWS_SESSION_TOKEN;
const region = process.env.AWS_REGION;

const db = await lancedb.connect({
  uri: "s3://bucket/path",
  awsCredentials: {
    accessKeyId: keyId,
    secretKey: secretKey,
    sessionToken: sessionToken,
  },
  awsRegion: region,
} as lancedb.ConnectionOptions);

  console.log(await db.createTable("test", [{ vector: [1, 2, 3] }]));
  console.log(await db.tableNames());
  console.log(await db.dropTable("test"))
```
- fix the repo link on npm
- add links for homepage and bug report
… runtime (#909)

Github action is deprecating old node-16 runtime.
This adds the python bindings requested in #870 The javascript/rust
bindings will be added in a future PR.
Adds capability to the remote python SDK to retry requests (fixes #911)

This can be configured through environment:
- `LANCE_CLIENT_MAX_RETRIES`= total number of retries. Set to 0 to
disable retries. default = 3
- `LANCE_CLIENT_CONNECT_RETRIES` = number of times to retry request in
case of TCP connect failure. default = 3
- `LANCE_CLIENT_READ_RETRIES` = number of times to retry request in case
of HTTP request failure. default = 3
- `LANCE_CLIENT_RETRY_STATUSES` = http statuses for which the request
will be retried. passed as comma separated list of ints. default `500,
502, 503`
- `LANCE_CLIENT_RETRY_BACKOFF_FACTOR` = controls time between retry
requests. see
[here](https://github.com/urllib3/urllib3/blob/23f2287eb526d9384dddeedb6f6345e263bb9b86/src/urllib3/util/retry.py#L141-L146).
default = 0.25

Only read requests will be retried:
- list table names
- query
- describe table
- list table indices

This does not add retry capabilities for writes as it could possibly
cause issues in the case where the retried write isn't idempotent. For
example, in the case where the LB times-out the request but the server
completes the request anyway, we might not want to blindly retry an
insert request.
<img width="837" alt="Screenshot 2024-02-01 at 4 23 34 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/4f0f5c5a-2a24-4b00-aad1-ef80a593d964">
[
<img width="838" alt="Screenshot 2024-02-01 at 4 26 03 PM"
src="https://github.com/lancedb/lancedb/assets/1305083/ca073bc8-b518-4be3-811d-8a7184416f07">
](url)

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
@westonpace
Copy link
Contributor

I'll take a look this week.

@changhiskhan changhiskhan changed the title feat(python): enable polars predict pushdown feat(python): enable polars predicate pushdown Feb 17, 2024
alexkohler pushed a commit to alexkohler/lancedb that referenced this pull request Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet