-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGVector Support for Custom Connection Object #2566
Conversation
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
10493810 | Triggered | Generic Password | 8d19f65 | test/agentchat/contrib/vectordb/test_pgvectordb.py | View secret |
10493810 | Triggered | Generic Password | 4b7ba2b | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 4b7ba2b | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 4b7ba2b | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | fdbc3d5 | test/agentchat/contrib/vectordb/test_pgvectordb.py | View secret |
10493810 | Triggered | Generic Password | 6e91d73 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 10e2c2e | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | 10e2c2e | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
* Added fixes and tests for basic auth format * User can provide their own connection object. Added test for it. * Updated instructions on how to use. Fully tested all 3 authentication methods successfully. * Get password from gitlab secrets. * Hide passwords. * Update notebook/agentchat_pgvector_RetrieveChat.ipynb Co-authored-by: Li Jiang <bnujli@gmail.com> * Hide passwords. * Added connection_string test. 3 tests total for auth. * Fixed quotes on db config params. No other changes found. * Ran notebook * Ran pre-commits and updated setup to include psycopg[binary] for windows and mac. * Corrected list extension. * Separate connection establishment function. Testing pending. * Fixed pgvectordb auth * Update agentchat_pgvector_RetrieveChat.ipynb Added autocommit=True in example * Rerun notebook --------- Co-authored-by: Li Jiang <bnujli@gmail.com> Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Hi @Knucklessg1 thanks for this awesome added feature! Not sure if this is the right place to ask this question but would appreciate any help on it. Is chunk token size being used to split docs while using pgvector as a vectordatabase. I don't quite see the code where it splits based on chunk token size (normal usage for local file) but max token of the model by default for each docs(link), which means that the full local docs/files will be added to the vectordb and be input into the context directy based on vector distance. |
@chenyanbiao did you take a look at the retrieve_utils.py? This is where the logic for the split is happening. It's split the same way regardless of vectordb backend. |
@Knucklessg1 Thanks for the response. Yes, it is what I am looking at. I understand that both ways use the same logic of splitting. My confusion is that the non-vectordb solution parse the parameter of |
@thinkall do you have any thoughts around this? |
Why are these changes needed?
This PR contains adding support for custom psycopg connections.
A user can define the connection object.
This is important because a connection object may have to be very custom for certain environments. We should allow the end user to specify the connection object for their environment.
Fix included for .gitattributes to commit certain files with lf line endings instead of crlf. (This was breaking bash scripts in the repo)
Fix included for psycopg[binary] dependency being installed for Windows and Mac, Linux can use the pure python implementation psycopg.
And pass that into the db_config for the retrieve agent.
This also contains a fix for the psycopg.connect() using the username field directly.
Related issue number
NA
Checks