New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contribute the WordLift Vector Store #13028
base: main
Are you sure you want to change the base?
Conversation
…wordlift-vector-store
…2107-add-support-for-the-wordlift-vector-store
…' of github.com:wordlift/llama_index into feature/12107-add-support-for-the-wordlift-vector-store
…2107-add-support-for-the-wordlift-vector-store
…-wordlift-vector-store Feature/12107 add support for the wordlift vector store
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@ziodave Seems like the tests aren't working. Have you tried running them locally? |
@logan-markewich yes, sorry I converted the PR to draft until we fix it. May I ask, we recreated the ubuntu-latest-unit-tester in our organization GH Runners so that we can have the GH Actions run on our fork, https://github.com/wordlift/llama_index/actions/runs/8800952266. Oddly enough the tests pass there, is there a special configuration we need to apply to the GH Runner? This is basically the configuration we did: |
@ziodave I don't think any special config is needed. The errors in the test seem unrelated to env though
|
@logan-markewich we're on it, we'll update the PR soon. Thanks! |
@logan-markewich we're ready for review 🙏 |
llama-index-integrations/vector_stores/llama-index-vector-stores-wordlift/README.md
Outdated
Show resolved
Hide resolved
def __init__(self, key: str): | ||
self.key = key | ||
|
||
async def for_add(self, nodes: List[BaseNode]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why these methods have to be async? Or what this is even for actually (they all return the same thing 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its async because the API key is stored on a different server and accessed by API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if I'm understanding properly, I can't use just any embedding model with wordlift, it has to be a specific one.
I'm unfamiliar with wordlift, but what's the reason for that restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary, we removed any reference to the use of a specific embedding model.
node_id=node.node_id, | ||
embeddings=node.get_embedding(), | ||
text=node.get_content(metadata_mode=MetadataMode.NONE) or "", | ||
metadata={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to not store the node metadata here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any metadata to store.
for node in nodes: | ||
node_dict = node.dict() | ||
metadata: Dict[str, Any] = node_dict.get("metadata", {}) | ||
entity_id = metadata.get("entity_id", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is an entity id required? What happens if it's not provided?
(I really should go read more about wordlift haha)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wordlift Knowledge Graphs are built on the principles of fully Linked Data, where each entity is assigned a permanent dereferentiable URI. When adding nodes to an existing Knowledge Graph, it's essential to include an "entity_id" in the metadata of each loaded document. In the future we may provide an endpoint that generates the ID
from manager_client.rest import RESTResponseType | ||
|
||
|
||
class VectorSearchQueriesApi: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans to publish this as its own package? Like a client SDK? Its fine if it lives here for now though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we’ll publish the API as its own package in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good to me, just worried about a few UX things that might trip up users. I wonder if we can smooth out some of this or not?
Hello @logan-markewich yes, we'll review your comments and be back soon with updates and answers. |
Description
Add support for the WordLift Vector Store.
Fixes #12107
New Package?
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
How Has This Been Tested?
Suggested Checklist:
make format; make lint
to appease the lint gods