New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Benchmarks #523
Comments
Also very interested in comparable benchmarks! Btw @do-me, could you share what route you ended up going, and if it was more or less a smooth ride? |
Yes absolutely! Just a disclaimer: my personal view might be somewhat biased towards the usage in SemanticFinder where I only append items to a variable, filter them and then perform cosine distance calculations based on the filtered values. The main bottleneck in frontend applications with embeddings is usually storage and transfering data to the frontend. For the actual cosine distance calculations there is no need for an in-browser vector DB as these are super fast anyway. To give you a practical example: 133k embeddings searched in ~70ms. Keep in mind that the query embedding calculation with transformers.js takes like 700ms, depending on your system. Definitely feel free to reuse the logic. If you create something nice, please ping me! So in a nutshell: plain JSON and JS is enough to built everything. There were no pain points and I think this way I had even more freedom to customize SemanticFinder e.g. for hybrid search. Also, I learned so much more, building the logic myself, but of course that meant investing more time. At the moment I have no plans to integrate Orama unless someone convinces me otherwise (with some nice features for example ;) ). However, if I would start over again, I would definitely consider Orama as it's just working fine out-of-the box and a nice component to build on! |
Thanks for the quick reply & sharing your insights! Semantic Finder and the CORDIS search are quite spectacular demos! I noticed you recommended in the dropdown certain models over others. Regarding evaluating the quality of embeddings/retrieved results: What was the methodology/benchmarks/metrics that you used to assess best model fit for a given task? (Mediapipe's |
Let's maybe head over to do-me/SemanticFinder#32 in order to not become too off-topic for this issue! |
Hey folks - awesome project!
The homepage states that it's fast so I was wondering how fast :)
Having read #428 I was wondering whether you could share some performance benchmarks. Projects like Qdrant offer some scripts do so, e.g. here.
Background
I was looking for a vector DB in JS for the last couple of months but couldn't quite find anything suitable. Orama seems just like the tool I was looking for!
There are plenty of people developing (static) web apps with transformers.js from @xenova (tagging e.g. for your image search ) and Orama seems like the perfect fit.
We're currently evaluating in do-me/SemanticFinder#43 whether rewriting the DB logic of SemanticFinder might be worth it. Performance would be the main reason to adopt it but also cleaner code and features like easy import/export of datadumps.
The text was updated successfully, but these errors were encountered: