Performance Benchmarks #523

do-me · 2023-10-19T12:07:25Z

Hey folks - awesome project!

The homepage states that it's fast so I was wondering how fast :)
Having read #428 I was wondering whether you could share some performance benchmarks. Projects like Qdrant offer some scripts do so, e.g. here.

Background

I was looking for a vector DB in JS for the last couple of months but couldn't quite find anything suitable. Orama seems just like the tool I was looking for!

There are plenty of people developing (static) web apps with transformers.js from @xenova (tagging e.g. for your image search ) and Orama seems like the perfect fit.

We're currently evaluating in do-me/SemanticFinder#43 whether rewriting the DB logic of SemanticFinder might be worth it. Performance would be the main reason to adopt it but also cleaner code and features like easy import/export of datadumps.

MentalGear · 2024-02-16T16:40:46Z

Also very interested in comparable benchmarks! Btw @do-me, could you share what route you ended up going, and if it was more or less a smooth ride?

do-me · 2024-02-16T16:57:24Z

Yes absolutely! Just a disclaimer: my personal view might be somewhat biased towards the usage in SemanticFinder where I only append items to a variable, filter them and then perform cosine distance calculations based on the filtered values.

The main bottleneck in frontend applications with embeddings is usually storage and transfering data to the frontend. For the actual cosine distance calculations there is no need for an in-browser vector DB as these are super fast anyway. To give you a practical example: 133k embeddings searched in ~70ms. Keep in mind that the query embedding calculation with transformers.js takes like 700ms, depending on your system. Definitely feel free to reuse the logic. If you create something nice, please ping me!

So in a nutshell: plain JSON and JS is enough to built everything. There were no pain points and I think this way I had even more freedom to customize SemanticFinder e.g. for hybrid search. Also, I learned so much more, building the logic myself, but of course that meant investing more time. At the moment I have no plans to integrate Orama unless someone convinces me otherwise (with some nice features for example ;) ).

However, if I would start over again, I would definitely consider Orama as it's just working fine out-of-the box and a nice component to build on!

MentalGear · 2024-02-16T17:32:41Z

Thanks for the quick reply & sharing your insights! Semantic Finder and the CORDIS search are quite spectacular demos!

I noticed you recommended in the dropdown certain models over others. Regarding evaluating the quality of embeddings/retrieved results: What was the methodology/benchmarks/metrics that you used to assess best model fit for a given task? (Mediapipe's universal sentence encoder seems also quite fast. Did you try it out as well in terms of embedding retrieval quality ?)

do-me · 2024-02-16T17:40:31Z

Let's maybe head over to do-me/SemanticFinder#32 in order to not become too off-topic for this issue!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Benchmarks #523

Performance Benchmarks #523

do-me commented Oct 19, 2023

MentalGear commented Feb 16, 2024

do-me commented Feb 16, 2024

MentalGear commented Feb 16, 2024 •

edited

do-me commented Feb 16, 2024

Performance Benchmarks #523

Performance Benchmarks #523

Comments

do-me commented Oct 19, 2023

Background

MentalGear commented Feb 16, 2024

do-me commented Feb 16, 2024

MentalGear commented Feb 16, 2024 • edited

do-me commented Feb 16, 2024

MentalGear commented Feb 16, 2024 •

edited