Vector Search and RAG Using MongoDB Atlas + Embedding Models + LLMs

About

This repo has sample code showcasing building Vector Search / RAG (Retrieval-Augmented Generation) applications using built-in Vector Search capablities of MongoDB Atlas, embedding models and LLMs (Large Language Models).

What is Vector Search?

vector search explained

Hackathon specific notes

2024-03-23 - Mistral AI hackathon

Labs

Setup: Setup Python Environment

Follow setup-python-env.md

Lab-1: Connect to MongoDB Atlas

Setup Atlas in the cloud and make sure we can connect to it.

Lab-1

Lab-2 - Vector Search Using OpenAI Embeddings

Perform vector search on an already indexed collection. This collection is pre-populated with embeddings using an OpenAI embedding model.

lab-2

Lab-3: Vector Search Using Custom Embeddings

We will populate collections data with custom embeddings, using open source embedding models and query them.

lab-3

Sample streamlit app

streamlit app

screencast | screenshot 1 | screenshot 2

Lab-4: RAG (Retrieval Augmentation Generation)

Index PDF files and store the index in Atlas with embeddings, and ask questions about the documents using LLMs

lab-4

Dockerizing and Deploying the App

dockerize.md

Some Fun Benchmarks

Vector search results using different embedding models

Local embedding models benchmark

LLMs performance on RAG

Useful Resources

RAG Series Part 1: How to Choose the Right Embedding Model for Your Application by Apoorva Joshi

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
archived		archived
data		data
images		images
lab-1-atlas-setup		lab-1-atlas-setup
lab-2-vector-search-openai		lab-2-vector-search-openai
lab-3-vector-search-custom		lab-3-vector-search-custom
lab-4-rag		lab-4-rag
notes		notes
quickstarts		quickstarts
scripts		scripts
tests		tests
.gitignore		.gitignore
AtlasClient.py		AtlasClient.py
Dockerfile		Dockerfile
LICENSE		LICENSE
OpenAIClient.py		OpenAIClient.py
README-mistral-hackathon-2024-03-23.md		README-mistral-hackathon-2024-03-23.md
README.md		README.md
benchmark-LLMs.md		benchmark-LLMs.md
benchmark-embedding-models.md		benchmark-embedding-models.md
benchmark-search-results.md		benchmark-search-results.md
dockerize.md		dockerize.md
env.sample		env.sample
requirements-docker.txt		requirements-docker.txt
requirements-pinned.txt		requirements-pinned.txt
requirements.txt		requirements.txt
setup-python-env-1.md		setup-python-env-1.md
setup-python-env.md		setup-python-env.md
vector-search.md		vector-search.md

License

sujee/mongodb-atlas-vector-search

Folders and files

Latest commit

History

Repository files navigation

Vector Search and RAG Using MongoDB Atlas + Embedding Models + LLMs

About

What is Vector Search?

Hackathon specific notes

Labs

Setup: Setup Python Environment

Lab-1: Connect to MongoDB Atlas

Lab-2 - Vector Search Using OpenAI Embeddings

Lab-3: Vector Search Using Custom Embeddings

Sample streamlit app

Lab-4: RAG (Retrieval Augmentation Generation)

Dockerizing and Deploying the App

Some Fun Benchmarks

Useful Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Languages