Efficient RAG retrieval system for article fragments from the Kaggle dataset available here. This system supports popular vector stores retrieval and includes a Question Answering functionality with Large Language Models (LLMs). By default, it utilizes FAISS for vector store retrieval and the Mixtral-8x7B LLM. More details in report.
To download the dataset, you can choose one of two options:
- Download it manually from link and create folder named
data
in project root directory. Then store themedium.csv
file in that folder. - Use the Kaggle API: Download your account token from this link and overwrite the existing
kaggle.json
file.
This step is not obligatory but necessary if you want to use Q/A system with Large Language Model support. To obtain your HuggingFaceHub API Token generate it and copy it from your HuggingFace account and paste it to .env
file overwriting <YOUR_TOKEN>
placeholder.
pip install -r requirements.txt
chmod +x kaggle.sh
./kaggle.sh
streamlit run app.py
sudo docker build -t ars-app:latest .
sudo docker container run -it -p 8501:8501 ars-app:latest