MUICT CHATBOT

Chatbot application for MUICT, a project for ITCS498 Special Topic in Computer Science semester 2 of 2023 at faculty of ICT, Mahidol University. The models are trained to know about MUICT, mostly on information on 2 year courses, therefore, with a limited range of data trained, it might not enough to generates such a relevent responses, or even generates bias responses and incorrect information.

On this file

MUICT CHATBOT
Members of group 4
License
Directory structure
Our models
Prerequisites
First things first
Instructions (Model Inference)
Model Evaluation
Instructions (Python interpreter)
Instructions (Docker)
Deployment (Backend)
- Things to know before proceed
- Deployment (Backend) instructions
Deployment (Frontend)
- Prerequisites for follow this instructions
- Deployment (Frontend) instructions
Screenshot of Chatbot frontend
Disclaimer

Members of group 4

NAME	ID	GITHUB	CONTACT	RESPONSIBILITY
Kittipich Aiumbhornsin	6488004	https://github.com/ngzh-luke	kittipich.aiu@student.mahidol.ac.th or contact@lukecreated.com	Backend/Frontend, deployment and documentation
Tawan Chaidee	6488011	https://github.com/tawan-chaidee	tawan.jhaidee@gmail.com	Model training, Inference, evaluation and data pre-processsing
Linfeng Zhang	6488168	https://github.com/Lr1zz		Data collecting

License

This project applied MIT license, about license please refer to LICENSE.MD file.

Directory structure

├── .vscode                                              <- Local VS Code folder
│
├── src                                                  <- Source code folder of the project (the main package)
|   ├── __init__.py
│   ├── apis                                             <- Directory of backend codes
|   |   ├─── __init__.py
|   |   ├─── docker.py
|   |   ├─── main.py                                     <- Starting point of FastAPI that contains all routes
|   |   └─── model.py                                    <- Handle everything about on our models
│   ├── model                                            <- Folders contains .ipynb files
│   |   ├─── Muict_Chatbot_inference.ipynb               <- Inference file
│   |   ├─── QA_pair_Augmentor_(Pre_processs).ipynb      <- Augmentor file
|   |   └─── Muict_Chatbot_Trainner.ipynb                <- Model training file
│   ├── ui                                               <- Directory of frontend codes
|   |   ├─── __init__.py
│   |   ├─── main.py                                     <- Tells how our frontend will looks
│   |   └─── connectionHandling.py                       <- Utils file to handle all connection to backend server
│   └── config.py                                        <- Loading the environment files
│
├── chat20240425T1532.pdf                                <- Screenshot of the application before major changes
│
├── chat20240425T1532.png                                <- Screenshot of the application before major changes
│
├── Llama2_evaluation.xlsx                               <- Llama 2 evaluation file
│
├── Mistral_evaluation.xlsx                              <- Mistral evaluation file
│
├── .example.env                                         <- Example of environment variables to be in .env or .dev.env file.
│
├── .gitignore                                           <- Files and directories to be ignored by git
|
├── .dockerignore                                        <- List of files that will be ignored by Docker
|
├── requirements.txt                                     <- Files that states project dependencies
│
├── .python-version                                      <- File that tells which Python version is used to develop
│
├── resources.md                                         <- Contains such useful information
│
├── Dockerfile.ui                                        <- Dockerfile for building the image of application's frontend
│
├── Dockerfile.api                                       <- Dockerfile for building the image of application's backend (useless)
|
├── requirements.txt                                     <- File that states project dependencies (both frontend and backend)
|
├── requirements.ui.txt                                  <- File that states project's frontend dependencies
│
├── LICENSE.md                                           <- Information about project's license
│
├── README.md                                            <- File with useful information about the project and instructions (this file)

Our models

We use Llama2 and Mistral as base models and fine-tuned them to know specific information about MUICT, mostly on information on 2 year courses. Below are HuggingFace links to our models:

Prerequisites

Google Colab access (For test the inference)
A machine with specifications to handle high workload with high performance GPU(s)
Python 3
Docker (if would like to run chat UI(frontend) using Docker)
A Web Browser
A cloned or downloaded the project

First things first

For the backend, due to the model is large and requires a huge amount of machine resorces, therefore a machine specifications that we tested and worked fine (with low traffic) are listed below:
- CPU: Intel(R) Xeon(R) CPU @ 2.00GHz: 2 vCPU (1 core) with 13GB of RAM
- GPU: 1 Nvidia T4
For the frontend, you may choose to run using Docker or directly using Python interpreter.
- Instructions (Python interpreter)
- Instructions (Docker)

Instructions (Model Inference)

To try out our chatbot model, please go to /src/model/Muict_Chatbot_inference.ipynb and run it on Google Colab. Alternatively, you can use this link: "https://colab.research.google.com/drive/1YBzJvVwAk2Vc8Bc0c7xnMzURW3t3946p?usp=sharing"

Model Evaluation

We have evaluated the performance of our two models using human evaluation from our group members. We have used two metrics:
- Accuracy: Is the response factual or not
- Relevent: does the respones match what user ask for?
To see the results of evaluation please uses this link
- Llama2-Fine-tuned: Llama2_evaluation.xlsx
- Mistral-Fine-tuned: Mistral_evaluation.xlsx
- Llama2-Finetune (mirror): https://docs.google.com/spreadsheets/d/1mYbO1b3D1JQe_gcB0o4YWB937OXScfkaEYz6-hulaGE/edit?usp=sharing
- Mistral-Finetune (mirror): https://docs.google.com/spreadsheets/d/1lqzw_hh_-L_QKWGjA136RRUsKfbxYo3lUymNiG0coJ4/edit#gid=759401991

Instructions (Python interpreter)

change working directory on terminal using cd command to where the project is saved.
create virtual environment by run command: python -m venv venv
activate virtual environment (macOS) by run command: source venv/bin/activate activate virtual environment (Windows) by run command: venv\Scripts\activate
check to see which environment is active by run command: pip --version
install project dependencies by run command: pip install -r requirements.txt
create the .env or .dev.env file and specify all of the key-value pairs, please refer to file .example.env for key-value pairs details.
start up api server with command: uvicorn src.apis.main:app
open another terminal and run command to start the application (browser will not open automatically): streamlit run src/ui/main.py --server.headless true if you want to open the browser automatically please instead run: streamlit run src/ui/main.py
check out the running application on browser by navigate to the given URL from the terminal.

Instructions (Docker)

make sure you have specified all key-value pairs in .env file and backend server is running.
change working directory on terminal using cd command to where the project is saved.
build Docker image by using command: docker build -f Dockerfile.ui -t chatui:v1 .
after build is success, run the frontend server by using command: docker run --name chatui -it -p 8501:8501 chatui:v1
navigate to browser and visit chat UI via 127.0.0.1:8501 or 0.0.0.0:8501

Deployment (Backend)

In this instructions, we will deploy our backend to a cloud linux instance by using NGINX as a web server.

Things to know before proceed

This instruction is adapted from 3 blog posts which you can find them in resources.md file.
We will run the server with Uvicorn instead of Gunicorn due to some workers problems that might have work around to fix, so to simplify processes we will use Uvicorn
A linux instance with GPU(s)
There are quite a lot of command lines operations, you may need to be familar with terminal stuff
You may replace the command nano with vim or vice versa in the later instructions
Preinstalled Miniconda in linux instance to manage dependencies
This GitHub repository is used in the instructions to deploy as a source

Deployment (Backend) instructions

connect to the cloud instance by ssh to it by using command: ssh [your instance username]@[your instance IP]
after connected to the instance, create new conda environment by using command: conda create -n [environment name such as 'myenv'] python=[Python version]
activate the newly created environment by using command: conda activate [you env name]
clone the Git repository (this case is this project's) by using command: git clone https://github.com/ngzh-luke/muict-498prj-ictchat.git
change the directory to be in the folder of cloned project using cd command
install project dependencies to our environment from requirements.txt file by run command: pip install -r requirements.txt
make a new folder called logs by run command: mkdir logs
create the .env file as required by the project and fill in the necessary key-value pairs, to create and edit the file, run command: nano .env
then we will install 2 necessary packages to the instance, NGINX and Supervisor, to install run command: sudo apt install supervisor nginx -y
enable and start Supervisor by run command: sudo systemctl enable supervisor and sudo systemctl start supervisor
generate the start script file by command: vim start_script and put in the following:

      #!/bin/bash

      exec /opt/conda/envs/myenv/bin/uvicorn src.apis.main:app --host 0.0.0.0 --port 8000

make the start script executable by run the command:chmod u+x start_script
create a Supervisor's configuration file by run the command: sudo vim /etc/supervisor/conf.d/ictchat.conf
fill in following to the previously created Supervisor configuration file and edit all places that required your instance username, please also noted that you may replace 'ictchat' with other names as you like but you also have to replace all 'ictchat' word that may be found in later in this instruction

      [program:ictchat]
      command=/home/[your instance username]/muict-498prj-ictchat/start_script
      user=replace_this_with_your_instance_username_here
      autostart=true
      autorestart=true
      redirect_stderr=true
      stdout_logfile=/home/[your instance username]/muict-498prj-ictchat/logs/run.log

run sudo supervisorctl reread to reread the Supervisor configurations, and run sudo supervisorctl update to restart Supervisor service
sudo supervisorctl status ictchat is a command to check the app status, in this case our app name is ictchat
to restart the ictchat, run command: sudo supervisorctl restart ictchat also, we can start or stop the ictchat by just simply replace the restart
next is to config NGINX, run command: sudo vim /etc/nginx/sites-available/ictchat and fill the following

      server{
          server_name domainOrIP; # Replace 'domainOrIP' with the IP address of your server or a domain pointing to that IP (e.g., ictchat-backend.com or www.ictchat-backend.com)
          location / {
              include proxy_params;
              proxy_pass http://127.0.0.1:8000;
          }
      }

run sudo ln -s /etc/nginx/sites-available/ictchat /etc/nginx/sites-enabled/ to enable the configuration of our app by creating a symbolic link from the file in sites-available into sites-enabled
test our NGINX is okay by run command: sudo nginx -t
restart NGINX for the new NGINX configurations to apply by run command: sudo systemctl restart nginx
go to the browser and place it with your instance public IP or domain name that points to that IP and you will see something like the following

      {"Hello": "From MUICT CHAT"}

enable the HTTPS is not in this instructions, you may find out how from online instructions

Deployment (Frontend)

We will deploy our frontend to a cloud linux instance by using a Docker container.

Prerequisites for follow this instructions

Container Registry (In this case is Google Artifact Registry) (you also may have to consult the official document for updated instructions)
Google Cloud CLI
- Already logged in and setup the Google Cloud project
Frontend Docker image
You may replace some of the commands that fit your situations
A Google Cloud VM instance (You can select container optimized machine image)

Deployment (Frontend) instructions

on your local machine, you may update Docker config to enable to push to Artifact Registry by command: gcloud auth configure-docker [Google cloud Region ID such as 'us-central1']-docker.pkg.dev please note that you may find this command in the Artifact console as well
tag our Docker image that to be pushed to the registry by using command: docker tag [the image you built]:[your image tag] [Google cloud Region ID such as 'us-central1']-docker.pkg.dev/[your Google Cloud project ID]/[Artifact repository name]/[image name to show in Artifact]:[image tag]
push Docker image to the Artifact Registry by using command: docker push [Google cloud Region ID such as 'us-central1']-docker.pkg.dev/[your Google Cloud project ID]/[Artifact repository name]/[image name to show in Artifact]:[image tag]
connect to the cloud instance by ssh
edit Docker configuration by run command: docker-credential-gcr configure-docker [Google cloud Region ID such as 'us-central1']-docker.pkg.dev
pull the image from Registry by run command: docker pull [Google cloud Region ID such as 'us-central1']-docker.pkg.dev/[your Google Cloud project ID]/[Artifact repository name]/[image name to show in Artifact]:[image tag]
copy the image ID, to get that you have to list the Docker image in the VM instance by run command: docker image ls -a
run our frontend

8.1 it is a best practice to specify a container name, to do that you may add --name and follow by its name, e.g. --name chatui

8.2 to run it without specify custom name, using command: docker run -p 80:8501 [image ID] note that this will availble only HTTP protocol, to enable HTTPS protocol, you may add -p 443:8501

8.3 to instantly attach to the container after run, you may add -it to the command

8.4 altogether would generates us a beautiful command: docker run --name chatui -it -p 80:8501 [image ID]
open web browser and visit your VM instance public IP address or domain that points to that IP, you will see the lovely frontend!

Screenshot of Chatbot

This screenshot was taken before we made huge major changes to our models and frontend design, but might be enough to give you an idea of how is our Chatbot:

PDF file
PNG image:

Disclaimer

We are not affiliated with any offers/products/services/sources listed in this project.

**Last updated by Luke/Kan on May 1, 2024 @23.58

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.vscode		.vscode
src		src
.dockerignore		.dockerignore
.example.env		.example.env
.gitignore		.gitignore
.python-version		.python-version
Dockerfile.api		Dockerfile.api
Dockerfile.ui		Dockerfile.ui
LICENSE.md		LICENSE.md
Llama2_evaluation.xlsx		Llama2_evaluation.xlsx
Mistral_evaluation.xlsx		Mistral_evaluation.xlsx
README.md		README.md
chat20240425T1532.pdf		chat20240425T1532.pdf
chat20240425T1532.png		chat20240425T1532.png
requirements.txt		requirements.txt
requirements.ui.txt		requirements.ui.txt
resources.md		resources.md

License

ngzh-luke/muict-498prj-ictchat

Folders and files

Latest commit

History

Repository files navigation

MUICT CHATBOT

On this file

Members of group 4

License

Directory structure

Our models

Prerequisites

First things first

Instructions (Model Inference)

Model Evaluation

Instructions (Python interpreter)

Instructions (Docker)

Deployment (Backend)

Things to know before proceed

Deployment (Backend) instructions

Deployment (Frontend)

Prerequisites for follow this instructions

Deployment (Frontend) instructions

Screenshot of Chatbot

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Languages