VizWiz VQA course project Multi Modal Machine Learning

Running Instructions

Download data:

Download skill data:

cd data/skill
bash download_data.sh

Download VQA data:

cd data/VQA
bash download_data.sh

Run model (SkillCLIP) variants:

With everything:

python -m src.main_model.clip_late_fusion -t -de "cuda:0" -exp skill_aware_clip

Without skill embeddings:

python -m src.main_model.clip_late_fusion -t -de "cuda:0" -exp skill_unaware_clip

Without object tags:

python -m src.main_model.clip_late_fusion -t -de "cuda:0" -exp skill_aware_clip_nobj -nobj

Without scene text:

python -m src.main_model.clip_late_fusion -t -de "cuda:0" -exp skill_aware_clip_nsctxt -nsctxt

With multi-task training:

python -m src.main_model.clip_multitasking.py -t -de "cuda:0" -exp skill_aware_clip_multitasking -pred_file pred.json

Interesting object detections

Keys of a keyboard are detected as microwaves with relatively high confidence scores:

path: val_objects_detected/VizWiz_val_00001474_objects.png
Potential reasons: the image is very zoomed in which might be abnormal.

Illustrative Examples:

Here are some illustrative examples from our error analysis: FusionCLIP refers to the SkillCLIP model without the skill embeddings. Comparison between our model (SkillCLIP) and FusionCLIP. Some more examples:

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
data		data
examples		examples
experiments		experiments
plots		plots
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
ans_bertscore_plot.png		ans_bertscore_plot.png
clip_base_vs_fusion_error_anal.csv		clip_base_vs_fusion_error_anal.csv
clip_fusion_vs_skill_error_anal.csv		clip_fusion_vs_skill_error_anal.csv
requirements.txt		requirements.txt
result_deberta.json		result_deberta.json
result_t5.json		result_t5.json
skills_overlap_venn_diagram.png		skills_overlap_venn_diagram.png
two_dogs_in_snow.jpg		two_dogs_in_snow.jpg
verbose_result_class.json		verbose_result_class.json
verbose_result_t5.json		verbose_result_t5.json
vilt_skill_clf_error_anal.csv		vilt_skill_clf_error_anal.csv

atharva-naik/MMML-TermProject-VizWiz-VQA-Challenge

Folders and files

Latest commit

History

Repository files navigation

VizWiz VQA course project Multi Modal Machine Learning

Running Instructions

Interesting object detections

Illustrative Examples:

About

Topics

Resources

Stars

Watchers

Forks

Languages