Structured Triplet Learning with Pos-tag Guided Attention for Visual Question Answering

This is the code for "Structured Triplet Learning with Pos-tag Guided Attention for Visual Question Answering, WACV 2018 (Zhe Wang, Xiaoyi Liu, Liangjian Chen, Limin Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes)", The good practice in the VQA system such as pos-tag attention, structured triplet learning and triplet attention is very general and can be inserted into almost any visual and language task.

If you find the code useful, please cite the paper:

Structured Triplet Learning with Pos-tag Guided Attention for Visual Question Answering WACV 2018 (Zhe Wang, Xiaoyi Liu, Liangjian Chen, Limin Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes)

If you have feedback for the code, please contact:

buptwangzhe2012 at gmail dot com

Performance

Below is the step by step effectiveness verification of our method, note to speed up the verification, we use the 7by7 feature instead of 14by14 feature

Method	V7W	VQA validation
Our Baseline	65.6	58.3
+POS tag guided attention (POS-Att)	66.3	58.7
+Convolutional N-Gram (Conv N-Gram)	66.2	59.3
+POS-Att +Conv N-Gram	66.6	59.5
+POS-Att +Conv N-Gram +Triplet attention-Q	66.8	60.1
+POS-Att +Conv N-Gram +Triplet attention-A	67.0	60.1
+POS-Att +Conv N-Gram +Triplet attention-Q+A	67.3	60.2
+POS-Att +Conv N-Gram +Triplet attention-Q+A + structured Learning Triplets	67.5	60.3

Our full model performance

Method	V7W Telling	VQA Test Standard	VQA Test Dev	VQA Test Dev Y/N	VQA Test Dev Num	VQA Test Dev Other
Ours	68.2	69.6	69.7	81.9	44.3	64.7

Pre-requisite

tensorflow, torch, pandas, h5py, ipdb, cv2, pdb, spacy, sklearn, matplotlib, PIL, nltk

Quick Demo

Download the V7W telling feature shared on https://drive.google.com/open?id=1Hofquxw22j8soyjE0vuZqxcNuvJd-e9V And run "CUDA_VISIBLE_DEVICES=0 python v7w.py"

Data pre-processing

Download Visual7W: http://web.stanford.edu/~yukez/visual7w/ And glove: http://nlp.stanford.edu/data/wordvecs/glove.6B.zip from https://github.com/stanfordnlp/GloVe Download: https://d2j0dndfm35trm.cloudfront.net/resnet-200.t7

python data_preprocessing_7w.py --data_set telling

python prepro_7w.py

th prepro_img_residule.lua

Visualization

Architecture:

Good Practice:

python comparisons_wacv.py

Good Samples:

python draw_heat_new.py

Bad Samples:

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
architecture.png		architecture.png
badsample.png		badsample.png
comparisons_wacv.py		comparisons_wacv.py
data_preprocessing_7w.py		data_preprocessing_7w.py
draw_heat_new.py		draw_heat_new.py
goodpractice.png		goodpractice.png
goodsample.png		goodsample.png
prepro_7w.py		prepro_7w.py
prepro_img_residule.lua		prepro_img_residule.lua
readme.md		readme.md
transforms.lua		transforms.lua
v7w.py		v7w.py

wangzheallen/STL-VQA

Folders and files

Latest commit

History

Repository files navigation

Structured Triplet Learning with Pos-tag Guided Attention for Visual Question Answering

Structured Triplet Learning with Pos-tag Guided Attention for Visual Question Answering WACV 2018 (Zhe Wang, Xiaoyi Liu, Liangjian Chen, Limin Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes)

buptwangzhe2012 at gmail dot com

Performance

Pre-requisite

Quick Demo

Data pre-processing

Visualization

License

About

Topics

Resources

Stars

Watchers

Forks

Languages