-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should I use blip2 for vqa task training? #688
Comments
Apologies my answer won't be the most in depth, but I haven't seen any of the devs responding to questions like this, probably for good reason they're busy people. So I will try my best to assist, yes there is only code for caption task training, but in the training config, there is a designation of task which is set to image captioning, if you have the time and patience to spend on it you can do things like check out the code for the vqa dataset loader to determine the expected input, my suggestion switch model to flan_t5xl since that's what the demos use for vqa. Then determine the changes needed to make flant5 work. In tasks you can find vqa i checked and compared some eval scripts and the task available, and am actually doing vqa training on my blip2_flant5xl model at this very moment. So i can try to go more in depth, however I am not a software engineer/Deep learning engineer by trade (however I am a deep learning hobbyist for the last 4 years) so I had to take a couple days of reverse engineering and testing and evaluating output to determine what I needed to fix to get this to work. Apologies I don't fully remember all of the iterations and changes I had to make. I do regret to inform you though, that it is definitely worth the effort, for some reason most models like this do not utilize ViT-g/14 from EVA-CLIP but blip2 does, which if my memory is correct that version is the highest performing version of any clip size of a similar size. Thus has a heightened visual capability compared to many models. |
hi,thank you. Now, I try to finished it. |
Hello, have you completed this project at present, would like to ask some questions, can you give me your contact information, thank you very much |
Hello, thank you very much for your open source. I found that for the blip2 model, there is only the code for the caption task training, and there is no code for the vqa task. What should I do?
The text was updated successfully, but these errors were encountered: