You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model wasn't really trained to perform region-level reasoning; it was only train to do region-level captioning. If you look in these region-level dataset classes, they only use the REGION_QUESTIONS and REGION_GROUP_QUESTIONS prompt templates from here as questions for LLM training, and they're all captioning questions. If you want region-level reasoning capabilities, GLaMM might not be the best solution for you. If you don't really need segmentation masks in the output, I'd try something like Shikra, for example.
the generated results only describe the content and not the answer for the specified prompt.
result:
The text was updated successfully, but these errors were encountered: