Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: OCR scoring in the data pipeline #377

Closed
Puiching-Memory opened this issue May 7, 2024 · 1 comment
Closed

Question: OCR scoring in the data pipeline #377

Puiching-Memory opened this issue May 7, 2024 · 1 comment

Comments

@Puiching-Memory
Copy link

In your reports, use OCR to identify the text in the images and then eliminate scenes with too much text.

I want to know why too much text affects the model generation.

If so, does that mean that it's difficult to improve the model for text generation, such as newspapers, streets with billboards, and various signs on the driveway lines?

@zhengzangw
Copy link
Collaborator

We follow the SVD's pipeline. If the video contains much text, it is hard to generate as the captioning model cannot get the text.

In the future, we plan to use OCR model to generate additional captions for generation, and thus make the model able for text generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants