Question: OCR scoring in the data pipeline #377

Puiching-Memory · 2024-05-07T10:39:54Z

In your reports, use OCR to identify the text in the images and then eliminate scenes with too much text.

I want to know why too much text affects the model generation.

If so, does that mean that it's difficult to improve the model for text generation, such as newspapers, streets with billboards, and various signs on the driveway lines?

zhengzangw · 2024-05-09T09:07:54Z

We follow the SVD's pipeline. If the video contains much text, it is hard to generate as the captioning model cannot get the text.

In the future, we plan to use OCR model to generate additional captions for generation, and thus make the model able for text generation.

zhengzangw closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: OCR scoring in the data pipeline #377

Question: OCR scoring in the data pipeline #377

Puiching-Memory commented May 7, 2024

zhengzangw commented May 9, 2024

Question: OCR scoring in the data pipeline #377

Question: OCR scoring in the data pipeline #377

Comments

Puiching-Memory commented May 7, 2024

zhengzangw commented May 9, 2024