-
-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Crash on multiple .pdf files #1312
Labels
Comments
Attached a verbose (-V 2) logfile: |
olafure
changed the title
[Bug]: Crash on a multiple .pdf files
[Bug]: Crash on multiple .pdf files
May 15, 2024
Can't reproduce here. Possibly, this is a tesseract bug. What is the output of |
|
Can you try upgrading to tesseract 5.x? |
Yep, that solves it, thanks!
|
jbarlow83
added a commit
that referenced
this issue
May 19, 2024
Addresses [Bug]: Crash on multiple .pdf files #1312 Not actually a fix, but at least it will get us better diagnostics. Appears old Tesseract 4.x generates bad line boxes at times.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Crash on multiple .pdf files. Latest master version.
Steps to reproduce
Files
https://archive.org/download/PopularMechanics1945/Popular_Mechanics_09_1945.pdf
How did you download and install the software?
PyPI (pip, poetry, pipx, etc.), source build
OCRmyPDF version
16.2.1.dev5+g5caf654
Relevant log output
The text was updated successfully, but these errors were encountered: