Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indian Numbers on Arabic text #1311

Open
MedoHamdani opened this issue May 15, 2024 · 0 comments
Open

Indian Numbers on Arabic text #1311

MedoHamdani opened this issue May 15, 2024 · 0 comments

Comments

@MedoHamdani
Copy link

As we know that these 0,1,2,3 are called Arabic numbers, and these are called ٠, ١ , ٢ ,٣ the Indian numbers, however, these numbers are widely used in Arabic text, therefore when the code line was used to extract the text, the output was irrelevant somehow.
The file will be uploaded which consists of 32 pages including the index which also was not outputted in a good way either.

This is the code that was used
ocrmypdf --sidecar Test.txt --deskew -l ara+eng Test.pdf Test_ocrd.pdf
Test.pdf
Test.txt

You can see both files.

Thanks for the help, once this issue is resolved, then hopefully will make a video to explain how to use OCR My PDF, because the installation is not beginner friendly, but with Medo will make it friendly :)

Thanks,

Medo Hamdani

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant