ocr

Simple and beautiful screenshot software that supports Windows, macOS, and Linux. It also supports OCR and image translation features. | Sunny是一款简洁且漂亮的截图软件，支持Windows、MacOS和Linux系统，亦支持 OCR 和图片翻译

screenshot image ocr snapshot screen capture translate

Updated May 21, 2024

doo / scanbot-sdk-ios-spm

Star

pdf ios ocr sdk scanner barcode image-processing qr-code document mrz image-filter

Updated May 21, 2024
Swift

ballerine-io / ballerine

Star

Open-source infrastructure and data orchestration platform for risk decisioning

Updated May 21, 2024
TypeScript

paperless-ngx / paperless-ngx

Star

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

pdf machine-learning django angular ocr archiving dms document-management optical-character-recognition document-management-system

Updated May 21, 2024
Python

pymupdf / PyMuPDF

Star

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

python pdf font data-science ocr tesseract epub mupdf text-processing pdf-documents extract-data table-extraction text-shaping xps pymupdf

Updated May 21, 2024
Python

Villavu / SimbaTesseract

Star

Bring the power of Tesseract-OCR to Simba

ocr tesseract simba

Updated May 21, 2024
Pascal

siyuan-note / siyuan

Star

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

electron markdown pdf ocr notebook s3 webdav self-hosted openai note-taking evernote anki knowledge-base obsidian pkm notion notes-app local-first chatgpt

Updated May 21, 2024
TypeScript

hiroi-sora / Umi-OCR

Star

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

ocr ocr-python paddleocr

Updated May 21, 2024
QML

mindee / doctr

Star

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

ocr deep-learning pytorch text-recognition text-detection optical-character-recognition text-detection-recognition tensorflow2 document-recognition

Updated May 21, 2024
Python

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Document logical extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser)

html pdf ocr table-of-contents excel html-parser docx documents doc scanned-documents txt odt pdf-parser table-recognition docx-parser document-content-extraction logical-extraction

Updated May 21, 2024
Python

PaddlePaddle / PaddleOCR

Star

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

ocr db crnn ocrlite chineseocr