Apache Tika adapter in Go
-
Updated
Jan 4, 2017 - Go
Apache Tika adapter in Go
Converts a pdf file into a text file while keeping the layout of the original pdf.
JRuby gem to pdf to text while keeping the layout from original pdf file
This PDFBox wrapper that can be used for extracting text and text co-ordinates from a printed PDF doc (no OCR)
C# demo for PDF to image converting, pdf text extracting, adding digital signature to pdf, adding watermark to pdf, and compressing pdf
Standalone .NET Converter library, not require Adobe Acrobat component nor Microsoft Office Interop Assemblies, to convert PDF, DOCX, XLSX, HTML, Image, CSV, RTF, TXT in .NET framework
IO management for PCU project
PDF parser component (Apache Tika) for PCU project
A book reader with voice control functionality for blind people
Table structure recognition dataset of the paper: Complicated Table Structure Recognition
Converting the Pdf and Fb2 documents to text or to the list of articles.
PDF.co Gem plugin for Ruby on Rails
Batch-convert pdf to text, extract data from pdf in python
Perl client for SelectPdf Online REST API
Ruby client for SelectPdf Online REST API
Node.js client for SelectPdf Online REST API
The notebook in this repository uses pytesseract to extract text from a pdf document. The script can be used to automate text acquisition from a large body of printed resources such as books. The acquired text can then be used for dowstream tasks, such as training language models, topic models, document summarization etc
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.
To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."