The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
-
Updated
May 31, 2024 - Java
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Elasticsearch File System Crawler (FS Crawler)
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Extract text from a document by Apache Tika
File Extension Fix Tool - Find and rename files with wrong extensions.
Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.
Book Management System for e-bibliomaniacs
A TYPO3 CMS extension that provides Apache Tika functionality
opensearch related code
A cross-platform command line tool for parallelised content extraction and analysis.
Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.
Processing system for the search engine service in Liquid Investigations.
Add a description, image, and links to the tika topic page so that developers can more easily learn about it.
To associate your repository with the tika topic, visit your repo's landing page and select "manage topics."