tika

pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.

nodejs tika pdf-converter pdfbox thumbnail pdftohtml

Updated May 23, 2024
JavaScript

albertus82 / extfix

Star

File Extension Fix Tool - Find and rename files with wrong extensions.

java tika rename-files filenames renamer filename-extension filename-utils

Updated May 21, 2024
Java

apache / tika-docker

Star

Convenience Docker images for Apache Tika Server

docker image tika

Updated May 21, 2024
Shell

OpenSextant / Xponents

Star

Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.

nlp geoparsing geonames geocoding document-conversion solr tika information-extraction geotagging

Updated May 21, 2024
Java

apache / tika-helm

Star

A Helm chart to deploy Apache Tika on Kubernetes.

chart helm tika

Updated May 20, 2024
Smarty

Dimous / tsundoku

Star

Book Management System for e-bibliomaniacs

tika javafx guava clean-architecture hibernate ebook-manager hibernate-search hibernate-orm guice-injection

Updated May 19, 2024
Java

TYPO3-Solr / ext-tika

Star

A TYPO3 CMS extension that provides Apache Tika functionality

search php metadata cms cms-extension tika language-detection typo3 typo3-cms-extension file-indexing text-extraction

Updated May 16, 2024
PHP

EricLondon / Docker-Rails-Tika-Elasticsearch

Star

Docker Rails Tika Elasticsearch

rails docker sidekiq elasticsearch tika paperclip

Updated May 14, 2024
Ruby

sarbanandabhikkhu / tipitaka-xml

Star

Roman Tipitaka (CSCD)

tika tipitaka atthakatha

Updated Apr 29, 2024
JavaScript

bcgov / nr-bcws-opensearch

Star

opensearch related code

aws aws-lambda for terraform tika clamav aws-sqs terragrunt opensearch wildfire forest-fire terraform-managed flnr bcws flnro natural-resource-ministries aws-opensearch wf1 wfdm

Updated May 16, 2024
Java

ICIJ / extract

Star

A cross-platform command line tool for parallelised content extraction and analysis.

etl solr tika index ediscovery

Updated Apr 24, 2024
Java

Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.

metadata incremental tika crawling extraction

Updated Apr 15, 2024
Java

liquidinvestigations / hoover-snoop2

Star

Processing system for the search engine service in Liquid Investigations.

docker elasticsearch django tika celery tesseract-ocr

Updated Apr 9, 2024
Python

lguberan / LuceneFx

Star

Tiny unofficial javafx demo application for Apache's Lucene and Tika.

tika javafx lucene

Updated Apr 6, 2024
Java

Improve this page

Add a description, image, and links to the tika topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tika topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tika

Here are 141 public repositories matching this topic...

apache / tika

dadoonet / fscrawler

vaites / php-apache-tika

kestra-io / plugin-tika

quarkiverse / quarkus-tika

shelfio / tika-text-extract

shebinleo / pdf2html

albertus82 / extfix

apache / tika-docker

OpenSextant / Xponents

apache / tika-helm

Dimous / tsundoku

TYPO3-Solr / ext-tika

EricLondon / Docker-Rails-Tika-Elasticsearch

sarbanandabhikkhu / tipitaka-xml

bcgov / nr-bcws-opensearch

ICIJ / extract

DFKI / leechcrawler

liquidinvestigations / hoover-snoop2

lguberan / LuceneFx

Improve this page

Add this topic to your repo