Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
-
Updated
May 27, 2024 - Python
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.
Table detection and table structure recognition using Yolov5
A Repo For Document AI
ReadingBank: A Benchmark Dataset for Reading Order Detection
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
Spacy for Key:Value pairs
FastAPI application for document classification using a multimodal LayoutLM model, designed to classify PDF documents into RVL-DCIP categories.
AI & Data, Google Cloud Skills Boost
Official release of RFUND introduced in the paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction" (arXiv:2401.03472).
OCR Runner - Command Line Application for processing image files using Google Cloud Vision API and Google Cloud Document AI.
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.
Custom data extractors that use Google Cloud's Document AI
Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
This repository includes all computer vision, audio, document AI, and multimodal projects.
Add a description, image, and links to the document-ai topic page so that developers can more easily learn about it.
To associate your repository with the document-ai topic, visit your repo's landing page and select "manage topics."