OCR_with_LLMs

LLaVA OCR Script (ocr_Llava.py)

This script utilizes the LLaVA library to process images using a pre-trained model. It automates the process of extracting text from images and interacts with the LLaVA terminal interface. The script assumes a specific workflow for processing a list of images.

Setup and Dependencies

Before running the script, ensure the following dependencies are installed:

# Clone the LLaVA repository
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA

# Create and activate a virtual environment
conda create -n llava python=3.10 -y
conda activate llava

# Install required packages
pip install --upgrade pip
pip install -e .
pip install protobuf
pip install --upgrade transformers

#If using llava-v1.6-vicuna-7b, additional steps may be required:

# Navigate to the cloned LLaVA repository
cd LLaVA

# Update the repository
git pull

# Reinstall dependencies
pip install -e .
pip uninstall psutil
pip install psutil

Usage

Place the script in the 'LLaVA\llava\serve' folder.
Update the image_dir variable in the script to the path containing the images to be processed.
Run the following command in the terminal:

python -m llava.serve.tcli --model-path liuhaotian/llava-v1.6-vicuna-7b --load-4bit

Note: If errors occur, consider changing time.sleep(35) to time.sleep(40).

Workflow

The script collects a list of images from the specified directory and sorts them.
It starts the LLaVA terminal command using subprocess.Popen.
For each image in the list:

a. It waits for the 'Image path:' prompt, then sends the image path to the LLaVA terminal.

b. It waits for the 'USER:' prompt and sends the command 'Extract text in the image.'

c. It waits for 35 seconds to allow processing.

d. It prints and saves the 'ASSISTANT:' output.

e. It waits for 2 seconds before moving to the next image.
The script logs the progress in the terminal and saves it to the 'output_llava.txt' file.
The process is terminated after processing all images, and the total time is displayed.

Note: The script assumes the LLaVA terminal interface follows specific prompts ('Image path:', 'USER:', 'ASSISTANT:'). Adjustments may be needed based on updates to LLaVA.

Prompts

The prompt used to create the llava script can be found in the jupyter notebook Prompt_GPT4_Llava.ipynb

Image Text Extraction with Pytesseract (ocr_pytesseract.py)

Purpose

The script is designed to process a folder of images, extract text from each image, rotate the image at 90-degree intervals up to 270 degrees, and extract text from the rotated images. The results are then compiled into a Microsoft Word document, with images, names, rotations, and extracted texts.

Dependencies

Ensure the following packages are installed before running the script:

opencv-python: Image processing library
pytesseract: OCR (Optical Character Recognition) tool
Pillow: Image processing library for opening, manipulating, and saving many different image file formats
tqdm: Progress bar for iteration
python-docx: Library for creating and updating Word documents
All the requirements can be found in the file requirements.txt and the sudo commands that should be installed can be found in this file

External Dependencies and Commands

Additional system dependencies and commands are provided in the comments at the beginning of the script.
These include updating the system, installing necessary libraries, setting up Tesseract OCR, and installing required Python packages.

Script Workflow

1. Rotate Image Function (`rotate_image`):

Takes an image and a rotation angle in degrees.
Applies rotation to the image using OpenCV's warpAffine function.
Returns the rotated image.

2. Resize Image Function (`resize_image`):

Takes an image and a scale factor.
Resizes the image while maintaining the original aspect ratio.
Returns the resized image.

3. Text Extraction Function (`extract_text`):

Uses pytesseract to extract text from an image.
Returns the extracted text.

4. Add Image to Document Function (`add_image_to_doc`):

Takes a Word document (doc), image path, image name, extracted text, rotated image, and rotation angle.
Sanitizes text to remove problematic characters.
Adds a paragraph to the document with image name, rotation angle, and sanitized text.
Resizes the rotated image, converts it to a format suitable for Word (PIL to BytesIO), and adds it to the document.
Adds a line break between images.

5. Main Function (`main`):

Accepts a folder path as a command-line argument.
Verifies the existence of the folder.
Gets a list of image files in the folder.
Initializes a Word document.
Iterates through each image, rotating it at 90-degree intervals and extracting text.
Saves the Word document with processed information on the desktop.

6. Command-Line Execution:

The script can be executed from the command line with the folder path as an argument.
Example: python script.py /path/to/image/folder

Notes

The script may display an error message related to XML compatibility. It handles this issue by modifying the add_image_to_doc function.
The script saves the Word document on the desktop with the folder name and "_word.docx" appended.
Images are not cropped to fit the original dimensions but are adapted to their new size when added to the Word document.
The script provides a progress bar using tqdm.
The execution time is printed at the end of the script.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Prompt_GPT4_Llava.ipynb		Prompt_GPT4_Llava.ipynb
README.md		README.md
ocr-llava-and-pytesseract.svg		ocr-llava-and-pytesseract.svg
ocr_Llava.py		ocr_Llava.py
ocr_pytesseract.py		ocr_pytesseract.py
requirements.txt		requirements.txt
sudo_commands.txt		sudo_commands.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt_GPT4_Llava.ipynb

Prompt_GPT4_Llava.ipynb

README.md

README.md

ocr-llava-and-pytesseract.svg

ocr-llava-and-pytesseract.svg

ocr_Llava.py

ocr_Llava.py

ocr_pytesseract.py

ocr_pytesseract.py

requirements.txt

requirements.txt

sudo_commands.txt

sudo_commands.txt

Repository files navigation

OCR_with_LLMs

LLaVA OCR Script (ocr_Llava.py)

Setup and Dependencies

Usage

Workflow

Prompts

Image Text Extraction with Pytesseract (ocr_pytesseract.py)

Purpose

Dependencies

External Dependencies and Commands

Script Workflow

1. Rotate Image Function (`rotate_image`):

2. Resize Image Function (`resize_image`):

3. Text Extraction Function (`extract_text`):

4. Add Image to Document Function (`add_image_to_doc`):

5. Main Function (`main`):

6. Command-Line Execution:

Notes

About

Releases

Packages

Languages

nsourlos/OCR_with_LLMs

Folders and files

Latest commit

History

Repository files navigation

OCR_with_LLMs

LLaVA OCR Script (ocr_Llava.py)

Setup and Dependencies

Usage

Workflow

Prompts

Image Text Extraction with Pytesseract (ocr_pytesseract.py)

Purpose

Dependencies

External Dependencies and Commands

Script Workflow

1. Rotate Image Function (rotate_image):

2. Resize Image Function (resize_image):

3. Text Extraction Function (extract_text):

4. Add Image to Document Function (add_image_to_doc):

5. Main Function (main):

6. Command-Line Execution:

Notes

About

Topics

Resources

Stars

Watchers

Forks

Languages

1. Rotate Image Function (`rotate_image`):

2. Resize Image Function (`resize_image`):

3. Text Extraction Function (`extract_text`):

4. Add Image to Document Function (`add_image_to_doc`):

5. Main Function (`main`):