Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OCR functionality to GDrive connectors #1316

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

markbotterill
Copy link

Entails modification in cross_connector_utils/file_utils.py so should be reasonably simple to roll out to other connectors.

read_pdf_file function gets additional use_ocr keyword, as shown:

def read_pdf_file(file: IO[Any],
                  file_name: str,
                  pdf_pass: str | None = None,
                  use_ocr: bool = False) -> str:

N.B Docker file for backend currently has "Cleanup" step commented out (likely leading to slightly larger image). Should function as is, but needs investigation into what specific part of the cleanup is causing tesseract to break.

Copy link

vercel bot commented Apr 10, 2024

@markbotterill is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant