ChatBOt for multi-column pdfs #11893
Replies: 1 comment
-
Hey @PradiptaAtByteIQMind-Graph2023! Back at it with another intriguing challenge, I see. Ready to dive into the world of multi-column PDFs together? 😎 To create a query engine for multi-column PDFs or documents, you can follow a similar approach to the one used in the LlamaIndex repository with the
Here's a simplified example of how you might set up a basic query engine for a DataFrame created from text extracted from PDFs: import pandas as pd
# Example DataFrame creation
data = {
"document_id": [1, 2, 3],
"title": ["Document 1", "Document 2", "Document 3"],
"content": ["Text of document 1", "Text of document 2", "Text of document 3"]
}
df = pd.DataFrame(data)
# Simplified query engine
class SimpleQueryEngine:
def __init__(self, dataframe):
self.dataframe = dataframe
def query(self, keyword):
# Example query method to find documents containing a keyword
return self.dataframe[self.dataframe['content'].str.contains(keyword, case=False)]
# Usage
query_engine = SimpleQueryEngine(df)
results = query_engine.query("document 1")
print(results) This example is quite basic and would need to be expanded with more sophisticated querying capabilities, error handling, and security measures similar to those in the LlamaIndex repository. The LlamaIndex repository contains functionality for parsing and extracting text from PDFs and other document types. Specifically, it includes a The I hope this helps! If you have any more questions or need further clarification, feel free to ask. Sources
|
Beta Was this translation helpful? Give feedback.
-
Hi
How to create a query engine for the pdfs or documents that have mulitple columns in it.
Thanks and Regards
Pradipta
Beta Was this translation helpful? Give feedback.
All reactions