GitHub - kaustubhbhavsar/gpt-information-extractor: A GPT-3.5 based tool to extract, store, and retrieve diverse information.

SLICEMATE

GPT-driven Data Exploration: Unveiling Insights and Enabling Discoverability

What Is It?

No more forms or button navigation—simply type your mind, and watch as this intelligent application classifies, segments, and archives everything from shopping lists to random ideas.

The purpose entails in the development of a sophisticated Natural Language Understanding (NLU) tool that aims to facilitate the swift extraction, storage, and retrieval of diverse personal information, encompassing shopping or to-do lists, phone numbers, email addresses, names, trivia, reminders, random ideas, and more. The impetus behind this undertaking lies in recognizing the challenges individuals encounter when seeking the appropriate form or navigating through various buttons for each information category. By leveraging this application, users can effortlessly articulate their thoughts via typing, secure in the knowledge that the tool adeptly classifies, segments, and archives the provided data.

(back to top)

Summary

[Click here to launch Streamlit application]

Slicemate excels in performing two primary tasks using GPT-3.5 (text-davinci-003):

Gathering facts from Natural Language and adding them to a database: GPT converts the written sentences into well-organized pieces of information, which are then stored in a database.
Searching the database using keywords: GPT also helps improve searches by including synonyms and related terms along with the original keywords. This makes the search results more comprehensive and accurate.

As mentioned above, the goal is to obtain data and enable its discoverability. The data extracted will include the following:

Category: The overall classification to which the data pertains (e.g., "Reminder", "Health", "Shopping").
Type: The inherent characteristics of the stored data (e.g., emails, phone numbers, prices, reminders).
People: Names of people or entities involved in the extraction.
Key: The primary entity to which a value is assigned. This field allows for more flexibility compared to the preceding ones.
Value: The specific entry associated with the key. It also permits greater flexibility compared to the other fields.

Please note that the categories are dynamic in nature and can be modified and updated while adding facts to the database.

Included in the resources are two separate notebooks designed to facilitate prompt engineering studies for extracting facts and searching facts, respectively. These notebooks provide a structured environment for exploring and refining prompt engineering techniques specific to each task.

Within the project files, you will find the app.py file, which serves as the Streamlit application. This file contains the necessary code to create the user interface and handle the interactions with the application. Additionally, the engine.py file houses the main logic and functionalities of the application, providing the underlying implementation and data processing capabilities.

Displayed below are the accompanying screenshots of the Streamlit application, which provide a visual representation of the user interface for both extracting factual information and conducting fact-based searches.

EXTRACT FACTS	SEARCH FACTS

(back to top)

Directory Structure

├── assets/                                        # assets such as images 
├── config/                                        # configuration file
├── notebooks/                                     # study notebooks
├── src/                                           # main code files
    └── app.py                                     # streamlit app
    └── engine.py                                  # app logic
    └── logger.py                                  # logger file

(back to top)

Tools and Libraries

Language: Python
GPT-3.5 (text-davinci-003): OpenAI
Web App: Streamlit
Other Prominent Libraries: Pandas

The additional libraries utilized, along with the precise versions of each library used, are specified in the requirements.txt file.

(back to top)

Final Notes

Please make sure that you have installed all the necessary dependencies and libraries. You can refer to the requirements.txt file to find a complete list of the required libraries and their versions. The codebase relies on Python version 3.8.16.

Prior to proceeding, please ensure that you possess the OpenAI API key. This key is essential for accessing and utilizing the OpenAI API services.

The codebase has been meticulously documented, incorporating comprehensive docstrings and comments. Please review these annotations, as they provide valuable insights into the functionality and operation of the code.

Lastly, I would like to extend my appreciation to Paulo Salem for providing the initial foundation for this application.

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.devcontainer

.devcontainer

assets

assets

config

config

notebooks

notebooks

src

src

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

SLICEMATE

What Is It?

Summary

Directory Structure

Tools and Libraries

Final Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.devcontainer		.devcontainer
assets		assets
config		config
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

kaustubhbhavsar/gpt-information-extractor

Folders and files

Latest commit

History

Repository files navigation

SLICEMATE

What Is It?

Summary

Directory Structure

Tools and Libraries

Final Notes

About

Topics

Resources

Stars

Watchers

Forks

Languages