Skip to content

A GPT-3.5 based tool to extract, store, and retrieve diverse information.

Notifications You must be signed in to change notification settings

kaustubhbhavsar/gpt-information-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


SLICEMATE

GPT-driven Data Exploration: Unveiling Insights and Enabling Discoverability

What Is It?

No more forms or button navigation—simply type your mind, and watch as this intelligent application classifies, segments, and archives everything from shopping lists to random ideas.

The purpose entails in the development of a sophisticated Natural Language Understanding (NLU) tool that aims to facilitate the swift extraction, storage, and retrieval of diverse personal information, encompassing shopping or to-do lists, phone numbers, email addresses, names, trivia, reminders, random ideas, and more. The impetus behind this undertaking lies in recognizing the challenges individuals encounter when seeking the appropriate form or navigating through various buttons for each information category. By leveraging this application, users can effortlessly articulate their thoughts via typing, secure in the knowledge that the tool adeptly classifies, segments, and archives the provided data.

(back to top)

Summary

Slicemate excels in performing two primary tasks using GPT-3.5 (text-davinci-003):

  • Gathering facts from Natural Language and adding them to a database: GPT converts the written sentences into well-organized pieces of information, which are then stored in a database.
  • Searching the database using keywords: GPT also helps improve searches by including synonyms and related terms along with the original keywords. This makes the search results more comprehensive and accurate.

As mentioned above, the goal is to obtain data and enable its discoverability. The data extracted will include the following:

  • Category: The overall classification to which the data pertains (e.g., "Reminder", "Health", "Shopping").
  • Type: The inherent characteristics of the stored data (e.g., emails, phone numbers, prices, reminders).
  • People: Names of people or entities involved in the extraction.
  • Key: The primary entity to which a value is assigned. This field allows for more flexibility compared to the preceding ones.
  • Value: The specific entry associated with the key. It also permits greater flexibility compared to the other fields.

Please note that the categories are dynamic in nature and can be modified and updated while adding facts to the database.

Included in the resources are two separate notebooks designed to facilitate prompt engineering studies for extracting facts and searching facts, respectively. These notebooks provide a structured environment for exploring and refining prompt engineering techniques specific to each task.

Within the project files, you will find the app.py file, which serves as the Streamlit application. This file contains the necessary code to create the user interface and handle the interactions with the application. Additionally, the engine.py file houses the main logic and functionalities of the application, providing the underlying implementation and data processing capabilities.

Displayed below are the accompanying screenshots of the Streamlit application, which provide a visual representation of the user interface for both extracting factual information and conducting fact-based searches.

EXTRACT FACTS SEARCH FACTS
Extract Facts Search Facts

(back to top)

Directory Structure

├── assets/                                        # assets such as images 
├── config/                                        # configuration file
├── notebooks/                                     # study notebooks
├── src/                                           # main code files
    └── app.py                                     # streamlit app
    └── engine.py                                  # app logic
    └── logger.py                                  # logger file 

(back to top)

Tools and Libraries

  • Language: Python
  • GPT-3.5 (text-davinci-003): OpenAI
  • Web App: Streamlit
  • Other Prominent Libraries: Pandas

The additional libraries utilized, along with the precise versions of each library used, are specified in the requirements.txt file.

(back to top)

Final Notes

Please make sure that you have installed all the necessary dependencies and libraries. You can refer to the requirements.txt file to find a complete list of the required libraries and their versions. The codebase relies on Python version 3.8.16.

Prior to proceeding, please ensure that you possess the OpenAI API key. This key is essential for accessing and utilizing the OpenAI API services.

The codebase has been meticulously documented, incorporating comprehensive docstrings and comments. Please review these annotations, as they provide valuable insights into the functionality and operation of the code.

Lastly, I would like to extend my appreciation to Paulo Salem for providing the initial foundation for this application.

(back to top)

About

A GPT-3.5 based tool to extract, store, and retrieve diverse information.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published