Skip to content

JSONL generator designed to elevate the fine-tuning process of cutting-edge language models like Google's PaLM 2 and OpenAI's GPT-3.5

License

Notifications You must be signed in to change notification settings

ryhkml/fine-tune-forge

Repository files navigation

FineTuneForge

FineTuneForge is a tool designed specifically for generating JSON Lines (JSONL) to facilitate the fine-tuning of AI language models like Google's PaLM 2 and OpenAI's GPT-3.5. It enables developers to easily transform text data into a JSONL format that machines can read.

Screenshot FineTuneForge Webapp

Getting Started

To get started with FineTuneForge, follow these steps:

Installation

git clone https://github.com/ryhkml/fine-tune-forge.git
cd fine-tune-forge
chmod +x ./install.sh
./install.sh

Usage

Run the JSONL generator with the following command:

npm run build

Serve server

npm run serve

Directory Structure

FineTuneForge is organized into several directories, each serving a specific purpose in the workflow of the JSONL generator. Below is an overview of these directories and their intended use:

  • DATADOC_OCR: This directory acts as a temporary storage for OCR (Optical Character Recognition) images
  • DATASET: The DATASET directory is the designated location for storing the completed dataset files. Once the JSONL files have been generated and are ready for use in fine-tuning the language models, they are placed in this directory
  • DATATMP: This directory for temporary storage of instruction content
  • tls: This directory is reserved for storing SSL/TLS certificates

Configuring SSL/TLS for HTTPS

To enable HTTPS in the application, you need to configure SSL/TLS certificates correctly.

Required Files

Before you start, ensure you have the following files placed in the tls directory:

  • fullchain.pem: This is your certificate file that contains the full chain of trust, including any intermediate certificates along with your own
  • cert-key.pem: This file contains your private key and must be kept secure. It is used to establish the encrypted connection
  • ca.crt (optional): This Certificate Authority (CA) file is used if you need to specify an external CA

If you use docker, uncomment the environment variable PROTOCOL_SERVER in docker-compose.yaml

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

JSONL generator designed to elevate the fine-tuning process of cutting-edge language models like Google's PaLM 2 and OpenAI's GPT-3.5

Topics

Resources

License

Stars

Watchers

Forks