LLaMA-Megatron

A LLaMA Megatron implement.

LLaMA

This repository is intended as a minimal, hackable and readable example with Nivida Megatron-LM to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form.

Setup

In a conda env with pytorch / cuda available, run:

pip install -r requirements.txt

# Install Nvidia APEX
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..

# Installing Megatron-LM
pip install --no-build-isolation git+https://github.com/MoFHeka/Megatron-LM.git

Then in this repository:

pip install -e .

Download

Once your request is approved, you will receive links to download the tokenizer and model files. Edit the download.sh script with the signed url provided in the email to download the model weights and tokenizer.

Model

LLaMA modeling code was rebuilt on the basis of Megatron, showing in llama_model.py. Class LLAMAModel is the entry class.

Checkpoint Transform

tools/transform_huggingface_to_megatron.py and tools/transform_huggingface_to_megatron.py was provided for converting llama model ckpt between Huggingface and Megatron.

Pretrain

Firstly, we need to run tools/preprocess_data.py to generate the Megatron style pretrain text dataset. Or we could write our own pretrain code like custom_pretrain_llama.py with custom_training.py.

The provided pretrain_llama.py can be run on a single or multi-gpu node with torchrun and will output completions for two pre-defined prompts. Using pretrain_llama_distributed.sh to run it:

sh pretrain_llama_distributed.sh {dataset_folder} {ckpt_folder} {tokenizer_model} {tensorboard_folder} {tensor_parallel_size} {pipeline_parallel_size} {number_of_nodes}

Different models require different TP values:

Model	TP
7B	1
13B	2
33B	4
65B	8

Reference

LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

Model Card

See MODEL_CARD.md

License

See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
llama		llama
.gitignore		.gitignore
Dockerfile.nvtorch		Dockerfile.nvtorch
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
download.sh		download.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama

llama

.gitignore

.gitignore

Dockerfile.nvtorch

Dockerfile.nvtorch

LICENSE

LICENSE

MODEL_CARD.md

MODEL_CARD.md

README.md

README.md

download.sh

download.sh

requirements.txt

requirements.txt

Repository files navigation

LLaMA-Megatron

LLaMA

Setup

Download

Model

Checkpoint Transform

Pretrain

Reference

Model Card

License

About

Releases

Packages

Languages

License

MoFHeka/LLaMA-Megatron

Folders and files

Latest commit

History

Repository files navigation

LLaMA-Megatron

LLaMA

Setup

Download

Model

Checkpoint Transform

Pretrain

Reference

Model Card

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages