-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuning on a custom dataset #366
Comments
@pankajtalk this works on my end, just want to make sure you already have installed llama-recipe right?
|
@HamidShojanazeri The script invocation works fine for me if I do not specify the --dataset and --custom_dataset.file params. samsum dataset is being used in that case. Once I specify --dataset and --custom_dataset.file, I get the error I specified i.e. "Unknown dataset: custom_dataset" from config_utils.py. Is there any param I can add to triage it further? |
I think llama-recipes v0.0.1 (which seems to be the latest version) does not contain custom dataset. I checked dataset_utils.py and see
whereas the one at https://github.com/facebookresearch/llama-recipes/blob/main/src/llama_recipes/utils/dataset_utils.py has
|
As per https://pypi.org/project/llama-recipes/#history, the only release of llama-recipes was on Sep 7, 2023. Any plans to release a newer version with latest code? |
@pankajtalk we are working on finalizing the release in the mean time can you pls install from src |
I believe it should run open assistant . |
System Info
Various versions
2024-01-10 08:35:17 - Successfully installed bitsandbytes-0.39.1 black-23.12.1 brotli-1.1.0 inflate64-1.0.0 llama-recipes-0.0.1 multivolumefile-0.2.3 pathspec-0.12.1 peft-0.6.0.dev0 py7zr-0.20.6 pybcj-1.0.2 pycryptodomex-3.19.1 pyppmd-1.0.0 pyzstd-0.15.9 texttable-1.7.0 tokenize-rt-5.2.0 tomli-2.0.1 torch-2.1.0+cu118 triton-2.1.0
Finetuning command being executed.
torchrun --nnode=4 --nproc_per_node=1 --rdzv_backend=c10d --rdzv_endpoint=10.0.1.14:29400 --rdzv_conf=read_timeout=600 examples/finetuning.py --dataset "custom_dataset" --custom_dataset.file "/mnt/scripts/custom_dataset.py" --enable_fsdp --use_peft --peft_method lora --pure_bf16 --mixed_precision --batch_size_training 1 --model_name $MODEL_NAME --output_dir /home/datascience/outputs --num_epochs 1 --save_model
Information
🐛 Describe the bug
I am using below command to finetune "Llama-2-7b-hf" model on a custom dataset. I have specified the --dataset and --custom_dataset.file params to the finetuning.py file.
torchrun examples/finetuning.py
--enable_fsdp
--dataset custom_dataset
--custom_dataset.file /mnt/scripts/custom_dataset.py
--use_peft
--peft_method lora
--pure_bf16
--mixed_precision
--batch_size_training 1
--model_name $MODEL_NAME
--output_dir /home/datascience/outputs
--num_epochs 1
--save_model
However, I am running into below error. Am I missing something?
Error logs
Expected behavior
Finetuning should work with custom dataset.
The text was updated successfully, but these errors were encountered: