Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM while finetuning Starcoder #10

Open
ctoth opened this issue Jun 14, 2023 · 1 comment
Open

OOM while finetuning Starcoder #10

ctoth opened this issue Jun 14, 2023 · 1 comment

Comments

@ctoth
Copy link

ctoth commented Jun 14, 2023

I really appreciate you releasing this work.
I have been trying to do something similar with the original Starcoder finetuning code but have had a variety of issues.
Unfortunately, when I run this script on my own dataset (it's only around 6800 MOO verbs) I get a pretty rapid OOM on a machine with 8x A100 80gb cards.
At first I thought it was because I was trying to increase max_seq_size, (I was hoping for 1024 tokens) but dropping it back to 512 gave me the same issue.
I then tried reducing batch size to 1, but that also did not work and errored out with insufficient memory again.
The only other thing I changed is the prompt, although I made very minor changes to that, mostly just changing the language to my own and picking different columns out of my dataset.

Here is my run.sh:

#! /usr/bin/env bash
set -e # stop on first error
set -u # stop if any variable is unbound
set -o pipefail # stop if any command in a pipe fails

LOG_FILE="output.log"
TRANSFORMERS_VERBOSITY=info

get_gpu_count() {
  local gpu_count
  gpu_count=$(nvidia-smi -L | wc -l)
  echo "$gpu_count"
}

gpu_count=$(get_gpu_count)
echo "Number of GPUs: $gpu_count"

train() {
    local script="$1"
    shift 1
    local script_args="$@"
    
    if [ -z "$script" ] || [ -z "$script_args" ]; then
        echo "Error: Missing arguments. Please provide the script and script_args."
        return 1
    fi
    
    { torchrun --nproc_per_node="$gpu_count" "$script" $script_args 2>&1; } | tee -a "$LOG_FILE"
    }


train train.py \
    --model_name_or_path "bigcode/starcoder" \
    --data_path ./verbs_augmented/verbs_augmented.jsonl \
    --bf16 True \
    --output_dir moocoder \
    --num_train_epochs 2 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard" \
    --fsdp_transformer_layer_cls_to_wrap 'GPTBigCodeBlock' \
    --tf32 True

Any idea what might be going wrong here/can I give you any more info to help me figure this out?

@minosvasilias
Copy link
Owner

Hey @ctoth , thank you, glad it's useful!

The only thing i noticed about your training params is the lack of auto_wrap option in the --fsdp argument. (see godot_dodo_4x_60k_starcoder_15b_3ep , Transformers docs)

Could you try adding that and report back? The rest all looks correct to me, and i was using the same hardware for my training runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants