Medusa Training Loss #95

TomYang-TZ · 2024-04-07T15:56:37Z

When utilizing Axolotl, the training loss reduces to 0 following the gradient accumulation steps. Is this expected behaviour?

With Torchrun, the training loss consistently remains NaN.

Thanks for the help!! Here is the training configuration:
base_model: teknium/OpenHermes-2.5-Mistral-7B
base_model_config: teknium/OpenHermes-2.5-Mistral-7B
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: false

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:

path: ShareGPT_Vicuna_unfiltered/ShareGPT_V4.3_unfiltered_cleaned_split.json
type: sharegpt
dataset_prepared_path:
val_set_size: 0.1
output_dir: ./openhermes7B_medusa_stage1

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0005

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
use_reentrant: True

warmup_steps: 40
eval_steps: 0.01
evaluation_strategy: steps
save_strategy: steps
save_steps:
save_total_limit: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: ""
eos_token: "<|im_end|>"
unk_token: ""

medusa_num_heads: 5
medusa_num_layers: 1
medusa_heads_coefficient: 0.2
medusa_decay_coefficient: 0.8
medusa_logging: true
medusa_scheduler: constant
medusa_lr_multiplier: 4.0
medusa_only_heads: true
ddp_find_unused_parameters: true

vivekmadan2 · 2024-04-08T19:09:32Z

I am also facing the same issue with Mistral example listed in the repo.

FatPigeorz · 2024-04-10T12:59:02Z

same issue

xiaoruirui356 · 2024-04-12T04:38:54Z

Have you solved this problem？

TomYang-TZ · 2024-04-12T12:34:44Z

Unfortunately no

xiaoruirui356 · 2024-05-06T07:45:18Z

I find some problems with the data，you can check it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Medusa Training Loss #95

Medusa Training Loss #95

TomYang-TZ commented Apr 7, 2024

vivekmadan2 commented Apr 8, 2024

FatPigeorz commented Apr 10, 2024

xiaoruirui356 commented Apr 12, 2024

TomYang-TZ commented Apr 12, 2024

xiaoruirui356 commented May 6, 2024

Medusa Training Loss #95

Medusa Training Loss #95

Comments

TomYang-TZ commented Apr 7, 2024

vivekmadan2 commented Apr 8, 2024

FatPigeorz commented Apr 10, 2024

xiaoruirui356 commented Apr 12, 2024

TomYang-TZ commented Apr 12, 2024

xiaoruirui356 commented May 6, 2024