Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-training with MPT-7B went well but fine-tuning it further gives garbled/random outputs #1474

Open
chanangad opened this issue Apr 30, 2024 · 0 comments

Comments

@chanangad
Copy link

chanangad commented Apr 30, 2024

Discussion

After a few bug-fixes, I ran the pre-training code using mosaicml/mpt-7b model.

The pre-training script I used
deepspeed train_mem.py \ --deepspeed ./scripts/zero2.json \ --model_name_or_path mpt-7b \ --version mpt\ --data_path LLaVA-Pretrain/blip_laion_cc_sbu_558k.json \ --image_folder LLaVA-Pretrain/images \ --vision_tower openai/clip-vit-large-patch14 \ --mm_projector_type mlp2x_gelu \ --tune_mm_mlp_adapter True \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --bf16 True \ --output_dir ./checkpoints/llava-mpt-7b-vit-l-pretrain \ --num_train_epochs 1 \ --per_device_train_batch_size 32 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 24000 \ --save_total_limit 1 \ --learning_rate 1e-3 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True\ --dataloader_num_workers 1 \ --lazy_preprocess True \ --report_to wandb
I ran inference on this pre-trained model using the following script

python -m llava.serve.cli \ --model-base mpt-7b \ --model-path ./checkpoints/llava-mpt-7b-vit-l-pretrain/ \ --image-file "https://cdn.pixabay.com/photo/2024/02/28/07/42/european-shorthair-8601492_1280.jpg" \ --temperature 0.1

The output looks like follows, which is good enough for pre-training stage
Screenshot 2024-04-30 at 12 14 11 PM

After this I ran this instruction tuning script:

deepspeed train_mem.py \ --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \ --deepspeed ./scripts/zero3.json \ --model_name_or_path mpt-7b \ --version mpt\ --data_path LLaVA-InTune/llava_v1_5_mix665k.json \ --image_folder LLaVA-InTune/ \ --vision_tower openai/clip-vit-large-patch14 \ --pretrain_mm_mlp_adapter ./checkpoints/llava-mpt-7b-vit-l-pretrain/mm_projector.bin \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --bf16 True \ --output_dir ./checkpoints/llava-mpt-7b-vit-l-lora-fulldata \ --num_train_epochs 3 \ --per_device_train_batch_size 8\ --per_device_eval_batch_size 2\ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 50000\ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True\ --dataloader_num_workers 4 \ --lazy_preprocess True \ --report_to wandb \ --image_aspect_ratio pad \ --group_by_modality_length True

The training loss graph looked as follows:
Screenshot 2024-04-30 at 12 18 38 PM

However, now that I try to run inference on this saved checkpoint using this script:
python -m llava.serve.cli \ --model-base mpt-7b \ --model-path ./checkpoints/llava-mpt-7b-vit-l-lora-fulldata/ \ --image-file "https://as1.ftcdn.net/v2/jpg/06/05/37/40/1000_F_605374009_hEUHatmKPzuHTIacg7rLneAgnLHUgegM.jpg" \ --temperature 0.1

I get very random output as below:

2024-04-30 06:35:20,791] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
You are using a model of type mpt to instantiate a model of type llava_mpt. This is not supported for all configurations of models and can yield errors.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LLaVA from base model...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.63s/it]
Some weights of LlavaMptForCausalLM were not initialized from the model checkpoint at /mnt/localssd/mpt-7b-test and are newly initialized: ['transformer.mm_projector.0.bias', 'transformer.mm_projector.0.weight', 'transformer.mm_projector.2.bias', 'transformer.mm_projector.2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading additional LLaVA weights...
Loading LoRA weights...
Merging LoRA weights...
Model is loaded...
user: describe this image
assistant:
[![Docker](https party/docker/images/docker.png)

Docker 容器技朝

前言

Docker 是一个开源的容器技朝,它可以将一个应用程序的所有依赖项和配置文件打包在一个单一的容器中,从而可以在不同的操作系统上运行。

.........

Can anyone help me understand why is this happening and how can I resolve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant