Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training the example code but Crashed #22

Open
HildaM opened this issue Feb 11, 2024 · 2 comments
Open

Training the example code but Crashed #22

HildaM opened this issue Feb 11, 2024 · 2 comments

Comments

@HildaM
Copy link

HildaM commented Feb 11, 2024

I am using 4090 graphic card to train, but it will crached half of the process. And it always crashed it 50 steps......
Example code:

python MotionDirector_train.py --config ./configs/config_single_video.yaml

Error Output:

(motiondirector) PS D:\Coding\AILearning\AI_Art_Technology_Demo\MotionDirector> python MotionDirector_train.py --config ./configs/config_single_video.yaml
Initializing the conversion map
D:\Applications\Miniconda3\envs\motiondirector\lib\site-packages\accelerate\accelerator.py:359: UserWarning: `log_with=tensorboard` was passed but no supported trackers are currently installed.
  warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
02/11/2024 14:26:18 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

{'rescale_betas_zero_snr', 'timestep_spacing'} was not found in config. Values will be initialized to default values.
33 Attention layers using Scaled Dot Product Attention.
Lora successfully injected into UNet3DConditionModel.
Lora successfully injected into UNet3DConditionModel.
{'rescale_betas_zero_snr', 'timestep_spacing'} was not found in config. Values will be initialized to default values.
Caching Latents.:   0%|                                                                                                                       | 0/1 [00:00<?, ?it/s]D:\Applications\Miniconda3\envs\motiondirector\lib\site-packages\diffusers\models\attention_processor.py:1129: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  hidden_states = F.scaled_dot_product_attention(
{'rescale_betas_zero_snr', 'timestep_spacing'} was not found in config. Values will be initialized to default values.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:10<00:00,  4.91it/s]
Caching Latents.: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:10<00:00, 10.88s/it]
02/11/2024 14:26:36 - INFO - __main__ - ***** Running training *****
02/11/2024 14:26:36 - INFO - __main__ -   Num examples = 1
02/11/2024 14:26:36 - INFO - __main__ -   Num Epochs = 150
02/11/2024 14:26:36 - INFO - __main__ -   Instantaneous batch size per device = 1
02/11/2024 14:26:36 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
02/11/2024 14:26:36 - INFO - __main__ -   Gradient Accumulation steps = 1
02/11/2024 14:26:36 - INFO - __main__ -   Total optimization steps = 150
Steps:  33%|███████████████████████████████████████▋                                                                               | 50/150 [00:46<01:31,  1.10it/s]
{'rescale_betas_zero_snr', 'timestep_spacing'} was not found in config. Values will be initialized to default values.
(motiondirector) PS D:\Coding\AILearning\AI_Art_Technology_Demo\MotionDirector>
@HildaM
Copy link
Author

HildaM commented Feb 11, 2024

I noticed that when the steps reach 50, memory consume nearly 32gb. But my PC only have 32GB memory.
Does it mean training the lora require more than 32GB memory?

@ruizhaocv
Copy link
Collaborator

I didn't meet this issue, since I used a 24GB GPU card. Could you please provide more information? Like how many "validation_steps" did you set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants