Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] when use 'DS_BUILD_FUSED_ADAM=1 pip3 install deepspeed',it cant install fused_adam
bug
Something isn't working
deepspeed-chat
Related to DeepSpeed-Chat
#5554
opened May 21, 2024 by
MinFFFF
[BUG] Cannot replace pytorch.checkpoint with deepspeed.runtime.activation_checkpointing.checkpointing in accelerate
bug
Something isn't working
training
#5550
opened May 20, 2024 by
vkaul11
[BUG] Code blocking when training on multi-nodes using DS-Chat.
bug
Something isn't working
training
#5548
opened May 20, 2024 by
cai-jianfeng
[BUG] deepspeed overlap_comm data race
bug
Something isn't working
training
#5545
opened May 18, 2024 by
yangyihang-bytedance
[Question]how to run the mixtral inference in multi-node?
bug
Something isn't working
inference
#5544
opened May 17, 2024 by
leachee99
[REQUEST] DeepSpeed-Ulysses with the Pure Deepspeed Zero
enhancement
New feature or request
#5542
opened May 16, 2024 by
ppengtang
[BUG] Zero3: Gather the params for inference(huggingface_language_model.generate) in the end of 1 epoch and re-partition it for next epoch training
bug
Something isn't working
training
#5539
opened May 15, 2024 by
Coobiw
[BUG] Version >0.14.0 leads to Something isn't working
training
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
bug
#5538
opened May 15, 2024 by
pacman100
[BUG] FlopsProfiler upsample flops compute bug
bug
Something isn't working
training
#5537
opened May 15, 2024 by
xgbj
[BUG]CUDA error in pipeline parallel
bug
Something isn't working
training
#5536
opened May 15, 2024 by
sunkun1997
[BUG] fp_quantizer is not correctly built when non-jit installation
bug
Something isn't working
inference
#5535
opened May 14, 2024 by
twaka
[BUG]AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'
bug
Something isn't working
compression
#5534
opened May 14, 2024 by
harborsarah
[BUG] Zero3: Post backward hook is not triggered for submodules whose inputs have .required_grad=False
bug
Something isn't working
training
#5524
opened May 12, 2024 by
deepcharm
[BUG] Why the results were inconsistent in two identical tests with config zero2 + overlap_comm
bug
Something isn't working
training
#5523
opened May 11, 2024 by
Suparjie
[BUG]Why ZeroOneAdam is easy to OOM compared to Adam optimizer?
bug
Something isn't working
training
#5521
opened May 10, 2024 by
npuichigo
[BUG] BertLMHeadModel.from_pretrained hangs when using zero-3 / zero3-offload
bug
Something isn't working
training
#5520
opened May 10, 2024 by
XenonLamb
[BUG] Uneven work distribution caused by get_shard_size changes
#5515
opened May 9, 2024 by
oelayan7
[BUG] When initializing model_engine, if an mpu is specified, it can lead to an excessively large checkpoint size, and the checkpoint may not be convertible through the Something isn't working
training
zero_to_fp32.py
script.
bug
#5514
opened May 9, 2024 by
Kwen-Chen
[REQUEST] Launcher mode with SSH bypass
enhancement
New feature or request
#5510
opened May 8, 2024 by
dogacancolak-kensho
[BUG] Mismatch between dtype settings in model and ds_config results in NaN loss
bug
Something isn't working
training
#5509
opened May 8, 2024 by
Taiki-azrs
[REQUEST] Enable both CPU and NVMe for optimizer
enhancement
New feature or request
#5508
opened May 8, 2024 by
shanhx2000
[BUG] Unexpected High Memory Usage (OOM) when finetuning Llama2-7B
bug
Something isn't working
training
#5507
opened May 8, 2024 by
shanhx2000
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.