[gemini] async grad chunk reduce (all-reduce&reduce-scatter) #5713

botbw · 2024-05-13T10:30:07Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

for more information, see https://pre-commit.ci

botbw · 2024-05-14T01:39:24Z

trace

previous

now

benchmark

before

colossalai run --nproc_per_node 8 --hostfile hosts.txt benchmark.py -g -x -b 4 -s 100

num_samples: 392, dp_world_size: 8, flop_megatron: 9.7808637396779e+16, flop: 86555325938794496, avg_duration: 567.2686767578125, avg_throughput: 5.528244601700203
Throughput: 5.53 samples/sec, TFLOPS per GPU by Megatron: 172.42, TFLOPS per GPU: 152.58
Max CUDA memory usage: 53503.48 MB

now

colossalai run --nproc_per_node 8 --hostfile hosts.txt benchmark.py -g -x -b 4 -s 100 --async-reduce

num_samples: 392, dp_world_size: 8, flop_megatron: 9.7808637396779e+16, flop: 86555325938794496, avg_duration: 562.5335083007812, avg_throughput: 5.574779019782774
Throughput: 5.57 samples/sec, TFLOPS per GPU by Megatron: 173.87, TFLOPS per GPU: 153.87
Max CUDA memory usage: 53504.40 MB

for more information, see https://pre-commit.ci

colossalai/zero/gemini/gemini_ddp.py

examples/language/llama/benchmark.py

for more information, see https://pre-commit.ci

botbw and others added 2 commits May 13, 2024 20:27

[gemini] async grad chunk reduce (all-reduce&reduce-scatter)

f4ba94f

[pre-commit.ci] auto fixes from pre-commit.com hooks

143be57

for more information, see https://pre-commit.ci

botbw force-pushed the main branch from 0009728 to 143be57 Compare May 13, 2024 12:28

botbw added 2 commits May 14, 2024 09:32

[gemini] add test

0b9b621

[gemini] rename func

6fb91c6

botbw marked this pull request as ready for review May 14, 2024 01:39

botbw requested a review from a team as a code owner May 14, 2024 01:39

botbw and others added 6 commits May 14, 2024 03:26

[gemini] update llama benchmark

91423a5

Merge branch 'main' of github.com:hpcaitech/ColossalAI into main

0b68578

Merge branch 'main' of github.com:botbw/ColossalAI into main

90077c7

[pre-commit.ci] auto fixes from pre-commit.com hooks

d285a7e

for more information, see https://pre-commit.ci

[gemini] use tensor counter

11a2367

Merge branch 'main' of github.com:botbw/ColossalAI into main

624bf7c

ver217 reviewed May 15, 2024

View reviewed changes

colossalai/zero/gemini/gemini_ddp.py Outdated Show resolved Hide resolved

examples/language/llama/benchmark.py Outdated Show resolved Hide resolved

botbw added 2 commits May 15, 2024 13:15

[gemini] change default config in GeminiPlugin and GeminiDDP

9387e90

[chore] typo

3d326d9

botbw self-assigned this May 16, 2024

botbw added the gemini related to the gemini feature label May 16, 2024

botbw marked this pull request as draft May 20, 2024 07:31

[gemini] fix sync issue & add test cases

6ce0e8d

botbw marked this pull request as ready for review May 20, 2024 09:32

[pre-commit.ci] auto fixes from pre-commit.com hooks

be596f3

for more information, see https://pre-commit.ci

botbw marked this pull request as draft May 23, 2024 02:42

Merge remote-tracking branch 'origin/main' into main

13373eb

botbw marked this pull request as ready for review May 23, 2024 13:43

ver217 approved these changes May 24, 2024

View reviewed changes

ver217 merged commit 2fc85ab into hpcaitech:main May 24, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gemini] async grad chunk reduce (all-reduce&reduce-scatter) #5713

[gemini] async grad chunk reduce (all-reduce&reduce-scatter) #5713

botbw commented May 13, 2024 •

edited

botbw commented May 14, 2024 •

edited

[gemini] async grad chunk reduce (all-reduce&reduce-scatter) #5713

[gemini] async grad chunk reduce (all-reduce&reduce-scatter) #5713

Conversation

botbw commented May 13, 2024 • edited

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

botbw commented May 14, 2024 • edited

trace

benchmark

botbw commented May 13, 2024 •

edited

botbw commented May 14, 2024 •

edited