19 Apr 23:46

ericharper

core_v0.6.0

cac60ce

NVIDIA Megatron Core 0.6.0 Latest

Latest

MoE (Mixture of Experts)
- Performance optimization
  - Communication optimization for multi GPU and Single GPU
  - 23% improvement (323 TFLOPS/GPU) over MCore 0.5.0 on Mixtral with Hopper BF16
  - GroupedMLP enhancement for Hopper
  - DP Overlapping. Support overlapping computation with gradient reduction and parameter gathering.
- All-to-All based Token Dispatcher
- Layer-wise logging for load balancing loss.
- Improved expert parallel support including distributed optimizer.
Distributed optimizer
RETRO
- Data processing
BERT
- Distributed checkpointing
Dist checkpointing
- PyTorch native distributed backend
- Improved saving/loading speed
TensorRT-LLM Export
- Integration with TensorRT Model Optimizer Post-training quantization (PTQ)
- Text generation driver to perform PTQ in Megatron-LM
- Llama2 and Nemotron3-8b examples to use TensorRT-LLM unified build API to build engine after training.
Several minor enhancements, bug fixes, and documentation updates

Assets 2

22 Mar 16:44

ericharper

core_v0.5.0

0acc240

NVIDIA Megatron Core 0.5.0

Key Features and Enhancements

Megatron core documentation is now live!

Model Features

MoE (Mixture of Experts)
- Support for Z-loss, Load balancing and Sinkhorn
- Layer and communications refactor
- Richer parallelism mappings and EP can be combined with other model parallel techniques for larger MoE variants, e.g. EP + TP + DP + SP + PP
- Token dropless architecture with Top-K routing
- Performance optimization with with GroupedGEMM when number of local experts is > 1
- Distributed checkpointing
Interleaved rotary embedding

Datasets

Masked WordPiece datasets for BERT and T5
Raw and mock datasets

Parallelism

Performance

Activation offloading to CPU
Rope and Swiglu fusion
Sliding window attention (via Transformer Engine)

General Improvements

Timers

Assets 2

14 Dec 23:18

jaredcasper

core_v0.4.0

38879f8

NVIDIA Megatron Core 0.4.0

Key Features and Enhancements

Models

BERT
RETRO
T5

Parallelism

Mixture of Experts support for GPT
Model parallel efficient Distributed Data Parallel (DDP)
Context Parallel (2D Tensor Parallel) support

Datasets

GPT Dataset
Blended Dataset

Assets 2

11 May 22:28

jaredcasper

23.04

2360d73

23.04

Merge branch 'pip_package' into 'main'

Add pip package for megatron.core

See merge request ADLR/megatron-lm!598

Assets 2

11 Aug 17:52

jaredcasper

v2.5

e269e20

v2.5

Merge branch 'sc21' into 'main'

scripts for sc21

See merge request ADLR/megatron-lm!298

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Features and Enhancements

Model Features

Datasets

Parallelism

Performance

General Improvements

Key Features and Enhancements

Models

Parallelism

Datasets

Releases: NVIDIA/Megatron-LM

NVIDIA Megatron Core 0.6.0

NVIDIA Megatron Core 0.5.0

Key Features and Enhancements

Model Features

Datasets

Parallelism

Performance

General Improvements

NVIDIA Megatron Core 0.4.0

Key Features and Enhancements

Models

Parallelism

Datasets

23.04

v2.5