Skip to content

Releases: NVIDIA/Megatron-LM

NVIDIA Megatron Core 0.6.0

19 Apr 23:46
Compare
Choose a tag to compare
  • MoE (Mixture of Experts)
    • Performance optimization
      • Communication optimization for multi GPU and Single GPU
      • 23% improvement (323 TFLOPS/GPU) over MCore 0.5.0 on Mixtral with Hopper BF16
      • GroupedMLP enhancement for Hopper
      • DP Overlapping. Support overlapping computation with gradient reduction and parameter gathering.
    • All-to-All based Token Dispatcher
    • Layer-wise logging for load balancing loss.
    • Improved expert parallel support including distributed optimizer.
  • Distributed optimizer
  • RETRO
    • Data processing
  • BERT
    • Distributed checkpointing
  • Dist checkpointing
    • PyTorch native distributed backend
    • Improved saving/loading speed
  • TensorRT-LLM Export
    • Integration with TensorRT Model Optimizer Post-training quantization (PTQ)
    • Text generation driver to perform PTQ in Megatron-LM
    • Llama2 and Nemotron3-8b examples to use TensorRT-LLM unified build API to build engine after training.
  • Several minor enhancements, bug fixes, and documentation updates

NVIDIA Megatron Core 0.5.0

22 Mar 16:44
Compare
Choose a tag to compare

Key Features and Enhancements

Megatron core documentation is now live!

Model Features

  • MoE (Mixture of Experts)
    • Support for Z-loss, Load balancing and Sinkhorn
    • Layer and communications refactor
    • Richer parallelism mappings and EP can be combined with other model parallel techniques for larger MoE variants, e.g. EP + TP + DP + SP + PP
    • Token dropless architecture with Top-K routing
    • Performance optimization with with GroupedGEMM when number of local experts is > 1
    • Distributed checkpointing
  • Interleaved rotary embedding

Datasets

  • Masked WordPiece datasets for BERT and T5
  • Raw and mock datasets

Parallelism

Performance

  • Activation offloading to CPU
  • Rope and Swiglu fusion
  • Sliding window attention (via Transformer Engine)

General Improvements

  • Timers

NVIDIA Megatron Core 0.4.0

14 Dec 23:18
Compare
Choose a tag to compare

Key Features and Enhancements

Models

  • BERT
  • RETRO
  • T5

Parallelism

  • Mixture of Experts support for GPT
  • Model parallel efficient Distributed Data Parallel (DDP)
  • Context Parallel (2D Tensor Parallel) support

Datasets

  • GPT Dataset
  • Blended Dataset

23.04

11 May 22:28
Compare
Choose a tag to compare
Merge branch 'pip_package' into 'main'

Add pip package for megatron.core

See merge request ADLR/megatron-lm!598

v2.5

11 Aug 17:52
Compare
Choose a tag to compare
Merge branch 'sc21' into 'main'

scripts for sc21

See merge request ADLR/megatron-lm!298