Skip to content

Releases: ml-explore/mlx

v0.14.1

31 May 19:34
0798824
Compare
Choose a tag to compare

πŸš€

v0.14.0

24 May 01:33
9f9cb7a
Compare
Choose a tag to compare

Highlights

  • Small-size build that JIT compiles kernels and omits the CPU backend which results in a binary <4MB
    • Series of PRs 1, 2, 3, 4, 5
  • mx.gather_qmm quantized equivalent for mx.gather_mm which speeds up MoE inference by ~2x
  • Grouped 2D convolutions

Core

  • mx.conjugate
  • mx.conv3d and nn.Conv3d
  • List based indexing
  • Started mx.distributed which uses MPI (if installed) for communication across machines
    • mx.distributed.init
    • mx.distributed.all_gather
    • mx.distributed.all_reduce_sum
  • Support conversion to and from dlpack
  • mx.linalg.cholesky on CPU
  • mx.quantized_matmul sped up for vector-matrix products
  • mx.trace
  • mx.block_masked_mm now supports floating point masks!

Fixes

  • Error messaging in eval
  • Add some missing docs
  • Scatter index bug
  • The extensions example now compiles and runs
  • CPU copy bug with many dimensions

v0.13.1

17 May 03:52
6a9b584
Compare
Choose a tag to compare

πŸš€

v0.13.0

10 May 01:21
8bd6bfa
Compare
Choose a tag to compare

Highlights

  • Block sparse matrix multiply speeds up MoEs by >2x
  • Improved quantization algorithm should work well for all networks
  • Improved gpu command submission speeds up training and inference

Core

  • Bitwise ops added:
    • mx.bitwise_[or|and|xor], mx.[left|right]_shift, operator overloads
  • Groups added to Conv1d
  • Added mx.metal.device_info to get better informed memory limits
  • Added resettable memory stats
  • mlx.optimizers.clip_grad_norm and mlx.utils.tree_reduce added
  • Add mx.arctan2
  • Unary ops now accept array-like inputs ie one can do mx.sqrt(2)

Bugfixes

  • Fixed shape for slice update
  • Bugfix in quantize that used slightly wrong scales/biases
  • Fixed memory leak for multi-output primitives encountered with gradient checkpointing
  • Fixed conversion from other frameworks for all datatypes
  • Fixed index overflow for matmul with large batch size
  • Fixed initialization ordering that occasionally caused segfaults

v0.12.2

02 May 23:38
02a9fc7
Compare
Choose a tag to compare
Patch bump (#1067)

* version

* use 0.12.2

v0.12.0

25 Apr 21:31
82463e9
Compare
Choose a tag to compare

Highlights

  • Faster quantized matmul

Core

  • mx.synchronize to wait for computation dispatched with mx.async_eval
  • mx.radians and mx.degrees
  • mx.metal.clear_cache to return to the OS the memory held by MLX as a cache for future allocations
  • Change quantization to always represent 0 exactly (relevant issue)

Bugfixes

  • Fixed quantization of a block with all 0s that produced NaNs
  • Fixed the len field in the buffer protocol implementation

v0.11.0

18 Apr 20:25
090ff65
Compare
Choose a tag to compare

Core

  • mx.block_masked_mm for block-level sparse matrix multiplication
  • Shared events for synchronization and asynchronous evaluation

NN

  • nn.QuantizedEmbedding layer
  • nn.quantize for quantizing modules
  • gelu_approx uses tanh for consistency with PyTorch

v0.10.0

11 Apr 19:53
d07e295
Compare
Choose a tag to compare

Highlights

  • Improvements for LLM generation
    • Reshapeless quant matmul/matvec
    • mx.async_eval
    • Async command encoding

Core

  • Slightly faster reshapeless quantized gemms
  • Option for precise softmax
  • mx.metal.start_capture and mx.metal.stop_capture for GPU debug/profile
  • mx.expm1
  • mx.std
  • mx.meshgrid
  • CPU only mx.random.multivariate_normal
  • mx.cumsum (and other scans) for bfloat
  • Async command encoder with explicit barriers / dependency management

NN

  • nn.upsample support bicubic interpolation

Misc

  • Updated MLX Extension to work with nanobind

Bugfixes

  • Fix buffer donation in softmax and fast ops
  • Bug in layer norm vjp
  • Bug initializing from lists with scalar
  • Bug in indexing
  • CPU compilation bug
  • Multi-output compilation bug
  • Fix stack overflow issues in eval and array destruction

v0.9.0

28 Mar 23:19
d8cb312
Compare
Choose a tag to compare

Highlights:

  • Fast partial RoPE (used by Phi-2)
  • Fast gradients for RoPE, RMSNorm, and LayerNorm

Core

  • More overhead reductions
  • Partial fast RoPE (fast Phi-2)
  • Better buffer donation for copy
  • Type hierarchy and issubdtype
  • Fast VJPs for RoPE, RMSNorm, and LayerNorm

NN

  • Module.set_dtype
  • Chaining in nn.Module (model.freeze().update(…))

Bugfixes

  • Fix set item bugs
  • Fix scatter vjp
  • Check shape integer overlow on array construction
  • Fix bug with module attributes
  • Fix two bugs for odd shaped QMV
  • Fix GPU sort for large sizes
  • Fix bug in negative padding for convolutions
  • Fix bug in multi-stream race condition for graph evaluation
  • Fix random normal generation for half precision

v0.8.0

21 Mar 21:00
44390bd
Compare
Choose a tag to compare

Highlights

Core

Optimizers

  • Set minimum value in cosine decay scheduler

Bugfixes

  • Fix bug in multi-dimensional reduction