Fix sharded rms norm in MiniMax M2.5 by angeloskath · Pull Request #898 · ml-explore/mlx-lm · GitHub

angeloskath · 2026-02-16T04:11:57Z

Minimax M2.5 sharding is currently broken because the qk norm is done on the whole vector.

This PR fixes it but introduces 2 extra communications per attention layer which pretty much destroys decoding perf improvement. I have a more complicated version that does one communication and overlaps it with some computation which is better but not by much, we may need something more fundamental here.

Otoh this is now correct and the prompt processing scales quite nicely at 2.7x speedup across 4 nodes at 8k tokens. Adding to that the fact that the KV cache is also 1/4th the size it still makes sense to shard it for long agentic tasks or using multiple subagents.

awni · 2026-02-16T14:58:42Z

+        return f"{self.weight.shape[0] * self.group.size()}, eps={self.eps}"
+
+    def __call__(self, x):
+        return sharded_rms_norm(self.group)(x, self["weight"], self.eps)


I think that will make a new function every time so each call needs to be recompiled.

Oops yeah. Imported the lru cache but forgot to add it.

awni · 2026-02-16T22:06:02Z

+        norm2 = x.square().sum(-1, keepdims=True)
+        norm2 = mx.distributed.all_sum(norm2, group=group)
+        norm = mx.rsqrt(norm2 / (x.shape[-1] * group.size()) + eps)


Oh also x should probably be up cast prior to the sum for parity with mx.fast.rms_norm

awni

Thanks for fixing that!

Fix sharded rms norm in minimax

619a92c

angeloskath requested a review from awni February 16, 2026 04:12

awni reviewed Feb 16, 2026

View reviewed changes

Add forgotten lru_cache

d9566ae

awni reviewed Feb 16, 2026

View reviewed changes

awni approved these changes Feb 16, 2026

View reviewed changes

Upcast in normalization

d3060bd

angeloskath force-pushed the fix-m2.5-dist branch from 358cdca to d3060bd Compare February 17, 2026 00:32

angeloskath merged commit d7b91e8 into main Feb 17, 2026
2 checks passed

angeloskath deleted the fix-m2.5-dist branch February 17, 2026 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sharded rms norm in MiniMax M2.5#898

Fix sharded rms norm in MiniMax M2.5#898
angeloskath merged 3 commits into
mainfrom
fix-m2.5-dist

angeloskath commented Feb 16, 2026

Uh oh!

awni Feb 16, 2026

Uh oh!

angeloskath Feb 16, 2026

Uh oh!

awni Feb 16, 2026

Uh oh!

awni left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

angeloskath commented Feb 16, 2026

Uh oh!

awni Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

angeloskath Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

awni Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants