Fix Kimi Linear by kernelpool · Pull Request #853 · ml-explore/mlx-lm · GitHub

kernelpool · 2026-02-06T23:28:02Z

This fixes some issues and improves performance of Kimi Linear, e.g. mlx-community/Kimi-Linear-48B-A3B-Instruct-4bit. I've tested the model in OpenCode and tool calling works fine. Note that the original HF repo is missing some tokens in its tokenizer config, but it seems it has been since fixed for the mlx community models.

MLA absorption - Similar to DSV3 MLA #839
Fix ShortConv1d variable name - cache.lengths -> lengths
Fix ArraysCache compatibility - Store conv states as a single concatenated array instead of a tuple (changed in Make MambaCache compatible with batch generation for nemotron-h #690)
Remove RoPE handling - the model shouldn't be using it as per quote from paper: "In Kimi Linear, we apply NoPE to all full attention (MLA) layers."

Context	Original Prompt TPS	Fixed Prompt TPS	Original Gen TPS	Fixed Gen TPS	Gen TPS Change
1k	1,485	2,185	99.2	93.8	-5%
2k	2,163	2,758	93.0	96.8	+4%
4k	2,427	2,771	89.4	95.2	+6%
8k	2,494	2,698	83.3	94.0	+13%
16k	2,374	2,484	71.4	93.3	+31%
32k	2,018	2,095	57.0	89.6	+57%

awni · 2026-02-07T00:46:29Z


        if cache is not None:
-            cache[0] = (q_state, k_state, v_state)
+            cache[0] = mx.concatenate([q_state, k_state, v_state], axis=-1)


It seems a bit wasteful to concatenate and split these just to store them in the cache. Wdyt about just changing the cache size so that it holds 4 arrays instead (1 for the ssm, 3 for the conv)?

Yep, that makes sense 👍

awni

Looks awesome. I left one comment, let me know what you think. Otherwise looks great we should merge it!

kernelpool · 2026-02-07T01:21:36Z

Updated comparison

Context	Original Prompt TPS	Fixed Prompt TPS	Original Gen TPS	Fixed Gen TPS	Gen TPS Change
1k	1,485	2,220	99.2	99.2	0%
2k	2,163	2,772	93.0	100.3	+8%
4k	2,427	2,794	89.4	99.1	+11%
8k	2,494	2,709	83.3	97.4	+17%
16k	2,374	2,498	71.4	95.4	+34%
32k	2,018	2,104	57.0	90.7	+59%

awni · 2026-02-07T01:22:41Z

Very nice!

awni

Awesome! Will merge when tests clear.

Fix Kimi Linear

b049b38

awni reviewed Feb 7, 2026

View reviewed changes

kernelpool added 2 commits February 7, 2026 12:16

Avoid concat/split

8c9e476

Use fused rms_norm

d58a90d

awni approved these changes Feb 7, 2026

View reviewed changes

awni merged commit fd6959d into ml-explore:main Feb 7, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Kimi Linear#853

Fix Kimi Linear#853
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:fix-kimi-linear

kernelpool commented Feb 6, 2026 •

edited

Loading

Uh oh!

awni Feb 7, 2026

Uh oh!

kernelpool Feb 7, 2026

Uh oh!

awni left a comment

Uh oh!

kernelpool commented Feb 7, 2026

Uh oh!

awni commented Feb 7, 2026

Uh oh!

awni left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kernelpool commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awni Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

kernelpool Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

kernelpool commented Feb 7, 2026

Uh oh!

awni commented Feb 7, 2026

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kernelpool commented Feb 6, 2026 •

edited

Loading