Skip to content

Fix Kimi Linear#853

Merged
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:fix-kimi-linear
Feb 7, 2026
Merged

Fix Kimi Linear#853
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:fix-kimi-linear

Conversation

@kernelpool
Copy link
Copy Markdown
Contributor

@kernelpool kernelpool commented Feb 6, 2026

This fixes some issues and improves performance of Kimi Linear, e.g. mlx-community/Kimi-Linear-48B-A3B-Instruct-4bit. I've tested the model in OpenCode and tool calling works fine. Note that the original HF repo is missing some tokens in its tokenizer config, but it seems it has been since fixed for the mlx community models.

  1. MLA absorption - Similar to DSV3 MLA #839
  2. Fix ShortConv1d variable name - cache.lengths -> lengths
  3. Fix ArraysCache compatibility - Store conv states as a single concatenated array instead of a tuple (changed in Make MambaCache compatible with batch generation for nemotron-h #690)
  4. Remove RoPE handling - the model shouldn't be using it as per quote from paper: "In Kimi Linear, we apply NoPE to all full attention (MLA) layers."
Context Original Prompt TPS Fixed Prompt TPS Original Gen TPS Fixed Gen TPS Gen TPS Change
1k 1,485 2,185 99.2 93.8 -5%
2k 2,163 2,758 93.0 96.8 +4%
4k 2,427 2,771 89.4 95.2 +6%
8k 2,494 2,698 83.3 94.0 +13%
16k 2,374 2,484 71.4 93.3 +31%
32k 2,018 2,095 57.0 89.6 +57%

Comment thread mlx_lm/models/kimi_linear.py Outdated

if cache is not None:
cache[0] = (q_state, k_state, v_state)
cache[0] = mx.concatenate([q_state, k_state, v_state], axis=-1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit wasteful to concatenate and split these just to store them in the cache. Wdyt about just changing the cache size so that it holds 4 arrays instead (1 for the ssm, 3 for the conv)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense 👍

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome. I left one comment, let me know what you think. Otherwise looks great we should merge it!

@kernelpool
Copy link
Copy Markdown
Contributor Author

Updated comparison

Context Original Prompt TPS Fixed Prompt TPS Original Gen TPS Fixed Gen TPS Gen TPS Change
1k 1,485 2,220 99.2 99.2 0%
2k 2,163 2,772 93.0 100.3 +8%
4k 2,427 2,794 89.4 99.1 +11%
8k 2,494 2,709 83.3 97.4 +17%
16k 2,374 2,498 71.4 95.4 +34%
32k 2,018 2,104 57.0 90.7 +59%

@awni
Copy link
Copy Markdown
Member

awni commented Feb 7, 2026

Very nice!

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Will merge when tests clear.

@awni awni merged commit fd6959d into ml-explore:main Feb 7, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants