Fix Kimi Linear#853
Merged
Merged
Conversation
awni
reviewed
Feb 7, 2026
|
|
||
| if cache is not None: | ||
| cache[0] = (q_state, k_state, v_state) | ||
| cache[0] = mx.concatenate([q_state, k_state, v_state], axis=-1) |
Member
There was a problem hiding this comment.
It seems a bit wasteful to concatenate and split these just to store them in the cache. Wdyt about just changing the cache size so that it holds 4 arrays instead (1 for the ssm, 3 for the conv)?
Contributor
Author
There was a problem hiding this comment.
Yep, that makes sense 👍
awni
reviewed
Feb 7, 2026
Member
awni
left a comment
There was a problem hiding this comment.
Looks awesome. I left one comment, let me know what you think. Otherwise looks great we should merge it!
Contributor
Author
|
Updated comparison
|
Member
|
Very nice! |
awni
approved these changes
Feb 7, 2026
Member
awni
left a comment
There was a problem hiding this comment.
Awesome! Will merge when tests clear.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes some issues and improves performance of Kimi Linear, e.g. mlx-community/Kimi-Linear-48B-A3B-Instruct-4bit. I've tested the model in OpenCode and tool calling works fine. Note that the original HF repo is missing some tokens in its tokenizer config, but it seems it has been since fixed for the mlx community models.
cache.lengths->lengths