Fix Gemma 4 KV-shared layers creating unused projections by glyphVault · Pull Request #1158 · ml-explore/mlx-lm · GitHub

glyphVault · 2026-04-15T22:46:33Z

Summary

Gemma 4 E4B/E2B models share KV projections across later layers (num_kv_shared_layers). The Attention class was creating k_proj, v_proj, k_norm, and v_norm for all layers, but shared layers never use them — the forward pass routes KV from earlier layers via shared_kv.
This caused load_weights(strict=True) to fail for any Gemma 4 model saved through transformers (save_pretrained), since transformers correctly omits these weights for shared layers. This affects all derivative models: fine-tunes, merges, abliterations, etc.
Skip creating k_proj/v_proj/k_norm/v_norm for KV-shared layers, matching the transformers implementation.
Add a defensive ValueError if a shared layer somehow receives no shared_kv at runtime.

Test plan

All 8 existing gemma4 tests pass
New test test_gemma4_kv_shared_layers_omit_kv_projections verifies shared layers don't create KV modules
Verified forward pass produces identical top-5 logits vs transformers on OBLITERATUS/gemma-4-E4B-it-OBLITERATED
Verified cached generation produces coherent output
Formatted with black

🤖 Generated with Claude Code

Gemma 4 E4B/E2B models share KV projections across later layers (controlled by num_kv_shared_layers). The Attention class was creating k_proj, v_proj, k_norm, and v_norm for all layers, but shared layers never use them — the forward pass routes KV from earlier layers via shared_kv. This caused strict weight loading to fail for any Gemma 4 model saved through transformers (fine-tunes, merges, abliterations), since transformers correctly omits these weights for shared layers. - Skip creating k_proj/v_proj/k_norm/v_norm for KV-shared layers - Add defensive ValueError if a shared layer receives no shared_kv - Add test verifying shared layers omit KV projections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

angeloskath

Thanks great catch.

fix #299 * Passed Qwen3 deterministic test. No regression * Passed Gemma4-E4B e2e smoke test, the model loaded successfully and output normal token (not gibberish) future plan: need to keep watching on ml-explore/mlx-lm#1158 . There might be more upstream bug fixings in the future. --------- Signed-off-by: Ranran Haoran Zhang <haorzhang@ebay.com> Signed-off-by: ran <hzz5361@psu.edu>

glyphVault and others added 4 commits April 15, 2026 15:43

Simplify shared_kv handling

b3091a7

Update KV projection assertions in test_models.py

a4708c7

Fix formatting issue in gemma4_text.py

634dfd5

angeloskath approved these changes Apr 21, 2026

View reviewed changes

angeloskath merged commit 4f5cbd2 into ml-explore:main Apr 21, 2026
2 checks passed

WindChimeRan mentioned this pull request Apr 25, 2026

Gemma-4-E4B-it fails to load vllm-project/vllm-metal#299

Closed

Fox13 mentioned this pull request Apr 26, 2026

fix(gemma4): drop KV-shared layer projections in sanitize #1205

Closed

WindChimeRan mentioned this pull request Apr 26, 2026

[Gemma4] Add temp patch for gemma4-E4B vllm-project/vllm-metal#303

Merged

mikeazo mentioned this pull request May 5, 2026

Error when using mlx-community/gemma-4-e4b-it-4bit #1242

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Gemma 4 KV-shared layers creating unused projections#1158

Fix Gemma 4 KV-shared layers creating unused projections#1158
angeloskath merged 4 commits into
ml-explore:mainfrom
glyphVault:fix/gemma4-kv-shared-layers

glyphVault commented Apr 15, 2026

Uh oh!

angeloskath left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

glyphVault commented Apr 15, 2026

Summary

Test plan

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants