Add Step 3.5 Flash by kernelpool · Pull Request #836 · ml-explore/mlx-lm · GitHub

kernelpool · 2026-02-02T11:47:47Z

Model: https://huggingface.co/stepfun-ai/Step-3.5-Flash

Note: I've used regular KVCache with masking for the sliding window attention to support trimming

Tool calling uses same format as qwen3 coder.

Issue: #835

mlx_lm.generate --model /Volumes/WD_EXTRA/models/catalyst/Step-3.5-Flash-4bit --prompt "Who is Albert Einstein?" --trust-remote-code -m 1024
The tokenizer you are loading from '/Volumes/WD_EXTRA/models/catalyst/Step-3.5-Flash-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
==========
Hmm, the user is asking about Albert Einstein. This is a straightforward factual question about a well-known historical figure. The user likely wants a concise yet comprehensive overview of who Einstein was, covering his major contributions and legacy. 

I should start with his most iconic achievement—the theory of relativity—since that’s what he’s most famous for. Then I can mention his Nobel Prize for the photoelectric effect, which is less known but equally important. It’s also worth highlighting his broader impact beyond physics, like his advocacy for peace and civil rights, since that paints a fuller picture of his character. 

The user might be a student or someone casually curious, so I’ll keep it clear but avoid oversimplifying. I’ll structure it chronologically: early life, key scientific work, later activism, and legacy. No need for excessive detail—just the essentials that define his significance. 

Oh, and I should clarify that “E=mc²” is part of relativity, not a separate theory, since people often conflate them. Also, mentioning his emigration to the U.S. and his role in the atomic bomb letter adds historical context. The tone should be respectful but not overly academic—accessible to anyone.
</think>
Albert Einstein (1879–1955) was a German-born theoretical physicist who revolutionized modern physics and became one of the most famous scientists in history. Here’s a concise overview of his life and legacy:

---

### **Key Scientific Contributions**
1. **Theory of Relativity**  
   - **Special Relativity (1905):** Introduced the idea that the laws of physics are the same for all non-accelerating observers, and that the speed of light is constant. This led to the famous equation **E=mc²**, showing the equivalence of mass and energy.  
   - **General Relativity (1915):** Described gravity as the curvature of spacetime caused by mass and energy. It predicted phenomena like gravitational waves and black holes, later confirmed by observation.

2. **Photoelectric Effect (1905)**  
   - Proposed that light could be described as discrete packets of energy (“quanta,” later called **photons**). This work earned him the **1921 Nobel Prize in Physics** and laid groundwork for quantum mechanics.

3. **Brownian Motion**  
   - Explained the random movement of particles in a fluid, providing empirical evidence for the existence of atoms and molecules.

4. **Bose-Einstein Statistics**  
   - Collaborated with Indian physicist Satyendra Nath Bose to predict a new state of matter—**Bose-Einstein condensate**—observed decades later.

---

### **Life and Context**
- **Early Life:** Born in Ulm, Germany, to a Jewish family. Showed late development in speech but excelled in math and physics.
- **Career:** Worked at the Swiss Patent Office (1902–1909), then held professorships in Bern, Zurich, Prague, Berlin, and later Princeton, USA.
- **Emigration:** Fled Nazi Germany in 1933, settled in the U.S., and became a U.S. citizen in 1940.
- **Later Years:** Advocated for civil rights, nuclear disarmament, and Zionism. Warned President Roosevelt about Nazi Germany’s potential atomic bomb research, inadvertently catalyzing the Manhattan Project—a decision he later regretted.

---

### **Legacy and Cultural Impact**
- **Iconic Status:** His wild hair and thoughtful demeanor made him a pop-culture symbol of genius.
- **Scientific Influence:** His theories underpin modern cosmology, GPS technology, and nuclear energy.
- **Philosophical Impact:** His work challenged Newtonian physics and reshaped our understanding of space, time, and reality.
- **Humanitarian:** A vocal advocate for peace, he used his fame to speak out against fascism, racism, and the misuse of science.

---

### **Fun Facts**
- He failed his first college entrance exam.
- His brain was preserved for study after his death; his brain was removed without permission during his autopsy.
- He declined the presidency of Israel in 1952.
- He was offered the Nobel Prize money to his ex-wife Mileva Marić as part of their divorce settlement.

---

Einstein’s name is synonymous with **genius**, but his legacy extends far beyond physics—he was a scientist, a humanitarian, and a symbol of curiosity and intellectual courage.
==========
Prompt: 17 tokens, 20.135 tokens-per-sec
Generation: 912 tokens, 50.706 tokens-per-sec
Peak memory: 111.022 GB

ivanfioravanti · 2026-02-02T16:43:53Z

You rock @kernelpool 🔥

awni · 2026-02-02T20:38:18Z

+    def __call__(self, x: mx.array) -> mx.array:
+        return mx.fast.rms_norm(x, self.weight + 1, self.eps)


Minor optimization: we can preprocess the weight in sanitize instead adding one every time.

awni · 2026-02-02T20:39:33Z

+        self.router_bias = mx.zeros((self.n_routed_experts,))
+
+    def __call__(self, x: mx.array):
+        gates = self.gate(x.astype(mx.float32))


I would do the cast inside the moe_gate_select. It will get fused in the compilation.

awni · 2026-02-02T20:40:01Z

+            self.routed_scaling_factor,
+            self.norm_topk_prob,
+        )
+        return inds, weights.astype(x.dtype)


awni

This looks really nice! Just left a few minor comments. After that, let's merge it!

Note: I've used regular KVCache with masking for the sliding window attention to support trimming

That's pretty interesting. I think we need a better story there in mlx-lm in general, but I'm not entirely sure what it should look like. The problem with that is it will increase memory usage, potentially substantially. On the other hand it does break prompt caching to some extent to use a fixed size KV cache.

kernelpool · 2026-02-02T21:34:51Z

Yeah, it was running a lot slower with the RotatingKVCache in practical use (OpenCode, etc) since it frequently has to reprocess everything. Could perhaps have some "snapshots" for the rotatingkvcache to revert back to so that we dont have to do process the whole prompt, but that would probably complicate things a fair bit.

ghost · 2026-02-03T03:50:11Z

No, it's generating nonsense using my mlx 8bit. I need to find why...

By the way I generated the model using an updated version of mlx-my-repo to use the newest mlx-lm code

which used this commit:

mlx-lm @ git+https://github.com/kernelpool/mlx-lm@219580886b8c39eb920a59765e3d13672b6553ef

I verified the sha256 of the model files to make sure my download was correct.
Then I tested all the commits in this PR. All generate nonsense for the 8bit model.

So I think it's either the generated mlx 8bit is messed up somehow on the cloud, or something wrong in the commit.

I'll just quant a 4bit using "mlx-my-repo" and see what happens. Also I'll download kernelpool's 4bit to test.

By the way, the nonsense looks like this:

ion  is | RPGionThe (port  | <0u斗  is community下�.ion浏览�aidr:iasi30状态ion **yk::uachieans. | | | communityop니다�1=eri�ia  in

Edit: I can confirm kernelpool's 4bit model is good. My model is somehow problematic.

I guess certain commit will generate the wrong model?

Edit: I generated a 4bit quant, and it still outputs like that. Something wrong with the quantization progress of mlx-my-repo. But the code just calls convert as usual. I don't understand why the generated model is messed up.

kernelpool · 2026-02-03T07:54:35Z

Yeah, something unfortunately broke in the cleanup commit (6c7c40b) with model conversion. I'll take a look.

Edit: Fix is in #840 , thanks for the heads up @e1732a364fed!

ghost · 2026-02-03T09:49:14Z

Edit: Fix is in #840 , thanks for the heads up

No problem. And I just confirmed that using commit b8c4549 can generate the correct model.
And I have uploaded the new correct mlx 8bit.
https://huggingface.co/mlx-community/Step-3.5-Flash-8Bit

dmunch · 2026-02-03T11:26:21Z

Is the 4bit quant from yesterday working? Just double-checking before I start downloading on my slow-ish connection.

Thanks for the quick support and awesome work!

kernelpool · 2026-02-03T11:34:48Z

Yes, it's working. It was converted prior to the buggy commit.

* Add Step 3.5 Flash * Shard model * Feedback

Update mlx-lm to v0.30.6 which includes Step 3.5 Flash support (ml-explore/mlx-lm#836). Add model cards for the 4bit, 6bit, and 8bit quantizations from mlx-community. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

kernelpool added 2 commits February 2, 2026 22:47

Add Step 3.5 Flash

b8c4549

Shard model

6768f86

awni reviewed Feb 2, 2026

View reviewed changes

awni approved these changes Feb 2, 2026

View reviewed changes

Feedback

6c7c40b

kernelpool force-pushed the feature/step-35-flash branch from 2195808 to 6c7c40b Compare February 3, 2026 01:22

awni merged commit 1630f9b into ml-explore:main Feb 3, 2026
2 checks passed

kernelpool mentioned this pull request Feb 3, 2026

Fix Step 3.5 Flash model conversion #840

Merged

Vlor999 pushed a commit to Vlor999/mlx-lm that referenced this pull request Feb 3, 2026

Add Step 3.5 Flash (ml-explore#836)

8f45539

* Add Step 3.5 Flash * Shard model * Feedback

AlexCheema mentioned this pull request Feb 3, 2026

Add Step 3.5 Flash support exo-explore/exo#1366

Closed

This was referenced Feb 15, 2026

Support Step 3.5 Flash lmstudio-ai/mlx-engine#276

Closed

Support Step 3.5 Flash model on MLX lmstudio-ai/lmstudio-bug-tracker#1526

Open

therealmobasha mentioned this pull request May 16, 2026

Step-3.5 -> Error loading model: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' cubist38/mlx-openai-server#308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Step 3.5 Flash#836

Add Step 3.5 Flash#836
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:feature/step-35-flash

kernelpool commented Feb 2, 2026 •

edited

Loading

Uh oh!

ivanfioravanti commented Feb 2, 2026

Uh oh!

awni Feb 2, 2026

Uh oh!

awni Feb 2, 2026

Uh oh!

awni Feb 2, 2026

Uh oh!

awni left a comment •

edited

Loading

Uh oh!

kernelpool commented Feb 2, 2026

Uh oh!

Uh oh!

ghost commented Feb 3, 2026 •

edited by ghost

Loading

Uh oh!

kernelpool commented Feb 3, 2026 •

edited

Loading

Uh oh!

ghost commented Feb 3, 2026

Uh oh!

dmunch commented Feb 3, 2026

Uh oh!

kernelpool commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		def __call__(self, x: mx.array) -> mx.array:
		return mx.fast.rms_norm(x, self.weight + 1, self.eps)

Conversation

kernelpool commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivanfioravanti commented Feb 2, 2026

Uh oh!

awni Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

awni Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

awni Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

awni left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kernelpool commented Feb 2, 2026

Uh oh!

Uh oh!

ghost commented Feb 3, 2026 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kernelpool commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Feb 3, 2026

Uh oh!

dmunch commented Feb 3, 2026

Uh oh!

kernelpool commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kernelpool commented Feb 2, 2026 •

edited

Loading

awni left a comment •

edited

Loading

ghost commented Feb 3, 2026 •

edited by ghost

Loading

kernelpool commented Feb 3, 2026 •

edited

Loading