Skip to content

Add Step 3.5 Flash#836

Merged
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:feature/step-35-flash
Feb 3, 2026
Merged

Add Step 3.5 Flash#836
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:feature/step-35-flash

Conversation

@kernelpool
Copy link
Copy Markdown
Contributor

@kernelpool kernelpool commented Feb 2, 2026

Model: https://huggingface.co/stepfun-ai/Step-3.5-Flash

Note: I've used regular KVCache with masking for the sliding window attention to support trimming

Tool calling uses same format as qwen3 coder.

Issue: #835

mlx_lm.generate --model /Volumes/WD_EXTRA/models/catalyst/Step-3.5-Flash-4bit --prompt "Who is Albert Einstein?" --trust-remote-code -m 1024
The tokenizer you are loading from '/Volumes/WD_EXTRA/models/catalyst/Step-3.5-Flash-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
==========
Hmm, the user is asking about Albert Einstein. This is a straightforward factual question about a well-known historical figure. The user likely wants a concise yet comprehensive overview of who Einstein was, covering his major contributions and legacy. 

I should start with his most iconic achievement—the theory of relativity—since that’s what he’s most famous for. Then I can mention his Nobel Prize for the photoelectric effect, which is less known but equally important. It’s also worth highlighting his broader impact beyond physics, like his advocacy for peace and civil rights, since that paints a fuller picture of his character. 

The user might be a student or someone casually curious, so I’ll keep it clear but avoid oversimplifying. I’ll structure it chronologically: early life, key scientific work, later activism, and legacy. No need for excessive detail—just the essentials that define his significance. 

Oh, and I should clarify that “E=mc²” is part of relativity, not a separate theory, since people often conflate them. Also, mentioning his emigration to the U.S. and his role in the atomic bomb letter adds historical context. The tone should be respectful but not overly academic—accessible to anyone.
</think>
Albert Einstein (1879–1955) was a German-born theoretical physicist who revolutionized modern physics and became one of the most famous scientists in history. Here’s a concise overview of his life and legacy:

---

### **Key Scientific Contributions**
1. **Theory of Relativity**  
   - **Special Relativity (1905):** Introduced the idea that the laws of physics are the same for all non-accelerating observers, and that the speed of light is constant. This led to the famous equation **E=mc²**, showing the equivalence of mass and energy.  
   - **General Relativity (1915):** Described gravity as the curvature of spacetime caused by mass and energy. It predicted phenomena like gravitational waves and black holes, later confirmed by observation.

2. **Photoelectric Effect (1905)**  
   - Proposed that light could be described as discrete packets of energy (“quanta,” later called **photons**). This work earned him the **1921 Nobel Prize in Physics** and laid groundwork for quantum mechanics.

3. **Brownian Motion**  
   - Explained the random movement of particles in a fluid, providing empirical evidence for the existence of atoms and molecules.

4. **Bose-Einstein Statistics**  
   - Collaborated with Indian physicist Satyendra Nath Bose to predict a new state of matter—**Bose-Einstein condensate**—observed decades later.

---

### **Life and Context**
- **Early Life:** Born in Ulm, Germany, to a Jewish family. Showed late development in speech but excelled in math and physics.
- **Career:** Worked at the Swiss Patent Office (1902–1909), then held professorships in Bern, Zurich, Prague, Berlin, and later Princeton, USA.
- **Emigration:** Fled Nazi Germany in 1933, settled in the U.S., and became a U.S. citizen in 1940.
- **Later Years:** Advocated for civil rights, nuclear disarmament, and Zionism. Warned President Roosevelt about Nazi Germany’s potential atomic bomb research, inadvertently catalyzing the Manhattan Project—a decision he later regretted.

---

### **Legacy and Cultural Impact**
- **Iconic Status:** His wild hair and thoughtful demeanor made him a pop-culture symbol of genius.
- **Scientific Influence:** His theories underpin modern cosmology, GPS technology, and nuclear energy.
- **Philosophical Impact:** His work challenged Newtonian physics and reshaped our understanding of space, time, and reality.
- **Humanitarian:** A vocal advocate for peace, he used his fame to speak out against fascism, racism, and the misuse of science.

---

### **Fun Facts**
- He failed his first college entrance exam.
- His brain was preserved for study after his death; his brain was removed without permission during his autopsy.
- He declined the presidency of Israel in 1952.
- He was offered the Nobel Prize money to his ex-wife Mileva Marić as part of their divorce settlement.

---

Einstein’s name is synonymous with **genius**, but his legacy extends far beyond physics—he was a scientist, a humanitarian, and a symbol of curiosity and intellectual courage.
==========
Prompt: 17 tokens, 20.135 tokens-per-sec
Generation: 912 tokens, 50.706 tokens-per-sec
Peak memory: 111.022 GB

@ivanfioravanti
Copy link
Copy Markdown
Contributor

You rock @kernelpool 🔥

Comment thread mlx_lm/models/step3p5.py Outdated
Comment on lines +72 to +73
def __call__(self, x: mx.array) -> mx.array:
return mx.fast.rms_norm(x, self.weight + 1, self.eps)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor optimization: we can preprocess the weight in sanitize instead adding one every time.

Comment thread mlx_lm/models/step3p5.py Outdated
self.router_bias = mx.zeros((self.n_routed_experts,))

def __call__(self, x: mx.array):
gates = self.gate(x.astype(mx.float32))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do the cast inside the moe_gate_select. It will get fused in the compilation.

Comment thread mlx_lm/models/step3p5.py Outdated
self.routed_scaling_factor,
self.norm_topk_prob,
)
return inds, weights.astype(x.dtype)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice! Just left a few minor comments. After that, let's merge it!

Note: I've used regular KVCache with masking for the sliding window attention to support trimming

That's pretty interesting. I think we need a better story there in mlx-lm in general, but I'm not entirely sure what it should look like. The problem with that is it will increase memory usage, potentially substantially. On the other hand it does break prompt caching to some extent to use a fixed size KV cache.

@kernelpool
Copy link
Copy Markdown
Contributor Author

Yeah, it was running a lot slower with the RotatingKVCache in practical use (OpenCode, etc) since it frequently has to reprocess everything. Could perhaps have some "snapshots" for the rotatingkvcache to revert back to so that we dont have to do process the whole prompt, but that would probably complicate things a fair bit.

@kernelpool kernelpool force-pushed the feature/step-35-flash branch from 2195808 to 6c7c40b Compare February 3, 2026 01:22
@awni awni merged commit 1630f9b into ml-explore:main Feb 3, 2026
2 checks passed
@ghost
Copy link
Copy Markdown

ghost commented Feb 3, 2026

No, it's generating nonsense using my mlx 8bit. I need to find why...

By the way I generated the model using an updated version of mlx-my-repo to use the newest mlx-lm code

which used this commit:

mlx-lm @ git+https://github.com/kernelpool/mlx-lm@219580886b8c39eb920a59765e3d13672b6553ef

I verified the sha256 of the model files to make sure my download was correct.
Then I tested all the commits in this PR. All generate nonsense for the 8bit model.

So I think it's either the generated mlx 8bit is messed up somehow on the cloud, or something wrong in the commit.

I'll just quant a 4bit using "mlx-my-repo" and see what happens. Also I'll download kernelpool's 4bit to test.

By the way, the nonsense looks like this:

ion  is | RPGionThe (port  | <0u斗  is community下�.ion浏览�aidr:iasi30状态ion **yk::uachieans. | | | communityop니다�1=eri�ia  in

Edit: I can confirm kernelpool's 4bit model is good. My model is somehow problematic.

I guess certain commit will generate the wrong model?

Edit: I generated a 4bit quant, and it still outputs like that. Something wrong with the quantization progress of mlx-my-repo. But the code just calls convert as usual. I don't understand why the generated model is messed up.

@kernelpool
Copy link
Copy Markdown
Contributor Author

kernelpool commented Feb 3, 2026

Yeah, something unfortunately broke in the cleanup commit (6c7c40b) with model conversion. I'll take a look.

Edit: Fix is in #840 , thanks for the heads up @e1732a364fed!

@ghost
Copy link
Copy Markdown

ghost commented Feb 3, 2026

Edit: Fix is in #840 , thanks for the heads up

No problem. And I just confirmed that using commit b8c4549 can generate the correct model.
And I have uploaded the new correct mlx 8bit.
https://huggingface.co/mlx-community/Step-3.5-Flash-8Bit

@dmunch
Copy link
Copy Markdown

dmunch commented Feb 3, 2026

Is the 4bit quant from yesterday working? Just double-checking before I start downloading on my slow-ish connection.

Thanks for the quick support and awesome work!

@kernelpool
Copy link
Copy Markdown
Contributor Author

Yes, it's working. It was converted prior to the buggy commit.

Vlor999 pushed a commit to Vlor999/mlx-lm that referenced this pull request Feb 3, 2026
* Add Step 3.5 Flash

* Shard model

* Feedback
AlexCheema added a commit to exo-explore/exo that referenced this pull request Feb 3, 2026
Update mlx-lm to v0.30.6 which includes Step 3.5 Flash support
(ml-explore/mlx-lm#836). Add model cards for the 4bit, 6bit, and 8bit
quantizations from mlx-community.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants