Skip to content

Fix Step 3.5 Flash model conversion#840

Merged
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:fix-step35-model
Feb 4, 2026
Merged

Fix Step 3.5 Flash model conversion#840
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:fix-step35-model

Conversation

@kernelpool
Copy link
Copy Markdown
Contributor

@kernelpool kernelpool commented Feb 3, 2026

Fix to avoid applying the RMSNorm delta twice, at conversion and subsequently at load.
This simply reverts back to the original approach from b8c4549. Maybe theres a better way?

More info: #836 (comment)

@kernelpool kernelpool mentioned this pull request Feb 3, 2026
@awni
Copy link
Copy Markdown
Member

awni commented Feb 3, 2026

Yes you can just check if any of the original (un-mapped) keys are still in the weight keys. If they are it means it hasn't been converted yet to MLX format so it's safe to apply the +1.0

@ghost
Copy link
Copy Markdown

ghost commented Feb 3, 2026

Hi Guys,just some feedback. This time I run the 8bit uploaded by myself and also the 4bit model uploaded by kernelpool under many different chat situation rather than the given test command.

It looks like they all have repetition problems. They might (almost 100% when the question is long) repeat certain words forever, the words is context related. Some times a whole short sentence is repeated.

It's not production ready for now. Don't know why, I'm just a test user, sorry.

@kernelpool
Copy link
Copy Markdown
Contributor Author

@awni What do you think about the < 0.5 check? Otherwise we need to re-upload the existing models.

@kernelpool
Copy link
Copy Markdown
Contributor Author

Hi Guys,just some feedback. This time I run the 8bit uploaded by myself and also the 4bit model uploaded by kernelpool under many different chat situation rather than the given test command.

It looks like they all have repetition problems. They might (almost 100% when the question is long) repeat certain words forever, the words is context related. Some times a whole short sentence is repeated.

It's not production ready for now. Don't know why, I'm just a test user, sorry.

What model parameters (temperature, etc) are you using?

@awni
Copy link
Copy Markdown
Member

awni commented Feb 3, 2026

@awni What do you think about the < 0.5 check? Otherwise we need to re-upload the existing models.

I don't think we should do it that way. It's somewhat brittle to the mean of the weights and also breaks lazy loading to some extent.

I think we should just check the presence of a pattern in the weight keys to determine if it's already been converted. And I will re-uploading the models, that's not so difficult. (But others will have to reconvert or re-download).

@ghost
Copy link
Copy Markdown

ghost commented Feb 3, 2026

What model parameters (temperature, etc) are you using?

I tried different temperatures, like 0.6, 1, they all behave the same way. top-p 0.95/1 also.

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@awni
Copy link
Copy Markdown
Member

awni commented Feb 4, 2026

I will re-upload the models as soon as this lands.

@awni awni merged commit b77ec6b into ml-explore:main Feb 4, 2026
2 checks passed
@ghost
Copy link
Copy Markdown

ghost commented Feb 4, 2026

Yup I confirm the old models are not working for the new commit.(Output nonsense again.) reupload required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants