Skip to content

Fix for Exception - MultiLinear.to_quantized() missing 'mode'#809

Merged
awni merged 2 commits into
ml-explore:mainfrom
inferencers:patch-2
Jan 29, 2026
Merged

Fix for Exception - MultiLinear.to_quantized() missing 'mode'#809
awni merged 2 commits into
ml-explore:mainfrom
inferencers:patch-2

Conversation

@inferencers
Copy link
Copy Markdown
Contributor

Add mode parameter to mixed_quant_predicate_builder as MLX now requires mode to be specified for nn.quantize class_predicate

How to reproduce: Try to convert GLM-4.7-Flash with a mixed quant predicate
MLX: v0.30.3

Add mode parameter to mixed_quant_predicate_builder as MLX now requires mode to be specified for nn.quantize class_predicate
@awni
Copy link
Copy Markdown
Member

awni commented Jan 26, 2026

Thanks for the fix. Since the mixed builder really only makes sense for "affine" I would change it to hardcode that mode in the function itself. And additionally if a mode other than affine is requested with a mixed quant, I would either print warning or raise an exception.

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@awni awni merged commit 7f1b7fe into ml-explore:main Jan 29, 2026
2 checks passed
Copy link
Copy Markdown
Contributor Author

@inferencers inferencers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small note: q_mode is being passed to mixed_quant_predicate_builder, but it's no longer needed as mode is now always set to affine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants