feat(opti): muon optimizer by beniz · Pull Request #809 · jolibrain/joliGEN · GitHub

beniz · 2026-03-12T10:10:01Z

No description provided.

JiwaniZakir

The Muon-specific flags (--train_muon_adjust_lr_fn, --train_muon_nesterov, --train_muon_ns_steps) have no apparent validation that they're only meaningful when --train_optim muon is selected — users who accidentally set --train_muon_ns_steps 10 while using Adam will get silent no-ops with no warning. A guard in the option parsing or register_optimizer that raises or warns when Muon-specific options are set alongside a non-Muon optimizer would prevent this confusion.

The switch from self.netG_A.parameters() to self.get_named_parameters(("G_A", self.netG_A)) in b2b_model.py is the right approach for Muon, since it needs to distinguish 2D parameters (eligible for Newton-Schulz orthogonalization) from 1D parameters (biases, norms) that fall back to AdamW — but the diff is truncated so it's not possible to verify whether build_named_parameters in util/optimizer_factory.py actually implements this split correctly. If 1D parameters like LayerNorm weights are accidentally routed through the Nesterov/orthogonalization path, training will silently diverge.

The default --train_muon_ns_steps 5 is reasonable, but --train_muon_nesterov True as a flag default that is True out of the box is unusual for a flag-type option — in most argparse setups a flag implies store_true with a default of False, so documenting it as defaulting to True in both options.md and options.rst may be misleading or incorrect depending on how the option is actually registered.

beniz self-assigned this Mar 12, 2026

beniz added the type:optimization label Mar 12, 2026

beniz force-pushed the feat_muon branch 3 times, most recently from 73b47a3 to 3a6aacc Compare March 12, 2026 13:02

JiwaniZakir reviewed Apr 3, 2026

View reviewed changes

beniz force-pushed the feat_muon branch from 3a6aacc to 12a504b Compare April 22, 2026 13:47

beniz added 3 commits May 7, 2026 12:08

feat(opti): muon optimizer

0b0013d

fix: amp for optimizer groups

18e9589

fix: optimizer factory with MAT

42bb0a5

beniz force-pushed the feat_muon branch from 6ed2a06 to 42bb0a5 Compare May 7, 2026 10:08

beniz added 2 commits May 7, 2026 12:10

fix: muon action type

c8026f6

chore: relaxed torch version

1e27c07

beniz force-pushed the feat_muon branch from 071e332 to 1e27c07 Compare May 11, 2026 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(opti): muon optimizer#809

feat(opti): muon optimizer#809
beniz wants to merge 5 commits into
masterfrom
feat_muon

beniz commented Mar 12, 2026

Uh oh!

JiwaniZakir left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

beniz commented Mar 12, 2026

Uh oh!

JiwaniZakir left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants