Skip to content

feat(opti): muon optimizer#809

Open
beniz wants to merge 5 commits into
masterfrom
feat_muon
Open

feat(opti): muon optimizer#809
beniz wants to merge 5 commits into
masterfrom
feat_muon

Conversation

@beniz
Copy link
Copy Markdown
Contributor

@beniz beniz commented Mar 12, 2026

No description provided.

@beniz beniz self-assigned this Mar 12, 2026
@beniz beniz force-pushed the feat_muon branch 3 times, most recently from 73b47a3 to 3a6aacc Compare March 12, 2026 13:02
Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Muon-specific flags (--train_muon_adjust_lr_fn, --train_muon_nesterov, --train_muon_ns_steps) have no apparent validation that they're only meaningful when --train_optim muon is selected — users who accidentally set --train_muon_ns_steps 10 while using Adam will get silent no-ops with no warning. A guard in the option parsing or register_optimizer that raises or warns when Muon-specific options are set alongside a non-Muon optimizer would prevent this confusion.

The switch from self.netG_A.parameters() to self.get_named_parameters(("G_A", self.netG_A)) in b2b_model.py is the right approach for Muon, since it needs to distinguish 2D parameters (eligible for Newton-Schulz orthogonalization) from 1D parameters (biases, norms) that fall back to AdamW — but the diff is truncated so it's not possible to verify whether build_named_parameters in util/optimizer_factory.py actually implements this split correctly. If 1D parameters like LayerNorm weights are accidentally routed through the Nesterov/orthogonalization path, training will silently diverge.

The default --train_muon_ns_steps 5 is reasonable, but --train_muon_nesterov True as a flag default that is True out of the box is unusual for a flag-type option — in most argparse setups a flag implies store_true with a default of False, so documenting it as defaulting to True in both options.md and options.rst may be misleading or incorrect depending on how the option is actually registered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants