Skip to content

Sync random seed across ranks in distributed chat#801

Merged
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:sync-random-seed
Jan 23, 2026
Merged

Sync random seed across ranks in distributed chat#801
awni merged 3 commits into
ml-explore:mainfrom
kernelpool:sync-random-seed

Conversation

@kernelpool
Copy link
Copy Markdown
Contributor

This fixes an issue affecting distributed chat where a temperature > 0 would result in ranks having a different random state, causing them to sample different tokens and diverge (often seen as artifacts in the output, shown below).

This fix follows the approach used in #741 and 1d76aab

mlx.launch --verbose --backend jaccl --hostfile hosts-jaccl.json --env MLX_METAL_FAST_SYNCH=1 -- /Users/optimus/repo/mlx-lm/.venv/bin/mlx_lm.chat --model mlx-community/Qwen3-4B-Instruct-2507-8bit --temp 1.0
[INFO] Running /Users/optimus/repo/mlx-lm/.venv/bin/python /Users/optimus/repo/mlx-lm/.venv/bin/mlx_lm.chat --model mlx-community/Qwen3-4B-Instruct-2507-8bit --temp 1.0 
Fetching 10 files: 100% 10/10 [00:00<00:00, 106184.91it/s]
Download complete: : 0.00B [00:00, ?B/s]              
Fetching 11 files: 100% 11/11 [00:00<00:00, 38067.12it/s]
Download complete: : 0.00B [00:00, ?B/s]              
Fetching 10 files: 100% 10/10 [00:02<00:00,  4.07it/s]MB/s]                
Fetching 11 files: 100% 11/11 [00:49<00:00,  4.46s/it]     
Download complete: 100% 4.27G/4.27G [00:49<00:00, 86.9MB/s]00:00, 152MB/s]  
Download complete: : 15.9MB [00:52, 306kB/s] :00, 152MB/s]                
[INFO] Starting chat session with mlx-community/Qwen3-4B-Instruct-2507-8bit.
The command list:
- 'q' to exit
- 'r' to reset the chat
- 'h' to display these commands
>> hello!
Hello! � How can I assist you today?

Comment thread mlx_lm/chat.py Outdated

if group.size() > 1:
seed = mx.distributed.all_sum(mx.random.state[0]).view(mx.uint64).item()
mx.random.seed(seed)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! But this ignores the seed flag above. Wdyt about instead just setting a default seed and using that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense

Comment thread mlx_lm/chat.py Outdated
Comment on lines +107 to +108
if args.seed is None:
mx.random.seed(0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can just set the DEFAULT_SEED=0 so the behavior is consistent in all cases. I think it's fine if the default behavior is seeded.

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

@awni awni merged commit 12073b1 into ml-explore:main Jan 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants