Supporting delay in mlx_lm benchmark by AndreasPlt · Pull Request #1010 · ml-explore/mlx-lm · GitHub

AndreasPlt · 2026-03-16T15:25:07Z

Added an optional argument to mlx_lm.benchmark.py named delay which controls cooldown delay between consecutive benchmark runs. This is useful when running benchmarks on passively-cooled MacBook Airs, which otherwise significantly throttle performance and thus degrade benchmark results.

Edit: I also tried out running the same benchmark with and without delay, and I think the results show that cool-down can make quite a difference (running Qwen2.5-7b-Instruct-4bit, with 2048 prompt length and 2048 generation length, on a MacBook Air M4 with 24GB of RAM):

Without delay (command: mlx_lm benchmark --model mlx-community/Qwen2.5-7b-Instruct-4bit -n 5 -p 2048 -g 2048 --delay 0):

Timing with prompt_tokens=2048, generation_tokens=2048, batch_size=1.
Trial 1:  prompt_tps=204.832, generation_tps=21.086, peak_memory=5.114
Trial 2:  prompt_tps=137.867, generation_tps=18.929, peak_memory=5.114
Trial 3:  prompt_tps=153.996, generation_tps=18.775, peak_memory=5.114
Trial 4:  prompt_tps=162.618, generation_tps=17.456, peak_memory=5.115
Trial 5:  prompt_tps=149.155, generation_tps=18.051, peak_memory=5.115
Averages: prompt_tps=161.694, generation_tps=18.859, peak_memory=5.114

With delay (120s) (command: mlx_lm benchmark --model mlx-community/Qwen2.5-7b-Instruct-4bit -n 5 -p 2048 -g 2048 --delay 120):

Running warmup..
Timing with prompt_tokens=2048, generation_tokens=2048, batch_size=1.
Trial 1:  prompt_tps=200.729, generation_tps=21.146, peak_memory=5.114
Trial 2:  prompt_tps=203.119, generation_tps=21.440, peak_memory=5.114
Trial 3:  prompt_tps=193.286, generation_tps=21.239, peak_memory=5.114
Trial 4:  prompt_tps=201.892, generation_tps=21.806, peak_memory=5.115
Trial 5:  prompt_tps=207.449, generation_tps=20.508, peak_memory=5.115
Averages: prompt_tps=201.295, generation_tps=21.228, peak_memory=5.114

The run without delay clearly shows a significant drop in both prompt and generation TPS after the first trial, which is not occuring when running with a delay (where we observe a steady prompt and generation TPS).

angeloskath · 2026-03-16T20:59:02Z

Hm interesting. That is definitely useful but the Macs will also go into low power mode if the program sleeps for say 10 seconds which means we 'll still have some cold start effects 🤷‍♂️

AndreasPlt · 2026-03-16T21:46:38Z

Hm interesting. That is definitely useful but the Macs will also go into low power mode if the program sleeps for say 10 seconds which means we 'll still have some cold start effects 🤷‍♂️

Appreciate the input! Feel free to correct me, but afaik calling time.sleep in a Python thread does not cause a system sleep (i.e. low power mode) immediately, as system sleep is governed by system idle time (if set in the settings). In any case, system idling can (and probably should, for most benchmarks) be deactivated when running with AC power in macOS settings ("Battery" > "Options" > Turn on "Prevent automatic sleeping on power adapter when the display is off").

angeloskath

Yeah it depends on many factors whether the machine will go to low power mode. I was simply mentioning that there will still likely be a discrepancy. Either way I think this looks great and especially like the numbers you added.

AndreasPlt added 2 commits March 16, 2026 15:19

add delay in benchmark

2ec7ee7

fix typo

c594395

angeloskath approved these changes Mar 16, 2026

View reviewed changes

angeloskath merged commit 564281f into ml-explore:main Mar 17, 2026
2 checks passed

ethannortharc mentioned this pull request Mar 29, 2026

v0.3.0rc1: dependency conflict between omlx and mlx-audio on mlx-lm version jundot/omlx#462

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting delay in mlx_lm benchmark#1010

Supporting delay in mlx_lm benchmark#1010
angeloskath merged 2 commits into
ml-explore:mainfrom
AndreasPlt:benchmark-with-delay

AndreasPlt commented Mar 16, 2026 •

edited

Loading

Uh oh!

angeloskath commented Mar 16, 2026

Uh oh!

AndreasPlt commented Mar 16, 2026

Uh oh!

angeloskath left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AndreasPlt commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angeloskath commented Mar 16, 2026

Uh oh!

AndreasPlt commented Mar 16, 2026

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AndreasPlt commented Mar 16, 2026 •

edited

Loading