Supporting delay in mlx_lm benchmark#1010
Conversation
|
Hm interesting. That is definitely useful but the Macs will also go into low power mode if the program sleeps for say 10 seconds which means we 'll still have some cold start effects 🤷♂️ |
Appreciate the input! Feel free to correct me, but afaik calling |
angeloskath
left a comment
There was a problem hiding this comment.
Yeah it depends on many factors whether the machine will go to low power mode. I was simply mentioning that there will still likely be a discrepancy. Either way I think this looks great and especially like the numbers you added.
fixes #1009
Added an optional argument to
mlx_lm.benchmark.pynameddelaywhich controls cooldown delay between consecutive benchmark runs. This is useful when running benchmarks on passively-cooled MacBook Airs, which otherwise significantly throttle performance and thus degrade benchmark results.Edit: I also tried out running the same benchmark with and without delay, and I think the results show that cool-down can make quite a difference (running Qwen2.5-7b-Instruct-4bit, with 2048 prompt length and 2048 generation length, on a MacBook Air M4 with 24GB of RAM):
Without delay (command:
mlx_lm benchmark --model mlx-community/Qwen2.5-7b-Instruct-4bit -n 5 -p 2048 -g 2048 --delay 0):With delay (120s) (command:
mlx_lm benchmark --model mlx-community/Qwen2.5-7b-Instruct-4bit -n 5 -p 2048 -g 2048 --delay 120):The run without delay clearly shows a significant drop in both prompt and generation TPS after the first trial, which is not occuring when running with a delay (where we observe a steady prompt and generation TPS).