Enhance load_config function to check for config file existence and i… by cubist38 · Pull Request #701 · ml-explore/mlx-lm · GitHub

cubist38 · 2025-12-27T08:30:29Z

Fix: Prioritize `eos_token_id` from `generation_config.json`

Problem

The load_config function was only reading configuration from config.json. However, in many HuggingFace models, eos_token_id is sometimes missing from config.json and is instead defined in generation_config.json. In HuggingFace's model structure, generation_config.json is the standard location for generation-related parameters like eos_token_id, and it should be used as a fallback or override when config.json doesn't contain this value.

When eos_token_id was missing from config.json, the code would fail to load it, leading to incorrect or missing eos_token_id values. This could cause generation to stop at the wrong token or fail to stop when appropriate.

Solution

Updated load_config to:

Load the base configuration from config.json (as before)
Check if generation_config.json exists
If it exists and contains eos_token_id, use that value to override or populate the eos_token_id in the config

This ensures that eos_token_id is correctly loaded even when it's missing from config.json, by reading it from generation_config.json where it's commonly defined in HuggingFace models.

Changes

Modified mlx_lm/utils.py::load_config() to check for and read generation_config.json
Added logic to prioritize eos_token_id from generation_config.json when available
Added proper error handling for missing config.json file

Impact

This fix ensures correct EOS token handling for models where eos_token_id is missing from config.json but present in generation_config.json. This improves generation accuracy and compatibility with standard HuggingFace model formats where generation-specific parameters are stored separately from the base model configuration.

…ncorporate eos_token_id from generation_config.json if available

cubist38 · 2025-12-27T08:38:52Z

Hi @awni ,

I hope you’re doing well. Could you please take a look at this pull request when you have time?
This PR addresses an issue related to a missing eos_token_id. For example, when using the functiongemma-270m model with the following input:

<bos><start_of_turn>user
What is the weather in Tokyo?<end_of_turn>
<start_of_turn>model

Before the fix, the output contains repeated and unintended <end_of_turn> tokens, for example:

('I apologize, but I cannot provide the weather in Tokyo. My current capabilities are limited to assisting with travel information and planning. I recommend searching a dedicated weather website or app for the most accurate forecast for Tokyo.<end_of_turn>\n<end_of_turn>\n feliz!<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n<end_of_turn>\n"""\n*I apologize, but I cannot provide the weather', 89)....

After the fix, the output is properly terminated and behaves as expected:

('I apologize, but I cannot provide the weather in Tokyo. My current capabilities are limited to assisting with travel information and planning. I recommend searching a dedicated weather website or app for the most accurate forecast for Tokyo.', 89)

Thank you very much for your time and for reviewing this change. I’d really appreciate any feedback you may have.

awni

Looks good, thanks!

cubist38 · 2025-12-27T14:44:25Z

Thank you very much for taking a look @awni . It’s a small fix, but I believe it can help improve generation for some models.

Enhance load_config function to check for config file existence and i…

3de289c

…ncorporate eos_token_id from generation_config.json if available

cubist38 mentioned this pull request Dec 27, 2025

Guides on using this with Claude Code cubist38/mlx-openai-server#122

Closed

awni approved these changes Dec 27, 2025

View reviewed changes

nits

b14694e

awni force-pushed the fix/prioritize-generation-config-eos-token branch from b2e7747 to b14694e Compare December 27, 2025 14:35

awni merged commit f5ae09a into ml-explore:main Dec 27, 2025
2 checks passed

cubist38 mentioned this pull request Dec 29, 2025

Enhance load_config to include generation_config.json and extract eos… Blaizzy/mlx-vlm#650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance load_config function to check for config file existence and i…#701

Enhance load_config function to check for config file existence and i…#701
awni merged 2 commits into
ml-explore:mainfrom
cubist38:fix/prioritize-generation-config-eos-token

cubist38 commented Dec 27, 2025

Uh oh!

cubist38 commented Dec 27, 2025

Uh oh!

awni left a comment

Uh oh!

cubist38 commented Dec 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cubist38 commented Dec 27, 2025

Fix: Prioritize eos_token_id from generation_config.json

Problem

Solution

Changes

Impact

Uh oh!

cubist38 commented Dec 27, 2025

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

cubist38 commented Dec 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: Prioritize `eos_token_id` from `generation_config.json`