Skip to content

Add 'mx.clear_cache()' to piecewise prompt processing in server.#917

Merged
awni merged 1 commit into
ml-explore:mainfrom
N8python:main
Feb 23, 2026
Merged

Add 'mx.clear_cache()' to piecewise prompt processing in server.#917
awni merged 1 commit into
ml-explore:mainfrom
N8python:main

Conversation

@N8python
Copy link
Copy Markdown
Contributor

Prompts that were being newly added to the server didn't have cache-clearing enabled while being processed. This lead to massive memory hang.

For prompts of length ~16K on GLM 4.7 Flash 6bit, this took peak memory from 50+GB to a more reasonable 28GB.

@ivanfioravanti
Copy link
Copy Markdown
Contributor

It works! A single line of code with a mega impact.

Copy link
Copy Markdown
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Not sure how we missed that.

@N8python
Copy link
Copy Markdown
Contributor Author

Can we merge?

@awni awni merged commit d4701ba into ml-explore:main Feb 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants