Inserting logits processors into BatchGenerator in batch_generate#1008
Merged
angeloskath merged 2 commits intoMar 31, 2026
Merged
Conversation
Contributor
Author
|
By the way, in the old PR I mentioned that the API for passing in logits_processors with generate-calls is inconsistent. E.g. in If you'd like to make this consistent, I'd be happy to add those changes to this PR too. |
angeloskath
approved these changes
Mar 30, 2026
Member
angeloskath
left a comment
There was a problem hiding this comment.
Moved logits_processors as a kwarg that is passed to BatchGenerator. As an aside, there is no need to make a dictionary to pass keyword arguments passing them by name works fine.
GoodOlClint
added a commit
to GoodOlClint/mlx-lm
that referenced
this pull request
Apr 30, 2026
…l-explore#845) Squash of ml-explore#845 (closed unmerged), 6 commits, applied on top of Patch 1 (case-project-v0.31.3.1). Adds outlines-based JsonSchemaLogitsProcessor wired into both batch and single generation paths in mlx_lm.server; OpenAI-compatible response_format extraction (json_schema, json_object, both nested and flat shapes). Files: mlx_lm/structured.py (new) — StructuredProcessorCache: per-tokenizer LRU cache of compiled outlines indices. mlx_lm/server.py — request parsing + processor integration; routes through the existing logits_processor pipeline (ml-explore#1008 plumbing already in v0.31.3) rather than reintroducing parallel infra. setup.py — adds outlines==1.2.12 dependency. Conflict resolution vs PR ml-explore#845 base: - generate.py: PR ml-explore#845's bb2f48d added logits_processors=logits_processors to gen.insert() in batch_generate(). In v0.31.3 batch_generate() receives logits_processors via **kwargs into BatchGenerator's constructor, and BatchGenerator.insert_segments falls back to self.logits_processors when not passed explicitly. PR's hunk would have been a NameError. Skipped. - server.py batch path: kept v0.31.3's GenerationContext + insert_segments + state_machines architecture. Built the structured processor and merged into the per-segment logits_processors list rather than swapping to PR ml-explore#845's older insert(prompts, max_tokens) call shape. - setup.py: PR pinned outlines==1.2.9 + outlines_core==0.2.14, which is impossible (1.2.9 requires outlines_core==0.2.11). Bumped to outlines==1.2.12 (transitively requires outlines_core==0.2.14) because 0.2.11's Metal kernel has a bfloat16 cast bug that crashes generation with `assigning to bfloat16_t from incompatible type 'float'`. 0.2.14's kernel uses `static_cast<T>(-INFINITY)`. Smoke-test (no --mtp): /v1/chat/completions with response_format={type:json_schema,json_schema:{Person schema}} returns valid JSON `{"name":"John Doe","age":30,"city":"New York"}`, all required keys, age is int, finish_reason=stop. ✓ KNOWN LIMITATION: --mtp + response_format crashes with `ValueError: No next state found for the current state ... with token ID ...` from outlines's stateful FSM. MTP's draft-rejection rollback is not compatible with outlines's Guide.advance() linear-progression assumption. Workaround: run the server WITHOUT --mtp when using response_format. A proper fix would teach structured.py to snapshot and roll back guide state on draft rejection — non-trivial follow-up, not blocking this tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tiny change:
logits_processorsare currently included in the signature ingenerate.batch_generate()but were unused. To actually use them inBatchGenerator, they need to be included when inserting prompts. This PR does that.P.s. I said I'd do this a long time ago (#845). Apologies for the delay, life hit hard and got in the way.