Skip to content

Inserting logits processors into BatchGenerator in batch_generate#1008

Merged
angeloskath merged 2 commits into
ml-explore:mainfrom
arthurhjorth:logits_processors_for_batch_generate
Mar 31, 2026
Merged

Inserting logits processors into BatchGenerator in batch_generate#1008
angeloskath merged 2 commits into
ml-explore:mainfrom
arthurhjorth:logits_processors_for_batch_generate

Conversation

@arthurhjorth
Copy link
Copy Markdown
Contributor

Tiny change: logits_processors are currently included in the signature in generate.batch_generate() but were unused. To actually use them in BatchGenerator, they need to be included when inserting prompts. This PR does that.

P.s. I said I'd do this a long time ago (#845). Apologies for the delay, life hit hard and got in the way.

@arthurhjorth
Copy link
Copy Markdown
Contributor Author

By the way, in the old PR I mentioned that the API for passing in logits_processors with generate-calls is inconsistent. E.g. in batch_generate() they are a named arg, in single generate() you have to pass them in with **kwargs, i.e.

g = generate(model, tokenizer, prompts[0], **{'logits_processors': logits_processors[0]})

If you'd like to make this consistent, I'd be happy to add those changes to this PR too.

Copy link
Copy Markdown
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved logits_processors as a kwarg that is passed to BatchGenerator. As an aside, there is no need to make a dictionary to pass keyword arguments passing them by name works fine.

@angeloskath angeloskath merged commit bdeac59 into ml-explore:main Mar 31, 2026
2 checks passed
GoodOlClint added a commit to GoodOlClint/mlx-lm that referenced this pull request Apr 30, 2026
…l-explore#845)

Squash of ml-explore#845 (closed unmerged), 6 commits, applied
on top of Patch 1 (case-project-v0.31.3.1). Adds outlines-based
JsonSchemaLogitsProcessor wired into both batch and single generation
paths in mlx_lm.server; OpenAI-compatible response_format extraction
(json_schema, json_object, both nested and flat shapes).

Files:
  mlx_lm/structured.py (new) — StructuredProcessorCache: per-tokenizer
    LRU cache of compiled outlines indices.
  mlx_lm/server.py — request parsing + processor integration; routes
    through the existing logits_processor pipeline (ml-explore#1008 plumbing
    already in v0.31.3) rather than reintroducing parallel infra.
  setup.py — adds outlines==1.2.12 dependency.

Conflict resolution vs PR ml-explore#845 base:
  - generate.py: PR ml-explore#845's bb2f48d added logits_processors=logits_processors
    to gen.insert() in batch_generate(). In v0.31.3 batch_generate()
    receives logits_processors via **kwargs into BatchGenerator's
    constructor, and BatchGenerator.insert_segments falls back to
    self.logits_processors when not passed explicitly. PR's hunk would
    have been a NameError. Skipped.
  - server.py batch path: kept v0.31.3's GenerationContext +
    insert_segments + state_machines architecture. Built the structured
    processor and merged into the per-segment logits_processors list
    rather than swapping to PR ml-explore#845's older insert(prompts, max_tokens)
    call shape.
  - setup.py: PR pinned outlines==1.2.9 + outlines_core==0.2.14, which
    is impossible (1.2.9 requires outlines_core==0.2.11). Bumped to
    outlines==1.2.12 (transitively requires outlines_core==0.2.14)
    because 0.2.11's Metal kernel has a bfloat16 cast bug that
    crashes generation with `assigning to bfloat16_t from incompatible
    type 'float'`. 0.2.14's kernel uses `static_cast<T>(-INFINITY)`.

Smoke-test (no --mtp): /v1/chat/completions with
response_format={type:json_schema,json_schema:{Person schema}} returns
valid JSON `{"name":"John Doe","age":30,"city":"New York"}`, all
required keys, age is int, finish_reason=stop. ✓

KNOWN LIMITATION: --mtp + response_format crashes with
`ValueError: No next state found for the current state ... with token ID ...`
from outlines's stateful FSM. MTP's draft-rejection rollback is not
compatible with outlines's Guide.advance() linear-progression
assumption. Workaround: run the server WITHOUT --mtp when using
response_format. A proper fix would teach structured.py to snapshot
and roll back guide state on draft rejection — non-trivial follow-up,
not blocking this tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants