Comparing changes

Features - LM Studio (closes #2): first-class provider via its OpenAI-compatible local server. `gauntlet run --model lmstudio/<name>`, `gauntlet discover` lists currently-loaded models, host configurable via LMSTUDIO_HOST env, `gauntlet config --lmstudio-host`, or default http://localhost:1234. Metadata (family/params/quant) inferred from the model ID. - Cloud chat wiring: ChatClient now supports openai/*, anthropic/*, and google/* directly (previously NotImplementedError). Enables leaderboard baselines for GPT-4o, Claude, Gemini. - MCP server: - Self-driving tool instructions with explicit "do NOT shell out" directives - Auto-detects client app via Context.session.client_params.clientInfo - New gauntlet_status(session_id) tool for resumability (replays current prompt without mutating runner state) Fixes - Temporal Reasoning probe: previous prompt said "Reply with ONLY the name" despite correct answer being Neither (both took 45min). Some models looped for minutes on the bind. Prompt now lists 'Alice' | 'Bob' | 'Neither' explicitly; verify logic unchanged. - collect_fingerprint(r.model, "ollama") hardcode: leaderboard submissions now derive provider via detect_provider(), fixing mis-attribution for non-Ollama runs in `gauntlet quick` and TUI paths. Safety - Non-TTY agent-invocation guard on `gauntlet run`: refuses to benchmark local models (Ollama/LM Studio/llama.cpp) when stdin/stdout aren't TTYs unless GAUNTLET_ALLOW_LOCAL=1. Prevents MCP clients (Gemini CLI, Claude Code, Cursor) that shell out to the CLI from accidentally loading large local models and tanking the user's machine. Polish - Error messages, auto-detect, and interactive setup now include LM Studio alongside Ollama (no more "Is Ollama running?" when LM Studio is loaded). - Host resolution honors config file (env > file > default) for both Ollama and LM Studio; persistent `gauntlet config --ollama-host` / `--lmstudio-host` flags now actually take effect. Tests - 12 new tests for LM Studio: host precedence, spec parsing, factory, metadata inference across 5 model-id patterns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Add #lm-studio quicklink to the header nav - Add Cloud Baselines section covering OpenAI/Anthropic/Google usage now that ChatClient supports cloud providers directly - Update provider filter tables (leaderboard + API) to list all six: ollama, lmstudio, llamacpp, openai, anthropic, google Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LM Studio provider (closes #2), cloud ChatClient wiring, MCP polish, Temporal probe fix, non-TTY agent safety guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Apr 20, 2026

This comparison is taking too long to generate.

Uh oh!