Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: Basaltlabs-app/Gauntlet
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v2.0.3
Choose a base ref
...
head repository: Basaltlabs-app/Gauntlet
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v2.1.0
Choose a head ref
  • 3 commits
  • 14 files changed
  • 2 contributors

Commits on Apr 20, 2026

  1. LM Studio provider, cloud chat wiring, MCP polish, probe fix

    Features
    - LM Studio (closes #2): first-class provider via its OpenAI-compatible
      local server. `gauntlet run --model lmstudio/<name>`, `gauntlet discover`
      lists currently-loaded models, host configurable via LMSTUDIO_HOST env,
      `gauntlet config --lmstudio-host`, or default http://localhost:1234.
      Metadata (family/params/quant) inferred from the model ID.
    - Cloud chat wiring: ChatClient now supports openai/*, anthropic/*, and
      google/* directly (previously NotImplementedError). Enables leaderboard
      baselines for GPT-4o, Claude, Gemini.
    - MCP server:
      - Self-driving tool instructions with explicit "do NOT shell out" directives
      - Auto-detects client app via Context.session.client_params.clientInfo
      - New gauntlet_status(session_id) tool for resumability (replays current
        prompt without mutating runner state)
    
    Fixes
    - Temporal Reasoning probe: previous prompt said "Reply with ONLY the
      name" despite correct answer being Neither (both took 45min). Some
      models looped for minutes on the bind. Prompt now lists
      'Alice' | 'Bob' | 'Neither' explicitly; verify logic unchanged.
    - collect_fingerprint(r.model, "ollama") hardcode: leaderboard submissions
      now derive provider via detect_provider(), fixing mis-attribution for
      non-Ollama runs in `gauntlet quick` and TUI paths.
    
    Safety
    - Non-TTY agent-invocation guard on `gauntlet run`: refuses to benchmark
      local models (Ollama/LM Studio/llama.cpp) when stdin/stdout aren't TTYs
      unless GAUNTLET_ALLOW_LOCAL=1. Prevents MCP clients (Gemini CLI, Claude
      Code, Cursor) that shell out to the CLI from accidentally loading large
      local models and tanking the user's machine.
    
    Polish
    - Error messages, auto-detect, and interactive setup now include LM Studio
      alongside Ollama (no more "Is Ollama running?" when LM Studio is loaded).
    - Host resolution honors config file (env > file > default) for both
      Ollama and LM Studio; persistent `gauntlet config --ollama-host` /
      `--lmstudio-host` flags now actually take effect.
    
    Tests
    - 12 new tests for LM Studio: host precedence, spec parsing, factory,
      metadata inference across 5 model-id patterns.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    JadeLyre and claude committed Apr 20, 2026
    Configuration menu
    Copy the full SHA
    40452cb View commit details
    Browse the repository at this point in the history
  2. README: LM Studio quicklink, Cloud Baselines section, provider lists

    - Add #lm-studio quicklink to the header nav
    - Add Cloud Baselines section covering OpenAI/Anthropic/Google usage
      now that ChatClient supports cloud providers directly
    - Update provider filter tables (leaderboard + API) to list all six:
      ollama, lmstudio, llamacpp, openai, anthropic, google
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    JadeLyre and claude committed Apr 20, 2026
    Configuration menu
    Copy the full SHA
    75622cc View commit details
    Browse the repository at this point in the history
  3. Release 2.1.0

    LM Studio provider (closes #2), cloud ChatClient wiring, MCP polish,
    Temporal probe fix, non-TTY agent safety guard.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    JadeLyre and claude committed Apr 20, 2026
    Configuration menu
    Copy the full SHA
    bdedeea View commit details
    Browse the repository at this point in the history
Loading