The OpenAI-compatible gateway for every free-tier provider.
One local endpoint that fans out across OpenRouter, Groq, NVIDIA NIM, HuggingFace, Cerebras, Cloudflare Workers AI, and your own Ollama. Hit a rate limit, fail over to the next provider. Your agent never knows.
Also wraps Claude Code, OpenAI Codex, and Google Gemini CLI — run all three without paying any of their vendors.
102M+ tokens served in 35 days. $0 spent. Routed through community free-tier keys via this gateway. Daily traffic: free-ride.xyz/models
curl -sSL https://api.free-ride.xyz/install.sh | sh
freeride run claude # or: freeride run codex / freeride run geminiThat's it. No accounts, no subscriptions, no FreeRide cloud. Local-first, BYO keys, your machine talks to providers directly.
macOS / Linux:
curl -sSL https://api.free-ride.xyz/install.sh | shWindows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://api.free-ride.xyz/install.ps1 | iex"The installer picks up uv, then pipx, then plain pip — whichever is on your system. Or install from source.
freeride init # interactive — collects keys, writes ~/.freeride/.env
freeride serve # gateway listens on localhost:11343Get keys (any one is enough; more = better failover):
| Provider | Free tier | Get a key |
|---|---|---|
| OpenRouter | rotating free models | openrouter.ai/keys |
| Groq | daily token cap | console.groq.com/keys |
| NVIDIA NIM | credits per account | build.nvidia.com |
| HuggingFace | $0.10/mo Free, $2/mo PRO | huggingface.co/settings/tokens |
| Cerebras | RPM / TPM caps | cloud.cerebras.ai |
| Cloudflare Workers AI | 10K neurons/day | dash.cloudflare.com |
| Ollama (local) | no quota | install from ollama.com |
Three of the major coding CLIs ship a freeride run wrapper that works with no per-vendor key and no login. The gateway translates between each CLI's native wire protocol and our routing layer; you get the polished agent UX of each CLI, paid for entirely by free-tier providers.
freeride run claudeInside the session, switch routing per request via /model:
| You type | What happens |
|---|---|
/model claude-opus-4-7 |
Your Pro/Max subscription answers (passthrough to api.anthropic.com) — only if claude login has run |
/model freeride/free |
Free providers answer; smart-router picks the model |
/model freeride/fast |
Free; prefers Groq (low TTFT) |
/model freeride/quality |
Free; prefers OpenRouter (widest catalog) |
/model freeride/coding |
Free; pinned to a code-tuned model that reliably emits tool_use blocks |
Full guide: docs/agents/claude-code.md.
freeride run codexWhatever model the CLI picks (gpt-5-codex, gpt-5, etc.) is routed to a free upstream provider. The gateway translates the Responses-API wire format (with full SSE event protocol — response.output_item.added → output_text.delta → output_item.done → response.completed) so the CLI parses everything natively.
Note: codex uses bubblewrap for shell-tool sandboxing; on systems without it, file/shell tool calls fail (the model still works). Full guide: docs/agents/codex.md.
freeride run geminiAny gemini-* model name routes to a free upstream provider. Translator handles Google's {contents, tools, generationConfig} shape both directions. Full guide: docs/agents/gemini.md.
# Aider / Continue.dev / hermes / your-own-tool — anything that speaks OpenAI:
freeride bind aider
freeride bind continue
# or just point it at the gateway directly:
OPENAI_API_BASE=http://localhost:11343/v1
OPENAI_API_KEY=any-string-herePer-request the chain is (provider, key), sorted by recent health:
- Try the head pair.
RATE_LIMITorAUTHerror → mark the key as cooling, try the next key on the same provider.MODEL_NOT_FOUNDorQUOTA_EXHAUSTED→ skip to the next provider.- 5xx / TIMEOUT → next pair.
- First successful response — stamp
X-FreeRide-Provider+X-FreeRide-Request-Idheaders and ship.
If every pair fails, you get a structured 503 with a per-provider breakdown so debugging is one log line, not five round-trips. Mid-stream errors after the first chunk shipped are logged but don't break the client (we can't un-ship bytes).
Smart routing for model: "auto": the resolver scores every free model in the catalog by health × popularity (from the public models leaderboard) and picks the best one. Run freeride audit-models once after install to cache health probes locally so the first real request isn't a cold start.
Deeper: docs/architecture/failover.md.
| Provider | Surface | Notes |
|---|---|---|
| OpenRouter | chat, streaming, tools, vision, structured outputs, embeddings | full surface — the most-used provider in our routing |
| NVIDIA NIM | chat + embeddings | curated free-model allowlist; NVIDIA_NIM_FREE_MODELS_OVERRIDE to expand |
| Groq | chat | Llama 3.x, Gemma 2, Mixtral, DeepSeek-R1-distill; daily token cap |
| Cloudflare Workers AI | chat | cheap-per-neuron models; needs CLOUDFLARE_ACCOUNT_ID |
| HuggingFace Inference | chat + embeddings | full HF router catalog; budget governs access |
| Cerebras | chat | fastest Llama / Qwen inference; no embeddings |
| Ollama (local) | chat | local-only; can mix with remote in the same failover chain |
Adding a new provider: implement freeride.core.provider.Provider in freeride/providers/<name>.py, register it in the conformance suite. See CONTRIBUTING.md.
Provide more than one key per provider with a numbered suffix:
OPENROUTER_API_KEY=sk-or-v1-aaa # primary
OPENROUTER_API_KEY_2=sk-or-v1-bbb
OPENROUTER_API_KEY_3=sk-or-v1-cccThe router tries them in health order. A 429 on one key cools it for the next 60s and rotates to the sibling key — no provider switch needed. On startup freeride keys shows which keys are available vs cooling.
freeride doctor # static checks: keys, ports, /etc/hosts, common gotchas
freeride doctor --claude-code # the same + Claude-Code-specific probes
freeride audit-models # probe every free model on every key; cache the results
freeride bench # measure p50/p95/tok-s per providerTail live events:
tail -f ~/.freeride/events.jsonlEach line is a JSON event: routing decisions, provider attempts, response statuses, mid-stream errors. Same schema the marketing site reads to render the live token counter and provider leaderboard.
A small beacon ships hourly with counts only: tokens served, request count, active providers, uptime hours, OS, version, and a per-install UUID. Never sent: prompts, completions, model IDs, API keys, hostname, IP.
freeride telemetry # audit what the next beacon would post
freeride telemetry off # opt outThe aggregate is what powers free-ride.xyz/models. Default on; explicit disclosure banner prints on first run.
freeride init interactive setup wizard — prompts for keys, writes ~/.freeride/.env
freeride serve start the gateway on :11343
freeride run <cli> wrap a CLI (claude / codex / gemini / anything) — points it at the gateway
freeride bind <agent> write the agent's config so it uses the gateway permanently
freeride doctor pre-flight checks: keys, ports, hosts file, common gotchas
freeride keys which provider keys are available vs cooling
freeride audit-models probe every free model; cache health locally
freeride bench measure p50/p95/tok-s per provider
freeride list list available free models
freeride telemetry manage the hourly aggregate beacon
- Agents
docs/agents/claude-code.md— Claude Code setup,/modelmodes, troubleshootingdocs/agents/codex.md— OpenAI Codex setup, bwrap notes, model selectiondocs/agents/gemini.md— Google Gemini CLI setup, auth flow, model selectiondocs/agents/binders.md— Aider, Continue, OpenClaw — per-agentfreeride bindreferencedocs/agents/hermes.md— NousResearch Hermes agent integration
- Providers
docs/providers/SURVEY.md— per-provider fit (auth, free-tier semantics, error mapping)docs/providers/nvidia_nim.md— NVIDIA NIM specifics
- Architecture
docs/architecture/failover.md— failover chain, cooldown, health trackingdocs/architecture/translators.md— how the Anthropic / Google / OpenAI-Responses translators work
- Other
CONTRIBUTING.md— adding a provider, a CLI wrapper, or a binderSECURITY.md— reporting vulnerabilities
MIT.