Run Claude Code on freeOSS models. Upgrade whenyou need to.

InferAll is an Anthropic-compatible gateway with $0 input/output on 118+ NVIDIA-hosted open-source models (within the free-plan daily request limit), pay-as-you-go access to Claude, GPT-4, and Gemini, and automatic provider failover. Point Claude Code, Cline, Cursor, LangChain or any OpenAI-compatible client at one base URL — one key, one bill.

Drop into Claude Code with two env vars — no code changes.

Drop-in for Claude Code

export ANTHROPIC_API_KEY=ifu_...
export ANTHROPIC_BASE_URL=https://api.inferall.ai
claude

Or call the gateway directly with the official SDK — Python (inferall-ai 0.1.0 on PyPI) or TypeScript (@inferall/sdk 0.1.0 on npm).

Python SDK — install

pip install inferall-ai

Python SDK — usage

from inferall import Inferall

ai = Inferall()  # reads INFERALL_API_KEY
ai.text("Explain quantum computing in two sentences")

200 free trial requests on 118+ open NVIDIA NIM models to start. $5 starter pack unlocks ongoing use — see /pricing.

217 models live · 121 free · 33,800 requests / 30d · gateway healthy

Why InferAll

Open models at $0 rate

118+ open-source models hosted on NVIDIA NIM — Llama 3.1 70B, Mixtral, Nemotron, CodeLlama — bill at our open-model rate ($0 input, $0 output) once your account is activated with the $5 starter pack.

Multi-provider routing

One key fronts OpenAI, Anthropic, Google, NVIDIA NIM, Replicate, and Runway. Switch providers with a parameter, not a rewrite.

Drop-in compatibility

OpenAI-compatible at /ai/v1/generate. Anthropic-compatible at /v1/messages. Existing SDKs, agents, and editors work unchanged.

One bill

Pay-as-you-go on premium providers at their published token price with zero markup. Free OSS, premium when you need it, single invoice.

Resilience

Automatic failover across providers

Anthropic rate-limits. OpenAI has incidents. Google Gemini quotas reset. Production agents that call a single provider directly inherit every one of those failures.

InferAll's gateway watches for 429s, 529s, 5xx responses, and timeouts (30s default). When the primary provider can't serve a request, the same prompt is retried server-side against the next provider that can run an equivalent model — Anthropic to Google, OpenAI to NVIDIA, and so on. Your client code does not change.

Context-size-aware routing means a 1M-token prompt is only sent to providers whose models can actually fit it. Cheapest-first selection means you pay the least for the same quality bar.

Failover flow

# Anthropic returns 429 / 529 / 5xx?
# InferAll transparently retries against the next provider
# that can serve the requested model class.

POST https://api.inferall.ai/v1/messages
  → Anthropic (rate-limited)       [retry]
  → Google Gemini 2.5              [200 OK]

Providers

OpenAIGPT-4o, DALL-E 3, o1
AnthropicClaude Sonnet 4, Opus, Haiku
GoogleGemini 2.5, Veo 3, Imagen 4
NVIDIALlama 3.1 70B, Mixtral, 118+ free models
ReplicateFlux, Stable Diffusion, video
RunwayGen-4.5, video generation

Editor integration

Use with your favorite editor

Anything that speaks the OpenAI or Anthropic wire format works with InferAll. That covers most modern coding agents — including the three below.

Claude Code

Anthropic's terminal-native coding agent. Point ANTHROPIC_BASE_URL at InferAll and route Claude Code through free OSS models for cheap turns, premium models for hard ones.

Setup snippet

export ANTHROPIC_API_KEY=ifu_...
export ANTHROPIC_BASE_URL=https://api.inferall.ai
claude

Cline

The autonomous coding agent for VS Code. Cline supports Anthropic-compatible endpoints — set the base URL in settings and your free OSS allowance covers the chatty inner loop.

Setup steps

# In Cline's Anthropic provider settings:
#   API Key:  ifu_...
#   Base URL: https://api.inferall.ai

Cursor

Cursor lets you bring your own OpenAI-compatible endpoint. Configure InferAll as a custom provider to use the same key across editors.

Setup steps

# In Cursor → Settings → Models → Custom API:
#   OpenAI API Key:  ifu_...
#   OpenAI Base URL: https://api.inferall.ai/v1

Prefer a one-click VS Code install with inference already wired in? Use the dedicated InferAll for VS Code extension — the canonical InferAll-branded experience, with the upstream Cline + base-URL path above still fully supported.

Dedicated Cline and Cursor integration guides are coming to the docs.

How it works

01

Get your API key

Sign up at /keys and create an InferAll API key (prefix ifu_). Activate the key at /billing with the $5 starter pack — that $5 becomes spendable balance for any model (open NIM at $0 input/output, premium providers at the provider's rate with zero markup).

02

Point your tool at InferAll

Set ANTHROPIC_BASE_URL=https://api.inferall.ai for Claude Code and Cline, or OPENAI_BASE_URL=https://api.inferall.ai/v1 for the OpenAI SDK. No code changes.

03

Let the gateway route

Free OSS models for cheap turns, premium providers when you ask for them, automatic failover when something breaks. One bill at the end of the month.

Pricing

Free

$0/month

  • 100 chat / 50 text requests per day on 118+ open NIM models — $0 in/out
  • $5/month of usage included on premium providers (OpenAI, Anthropic, Google)
  • Anthropic + OpenAI compatible
  • Automatic provider failover

Pro

$29/month

  • $20/month of premium-provider usage included
  • Higher daily request limits (5,000 chat / 10,000 text / 500 image-generate)
  • All 207+ models, all providers — zero markup
  • Image and video generation, streaming, tool use

Team

$99/month

  • $100/month of premium-provider usage included
  • Same high request limits as Pro + 200/day video-generate
  • Multiple API keys
  • Priority routing

Enterprise

Custom

  • Custom included usage and request limits
  • Custom model hosting
  • Priority support
  • Dedicated account team
Get started free

Questions

What is InferAll?

InferAll is an Anthropic-compatible and OpenAI-compatible AI gateway. Point your existing Claude Code, Cline, Cursor, or SDK code at api.inferall.ai with one environment variable and get access to 207+ models across OpenAI, Anthropic, Google, NVIDIA NIM, Replicate, and Runway — including $0 input/output on 118+ open NVIDIA NIM models within the free-tier daily request limits.

How do I use InferAll with Claude Code?

Set ANTHROPIC_API_KEY to your InferAll key and ANTHROPIC_BASE_URL to https://api.inferall.ai, then run claude as usual. InferAll exposes an Anthropic-compatible /v1/messages endpoint, so Claude Code sends every request through InferAll. Route cheap turns to free NVIDIA-hosted OSS models and let premium Claude or GPT handle hard tasks.

How does automatic failover work?

When a provider rate-limits, returns a 5xx, or times out (30s default), InferAll's gateway transparently retries the same request against the next provider that can serve the model class. A single Anthropic outage or rate-limit no longer takes down your agent. Failover happens server-side — no client-side retry logic required.

How many free models does InferAll offer?

118+ open-source models hosted on NVIDIA NIM are billed at $0 input and $0 output once your account is activated, capped only by the free-plan daily request rate (100 chat / 50 text / 20 image-analyze / 10 image-generate / 5 video-generate per day). The roster includes Llama 3.1 70B, Mixtral, Nemotron, and CodeLlama. Activation is a $5 starter pack that becomes spendable balance on your account — the open NIM models cost $0 against that balance, premium providers (OpenAI, Anthropic, Google) bill at the provider's published per-token rate with zero markup.

How does InferAll compare to OpenRouter, LiteLLM, Portkey, or Vercel AI Gateway?

Other gateways aggregate paid providers. InferAll is the only gateway that bundles an Anthropic-compatible endpoint, an open NVIDIA NIM inference tier at $0 input/output, and automatic cross-provider failover into one product — at zero markup on premium tokens.

Is there a native InferAll SDK?

Yes — both SDKs are live. Python: pip install inferall-ai (0.1.0 on PyPI), import Inferall from inferall, and call text(), chat(), vision(), or generate(). TypeScript: npm install @inferall/sdk (0.1.0 on npm), import { Inferall } from "@inferall/sdk" with the same surface. Both read INFERALL_API_KEY from the environment (the legacy AI_GATEWAY_KEY is still accepted). Prefer the official OpenAI or Anthropic SDKs? Point them at api.inferall.ai and everything works unchanged.

Is InferAll open source?

The InferAll gateway is built and maintained by Kindly Robotics, Inc. The infrastructure code is available on GitHub at github.com/kindlyrobotics/infra. The gateway runs on Fly.io for low-latency global inference.