Question 1

What is InferAll?

Accepted Answer

InferAll is an Anthropic-compatible and OpenAI-compatible AI gateway. Point your existing Claude Code, Cline, Cursor, or SDK code at api.inferall.ai with one environment variable and get access to 255+ models across OpenAI, Anthropic, Google, NVIDIA NIM, Replicate, and Runway — plus 100,000 free tokens/month on 186 open-source models.

Question 2

How do I use InferAll with Claude Code?

Accepted Answer

Set ANTHROPIC_API_KEY to your InferAll key and ANTHROPIC_BASE_URL to https://api.inferall.ai, then run claude as usual. InferAll exposes an Anthropic-compatible /v1/messages endpoint, so Claude Code sends every request through InferAll. Route cheap turns to free NVIDIA-hosted OSS models and let premium Claude or GPT handle hard tasks.

Question 3

How does automatic failover work?

Accepted Answer

When a provider rate-limits, returns a 5xx, or times out (30s default), InferAll's gateway transparently retries the same request against the next provider that can serve the model class. A single Anthropic outage or rate-limit no longer takes down your agent. Failover happens server-side — no client-side retry logic required.

Question 4

How many free models does InferAll offer?

Accepted Answer

186 open-source models hosted on NVIDIA NIM are free up to 100,000 tokens per month. The roster includes Llama 3.1 405B, Mixtral, Nemotron, and CodeLlama. No credit card required.

Question 5

How does InferAll compare to OpenRouter, LiteLLM, Portkey, or Vercel AI Gateway?

Accepted Answer

Other gateways aggregate paid providers. InferAll is the only gateway that bundles an Anthropic-compatible endpoint, a free OSS inference tier on NVIDIA NIM, and automatic cross-provider failover into one product — at zero markup on premium tokens.

Question 6

Is there a native InferAll SDK?

Accepted Answer

Yes — the Python SDK is live on PyPI today as inferall-ai 0.1.0. Install with pip install inferall-ai, import Inferall from inferall, and call text(), chat(), vision(), or generate(). It reads INFERALL_API_KEY from the environment (the legacy AI_GATEWAY_KEY is still accepted). A TypeScript SDK is in progress and not yet published; in the meantime, point the official OpenAI or Anthropic SDKs at api.inferall.ai and everything works unchanged.

Question 7

Is InferAll open source?

Accepted Answer

The InferAll gateway is built and maintained by Kindly Robotics, Inc. The infrastructure code is available on GitHub at github.com/kindlyrobotics/infra. The gateway runs on Fly.io for low-latency global inference.

Run Claude Code on free
OSS models. Upgrade when
you need to.

Free OSS tier

Multi-provider routing

Drop-in compatibility

One bill

Automatic failover across providers

Use with your favorite editor

Claude Code

Cline

Cursor

How it works

Get your API key

Point your tool at InferAll

Let the gateway route

Pricing

Questions

What is InferAll?

How do I use InferAll with Claude Code?

How does automatic failover work?

How many free models does InferAll offer?

How does InferAll compare to OpenRouter, LiteLLM, Portkey, or Vercel AI Gateway?

Is there a native InferAll SDK?

Is InferAll open source?

Run Claude Code on freeOSS models. Upgrade whenyou need to.

Free OSS tier

Multi-provider routing

Drop-in compatibility

One bill

Automatic failover across providers

Use with your favorite editor

Claude Code

Cline

Cursor

How it works

Get your API key

Point your tool at InferAll

Let the gateway route

Pricing

Questions

What is InferAll?

How do I use InferAll with Claude Code?

How does automatic failover work?

How many free models does InferAll offer?

How does InferAll compare to OpenRouter, LiteLLM, Portkey, or Vercel AI Gateway?

Is there a native InferAll SDK?

Is InferAll open source?

Run Claude Code on free
OSS models. Upgrade when
you need to.