Most AI apps are locked to one provider. When OpenAI has an outage, you're down. When Anthropic raises prices, you rebuild. When a better model launches, you rewrite your integration.
InferAll gives you one [AI inference API](/solutions/ai-inference-api) key that routes to any provider — OpenAI, Anthropic, Google, NVIDIA, Replicate, and Runway. Switching is a parameter change, not a rewrite.
---
### One prompt, four providers
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one at inferall.ai/keys — no card required
)
prompt = "What are the tradeoffs between SQL and NoSQL databases?"
# Route to any provider by changing just `model`
for model in [
"meta/llama-3.1-70b-instruct", # NVIDIA NIM — free
"google/gemma-4-31b-it", # Google Gemma — free
"qwen/qwen3-coder-480b-a35b-instruct", # Qwen — free
"anthropic/claude-sonnet-4-6", # Anthropic Claude — paid
]:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=200,
)
print(f"\n=== {model.split('/')[-1]} ===")
print(response.choices[0].message.content)
```
The first three models are free (NVIDIA NIM). The last requires a card on file but bills at Anthropic's published per-token rate with zero markup.
---
### Automatic failover
InferAll falls back to the next provider automatically when one fails. If the primary returns a 500, rate limit, or timeout, the gateway retries on the configured fallback chain — without any code in your application.
```python
# This call retries on NVIDIA if Anthropic fails:
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-6", # primary
messages=[{"role": "user", "content": "Explain neural networks."}],
)
# provider=anthropic attempted first, nvidia fallback on failure
```
No retries in your application code, no provider-specific error handling.
---
### Compare providers on the same task
```python
import asyncio
async def compare(prompt: str, models: list[str]):
import httpx
results = []
async with httpx.AsyncClient() as http:
tasks = [
http.post(
"https://api.inferall.ai/v1/chat/completions",
headers={"Authorization": "Bearer ifu_your_key"},
json={
"model": m,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 150,
},
timeout=30,
)
for m in models
]
responses = await asyncio.gather(*tasks, return_exceptions=True)
for model, resp in zip(models, responses):
if isinstance(resp, Exception):
print(f"{model}: error")
else:
data = resp.json()
text = data["choices"][0]["message"]["content"]
print(f"\n{model.split('/')[-1]}:\n{text[:300]}")
asyncio.run(compare(
"Write a haiku about distributed systems.",
["meta/llama-3.1-70b-instruct", "google/gemma-4-31b-it", "qwen/qwen3.5-122b-a10b"]
))
```
---
### Route by task type
Different providers excel at different tasks. InferAll lets you route at the application layer:
```python
def get_model(task: str) -> str:
if task == "code":
return "qwen/qwen3-coder-480b-a35b-instruct" # best free coder
elif task == "reasoning":
return "nvidia/nemotron-3-super-120b-a12b" # largest free model
elif task == "fast":
return "meta/llama-3.1-8b-instruct" # fastest free model
else:
return "meta/llama-3.1-70b-instruct" # balanced default
task = "code"
model = get_model(task)
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Write a binary search in Python."}],
)
```
---
### Free models available today
All of these route through NVIDIA NIM and cost $0:
```sh
curl https://api.inferall.ai/ai/v1/models \
| python3 -c "
import sys, json
models = json.load(sys.stdin)
free = [(k, v) for k, v in models.items() if v.get('inputPerM') == 0 and v.get('type') == 'token']
print(f'{len(free)} free token models')
for k, _ in sorted(free)[:10]:
print(f' {k}')
"
```
---
### Get started
[inferall.ai/keys](https://inferall.ai/keys) — no credit card required. 200 free requests across any provider, then add a card to continue (free models stay $0; paid providers bill at the provider's rate with zero markup).
← Blog
Switch between AI providers at runtime — one key, no code changes
How to route the same prompt to OpenAI, Anthropic, Google, or NVIDIA in one script using InferAll's unified API. Free to start, no credit card required.
InferAll Team
3 min read
AI gatewaymulti-providerOpenAI APIAnthropic APINVIDIA NIMfree LLM APIprovider switching
Share
Related
2 min read
GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano — via one API key
How to call OpenAI's GPT-4.1 family through InferAll's OpenAI-compatible endpoint. Try all three tiers — nano to full — with the same key, same SDK, no provider switching.
3 min read
Mistral Codestral 22B — free API for code generation
How to call Codestral 22B for free using any OpenAI-compatible SDK. Mistral's code-specialized model, hosted on NVIDIA NIM through InferAll. No credit card required.
3 min read
Free GPT-4 alternatives — open-source models via the OpenAI API
The top free open-source alternatives to GPT-4, callable with the same OpenAI SDK. No code changes, no credit card required. Hosted on NVIDIA NIM through InferAll.