Anthropic's Claude 4 family is available through InferAll — using the same Anthropic SDK you already have, with no separate API key to manage.
Current Claude models available:
| Model | Input | Output | Best for |
|---|---|---|---|
| `claude-opus-4-8` | $15.00/M | $75.00/M | Hardest reasoning, agentic tasks |
| `claude-opus-4-7` | $15.00/M | $75.00/M | Complex analysis, long documents |
| `claude-opus-4-6` | $15.00/M | $75.00/M | High-stakes generation |
| `claude-sonnet-4-6` | $3.00/M | $15.00/M | Most tasks — the daily driver |
| `claude-sonnet-4-5-20250929` | $3.00/M | $15.00/M | Sonnet 4.5 milestone |
| `claude-haiku-4-5-20251001` | $0.80/M | $4.00/M | Fast, cheap, high-volume |
All at Anthropic's published list rates, zero markup.
---
### With the Anthropic SDK
```python
import anthropic
client = anthropic.Anthropic(
api_key="ifu_your_key_here", # get one at inferall.ai/keys
base_url="https://api.inferall.ai",
)
# Opus 4 — for your hardest tasks
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Analyze this contract for risk clauses: ..."}],
)
# Sonnet 4.6 — the everyday workhorse
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": "Write unit tests for this function: ..."}],
)
print(message.content[0].text)
```
The only change from direct Anthropic: add `base_url="https://api.inferall.ai"`. Your existing client code, tool use, streaming, and system prompts all work unchanged.
---
### TypeScript
```typescript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.INFERALL_API_KEY,
baseURL: "https://api.inferall.ai",
});
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 256,
messages: [{ role: "user", content: "Explain this stack trace in plain English." }],
});
console.log(message.content[0].text);
```
---
### Claude Code with free inference
```bash
export ANTHROPIC_BASE_URL=https://api.inferall.ai
export ANTHROPIC_API_KEY=ifu_your_key_here
claude # Claude Code now routes through InferAll
```
By default, Claude Code requests route to free NVIDIA models (zero cost). Use a model prefix to force a specific provider:
```bash
# In Claude Code, when you want actual Claude:
# Set model to anthropic/claude-sonnet-4-6 in settings
```
---
### OpenAI-compatible endpoint
If your stack uses the OpenAI SDK format, Claude models are reachable there too:
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here",
)
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Refactor this code for clarity."}],
max_tokens=512,
)
```
---
### Why one key for Claude + everything else
The same `ifu_...` key that calls Claude also routes to GPT-4.1, Gemini 2.5 Flash, and 118+ free NVIDIA NIM models. When you want to benchmark Claude Sonnet against GPT-4.1-mini on a task, it's a single string change — no credential juggling.
Free trial: 200 requests, no credit card. Card required only to continue past the trial or hit paid providers.
Get your key at [inferall.ai/keys](https://inferall.ai/keys).
← Blog
Claude Opus 4 and Sonnet 4 — via one API key
How to call Claude Opus 4, Sonnet 4, and Haiku 4 through InferAll's Anthropic-compatible endpoint. Same SDK you already use — just change the base URL.
InferAll Team
3 min read
AnthropicClaudeClaude Opus 4Claude Sonnet 4LLM APIAI gatewaydeveloper tools
Share
Related
3 min read
Gemini 2.5 Flash API — via one unified key
How to call Google's Gemini 2.5 Flash through InferAll's OpenAI-compatible endpoint. Same SDK, same key as your other models. No Google Cloud setup required.
3 min read
Llama 3.1 70B — free API, OpenAI-compatible, no credit card
How to call Meta Llama 3.1 70B for free through InferAll's OpenAI-compatible endpoint. Hosted on NVIDIA NIM, $0 within the free tier, works with the OpenAI SDK you already have.
3 min read
o3 and o4-mini API — OpenAI reasoning models via one key
How to call OpenAI's o3 and o4-mini reasoning models through InferAll's OpenAI-compatible endpoint. Same SDK, same key — no separate API access needed.