← Blog

Claude Opus 4 and Sonnet 4 — via one API key

How to call Claude Opus 4, Sonnet 4, and Haiku 4 through InferAll's Anthropic-compatible endpoint. Same SDK you already use — just change the base URL.

InferAll Team

3 min read
AnthropicClaudeClaude Opus 4Claude Sonnet 4LLM APIAI gatewaydeveloper tools
Anthropic's Claude 4 family is available through InferAll — using the same Anthropic SDK you already have, with no separate API key to manage. Current Claude models available: | Model | Input | Output | Best for | |---|---|---|---| | `claude-opus-4-8` | $15.00/M | $75.00/M | Hardest reasoning, agentic tasks | | `claude-opus-4-7` | $15.00/M | $75.00/M | Complex analysis, long documents | | `claude-opus-4-6` | $15.00/M | $75.00/M | High-stakes generation | | `claude-sonnet-4-6` | $3.00/M | $15.00/M | Most tasks — the daily driver | | `claude-sonnet-4-5-20250929` | $3.00/M | $15.00/M | Sonnet 4.5 milestone | | `claude-haiku-4-5-20251001` | $0.80/M | $4.00/M | Fast, cheap, high-volume | All at Anthropic's published list rates, zero markup. --- ### With the Anthropic SDK ```python import anthropic client = anthropic.Anthropic( api_key="ifu_your_key_here", # get one at inferall.ai/keys base_url="https://api.inferall.ai", ) # Opus 4 — for your hardest tasks message = client.messages.create( model="claude-opus-4-8", max_tokens=1024, messages=[{"role": "user", "content": "Analyze this contract for risk clauses: ..."}], ) # Sonnet 4.6 — the everyday workhorse message = client.messages.create( model="claude-sonnet-4-6", max_tokens=512, messages=[{"role": "user", "content": "Write unit tests for this function: ..."}], ) print(message.content[0].text) ``` The only change from direct Anthropic: add `base_url="https://api.inferall.ai"`. Your existing client code, tool use, streaming, and system prompts all work unchanged. --- ### TypeScript ```typescript import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic({ apiKey: process.env.INFERALL_API_KEY, baseURL: "https://api.inferall.ai", }); const message = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 256, messages: [{ role: "user", content: "Explain this stack trace in plain English." }], }); console.log(message.content[0].text); ``` --- ### Claude Code with free inference ```bash export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=ifu_your_key_here claude # Claude Code now routes through InferAll ``` By default, Claude Code requests route to free NVIDIA models (zero cost). Use a model prefix to force a specific provider: ```bash # In Claude Code, when you want actual Claude: # Set model to anthropic/claude-sonnet-4-6 in settings ``` --- ### OpenAI-compatible endpoint If your stack uses the OpenAI SDK format, Claude models are reachable there too: ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", ) response = client.chat.completions.create( model="claude-sonnet-4-6", messages=[{"role": "user", "content": "Refactor this code for clarity."}], max_tokens=512, ) ``` --- ### Why one key for Claude + everything else The same `ifu_...` key that calls Claude also routes to GPT-4.1, Gemini 2.5 Flash, and 118+ free NVIDIA NIM models. When you want to benchmark Claude Sonnet against GPT-4.1-mini on a task, it's a single string change — no credential juggling. Free trial: 200 requests, no credit card. Card required only to continue past the trial or hit paid providers. Get your key at [inferall.ai/keys](https://inferall.ai/keys).