If you have shipped anything with LLMs lately, you have probably written the same plumbing twice. The OpenAI SDK wants one client, one auth scheme, one response shape. The Anthropic SDK wants another. Google's Gemini SDK wants a third. Add NVIDIA NIM or Replicate for open models and you are now maintaining four sets of credentials, four error-handling paths, and four mental models — all to do essentially the same thing: send a prompt, get tokens back. This is the problem an **AI gateway** solves. Instead of integrating each provider's SDK directly, you point your code at a single endpoint that speaks the API formats you already know. One key, one base URL, and the ability to **switch between OpenAI, Anthropic, and Gemini with one parameter** — not a rewrite. This is a practical, code-forward walkthrough of how that works with InferAll. Everything below is real and runnable. ### The core idea: a drop-in, OpenAI- and Anthropic-compatible endpoint InferAll exposes two compatible surfaces: - An **OpenAI-compatible** endpoint at `https://api.inferall.ai/v1` - An **Anthropic-compatible** endpoint at `https://api.inferall.ai/v1/messages` Both are drop-in. You do not learn a new client library. You take the SDK you are already using, change the base URL, and swap in a single InferAll key (it starts with `ifu_`). That one key routes to **190+ models across six providers** — OpenAI, Anthropic, Google Gemini, NVIDIA NIM, Replicate, and Runway. Because the gateway is compatible at the protocol level, the same trick works for tools that wrap those SDKs — Claude Code, Cline, Cursor, your own agents — by setting `ANTHROPIC_BASE_URL` or `OPENAI_BASE_URL`. No code changes. ### (a) The OpenAI Python SDK, pointed at the gateway Here is the entire change. You keep `openai`, keep `client.chat.completions.create`, keep your prompt-handling code. You change two arguments: ```python from openai import OpenAI client = OpenAI( api_key="ifu_...", # your InferAll key base_url="https://api.inferall.ai/v1", # the gateway, not api.openai.com ) resp = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "user", "content": "Summarize the CAP theorem in two sentences."} ], ) print(resp.choices[0].message.content) ``` That is a working OpenAI call going through the gateway. The request and response shapes are exactly what the OpenAI SDK already expects. ### (b) Switching to Claude by changing only the model string Now the payoff. Suppose you want to A/B that same prompt against a Claude model. With per-provider SDKs you would install `anthropic`, build a new client, translate `messages` into Anthropic's format, and re-handle the response object. Through the gateway, you change **one string**: ```python resp = client.chat.completions.create( model="claude-sonnet-4-6", # was "gpt-4o" messages=[ {"role": "user", "content": "Summarize the CAP theorem in two sentences."} ], ) print(resp.choices[0].message.content) ``` Same client. Same method. Same `messages`. Same response parsing. The gateway maps the OpenAI-shaped request onto the right provider behind the scenes and hands you back an OpenAI-shaped response. Want to try Gemini next? Set `model` to a Gemini model ID and run it again. This is what makes real model comparison practical instead of a refactoring project — you can loop over a list of model strings and benchmark them against your own prompts and your own data. ```python for model in ["gpt-4o", "claude-sonnet-4-6", "gemini-2.5-pro"]: resp = client.chat.completions.create( model=model, messages=[{"role": "user", "content": "Summarize the CAP theorem in two sentences."}], ) print(model, "→", resp.choices[0].message.content) ``` ### (c) Talking to the Anthropic-compatible endpoint with curl If your stack is built around the Anthropic Messages format — or you are wiring up an agent that already speaks it — use the `/v1/messages` endpoint directly. Here it is with raw `curl`: ```bash curl https://api.inferall.ai/v1/messages \ -H "x-api-key: ifu_..." \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 256, "messages": [ { "role": "user", "content": "Summarize the CAP theorem in two sentences." } ] }' ``` The only difference from calling Anthropic directly is the host and the key. That is the whole point of "drop-in" — your existing request bodies keep working. ### Why a gateway beats hand-rolling per-provider SDKs It is tempting to think you can just write a thin adapter layer yourself. You can. But the cost is rarely the first integration — it is everything that comes after: - **Format translation, forever.** OpenAI, Anthropic, and Gemini disagree on message shapes, system prompts, tool-call schemas, and streaming formats. Every time a provider revises its API, your adapter is now your problem to maintain. A gateway absorbs that translation so your application code stays in one format. - **One credential instead of N.** You manage a single `ifu_` key and one base URL rather than a key vault per provider, each with its own rotation and rate-limit quirks. - **Switching is configuration, not engineering.** When a newer, cheaper, or better-suited model ships, you change a model string. Your architecture is not married to one vendor's roadmap. - **A single, honest bill.** Premium provider tokens are passed through at the published provider price with **zero markup**, on one invoice — so the gateway is a convenience layer, not a tax. There is also a free tier on open-source models hosted via NVIDIA NIM, so cheap and exploratory traffic does not have to touch a paid provider at all. The trade-off worth naming: you are adding a hop. That is the honest cost of any gateway. What you get back is that the hard, recurring work — protocol compatibility, credential sprawl, and the friction of trying a new model — collapses into changing a URL and a string. ### A realistic pattern: cheap inner loop, premium for hard tasks Because every model lives behind the same interface, routing by task becomes trivial. Send the chatty, high-volume turns — retries, classification, simple summarization — to a free open-source model on NVIDIA NIM. Reserve a premium Claude, GPT, or Gemini model for the requests that actually need the extra capability. Same client, different model string. No second SDK, no second key. ### Get started Grab a free key and try the switching example above: 1. Sign up and create an InferAll key (prefix `ifu_`). 2. Point your existing OpenAI SDK at `https://api.inferall.ai/v1`, or your Anthropic-format requests at `https://api.inferall.ai/v1/messages`. 3. Change the `model` string to compare OpenAI, Anthropic, and Gemini against your own prompts. The free tier runs open models via NVIDIA NIM, so you can start experimenting before you ever reach for a premium provider. One key, one base URL, 190+ models — and switching is a parameter, not a project.

One API Key for OpenAI, Anthropic, and Google: A Drop-In Unified LLM API

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

One observability ship found three production bugs in five hours

DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible