← Blog

Switch between AI providers at runtime — one key, no code changes

How to route the same prompt to OpenAI, Anthropic, Google, or NVIDIA in one script using InferAll's unified API. Free to start, no credit card required.

InferAll Team

3 min read
AI gatewaymulti-providerOpenAI APIAnthropic APINVIDIA NIMfree LLM APIprovider switching
Most AI apps are locked to one provider. When OpenAI has an outage, you're down. When Anthropic raises prices, you rebuild. When a better model launches, you rewrite your integration. InferAll gives you one [AI inference API](/solutions/ai-inference-api) key that routes to any provider — OpenAI, Anthropic, Google, NVIDIA, Replicate, and Runway. Switching is a parameter change, not a rewrite. --- ### One prompt, four providers ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys — no card required ) prompt = "What are the tradeoffs between SQL and NoSQL databases?" # Route to any provider by changing just `model` for model in [ "meta/llama-3.1-70b-instruct", # NVIDIA NIM — free "google/gemma-4-31b-it", # Google Gemma — free "qwen/qwen3-coder-480b-a35b-instruct", # Qwen — free "anthropic/claude-sonnet-4-6", # Anthropic Claude — paid ]: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], max_tokens=200, ) print(f"\n=== {model.split('/')[-1]} ===") print(response.choices[0].message.content) ``` The first three models are free (NVIDIA NIM). The last requires a card on file but bills at Anthropic's published per-token rate with zero markup. --- ### Automatic failover InferAll falls back to the next provider automatically when one fails. If the primary returns a 500, rate limit, or timeout, the gateway retries on the configured fallback chain — without any code in your application. ```python # This call retries on NVIDIA if Anthropic fails: response = client.chat.completions.create( model="anthropic/claude-sonnet-4-6", # primary messages=[{"role": "user", "content": "Explain neural networks."}], ) # provider=anthropic attempted first, nvidia fallback on failure ``` No retries in your application code, no provider-specific error handling. --- ### Compare providers on the same task ```python import asyncio async def compare(prompt: str, models: list[str]): import httpx results = [] async with httpx.AsyncClient() as http: tasks = [ http.post( "https://api.inferall.ai/v1/chat/completions", headers={"Authorization": "Bearer ifu_your_key"}, json={ "model": m, "messages": [{"role": "user", "content": prompt}], "max_tokens": 150, }, timeout=30, ) for m in models ] responses = await asyncio.gather(*tasks, return_exceptions=True) for model, resp in zip(models, responses): if isinstance(resp, Exception): print(f"{model}: error") else: data = resp.json() text = data["choices"][0]["message"]["content"] print(f"\n{model.split('/')[-1]}:\n{text[:300]}") asyncio.run(compare( "Write a haiku about distributed systems.", ["meta/llama-3.1-70b-instruct", "google/gemma-4-31b-it", "qwen/qwen3.5-122b-a10b"] )) ``` --- ### Route by task type Different providers excel at different tasks. InferAll lets you route at the application layer: ```python def get_model(task: str) -> str: if task == "code": return "qwen/qwen3-coder-480b-a35b-instruct" # best free coder elif task == "reasoning": return "nvidia/nemotron-3-super-120b-a12b" # largest free model elif task == "fast": return "meta/llama-3.1-8b-instruct" # fastest free model else: return "meta/llama-3.1-70b-instruct" # balanced default task = "code" model = get_model(task) response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": "Write a binary search in Python."}], ) ``` --- ### Free models available today All of these route through NVIDIA NIM and cost $0: ```sh curl https://api.inferall.ai/ai/v1/models \ | python3 -c " import sys, json models = json.load(sys.stdin) free = [(k, v) for k, v in models.items() if v.get('inputPerM') == 0 and v.get('type') == 'token'] print(f'{len(free)} free token models') for k, _ in sorted(free)[:10]: print(f' {k}') " ``` --- ### Get started [inferall.ai/keys](https://inferall.ai/keys) — no credit card required. 200 free requests across any provider, then add a card to continue (free models stay $0; paid providers bill at the provider's rate with zero markup).