The fastest way to kill a side project is friction on line one. You want to prototype a feature, test a prompt, or wire up an agent — and before you can print a single token you are staring at four billing pages and four sets of credentials. The friction is real enough that a lot of good ideas never get past "I'll try it this weekend." This post is a practical, code-forward walkthrough of starting with InferAll: **118+ open-source models plus every major premium provider behind one API key, using the SDK you already have installed**, for the cost of a $5 starter pack that becomes usage credit you spend on whatever you want. ### What you actually get for $5 InferAll routes **118+ open-source models** on **NVIDIA NIM** (Llama 3.1 70B/8B, Mixtral, Nemotron, CodeLlama, and more) plus paid endpoints into **OpenAI, Anthropic, Google, and others** — all behind a single `ifu_` key and a single invoice. Premium providers are billed **pay-as-you-go at their published token price with zero markup**. The gateway is a convenience layer, not a tax. One honest detail up front: there is **no anonymous free tier** — you fund a key with a $5 starter pack at sign-up. That $5 sits in your account as usage credit you can spend on open or premium models. We learned the hard way that "no card to start" attracted thousands of bot signups and zero real builders; the small friction filters those out. If you're a real developer prototyping a side project, $5 stretches a long way on NVIDIA NIM open models. Everything runs through two drop-in endpoints: - An **OpenAI-compatible** endpoint at `https://api.inferall.ai/v1` - An **Anthropic-compatible** endpoint at `https://api.inferall.ai/v1/messages` You do not learn a new client library. You take the SDK you already use, point it at the gateway, and pass any model ID — open or premium. ### Open model IDs you can use right now These three route to NVIDIA NIM open models and are good defaults to start with: - `meta/llama-3.1-70b-instruct` — strong general-purpose instruct model - `meta/llama-3.1-8b-instruct` — smaller and snappier for high-volume, simple turns - `mistralai/mixtral-8x7b-instruct-v0.1` — a solid mixture-of-experts option The full, current list is available from the API itself — no need to trust a hardcoded table in a blog post that may drift: ```bash curl https://api.inferall.ai/ai/v1/models \ -H "Authorization: Bearer ifu_..." ``` ### (a) The OpenAI Python SDK, pointed at the gateway Here is a complete, runnable call. You keep `openai`, keep `client.chat.completions.create`, and change exactly two arguments — the key and the base URL — plus a model string: ```python from openai import OpenAI client = OpenAI( api_key="ifu_...", # your InferAll key base_url="https://api.inferall.ai/v1", # the gateway, not api.openai.com ) resp = client.chat.completions.create( model="meta/llama-3.1-70b-instruct", # an open model on NVIDIA NIM messages=[ {"role": "user", "content": "Explain backpropagation to a new grad in three sentences."} ], ) print(resp.choices[0].message.content) ``` That is a working call to an open model, billed against your $5 starter balance at the listed open-model rate, using the exact request and response shapes the OpenAI SDK already expects. No new dependency, no provider-specific client. ### (b) The same thing with curl If you are not in Python — or you just want to confirm your key works before writing any code — hit the OpenAI-compatible endpoint directly: ```bash curl https://api.inferall.ai/v1/chat/completions \ -H "Authorization: Bearer ifu_..." \ -H "Content-Type: application/json" \ -d '{ "model": "meta/llama-3.1-8b-instruct", "messages": [ { "role": "user", "content": "Write a haiku about cold start latency." } ] }' ``` Same endpoint, same JSON shape you would send to OpenAI — the only differences are the host and the key. Swap `meta/llama-3.1-8b-instruct` for `mistralai/mixtral-8x7b-instruct-v0.1` and you are testing a different model with a one-word change. ### (c) When you need premium: change one string The point of starting on open models is that nothing is throwaway. The moment you need more capability than an open model offers — a genuinely hard task, a long-context job, a quality bar a prototype model misses — you reach for a premium provider by changing the **model string**. Same client, same key, same code: ```python resp = client.chat.completions.create( model="gpt-4o", # premium provider, pay-as-you-go messages=[ {"role": "user", "content": "Explain backpropagation to a new grad in three sentences."} ], ) ``` That call crosses into paid premium usage at OpenAI's published token price — **zero markup** from us. The trade is straightforward: you pay the provider's price; you just do not manage four sets of credentials to do it. Grab the exact premium model IDs from `GET /ai/v1/models`. ### The pattern that makes your balance go further Because every model — open or premium — lives behind the same interface, you can route by task instead of by vendor. Send the chatty, high-volume, low-stakes turns to an open model: classification, retries, first-pass summarization, draft generation, the inner loop of an agent. Reserve a premium model only for the requests that actually need the extra capability. In practice that looks like a single function with a model string chosen per call: ```python def complete(prompt, hard=False): model = "gpt-4o" if hard else "meta/llama-3.1-70b-instruct" resp = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], ) return resp.choices[0].message.content ``` The cheap path stays on NVIDIA NIM at open-model rates. The expensive path is one boolean away when you need it. No second SDK, no second key, no architectural commitment to either side. ### Get started You can have a key and a working call in a couple of minutes: 1. Sign up at [inferall.ai](https://inferall.ai) and fund a key with the $5 starter pack (that $5 is your usage credit — spend it on open or premium models). 2. Point your existing OpenAI SDK at `https://api.inferall.ai/v1`, or Anthropic-format requests at `https://api.inferall.ai/v1/messages`. 3. Pass a model ID like `meta/llama-3.1-70b-instruct` and ship your prototype. Start on open models for the high-volume turns, and switch to premium with a single string only when your project earns it. Create a key at [inferall.ai](https://inferall.ai) and run the example above.

Start Building With LLMs: 118+ Open Models Behind One API Key (from $5)

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

NVIDIA Nemotron 3 Super 120B vs Claude Opus 4: when the free model is good enough

DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible