Meta's Llama 4 Maverick (`meta/llama-4-maverick-17b-128e-instruct`) is available free via NVIDIA NIM through InferAll. No credit card, no billing setup — create a key and start calling it now. ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) response = client.chat.completions.create( model="meta/llama-4-maverick-17b-128e-instruct", messages=[{"role": "user", "content": "What makes Llama 4 Maverick different from Llama 3?"}], max_tokens=512, ) print(response.choices[0].message.content) ``` --- ### What is Llama 4 Maverick? Llama 4 Maverick is Meta's 17B active parameter Mixture of Experts (MoE) model with 128 experts (`17b-128e`). The MoE architecture activates a subset of its 128 expert networks per token, giving it performance significantly above its active parameter count while keeping inference costs low. Maverick sits in the Llama 4 family alongside Llama 4 Scout (smaller, faster) and is optimized for instruction-following, reasoning, and code. It's available for free hosting on NVIDIA NIM, which InferAll routes to at zero cost. --- ### TypeScript / Node.js ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "meta/llama-4-maverick-17b-128e-instruct", messages: [{ role: "user", content: "Explain mixture of experts architectures." }], }); console.log(response.choices[0].message.content); ``` ### Streaming ```python with client.chat.completions.create( model="meta/llama-4-maverick-17b-128e-instruct", messages=[{"role": "user", "content": "Walk me through the Llama 4 architecture."}], stream=True, ) as stream: for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` ### Claude Code / Cline / Cursor Point any Anthropic-compatible client at InferAll and Llama 4 Maverick routes under the "opus" tier: ```sh export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=ifu_your_key_here ``` --- ### Free models on InferAll (selection) | Model | Size | Notes | |---|---|---| | `meta/llama-4-maverick-17b-128e-instruct` | 17B×128E MoE | Latest Llama generation | | `meta/llama-3.3-70b-instruct` | 70B | Strong general purpose | | `meta/llama-3.1-70b-instruct` | 70B | Stable workhorse | | `nvidia/nemotron-3-super-120b-a12b` | 120B | NVIDIA's largest free model | | `mistralai/mixtral-8x7b-instruct-v0.1` | 46.7B MoE | Fast, efficient | | `google/gemma-3-12b-it` | 12B | Google's compact model | The [full free model list](https://api.inferall.ai/ai/v1/models) is always available at the API — filter by `inputPerM: 0`. --- ### Get started [inferall.ai/keys](https://inferall.ai/keys) — sign up free, then activate via the $5 starter pack at [/billing](https://inferall.ai/billing). The $5 becomes spendable balance: 118+ open NIM models stay $0 in/out against it (within the free-plan daily request caps); premium providers (OpenAI, Anthropic, Google) bill at the provider's published per-token rate with zero markup.

Meta Llama 4 Maverick — free API at $0 in/out

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

NVIDIA Nemotron 3 Super 120B vs Claude Opus 4: when the free model is good enough

DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible