NVIDIA's Nemotron 3 Super 120B (`nvidia/nemotron-3-super-120b-a12b`) is one of the most capable open-weight models available today — and it runs at $0 input / $0 output on NVIDIA NIM through InferAll, within the free-plan daily request caps (100 chat / 50 text / day, reset 00:00 UTC). Activate via the $5 starter pack at [/billing](https://inferall.ai/billing) — the $5 becomes spendable balance for premium providers (OpenAI, Anthropic, Google at the published per-token rate with zero markup). Here's how to call it using the standard OpenAI SDK: ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) response = client.chat.completions.create( model="nvidia/nemotron-3-super-120b-a12b", messages=[{"role": "user", "content": "What makes a good system prompt?"}], max_tokens=512, ) print(response.choices[0].message.content) ``` That's it. The same call works with any OpenAI-compatible library — LangChain, LlamaIndex, LiteLLM, CrewAI, and any other framework that accepts an `openai_api_base` override. --- ### Why Nemotron 120B **Scale.** At 120B parameters, Nemotron outperforms many models twice its cost on reasoning, instruction-following, and long-context tasks. It scores well on coding and math benchmarks without the rate-limit friction of commercial APIs. **Free on NIM.** NVIDIA hosts Nemotron on their DGX Cloud infrastructure via NIM (NVIDIA Inference Microservices), which InferAll exposes at $0. There's no inference cost for us to pass through, so it stays free within the allowance. **OpenAI-compatible.** The model is served through InferAll's OpenAI-compatible endpoint — you get standard `ChatCompletion` responses, streaming, tool use, and JSON mode, all working with whatever OpenAI client you already have. --- ### TypeScript / Node.js ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "nvidia/nemotron-3-super-120b-a12b", messages: [{ role: "user", content: "Explain backpropagation." }], }); ``` ### Streaming ```python with client.chat.completions.create( model="nvidia/nemotron-3-super-120b-a12b", messages=[{"role": "user", "content": "Write a poem about distributed systems."}], stream=True, ) as stream: for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` ### Claude Code / Cline / Cursor Set these two environment variables and any Anthropic-compatible agent routes through InferAll: ```sh export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=ifu_your_key_here ``` When your agent tries to use `claude-opus-4-8`, the gateway maps it to Nemotron (opus-class model, same relative capability tier). Works immediately, no configuration changes. --- ### Comparing the free models All of these are $0 on InferAll, hosted on NVIDIA NIM: | Model | Size | Best for | |---|---|---| | `nvidia/nemotron-3-super-120b-a12b` | 120B | Complex reasoning, coding, long context | | `meta/llama-3.1-70b-instruct` | 70B | General chat, instruction following | | `meta/llama-3.1-8b-instruct` | 8B | Fast responses, simple tasks | | `mistralai/mixtral-8x7b-instruct-v0.1` | 46.7B (MoE) | Speed + quality balance | See the [live model list](https://api.inferall.ai/ai/v1/models) for all 118+ free models. --- ### Get a key [inferall.ai/keys](https://inferall.ai/keys) — sign up free, then activate via the $5 starter pack at [/billing](https://inferall.ai/billing). The $5 becomes spendable balance: 118+ open NIM models stay $0 in/out against it (within the free-plan daily request caps); premium providers (OpenAI, Anthropic, Google) bill at the provider's published per-token rate with zero markup. ```sh # Verify the model is live curl https://api.inferall.ai/ai/v1/models | jq '."nvidia/nemotron-3-super-120b-a12b"' ```

NVIDIA Nemotron 120B — free, via the OpenAI API

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

NVIDIA Nemotron 3 Super 120B vs Claude Opus 4: when the free model is good enough

DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible