← Blog

NVIDIA Nemotron 120B — free, via the OpenAI API

How to call NVIDIA Nemotron 3 Super 120B for free using any OpenAI-compatible SDK. No credit card required. Works with Python, TypeScript, LangChain, and Claude Code.

InferAll Team

3 min read
NVIDIA NIMNemotronfree LLM APIOpenAI APIopen sourcedeveloper tools
NVIDIA's Nemotron 3 Super 120B (`nvidia/nemotron-3-super-120b-a12b`) is one of the most capable open-weight models available today — and it runs free on NVIDIA NIM through InferAll. No billing setup, no credit card. You get 200 free requests to evaluate it, then a card-on-file activates the full free allowance. Here's how to call it using the standard OpenAI SDK: ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) response = client.chat.completions.create( model="nvidia/nemotron-3-super-120b-a12b", messages=[{"role": "user", "content": "What makes a good system prompt?"}], max_tokens=512, ) print(response.choices[0].message.content) ``` That's it. The same call works with any OpenAI-compatible library — LangChain, LlamaIndex, LiteLLM, CrewAI, and any other framework that accepts an `openai_api_base` override. --- ### Why Nemotron 120B **Scale.** At 120B parameters, Nemotron outperforms many models twice its cost on reasoning, instruction-following, and long-context tasks. It scores well on coding and math benchmarks without the rate-limit friction of commercial APIs. **Free on NIM.** NVIDIA hosts Nemotron on their DGX Cloud infrastructure via NIM (NVIDIA Inference Microservices), which InferAll exposes at $0. There's no inference cost for us to pass through, so it stays free within the allowance. **OpenAI-compatible.** The model is served through InferAll's OpenAI-compatible endpoint — you get standard `ChatCompletion` responses, streaming, tool use, and JSON mode, all working with whatever OpenAI client you already have. --- ### TypeScript / Node.js ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "nvidia/nemotron-3-super-120b-a12b", messages: [{ role: "user", content: "Explain backpropagation." }], }); ``` ### Streaming ```python with client.chat.completions.create( model="nvidia/nemotron-3-super-120b-a12b", messages=[{"role": "user", "content": "Write a poem about distributed systems."}], stream=True, ) as stream: for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` ### Claude Code / Cline / Cursor Set these two environment variables and any Anthropic-compatible agent routes through InferAll: ```sh export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=ifu_your_key_here ``` When your agent tries to use `claude-opus-4-8`, the gateway maps it to Nemotron (opus-class model, same relative capability tier). Works immediately, no configuration changes. --- ### Comparing the free models All of these are $0 on InferAll, hosted on NVIDIA NIM: | Model | Size | Best for | |---|---|---| | `nvidia/nemotron-3-super-120b-a12b` | 120B | Complex reasoning, coding, long context | | `meta/llama-3.1-70b-instruct` | 70B | General chat, instruction following | | `meta/llama-3.1-8b-instruct` | 8B | Fast responses, simple tasks | | `mistralai/mixtral-8x7b-instruct-v0.1` | 46.7B (MoE) | Speed + quality balance | See the [live model list](https://api.inferall.ai/ai/v1/models) for all 110+ free models. --- ### Get a key [inferall.ai/keys](https://inferall.ai/keys) — no credit card required to start. 200 free requests, then add a card to unlock the full free allowance (still $0 within it). Paid providers (OpenAI, Anthropic, Google) bill at the upstream rate with zero markup. ```sh # Verify the model is live curl https://api.inferall.ai/ai/v1/models | jq '."nvidia/nemotron-3-super-120b-a12b"' ```