NVIDIA's Nemotron 3 Super 120B (`nvidia/nemotron-3-super-120b-a12b`) is one of the most capable open-weight models available today — and it runs free on NVIDIA NIM through InferAll. No billing setup, no credit card. You get 200 free requests to evaluate it, then a card-on-file activates the full free allowance.
Here's how to call it using the standard OpenAI SDK:
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one at inferall.ai/keys
)
response = client.chat.completions.create(
model="nvidia/nemotron-3-super-120b-a12b",
messages=[{"role": "user", "content": "What makes a good system prompt?"}],
max_tokens=512,
)
print(response.choices[0].message.content)
```
That's it. The same call works with any OpenAI-compatible library — LangChain, LlamaIndex, LiteLLM, CrewAI, and any other framework that accepts an `openai_api_base` override.
---
### Why Nemotron 120B
**Scale.** At 120B parameters, Nemotron outperforms many models twice its cost on reasoning, instruction-following, and long-context tasks. It scores well on coding and math benchmarks without the rate-limit friction of commercial APIs.
**Free on NIM.** NVIDIA hosts Nemotron on their DGX Cloud infrastructure via NIM (NVIDIA Inference Microservices), which InferAll exposes at $0. There's no inference cost for us to pass through, so it stays free within the allowance.
**OpenAI-compatible.** The model is served through InferAll's OpenAI-compatible endpoint — you get standard `ChatCompletion` responses, streaming, tool use, and JSON mode, all working with whatever OpenAI client you already have.
---
### TypeScript / Node.js
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.inferall.ai/v1",
apiKey: process.env.INFERALL_API_KEY,
});
const response = await client.chat.completions.create({
model: "nvidia/nemotron-3-super-120b-a12b",
messages: [{ role: "user", content: "Explain backpropagation." }],
});
```
### Streaming
```python
with client.chat.completions.create(
model="nvidia/nemotron-3-super-120b-a12b",
messages=[{"role": "user", "content": "Write a poem about distributed systems."}],
stream=True,
) as stream:
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
```
### Claude Code / Cline / Cursor
Set these two environment variables and any Anthropic-compatible agent routes through InferAll:
```sh
export ANTHROPIC_BASE_URL=https://api.inferall.ai
export ANTHROPIC_API_KEY=ifu_your_key_here
```
When your agent tries to use `claude-opus-4-8`, the gateway maps it to Nemotron (opus-class model, same relative capability tier). Works immediately, no configuration changes.
---
### Comparing the free models
All of these are $0 on InferAll, hosted on NVIDIA NIM:
| Model | Size | Best for |
|---|---|---|
| `nvidia/nemotron-3-super-120b-a12b` | 120B | Complex reasoning, coding, long context |
| `meta/llama-3.1-70b-instruct` | 70B | General chat, instruction following |
| `meta/llama-3.1-8b-instruct` | 8B | Fast responses, simple tasks |
| `mistralai/mixtral-8x7b-instruct-v0.1` | 46.7B (MoE) | Speed + quality balance |
See the [live model list](https://api.inferall.ai/ai/v1/models) for all 110+ free models.
---
### Get a key
[inferall.ai/keys](https://inferall.ai/keys) — no credit card required to start. 200 free requests, then add a card to unlock the full free allowance (still $0 within it). Paid providers (OpenAI, Anthropic, Google) bill at the upstream rate with zero markup.
```sh
# Verify the model is live
curl https://api.inferall.ai/ai/v1/models | jq '."nvidia/nemotron-3-super-120b-a12b"'
```
← Blog
NVIDIA Nemotron 120B — free, via the OpenAI API
How to call NVIDIA Nemotron 3 Super 120B for free using any OpenAI-compatible SDK. No credit card required. Works with Python, TypeScript, LangChain, and Claude Code.
InferAll Team
3 min read
NVIDIA NIMNemotronfree LLM APIOpenAI APIopen sourcedeveloper tools
Share
Related
2 min read
GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano — via one API key
How to call OpenAI's GPT-4.1 family through InferAll's OpenAI-compatible endpoint. Try all three tiers — nano to full — with the same key, same SDK, no provider switching.
3 min read
Mistral Codestral 22B — free API for code generation
How to call Codestral 22B for free using any OpenAI-compatible SDK. Mistral's code-specialized model, hosted on NVIDIA NIM through InferAll. No credit card required.
3 min read
Free GPT-4 alternatives — open-source models via the OpenAI API
The top free open-source alternatives to GPT-4, callable with the same OpenAI SDK. No code changes, no credit card required. Hosted on NVIDIA NIM through InferAll.