Question 1

Is the free plan actually free?

Accepted Answer

Yes. The 118+ open-source models on NVIDIA NIM are $0 input and $0 output, capped only by the free-plan daily request rate (100 chat / 50 text / 20 image-analyze / 10 image-generate / 5 video-generate per day). NIM access unlocks after the $5 starter pack settles — that $5 is not a fee, it becomes spendable balance on your account and covers premium-model usage. Premium providers (OpenAI, Anthropic, Google) bill at the provider's published per-token rate with zero markup, drawn from your balance or your plan's monthly included usage.

Question 2

What does 'zero markup' mean?

Accepted Answer

When you use a premium provider (OpenAI, Anthropic, Google), you pay exactly what that provider charges per token. InferAll does not add a markup. A $0.15/M-token call to GPT-4o-mini costs $0.15/M through InferAll.

Question 3

How does the monthly included usage work?

Accepted Answer

Each plan includes a monthly $-amount of premium-provider usage: $5/month on Free, $20/month on Pro, $100/month on Team. Calls to premium providers (OpenAI, Anthropic, Google) draw against your included usage first, then against any prepaid balance you've added via the $5 or $100 packs. Calls to the 118+ NVIDIA NIM open models cost $0 and don't burn included usage. Included usage resets monthly.

Question 4

What's the difference between Free, the $5 activation pack, and Pro?

Accepted Answer

Free is a 200-request trial to evaluate the 118+ NVIDIA NIM open-source models. The $5 activation pack is a one-time charge that unlocks ongoing free-plan use and becomes spendable balance for premium providers (OpenAI, Anthropic, Google) at the provider's published per-token rate with zero markup. Pro ($29/mo) gives you higher daily request limits and $20/month of premium-provider usage included — the right fit if you call paid providers regularly.

Question 5

Claude Opus 4 is $15/$75 per million tokens — what's the free alternative?

Accepted Answer

For Opus-class work (long-context reasoning, agentic tasks, complex analysis), the closest free-NIM alternative is nvidia/nemotron-3-super-120b-a12b — a 120B-parameter model with comparable quality on most benchmarks, billed at $0 input / $0 output. A typical Opus conversation (10K input + 4K output tokens) costs ~$0.45 on Anthropic's published Opus 4.7 rate; the same conversation through Nemotron 120B is $0. The $5 starter pack is most useful as a way to keep premium-Opus available on-demand for the few tasks where it matters, with NIM handling the bulk. For Sonnet- or Haiku-class work, the equivalents are meta/llama-3.1-70b-instruct (70B mid-class) and meta/llama-3.1-8b-instruct (8B fast class), respectively.

Start free. Pay only for what you use.

Token prices

Common questions