Pricing
Start free. Pay only for what you use.
110+ open-source models are $0 with no credit card. Premium providers (OpenAI, Anthropic, Google) bill at the provider's published rate — zero markup.
Free
Evaluate and prototype with open-source models.
- 200 free requests (no credit card required)
- 110+ open-source models on NVIDIA NIM
- Llama 3.1 70B, Mixtral, Nemotron, Codestral, Gemma 4
- OpenAI-compatible and Anthropic-compatible endpoints
- Automatic failover across providers
Pay-as-you-go· Most popular
Add a card to activate the full free tier and unlock premium providers.
- All free-tier models — still $0
- OpenAI, Anthropic, Google, Replicate, Runway
- Charged at the provider's published per-token rate
- Zero markup — you pay exactly what providers charge
- $10/month default spending cap (configurable)
Pro
For developers shipping production AI features.
- 2M tokens included
- All 190+ models across every provider
- Image generation (Flux, DALL·E, Imagen)
- Video generation (Kling, Runway, Veo)
- Priority routing
Enterprise
Custom token allowance, dedicated support, and compliance.
- Custom token allowance
- Custom model hosting
- Dedicated account team
- Priority support SLA
- Custom DPA and compliance
Token prices
InferAll charges the provider's published rate with zero markup. Prices shown are per 1M tokens (input / output) as of June 2026; see each provider's website for the current rate.
See inferall.ai/live for the full model list and pricing.
Common questions
Is the free tier actually free?
Yes. The 110+ open-source models on NVIDIA NIM are $0 to call. No credit card required to start — you get 200 free requests to evaluate. Add a card to continue past the trial and unlock paid providers; the free models stay $0.
What does 'zero markup' mean?
When you use a premium provider (OpenAI, Anthropic, Google), you pay exactly what that provider charges per token. InferAll does not add a markup. A $0.15/M-token call to GPT-4o-mini costs $0.15/M through InferAll.
How does the $10/month spending cap work?
By default, usage on premium providers is capped at $10/month to prevent unexpected bills. You can raise or remove this cap in your billing settings. The cap does not apply to free NVIDIA NIM models.
What's the difference between the free tier and pay-as-you-go?
Both give you the same 110+ free NVIDIA NIM models at $0. Pay-as-you-go additionally unlocks premium providers (OpenAI, Anthropic, Google) billed per token. The only requirement is a card on file.