← Blog

We removed the card gate: InferAll's free tier now requires no credit card

85% of developers who signed up for InferAll's free NVIDIA NIM tier never made a call because we put a card wall in front of a $0 product. We removed it.

InferAll Team

4 min read
free LLM APIno credit cardNVIDIA NIMOpenAI APIAnthropic APIdeveloper toolsAI gatewayLlamaopen source
We had a problem: 85% of developers who signed up for InferAll never made their first API call. They created an account. They looked at the key creation page. And then they stopped — because we asked them to add a credit card before they could use a product that was, for the free tier, literally $0. We fixed it. As of today, InferAll requires no credit card to start. --- ### What the free tier actually is InferAll is an AI gateway — one API key, one endpoint, routing to any provider. The acquisition tier is 110+ free open-source models hosted on NVIDIA NIM: Llama 3.1 70B and 8B, Mixtral 8x7B, Nemotron 120B, CodeLlama, and the rest of the open-source stack that runs on DGX Cloud. These models cost us $0 in provider fees. We're not giving away money — we're giving away access to a free compute tier that runs whether you use it or not. The endpoints are fully OpenAI- and Anthropic-compatible: ```sh # OpenAI-compatible — drop in to any existing app export OPENAI_API_KEY=ifu_your_key export OPENAI_BASE_URL=https://api.inferall.ai/v1 # Anthropic-compatible — Claude Code, Cline, Cursor work with two env vars export ANTHROPIC_API_KEY=ifu_your_key export ANTHROPIC_BASE_URL=https://api.inferall.ai ``` No adapter. No proxy. Your existing code works with those two variables. --- ### Why we had a card gate on a $0 product The short answer: we assumed it was the right B2D pattern because everyone else does it. The real answer: we were using Stripe card-on-file as a proxy for "real developer, not a bot" — anti-abuse, not monetization. The free tier has no marginal cost to us, but we were worried about throwaway accounts farming the free compute. What we missed is that the cost of the friction is much higher than the cost of the abuse. Free NIM inference is $0 to us. The abuse scenario is someone getting free Llama calls, which is... the product. The actual downside is load, and we can handle that with per-user request caps instead of a card wall. Meanwhile, the friction cost was brutal. A developer who finds an AI gateway via a search result, signs up, sees "add a card," and leaves — that's a lost developer, not a prevented bad actor. We were optimizing for the wrong tail. --- ### How it works now No credit card. Create a key at [inferall.ai/keys](https://inferall.ai/keys) and it works immediately on free open-source models. A card is required only for: - **Paid providers** — OpenAI GPT-4o/5, Anthropic Claude, Google Gemini, Replicate, Runway. These cost us real money and we pass that cost through at $0 markup. - **Past the free trial** — the no-card trial is capped at a modest number of requests. Enough to evaluate, prototype, and integrate. Add a card to continue; the free tier stays $0 within the free allowance. The cap is per-user, not per-key — it's enforced against your account, so minting extra keys doesn't grant extra trial. On the backend: the trial check runs at the gateway against the resolved model, not the requested model name. That matters because InferAll remaps bare paid model names (like `gpt-4o` or `claude-sonnet-4-6`) to a free open-source default — so you can test your code with the OpenAI client shape and a free model, then swap in the real model when you're ready to pay. The gate defaults-deny: anything that isn't a known free token model requires a card, which means unknown models, image generation, and video generation all require a card too. No paid-spend leaks through on a no-card trial. --- ### Practical impact for your tooling **Claude Code, Cline, Cursor:** set the two Anthropic env vars, run your agent. Cheap turns (code analysis, planning, chat) route through free Llama 70B. When you explicitly request `anthropic/claude-sonnet-4-6` or another paid model, the trial check tells you to add a card. You can do your entire prototype on free compute before committing a payment method. **Python / TypeScript / any OpenAI-compatible client:** ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key", # no card required to start ) response = client.chat.completions.create( model="meta/llama-3.1-70b-instruct", # free, $0 messages=[{"role": "user", "content": "Hello"}], ) print(response.choices[0].message.content) ``` **Agent frameworks (LangChain, LlamaIndex, CrewAI):** same three changes as any OpenAI client — `base_url`, key, and model ID. The free open models cover the inner loop where cost scales with iterations. --- ### Get a key [inferall.ai/keys](https://inferall.ai/keys) — no card required. The full model list is at `https://api.inferall.ai/ai/v1/models`. The free models are the ones with `inputPerM: 0, outputPerM: 0`. If you hit issues, open an issue or email [taylorm@kindly.fyi](mailto:taylorm@kindly.fyi). We read everything.