> **2026-06-13 update.** The original version of this post celebrated removing the card requirement. We had to put it back. This is the honest follow-up — what broke, how, and where we landed. We had a problem in May: 85% of developers who signed up for InferAll never made their first API call. Many of them bounced on the credit-card prompt before they could call a single free Llama model. We removed the gate on 2026-05-30 to fix that. It worked — for real developers. It also worked for bots, who showed up at scale within a week. --- ### What broke The free plan on InferAll is $0 input/$0 output on 118+ open-source models hosted on NVIDIA NIM, within per-day request caps (100 chat / 50 text / 20 image-analyze / 10 image-generate / 5 video-generate, reset 00:00 UTC). Llama 3.1 70B, Mixtral, Nemotron, CodeLlama, and the rest of the open-source stack. These models cost **us** $0 in upstream provider fees, which is why we thought we could safely run them gate-less. We forgot that we also offer Anthropic, OpenAI, Google, and Replicate through the same gateway. With no card requirement, we used a "no-card trial" allowance on the free models — but bots optimized against that limit. A signup wave on 2026-06-08 created ~826 throwaway-email accounts that all completed a $0.01/mo "metered" sub on Stripe (which we'd added as a "frictionless activation" path), and then started funneling Anthropic Opus calls. We burned ~$7k/mo against $4/mo in Stripe revenue over 36 hours before we caught it. The metered $0.01 plan turned out to be exactly the wrong shape for bots: cheap enough to spin up at scale, attached enough to a real Stripe customer to laundere card reputation, and giving access to the paid upstream allotment we'd been bundling with paid tiers. --- ### Where we landed We've put the card requirement back, with three changes that should keep the friction low for real evaluators: 1. **A $5 one-time Activation pack.** $5 is low enough that real evaluators will pay it; high enough that bots can't economically burn cards on it at scale. Real card + real money cleared by Stripe = the bot defense. The $5 becomes spendable balance for paid providers (OpenAI / Anthropic / Google / Replicate) at the provider's published rate with zero markup, and it unlocks ongoing access to the 118+ free NIM models at $0 in/out (within the free-plan daily request caps; reset 00:00 UTC). 2. **Cloudflare Turnstile on signup.** Catches the obvious bots before they reach Stripe. 3. **Email pre-validation.** Disposable-domain emails (proton.me, mailinator, etc.) get rejected at the auth endpoint before Supabase ever sends an OTP. Saves us from the bounce-rate hit Supabase was warning about. The combination means a real developer goes signup → email → $5 activation → key → call. Five minutes, ~$5 of credit on file, full free-NIM access on the other side. A bot trying the same path bounces at Turnstile, at email validation, or at Stripe's Radar (their default fraud screen). We've seen the card-testing attempts continue post-fix; Stripe blocks them at `risk_level=highest` without us doing anything. --- ### What the free tier actually gets you After the $5 activation: ```sh # OpenAI-compatible — drop in to any existing app export OPENAI_API_KEY=ifu_your_key export OPENAI_BASE_URL=https://api.inferall.ai/v1 # Anthropic-compatible — Claude Code, Cline, Cursor work with two env vars export ANTHROPIC_API_KEY=ifu_your_key export ANTHROPIC_BASE_URL=https://api.inferall.ai ``` ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key", ) # Free within the daily request cap ($0 in / $0 out): response = client.chat.completions.create( model="meta/llama-3.1-70b-instruct", messages=[{"role": "user", "content": "Hello"}], ) ``` The free models — `meta/llama-3.1-70b-instruct`, `meta/llama-3.1-8b-instruct`, `nvidia/nemotron-3-super-120b-a12b`, and 115 others — stay $0 within the monthly allowance. The $5 covers paid-provider calls at the provider's published per-token rate, no markup. Top up at $5 or $100 increments whenever the balance runs out. The full model list is at `https://api.inferall.ai/ai/v1/models`. The free models are the ones with `inputPerM: 0, outputPerM: 0`. --- ### What we got wrong, what we got right We got the diagnosis wrong on the May post. We blamed friction for the 85% drop-off, removed friction, and watched the abuse pattern materialize within a week. The actual problem wasn't friction — it was that we hadn't built the right anti-abuse posture for a gateway that fronts both $0 models and real-money upstreams. What we got right is doubling back when the data showed up. The $5 activation pack is the compromise we should have shipped in May: enough friction to filter bots, low enough that real evaluators don't bounce. If you signed up during the no-card window and stalled at the $29/mo Pro option, sign back in — the $5 pack is on the /billing page now. The $5 becomes credit, so you're not paying "for nothing." We tested it; it works. If you hit issues, open an issue or email [taylorm@kindly.fyi](mailto:taylorm@kindly.fyi). We read everything.

We tried removing the card gate. Then bots happened. Here's where we landed.

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

NVIDIA Nemotron 3 Super 120B vs Claude Opus 4: when the free model is good enough

DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible