What we've shipped

Changelog

Reverse-chronological list of what's actually shipped on the InferAll AI gateway and the marketing site. Each entry links to the public PR where the change landed; gateway-side changes that live in the private infra repo are marked as such.

2026-05-30

Start free — no credit card required

You no longer need a card on file to try InferAll. Create a key and start calling the 110+ free NVIDIA NIM open-source models (Llama 3.1 70B/8B, Mixtral, Nemotron, CodeLlama, and more) right away — $0. A card is needed only when you want a paid provider (OpenAI, Anthropic, Google) or to continue past the free trial. Same ifu_ key, same OpenAI- and Anthropic-compatible endpoints — the card wall that used to sit in front of the free tier is gone.

infra (private)

2026-05-29

Automated model-health checks — dead models get flagged, not served

The gateway now monitors its whole catalog. A daily check (and a public GET /ai/v1/models/health endpoint) compares every model against its provider's live model list and flags any the provider has stopped serving — so when a provider deprecates, renames, or rotates a model ID (the drift that retired DALL·E, sunset Gemini 1.5, and regularly rotates NVIDIA NIM IDs) it gets caught and fixed instead of 404'ing your calls. Detection is registry-based and $0, so monitoring never burns the free tier it's watching.

infra (private)

Deprecated model IDs keep working; catalog drift cleaned up

If your app is still pinned to a deprecated Claude snapshot the provider has retired, the gateway now transparently routes it to the current equivalent and labels the response with the model that actually served it — so you keep working while you migrate instead of failing during an incident. We also removed catalog IDs providers no longer serve (the catalog now lists only models that actually run), repointed a stale fallback hop, and moved OpenAI image generation to the current gpt-image-1 (DALL·E was retired upstream) at the same price.

infra (private)

Large requests no longer get cut off mid-generation

Raised the upstream timeout for the primary provider attempt so large, long-running generations complete instead of being aborted at 30 seconds — while keeping fallback hops short so a real provider outage still fails over quickly. Fixes a class of large requests that were erroring after the primary and every fallback hit the old 30s ceiling.

infra (private)

Gateway hardened for reliability — multi-region, health checks, more memory

api.inferall.ai now runs on two machines across two regions (sjc + iad) with HTTP health checks, so a single machine or a regional edge incident no longer takes the gateway down — the proxy routes around an unhealthy instance. Per-machine memory was raised 256MB → 512MB to remove GC stalls under streaming load. Resolves the intermittent connection resets observed during a Fly SJC edge incident on 2026-05-28.

infra fly.toml (private)

Catalog expanded: GPT-5.5 family + latest Claude

Added GPT-5.5, GPT-5.5 Pro, GPT-5.4 (+ mini/nano), GPT-5, and Claude Opus 4.8 / Sonnet 4.6 to the model catalog (GET /ai/v1/models). Premium tokens bill at the provider's published price with zero markup, on your existing ifu_ key.

gateway change (private)

First-call quickstart: free $0 curl on any stack

After you create a key, the quickstart now offers a copy-paste curl that makes a $0 call on a free NVIDIA NIM open model via the OpenAI-compatible endpoint — alongside the Claude Code snippet — so you can confirm your key works in one paste, with no premium spend.

PR #42

Marketing/docs model IDs now verified in CI

Fixed two broken model IDs in the OpenRouter comparison and Claude Code use-case pages (they named models not in the catalog, so copy-paste examples 404'd), and added a CI check that fails if any blog / comparison / use-case / docs page references a model ID not in the live catalog. Prevents the bug class going forward.

PR #43 · PR #44

2026-05-26

Gateway 429 messages now point at /billing for upgrades

Both 429 responses from the gateway's limiter — spending-limit-exceeded and daily-rate-limit-exceeded — now lead with https://inferall.ai/billing as the primary upgrade CTA. Previously the spending-limit message only pointed at the Stripe portal (meant for existing subs adjusting card cap), and the daily-rate-limit message had no upgrade pointer at all.

gateway change (private)

Billing UX: tier-specific CTAs, current-plan indicator, Most-popular badge, success banner

Replaced the generic "Select" buttons on /billing with tier-specific labels ("Upgrade to Pro", "Upgrade to Team", "Get free API key" → /keys, "Contact sales"). The current tier gets a left-border accent and a disabled "Current plan" label. Pro carries a "· Most popular" badge. Post-checkout, ?success=true now triggers a "Thanks — your subscription is active" banner.

PR #36 · PR #37 (hotfix)

scripts/founder-stats.sh — reusable Stripe-aggregate refresh

Single shell script anyone with the Stripe CLI authed can run to refresh founder metrics: customer count, signup velocity, active subscriptions + plan breakdown, paid invoices, churn, void invoices, funnel breakdown. Outputs human-readable markdown + a JSON snapshot. Aggregate-only — no customer IDs, emails, or payment details written to disk.

PR #35

PostHog activation-funnel analytics + Replicate passthrough note + honesty fixes

PostHog wired to the marketing site (project 359601) — pageviews on soft navigation, plus explicit funnel events (inferall_auth_success with identify, inferall_key_created with tier, inferall_first_call_snippet_copied). Removed a fabricated aggregateRating from JSON-LD (no real ratings exist; honesty + Google structured-data policy). /live now acknowledges that the gateway passes any Replicate or NIM model id through directly, so the routable catalog is much larger than the 207 enumerated.

PR #34

VS Code extension: security + Code-of-Conduct contacts routed to InferAll

SECURITY.md and all nine Code-of-Conduct translations (en + ja, ar-sa, ko, es, zh-tw, pt-BR, zh-cn) now route security reports and abuse reports to the InferAll team instead of the upstream Cline fork's contact addresses.

inferall-vscode#4 · inferall-vscode#5

2026-05-25

Clonable quickstart in /examples

Python (inferall-ai) and TypeScript (@inferall/sdk) hello-world scripts plus a three-path README (Claude Code env vars, Python SDK, TypeScript SDK). Linked from the docs Quick-start.

PR #32

/live page — per-vendor model breakdown + source mix

Server-rendered with 1h ISR. Shows total / free / paid counts, the table of every vendor (google, nvidia, openai, meta, mistralai, anthropic, runway, microsoft, qwen, …), and live vs static source mix. Counts match what `curl /ai/v1/models | jq 'keys|length'` returns.

PR #31

/keys: ready-to-paste Claude Code snippet right after key creation

When you create a new key, the page now embeds a two-env-var snippet with the actual key inline and its own Copy button. Time from key → working `claude` command is one paste.

PR #30

Live gateway stats on the homepage hero and /status

Homepage hero now shows a one-line live readout: total models, free models, gateway healthy/degraded — fetched server-side from the public endpoints. /status replaces its previous placeholder with the same data while the full multi-region status page is built out.

PR #29

User-key prefix: `ifu_` (legacy `kr_user_` still accepted)

Dashboard-created user keys now mint with the InferAll-branded `ifu_` prefix instead of the legacy `kr_user_`. Docs, SDK examples, /compare snippets, and the security page were updated to match. Existing keys remain valid forever.

gateway change (private) · PR #28

Model-count copy aligned to verifiable numbers

Site now says "200+ models" / "150+ free open-source models" / "120+ NVIDIA-hosted models" — each rounded down from what /ai/v1/models actually enumerates today (207 / 158 / 123). Survives catalog fluctuation without overclaiming.

PR #27

Free tier framing: card on file required, $0 within the allowance

Removed every "no credit card" claim. The Free tier still costs $0 within the 100,000-token monthly allowance on NVIDIA NIM open-source models, but activation requires a card on file — we say so upfront across the marketing site and on /keys.

PR #26

Branded social-share assets

Open Graph + Twitter image, generated favicon, web manifest, sitemap expansion, /docs metadata. Link previews on Hacker News, X, Reddit, Slack now render correctly.

PR #25

2026-05-24

TypeScript SDK published — `@inferall/sdk` 0.1.0

The native TypeScript SDK is live on npm. `npm install @inferall/sdk`. Reads `INFERALL_API_KEY` from the environment; same `Inferall` class shape as the Python SDK.

PR #24

SLA published — 99.9% monthly uptime target + tiered service credits

Concrete SLA at /sla: 99.9% gateway control-plane uptime per calendar month, tiered service credits (10% / 25% / 50% of monthly fee), exclusions, failover behavior, and incident response.

PR #22

2026-05-23

Billing page: Team tier aligned to 10M tokens + Enterprise contact-sales tile

Pricing tiles on /billing now reflect the real Team tier (10M tokens/month) and add an Enterprise contact-sales tile linking to contact@kindly.fyi.

PR #19

Docs: SDK examples refreshed to shipped `inferall-ai` + `ifa_` keys

/docs SDK examples now show the actual published Python (`inferall-ai`) and TypeScript (`@inferall/sdk`) shapes, with `ifa_…`-prefixed keys.

PR #20

Python SDK restored to the hero — `pip install inferall-ai` 0.1.0

Homepage hero leads with the published Python SDK install + a minimal usage snippet.

PR #16

Earlier

Gateway: automatic cross-provider failover on 429 / 529 / 5xx / timeout

Server-side retry against the next capable provider in the chain when an upstream rate-limits or errors. Context-size-aware, cheapest-first selection. A single Anthropic outage or Google quota reset no longer surfaces to your application.

gateway change (private)