Reliability
Service Level Agreement
InferAll targets 99.9% monthly uptime for the gateway control plane, backed by tiered service credits and automatic cross-provider failover. The commitments below apply to paid plans; the Free tier is provided as-is. For real-time health, see the status page.
What InferAll commits to
InferAll runs as a multi-provider AI gateway. Reliability is composed of two layers: the InferAll gateway itself (request handling, auth, routing) and the upstream model providers (OpenAI, Anthropic, Google, NVIDIA, Replicate, Runway).
Our reliability commitment focuses on the layer we control — gateway availability and the cross-provider failover behavior that hides upstream incidents from your application. We do not guarantee any individual upstream provider's uptime, because we do not operate them; instead, failover routes around an individual provider's outage.
Gateway uptime target
InferAll targets 99.9% uptime per calendar month for the gateway control plane — request admission, authentication, routing, and billing-aware handling at api.inferall.ai. 99.9% corresponds to roughly 43 minutes of allowable downtime per month.
“Downtime” means the gateway control plane is unable to accept and route otherwise-valid requests. The following are excluded from the uptime calculation:
- Outages of an individual upstream provider — these are handled by failover, not counted against the gateway.
- Scheduled maintenance announced in advance (see below).
- Factors outside our reasonable control (force majeure, upstream network or DNS providers, customer misconfiguration, or use that exceeds documented rate limits).
Service credits
If monthly gateway uptime falls below the 99.9% target, paid customers may request a service credit, calculated as a percentage of that month's subscription fee:
Credits apply to the affected month's subscription fee, are issued against future invoices, and must be requested within 30 days of the affected month. Credits are the sole and exclusive remedy for any failure to meet the uptime target. The Free tier carries no subscription fee and is therefore not eligible for credits.
Failover behavior (in place today)
When an upstream provider returns 429, 529, 5xx, or times out (30 second default), the gateway retries the same request server-side against the next provider capable of serving the requested model class. A single Anthropic rate-limit, an OpenAI incident, or a Google quota reset does not surface to your application as a failure.
Context-size-aware routing means a 1M-token prompt is only sent to providers whose models can fit it. Cheapest-first selection keeps cost predictable across automatic retries.
Incident response
Incidents affecting the gateway control plane are posted to the status page. Enterprise customers with a master services agreement receive the acknowledgement and resolution-time commitments specified in that agreement; per-tier response targets for self-serve plans are best effort.
Maintenance windows
Routine changes are rolled progressively across regions without a fixed maintenance window and without expected downtime. If a change requires planned downtime, we announce it in advance; such announced maintenance is excluded from the uptime calculation above.
Live status
For real-time status and historical uptime, see the status page.