Reliability
Service Level Agreement
This page is a scaffold. Specific reliability numbers are gated on ops review and will be published only when they reflect production-measured behavior.
What InferAll commits to
InferAll runs as a multi-provider AI gateway. Reliability is composed of two layers: the InferAll gateway itself (request handling, auth, routing) and the upstream model providers (OpenAI, Anthropic, Google, NVIDIA, Replicate, Runway).
Our reliability commitment focuses on the layer we control — gateway availability and the cross-provider failover behavior that hides upstream incidents from your application. We do not guarantee any individual upstream provider's uptime, because we do not operate them.
Gateway uptime target
The monthly uptime target for the gateway control plane (auth, routing, billing-aware admission) will be published here once it has been confirmed against measured production data. Until that confirmation, we are deliberately not citing a percentage.
Failover behavior (in place today)
When an upstream provider returns 429, 529, 5xx, or times out (30 second default), the gateway retries the same request server-side against the next provider capable of serving the requested model class. A single Anthropic rate-limit, an OpenAI incident, or a Google quota reset does not surface to your application as a failure.
Context-size-aware routing means a 1M-token prompt is only sent to providers whose models can fit it. Cheapest-first selection keeps cost predictable across automatic retries.
Incident response
Per-tier incident response time targets (acknowledgment and resolution) will be published once they have been confirmed against staffing capacity. Enterprise customers should refer to their individual master services agreement for current commitments.
Maintenance windows
Scheduled-maintenance policy (announcement lead time, typical window length, time zones) will appear here. Today, all changes are rolled progressively across regions without a fixed maintenance window.
Live status
For real-time status and historical uptime, see the status page.