For OpenAI-SDK code, the migration is two environment variables. Swap the key, swap the base URL, ship. The wire formats are identical because both sides speak OpenAI Chat Completions.
For Claude Code or any Anthropic-SDK consumer, you stop emulating and use the Anthropic-format endpoint directly. That removes a moving part: no adapter sitting between your agent and the gateway.
One detail to watch: model IDs. OpenRouter prefixes model names with the upstream provider (for example, anthropic/claude-3.5-sonnet). On InferAll's OpenAI- and Anthropic-compatible surfaces, you pass the model in the upstream's own naming — claude-sonnet-4-20250514 or gpt-4o. On the unified /ai/v1/generate endpoint, the provider goes in its own field. The mechanical change is small but you do need to grep your code for provider-prefixed model strings.
For tool calls, streaming, and vision inputs, both gateways forward the wire-format payloads through to the upstream provider. If your tool-call code worked against OpenRouter, it will work against InferAll on the equivalent endpoint.
If you depend on OpenRouter's per-request models array for client-side fallback, you don't need to port it. InferAll does the cross-provider retry server-side on 429, 529, 5xx, and timeout (30s default). You can still pass a preferred provider; the gateway will route to it first and only fall back if it has to.
The trade-off is worth naming explicitly. OpenRouter's model-list approach gives the caller fine-grained control: “try Sonnet, then a Llama 70B, then Mixtral, in that order.” InferAll's server-side default routes by model class — “serve a Claude-class model, and if Anthropic is rate-limited, route to whatever else can run an equivalent.” That's an opinion. If your fallback choreography is a load-bearing part of your application, the OpenRouter shape gives you the levers; if you want one less thing to maintain, InferAll's default is the cheaper code path.