The clearest fit is “I'm a developer, not a platform team.” If you want a base URL, one key, a free open-source allowance to develop against, and the observability needs are answered by “I check the dashboard occasionally for spend,” InferAll is the cheaper, smaller answer. You don't need to buy a platform to get an Anthropic-compatible endpoint with a free tier; you can just hit api.inferall.ai.
The free OSS inference tier is the second reason, and it's the one Portkey structurally can't match. Portkey routes; you bring the provider keys and you pay the upstream bills. InferAll bundles 100,000 tokens per month against 186 NVIDIA-hosted models — Llama 3.1 405B, Mixtral, Nemotron, CodeLlama. For developer workflows that spend most of their tokens on cheap inner-loop turns — file reads, status checks, lint-style suggestions — that free allowance is the difference between a paid evaluation and a free one.
Claude Code is the third reason. InferAll's Anthropic-format /v1/messages endpoint is the default surface, not a configuration to enable. Set ANTHROPIC_BASE_URL=https://api.inferall.ai, use your InferAll key, and the standard Claude Code flow works without per-request header gymnastics. The VS Code extension extends the same pattern to a Cline-based agent with the gateway pre-wired.
And one structural thing worth saying directly: not storing prompt and completion bodies for analytics is a privacy posture some teams want. If “the gateway doesn't log my prompts” is a feature for your buyer instead of a gap, that's where InferAll currently sits. We may add logging later as an opt-in; today the gateway is closer to a pipe than to a recorder.