Compare

InferAll vs Portkey

The short version: Portkey is the platform — observability, guardrails, prompt management, caching. InferAll is the gateway — one base URL, a free OSS inference tier, and Anthropic-compat that Claude Code can point at. If you're a platform team buying a governance layer, Portkey is built for that. If you're a developer who wants a base URL with a free tier and the observability is somebody else's problem, InferAll is the smaller, cheaper answer.

At a glance

FeaturePortkeyInferAll
Primary positioningPlatform: routing + observability + guardrailsGateway: routing + free OSS tier + Anthropic-compat
Free OSS inference tierNone — you provide upstream keys100k tokens/month on 186 NVIDIA-hosted OSS
Catalog size1,600+ LLMs via a unified API255+ models across 6 providers
Observability / tracingCategory leader — request traces, prompt logs, analyticsPer-key usage and spend only
Guardrails / policyFirst-class: PII, jailbreak, regex, custom chainsNone at the gateway layer
Prompt managementVersioned prompts, A/B, deployment IDsNone — prompts live in your application
CachingBuilt-in semantic + exact-match cachePass-through only — no cache layer
Anthropic-format endpointYes — /v1/messages, with Portkey-specific config headersYes — /v1/messages, default surface
OpenAI-format endpointYesYes — /v1
Failover / fallbackConfigurable retry and fallback policy per requestServer-side cross-provider retry on 429/529/5xx/timeout
VS Code extensionNo first-party branded extensionYes — InferAll for VS Code (Cline-based, sign-in to use)

Catalog and Anthropic-compat figures are pulled from portkey.ai and their public docs at the last-updated date below. Have a correction? Email contact@kindly.fyi.

When Portkey is the right choice

The clearest case is observability as a first-order requirement. If your team needs request-level traces, prompt and completion logging, latency and cost breakdowns by key and by model, and a UI that platform engineers actually open every day, Portkey leads the category and InferAll is not in the running today. We don't have that surface and we're not going to claim parity. If you're shopping for an LLM observability platform that happens to do routing, Portkey is the right product.

The second case is guardrails. Portkey ships a real guardrails layer — PII scrubbing, jailbreak detection, regex policies, chains of custom rules — at the gateway, where they apply to every call regardless of which application initiated it. For regulated industries or for any team where “model output must pass policy checks before reaching a user” is a hard requirement, having those checks in the gateway instead of in N application services is a real architectural advantage. InferAll forwards your request and trusts you to handle policy in-app or in another layer.

The third case is prompt management. Versioned prompts, deployment IDs, A/B test routing, evaluation harnesses — if your prompts are the product and you need to version them like code with a real management surface, Portkey is built for that. InferAll has no view on your prompts; they leave your application as request bodies and are not stored or versioned by the gateway.

And caching. Portkey's semantic and exact-match cache layer can take real cost out of repetitive workloads. InferAll does not have a cache layer today; identical requests hit the upstream provider every time.

When InferAll is the right choice

The clearest fit is “I'm a developer, not a platform team.” If you want a base URL, one key, a free open-source allowance to develop against, and the observability needs are answered by “I check the dashboard occasionally for spend,” InferAll is the cheaper, smaller answer. You don't need to buy a platform to get an Anthropic-compatible endpoint with a free tier; you can just hit api.inferall.ai.

The free OSS inference tier is the second reason, and it's the one Portkey structurally can't match. Portkey routes; you bring the provider keys and you pay the upstream bills. InferAll bundles 100,000 tokens per month against 186 NVIDIA-hosted models — Llama 3.1 405B, Mixtral, Nemotron, CodeLlama. For developer workflows that spend most of their tokens on cheap inner-loop turns — file reads, status checks, lint-style suggestions — that free allowance is the difference between a paid evaluation and a free one.

Claude Code is the third reason. InferAll's Anthropic-format /v1/messages endpoint is the default surface, not a configuration to enable. Set ANTHROPIC_BASE_URL=https://api.inferall.ai, use your InferAll key, and the standard Claude Code flow works without per-request header gymnastics. The VS Code extension extends the same pattern to a Cline-based agent with the gateway pre-wired.

And one structural thing worth saying directly: not storing prompt and completion bodies for analytics is a privacy posture some teams want. If “the gateway doesn't log my prompts” is a feature for your buyer instead of a gap, that's where InferAll currently sits. We may add logging later as an opt-in; today the gateway is closer to a pipe than to a recorder.

Migrating from Portkey to InferAll

For OpenAI-SDK code that hits Portkey's OpenAI-compatible route, the mechanical migration is two environment variables — swap the base URL and the key. Portkey's virtual-key headers don't carry over; InferAll's routing is keyed off your account, not off per-request virtual keys, so any header configuration that encoded routing intent will move to model-name or provider-field selection instead.

For Anthropic-format consumers, you stop configuring an Anthropic-compat route and point at /v1/messages on api.inferall.ai directly.

The features you'll be giving up are real and named at the top of this page: deep observability, guardrails, prompt management, caching. If those are load-bearing in your setup, migration probably isn't the right move — or it's a partial migration where InferAll handles specific workloads (Claude Code, free-OSS development traffic) and Portkey continues to handle the workloads where its platform features earn their keep.

Environment swap

# Before: Portkey hosted gateway via the OpenAI SDK
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.portkey.ai/v1
# (Plus Portkey-specific headers for your virtual key.)

# After: InferAll via the same OpenAI SDK
export OPENAI_API_KEY=ifa_...
export OPENAI_BASE_URL=https://api.inferall.ai/v1

# Or, if you want Claude Code / Cline (Anthropic-format):
export ANTHROPIC_API_KEY=ifa_...
export ANTHROPIC_BASE_URL=https://api.inferall.ai
claude

Why we're writing this

This page is on inferall.ai and it's about an InferAll competitor, so the bias goes one way by default. We built InferAll because we wanted a developer-shaped gateway with a free OSS allowance and an Anthropic-format endpoint — a smaller product than the LLM-platform category Portkey operates in. Portkey is a real piece of software solving a real problem, and for the buyers shopping for observability and guardrails, it's the right answer. If we wrote a comparison that didn't name where they're ahead, you'd be right not to trust the rest of the page.

Frequently asked questions

Does InferAll have observability like Portkey does?

Not at parity. Portkey leads the category on observability — request traces, prompt logging, latency breakdowns, cost analytics by key and by model, caching dashboards. InferAll's dashboard at this point is rudimentary: per-key usage and spend, not deep request-level tracing or prompt-level analytics. If observability is a first-order requirement, Portkey is the right call. We're not going to pretend otherwise.

Does InferAll have guardrails and prompt management?

No. Portkey ships a guardrails layer (regex checks, PII scrubbing, jailbreak detection, custom rule chains) and a prompt-management product (versioned prompts, A/B tests, deployment IDs). InferAll does not. The gateway forwards your request to the upstream provider; guardrails and prompt versioning live in your application code or another tool. If you need policy enforcement at the gateway layer today, Portkey is the better fit.

Then why would I pick InferAll over Portkey?

Three honest reasons. First, price and free tier: InferAll bundles 100,000 tokens per month against 186 NVIDIA-hosted OSS models, which is a category of free inference Portkey doesn't offer because Portkey is a routing-and-observability layer over your own provider keys. Second, Claude Code: InferAll exposes a native Anthropic-format /v1/messages endpoint, so ANTHROPIC_BASE_URL=https://api.inferall.ai works for Claude Code without an adapter. Third, scope: if your problem is 'I want one base URL and a free OSS allowance,' InferAll is the smaller, cheaper answer; you don't need to buy the full platform.

Is InferAll cheaper than Portkey?

For straightforward gateway use, almost certainly. Portkey's free tier covers a request budget on their hosted plan, but you still pay your upstream provider keys directly; the value Portkey adds is the platform around the call (observability, caching, governance). InferAll's free tier is 100,000 tokens of actual model inference on NVIDIA NIM. For a small team or solo developer running Claude Code against the gateway, InferAll's free tier covers real work; Portkey's free tier covers the gateway wrapper but not the tokens. Different products, different math.

Does Portkey support Claude Code via ANTHROPIC_BASE_URL?

Yes — Portkey exposes /v1/messages and accepts requests from the Anthropic SDK and Claude Code-style flows. The integration carries Portkey's own header configuration (virtual keys, config IDs) on top of the Anthropic wire format. InferAll's Anthropic-compatible endpoint is the default surface with no extra header layer — set ANTHROPIC_BASE_URL=https://api.inferall.ai and the standard Claude Code flow works.

Can I use both InferAll and Portkey together?

In principle, yes — Portkey can sit in front of any upstream that speaks OpenAI-compatible, including InferAll. You'd get Portkey's observability and guardrails over InferAll's gateway and free OSS allowance. We haven't seen many people set this up because the gain over picking one isn't large, but the wire formats line up if you want to try it.

We don't have prompt logging — is that a problem?

For some buyers, yes — full request and response logging is exactly the feature they came to a gateway to get. For other buyers it's a benefit: fewer copies of your prompts and outputs sitting in a third party. We don't store prompt or completion bodies for analytics today. If logging matters to you positively, pick Portkey; if it matters to you negatively, that's a privacy posture you can lean on with InferAll.

Related

InferAll home — the gateway, the free tier, the failover story.

InferAll for VS Code — Cline-based agent with the gateway pre-wired, free first run.

Pricing — free tier, Pro, Team, Enterprise.

AI inference API — endpoint surface, supported providers, code examples.

Unified AI API — one key, one bill, every provider.

Last updated: 2026-05-14.

Portkey facts on this page are drawn from portkey.ai and its public documentation. InferAll facts are drawn from this site and the gateway running at api.inferall.ai. Specifics change. Have a correction? Email contact@kindly.fyi.