Documentation
Quick start
# Python SDK pip install inferall-ai # TypeScript SDK npm install @inferall/sdk
Want a runnable example? examples/ in the repo has minimal Python and TypeScript scripts you can clone, edit, and run.
Base URL
https://api.inferall.aiAll requests require an API key via Authorization: Bearer ifu_... or x-api-key: ifu_... (legacy kr_proj_ keys are still accepted).
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /ai/v1/generate | Generate text, chat, images, or video |
| GET | /ai/v1/models | List all models with pricing |
| GET | /ai/v1/health | Health check |
| POST | /v1/messages | Anthropic-compatible (Claude Code) |
| POST | /ai/v1/keys | Create API key (requires JWT) |
| GET | /ai/v1/keys | List your keys (requires JWT) |
| GET | /ai/v1/usage | Usage summary (requires JWT) |
| POST | /ai/v1/billing/checkout | Stripe checkout session |
| GET | /ai/v1/billing/status | Billing status and spend |
TypeScript SDK
The native TypeScript SDK is live on npm as @inferall/sdk 0.1.0 (npm install @inferall/sdk). Prefer the official OpenAI or Anthropic SDK? Point either at the InferAll base URL — both work unchanged.
// Native TypeScript SDK — live on npm as @inferall/sdk 0.1.0
import { Inferall } from "@inferall/sdk";
const ai = new Inferall(); // reads INFERALL_API_KEY (ifu_...)
// Text (free OSS by default)
const text = await ai.text("Explain quantum computing in two sentences");
// Chat with a specific provider/model
const reply = await ai.chat(messages, {
provider: "anthropic",
model: "claude-sonnet-4-6",
});
// Vision
const analysis = await ai.vision(imageBase64, "What is this?");
// Prefer the official OpenAI/Anthropic SDKs instead? Point them at InferAll:
// OPENAI_BASE_URL=https://api.inferall.ai/v1 (key: ifu_...)
// ANTHROPIC_BASE_URL=https://api.inferall.ai (key: ifu_...)Python SDK
from inferall import Inferall
ai = Inferall() # reads INFERALL_API_KEY from the environment
# Text generation (free via NVIDIA Llama by default)
text = ai.text("Explain quantum computing")
# Chat with any provider
reply = ai.chat(messages, provider="anthropic", model="claude-sonnet-4-6")
# Vision
analysis = ai.vision(image_base64, "What is this?")
# Generate (image or video)
video = ai.generate(
provider="gemini",
model="veo-2.0-generate-001",
operation="video-generate",
prompt="Drone shot of a city",
)Claude Code integration
Requests are routed to free NVIDIA models by default.
# Use Claude Code with free inference export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=your_inferall_key # Run Claude Code normally — uses free NVIDIA models by default claude # Force a specific provider with model prefix # anthropic/claude-sonnet-4-6 → actual Claude # gemini/gemini-2.5-flash → Google Gemini
Providers
Live model list
View all available models with pricing at api.inferall.ai/ai/v1/models
More resources
Integrations — Claude Code, Cline, Cursor, LangChain, LlamaIndex, and more
Pricing — Free tier, pay-as-you-go, Pro, and Enterprise plans
Rate limits — Per-operation limits and spending caps