Documentation

Quick start

# Python SDK
pip install inferall-ai

# TypeScript SDK
npm install @inferall/sdk

Want a runnable example? examples/ in the repo has minimal Python and TypeScript scripts you can clone, edit, and run.

Base URL

https://api.inferall.ai

All requests require an API key via Authorization: Bearer ifu_... or x-api-key: ifu_... (legacy kr_proj_ keys are still accepted).

Endpoints

MethodPathDescription
POST/ai/v1/generateGenerate text, chat, images, or video
GET/ai/v1/modelsList all models with pricing
GET/ai/v1/healthHealth check
POST/v1/messagesAnthropic-compatible (Claude Code)
POST/ai/v1/keysCreate API key (requires JWT)
GET/ai/v1/keysList your keys (requires JWT)
GET/ai/v1/usageUsage summary (requires JWT)
POST/ai/v1/billing/checkoutStripe checkout session
GET/ai/v1/billing/statusBilling status and spend

TypeScript SDK

The native TypeScript SDK is live on npm as @inferall/sdk 0.1.0 (npm install @inferall/sdk). Prefer the official OpenAI or Anthropic SDK? Point either at the InferAll base URL — both work unchanged.

// Native TypeScript SDK — live on npm as @inferall/sdk 0.1.0
import { Inferall } from "@inferall/sdk";

const ai = new Inferall(); // reads INFERALL_API_KEY (ifu_...)

// Text (free OSS by default)
const text = await ai.text("Explain quantum computing in two sentences");

// Chat with a specific provider/model
const reply = await ai.chat(messages, {
  provider: "anthropic",
  model: "claude-sonnet-4-6",
});

// Vision
const analysis = await ai.vision(imageBase64, "What is this?");

// Prefer the official OpenAI/Anthropic SDKs instead? Point them at InferAll:
//   OPENAI_BASE_URL=https://api.inferall.ai/v1   (key: ifu_...)
//   ANTHROPIC_BASE_URL=https://api.inferall.ai   (key: ifu_...)

Python SDK

from inferall import Inferall

ai = Inferall()  # reads INFERALL_API_KEY from the environment

# Text generation (free via NVIDIA Llama by default)
text = ai.text("Explain quantum computing")

# Chat with any provider
reply = ai.chat(messages, provider="anthropic", model="claude-sonnet-4-6")

# Vision
analysis = ai.vision(image_base64, "What is this?")

# Generate (image or video)
video = ai.generate(
    provider="gemini",
    model="veo-2.0-generate-001",
    operation="video-generate",
    prompt="Drone shot of a city",
)

Claude Code integration

Requests are routed to free NVIDIA models by default.

# Use Claude Code with free inference
export ANTHROPIC_BASE_URL=https://api.inferall.ai
export ANTHROPIC_API_KEY=your_inferall_key

# Run Claude Code normally — uses free NVIDIA models by default
claude

# Force a specific provider with model prefix
# anthropic/claude-sonnet-4-6  → actual Claude
# gemini/gemini-2.5-flash            → Google Gemini

Providers

OpenAIGPT-4o, o1, DALL-E 3
AnthropicClaude Sonnet/Opus/Haiku
Google Gemini2.5 Flash/Pro, Veo, Imagen
NVIDIA NIM110+ free models (Llama, Mixtral)
ReplicateFlux, Stable Diffusion
RunwayGen-4.5, video generation

Live model list

View all available models with pricing at api.inferall.ai/ai/v1/models

More resources

Integrations — Claude Code, Cline, Cursor, LangChain, LlamaIndex, and more

Pricing — Free tier, pay-as-you-go, Pro, and Enterprise plans

Rate limits — Per-operation limits and spending caps