OpenAI's reasoning model family — o1, o3, o3-mini, o4-mini — is now available through InferAll's OpenAI-compatible endpoint. Same SDK you already use, same `ifu_...` key.
| Model | Input | Output | Best for |
|---|---|---|---|
| `o3` | $10.00/M | $40.00/M | Hardest reasoning, math proofs, complex code |
| `o4-mini` | $1.10/M | $4.40/M | Reasoning at lower cost — most tasks |
| `o3-mini` | $1.10/M | $4.40/M | Previous generation (same tier) |
| `o1` | $15.00/M | $60.00/M | Deliberate reasoning, slower |
All at OpenAI's published list rates.
---
### Quick start
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one free at inferall.ai/keys
)
# o3 — for hard problems
response = client.chat.completions.create(
model="o3",
messages=[
{"role": "user", "content": "Prove that there are infinitely many prime numbers."}
],
)
# o4-mini — reasoning at lower cost
response = client.chat.completions.create(
model="o4-mini",
messages=[
{"role": "user", "content": "Debug this Python code: ..."}
],
)
print(response.choices[0].message.content)
```
The only difference from calling OpenAI directly: `base_url="https://api.inferall.ai/v1"`. Your existing code, streaming, and tool use all work unchanged.
---
### TypeScript
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.inferall.ai/v1",
apiKey: process.env.INFERALL_API_KEY,
});
const response = await client.chat.completions.create({
model: "o4-mini",
messages: [
{ role: "user", content: "Write a SQL query to find the top 10 customers by revenue in the last 90 days." }
],
});
console.log(response.choices[0].message.content);
```
---
### When to use each model
**o4-mini** is the default choice for most tasks that need reasoning. It's 9× cheaper than o3 and handles most real-world problems — debugging, code generation, complex queries, structured data extraction. Most teams find o3 is only necessary for research-grade proofs, very hard math, or safety-critical reasoning where maximum accuracy matters.
**o3** is for the problems where o4-mini genuinely struggles — multi-step deduction, long-horizon planning, formal proofs. If you're not sure, start with o4-mini and only step up when you see it fail.
**o1** is the previous generation of deliberate reasoner. o3 is significantly more capable at similar or lower cost — there's no strong reason to use o1 in new projects.
---
### Same key for everything else
The same `ifu_...` key that calls o3 also routes to:
- GPT-4.1 / GPT-4.1-mini / GPT-4.1-nano
- Claude Opus 4 / Sonnet 4
- Gemini 2.5 Flash / Pro
- 118+ free NVIDIA NIM models (no card needed)
Switch between providers by changing one string. Useful when you want to compare o4-mini vs Claude Opus 4 on a hard task, or fall back to GPT-4.1-mini for cheaper follow-up calls after o3 generates the plan.
---
### Note on reasoning token behavior
OpenAI's o-series models think before they respond — they generate internal reasoning tokens that count toward your usage but aren't returned in the output. InferAll passes the full response through unchanged, so `usage.completion_tokens` includes both reasoning and output tokens as reported by OpenAI.
---
Free trial: 200 requests, no credit card. Get your key at [inferall.ai/keys](https://inferall.ai/keys).
← Blog
o3 and o4-mini API — OpenAI reasoning models via one key
How to call OpenAI's o3 and o4-mini reasoning models through InferAll's OpenAI-compatible endpoint. Same SDK, same key — no separate API access needed.
InferAll Team
3 min read
OpenAIo3o4-minireasoning modelsLLM APIAI gatewaydeveloper tools
Share
Related
3 min read
Gemini 2.5 Flash API — via one unified key
How to call Google's Gemini 2.5 Flash through InferAll's OpenAI-compatible endpoint. Same SDK, same key as your other models. No Google Cloud setup required.
3 min read
Llama 3.1 70B — free API, OpenAI-compatible, no credit card
How to call Meta Llama 3.1 70B for free through InferAll's OpenAI-compatible endpoint. Hosted on NVIDIA NIM, $0 within the free tier, works with the OpenAI SDK you already have.
3 min read
Claude Opus 4 and Sonnet 4 — via one API key
How to call Claude Opus 4, Sonnet 4, and Haiku 4 through InferAll's Anthropic-compatible endpoint. Same SDK you already use — just change the base URL.