OpenAI's reasoning model family — o1, o3, o3-mini, o4-mini — is now available through InferAll's OpenAI-compatible endpoint. Same SDK you already use, same `ifu_...` key. | Model | Input | Output | Best for | |---|---|---|---| | `o3` | $10.00/M | $40.00/M | Hardest reasoning, math proofs, complex code | | `o4-mini` | $1.10/M | $4.40/M | Reasoning at lower cost — most tasks | | `o3-mini` | $1.10/M | $4.40/M | Previous generation (same tier) | | `o1` | $15.00/M | $60.00/M | Deliberate reasoning, slower | All at OpenAI's published list rates. --- ### Quick start ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) # o3 — for hard problems response = client.chat.completions.create( model="o3", messages=[ {"role": "user", "content": "Prove that there are infinitely many prime numbers."} ], ) # o4-mini — reasoning at lower cost response = client.chat.completions.create( model="o4-mini", messages=[ {"role": "user", "content": "Debug this Python code: ..."} ], ) print(response.choices[0].message.content) ``` The only difference from calling OpenAI directly: `base_url="https://api.inferall.ai/v1"`. Your existing code, streaming, and tool use all work unchanged. --- ### TypeScript ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "o4-mini", messages: [ { role: "user", content: "Write a SQL query to find the top 10 customers by revenue in the last 90 days." } ], }); console.log(response.choices[0].message.content); ``` --- ### When to use each model **o4-mini** is the default choice for most tasks that need reasoning. It's 9× cheaper than o3 and handles most real-world problems — debugging, code generation, complex queries, structured data extraction. Most teams find o3 is only necessary for research-grade proofs, very hard math, or safety-critical reasoning where maximum accuracy matters. **o3** is for the problems where o4-mini genuinely struggles — multi-step deduction, long-horizon planning, formal proofs. If you're not sure, start with o4-mini and only step up when you see it fail. **o1** is the previous generation of deliberate reasoner. o3 is significantly more capable at similar or lower cost — there's no strong reason to use o1 in new projects. --- ### Same key for everything else The same `ifu_...` key that calls o3 also routes to: - GPT-4.1 / GPT-4.1-mini / GPT-4.1-nano - Claude Opus 4 / Sonnet 4 - Gemini 2.5 Flash / Pro - 118+ NVIDIA NIM open models at our open-model rate Switch between providers by changing one string. Useful when you want to compare o4-mini vs Claude Opus 4 on a hard task, or fall back to GPT-4.1-mini for cheaper follow-up calls after o3 generates the plan. --- ### Note on reasoning token behavior OpenAI's o-series models think before they respond — they generate internal reasoning tokens that count toward your usage but aren't returned in the output. InferAll passes the full response through unchanged, so `usage.completion_tokens` includes both reasoning and output tokens as reported by OpenAI. --- Sign up at [inferall.ai/keys](https://inferall.ai/keys) and fund a key with the $5 starter pack — usage credit you can spend on o3, o4-mini, or any other model at the provider's published rate with zero markup.

o3 and o4-mini API — OpenAI reasoning models via one key

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

One observability ship found three production bugs in five hours

DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible