← Blog

GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano — via one API key

How to call OpenAI's GPT-4.1 family through InferAll's OpenAI-compatible endpoint. Try all three tiers — nano to full — with the same key, same SDK, no provider switching.

InferAll Team

2 min read
OpenAIGPT-4.1LLM APIOpenAI APIdeveloper toolsAI gateway
OpenAI's GPT-4.1 family is now available through InferAll — the same OpenAI-compatible endpoint that already routes to Anthropic, Gemini, and 110+ free NVIDIA NIM models. You get all three tiers with one key: | Model | Input | Output | Best for | |---|---|---|---| | `gpt-4.1` | $2.00/M | $8.00/M | Complex reasoning, long context | | `gpt-4.1-mini` | $0.40/M | $1.60/M | Most production workloads | | `gpt-4.1-nano` | $0.10/M | $0.40/M | High-volume, latency-sensitive | Prices are OpenAI's published list rates — InferAll passes them through at zero markup. --- ### Drop-in with the OpenAI SDK ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one free at inferall.ai/keys ) # Full model — complex tasks response = client.chat.completions.create( model="gpt-4.1", messages=[{"role": "user", "content": "Review this code for security issues: ..."}], max_tokens=1024, ) # Mini — most workloads, 5× cheaper response = client.chat.completions.create( model="gpt-4.1-mini", messages=[{"role": "user", "content": "Summarize this document in three bullets."}], max_tokens=256, ) # Nano — high-volume classification, routing, structured extraction response = client.chat.completions.create( model="gpt-4.1-nano", messages=[{"role": "user", "content": "Classify this support ticket: ..."}], max_tokens=64, ) print(response.choices[0].message.content) ``` The `base_url` swap is the only change. Your existing OpenAI SDK code, LangChain pipelines, and LlamaIndex retrievers all work unchanged. --- ### TypeScript ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "gpt-4.1-mini", messages: [{ role: "user", content: "Explain async/await in one paragraph." }], max_tokens: 200, }); console.log(response.choices[0].message.content); ``` --- ### Also new: o3 and o4-mini The same deploy that brought GPT-4.1 also added OpenAI's reasoning models: ```python # o3 — strong reasoning, slower response = client.chat.completions.create( model="o3", messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}], ) # o4-mini — faster reasoning, lower cost response = client.chat.completions.create( model="o4-mini", messages=[{"role": "user", "content": "Debug this Python traceback: ..."}], ) ``` --- ### Why route through InferAll **One key, every provider.** The same `ifu_...` key routes to GPT-4.1, Claude Sonnet, Gemini Flash, and 110+ free NVIDIA models. You don't manage separate OpenAI, Anthropic, and Google credentials. **Switch models without changing code.** Want to compare GPT-4.1-mini vs Claude Sonnet 4.6 on the same prompt? Change one string. The response shape is identical. **Free trial, no card required.** New accounts get 200 free requests to try any model — including paid tiers. Card required only to continue past the trial. Get your key at [inferall.ai/keys](https://inferall.ai/keys).