OpenAI's GPT-4.1 family is now available through InferAll — the same OpenAI-compatible endpoint that already routes to Anthropic, Gemini, and 110+ free NVIDIA NIM models.
You get all three tiers with one key:
| Model | Input | Output | Best for |
|---|---|---|---|
| `gpt-4.1` | $2.00/M | $8.00/M | Complex reasoning, long context |
| `gpt-4.1-mini` | $0.40/M | $1.60/M | Most production workloads |
| `gpt-4.1-nano` | $0.10/M | $0.40/M | High-volume, latency-sensitive |
Prices are OpenAI's published list rates — InferAll passes them through at zero markup.
---
### Drop-in with the OpenAI SDK
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one free at inferall.ai/keys
)
# Full model — complex tasks
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Review this code for security issues: ..."}],
max_tokens=1024,
)
# Mini — most workloads, 5× cheaper
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Summarize this document in three bullets."}],
max_tokens=256,
)
# Nano — high-volume classification, routing, structured extraction
response = client.chat.completions.create(
model="gpt-4.1-nano",
messages=[{"role": "user", "content": "Classify this support ticket: ..."}],
max_tokens=64,
)
print(response.choices[0].message.content)
```
The `base_url` swap is the only change. Your existing OpenAI SDK code, LangChain pipelines, and LlamaIndex retrievers all work unchanged.
---
### TypeScript
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.inferall.ai/v1",
apiKey: process.env.INFERALL_API_KEY,
});
const response = await client.chat.completions.create({
model: "gpt-4.1-mini",
messages: [{ role: "user", content: "Explain async/await in one paragraph." }],
max_tokens: 200,
});
console.log(response.choices[0].message.content);
```
---
### Also new: o3 and o4-mini
The same deploy that brought GPT-4.1 also added OpenAI's reasoning models:
```python
# o3 — strong reasoning, slower
response = client.chat.completions.create(
model="o3",
messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}],
)
# o4-mini — faster reasoning, lower cost
response = client.chat.completions.create(
model="o4-mini",
messages=[{"role": "user", "content": "Debug this Python traceback: ..."}],
)
```
---
### Why route through InferAll
**One key, every provider.** The same `ifu_...` key routes to GPT-4.1, Claude Sonnet, Gemini Flash, and 110+ free NVIDIA models. You don't manage separate OpenAI, Anthropic, and Google credentials.
**Switch models without changing code.** Want to compare GPT-4.1-mini vs Claude Sonnet 4.6 on the same prompt? Change one string. The response shape is identical.
**Free trial, no card required.** New accounts get 200 free requests to try any model — including paid tiers. Card required only to continue past the trial.
Get your key at [inferall.ai/keys](https://inferall.ai/keys).
← Blog
GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano — via one API key
How to call OpenAI's GPT-4.1 family through InferAll's OpenAI-compatible endpoint. Try all three tiers — nano to full — with the same key, same SDK, no provider switching.
InferAll Team
2 min read
OpenAIGPT-4.1LLM APIOpenAI APIdeveloper toolsAI gateway
Share
Related
3 min read
Mistral Codestral 22B — free API for code generation
How to call Codestral 22B for free using any OpenAI-compatible SDK. Mistral's code-specialized model, hosted on NVIDIA NIM through InferAll. No credit card required.
3 min read
Free GPT-4 alternatives — open-source models via the OpenAI API
The top free open-source alternatives to GPT-4, callable with the same OpenAI SDK. No code changes, no credit card required. Hosted on NVIDIA NIM through InferAll.
2 min read
Google Gemma 4 31B — free API, no credit card
How to call Google's Gemma 4 31B for free using any OpenAI-compatible SDK. Hosted on NVIDIA NIM through InferAll. No billing setup, no credit card required.