Google's Gemini 2.5 Flash is available through InferAll — no Google Cloud project, no separate API key, no per-project billing setup. Your `ifu_...` key routes to Gemini the same way it routes to GPT-4.1, Claude Sonnet, and free NVIDIA models.
**Gemini 2.5 Flash pricing via InferAll:**
| Model | Input | Output |
|---|---|---|
| `gemini-2.5-flash` | $0.15/M | $0.60/M |
| `gemini-2.5-pro` | $1.25/M | $10.00/M |
| `gemini-2.0-flash` | $0.10/M | $0.40/M |
Google's published rates, zero markup.
---
### OpenAI SDK (drop-in)
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one free at inferall.ai/keys
)
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[
{"role": "user", "content": "Summarize the key differences between REST and GraphQL."}
],
max_tokens=512,
)
print(response.choices[0].message.content)
```
No changes to your existing OpenAI code — just point `base_url` at InferAll.
---
### TypeScript
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.inferall.ai/v1",
apiKey: process.env.INFERALL_API_KEY,
});
const response = await client.chat.completions.create({
model: "gemini-2.5-flash",
messages: [{ role: "user", content: "Write a regex to validate an email address." }],
max_tokens: 256,
});
console.log(response.choices[0].message.content);
```
---
### LangChain
```python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gemini-2.5-flash",
openai_api_base="https://api.inferall.ai/v1",
openai_api_key="ifu_your_key_here",
)
response = llm.invoke("What is the capital of France?")
print(response.content)
```
---
### Vision (multimodal)
Gemini 2.5 Flash supports images. Pass base64-encoded images in the message content:
```python
import base64
with open("screenshot.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's wrong with this UI?"},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_data}"}},
],
}],
max_tokens=512,
)
```
---
### When to use Gemini 2.5 Flash
Gemini 2.5 Flash is in the same cost tier as GPT-4o-mini ($0.15/$0.60) but with a much larger default context window and strong performance on structured extraction, summarization, and multilingual tasks. It's a good default for:
- **High-volume pipelines** where per-request cost matters
- **Document processing** — long-context summarization, extraction
- **Multilingual apps** — strong coverage across languages
- **Vision tasks** — image understanding, chart analysis
When you need stronger reasoning, step up to `gemini-2.5-pro` (still via the same key and endpoint).
---
### Why not use Google AI Studio directly?
You can — but you'd manage a separate Google API key, a separate billing account, and separate client configuration for every project. InferAll gives you one key that routes to Gemini, Claude, GPT-4.1, and free NVIDIA models. Switch between providers by changing one string. Useful when you're benchmarking, building fallback chains, or running multiple projects.
Free trial: 200 requests, no credit card. Get your key at [inferall.ai/keys](https://inferall.ai/keys).
← Blog
Gemini 2.5 Flash API — via one unified key
How to call Google's Gemini 2.5 Flash through InferAll's OpenAI-compatible endpoint. Same SDK, same key as your other models. No Google Cloud setup required.
InferAll Team
3 min read
GeminiGoogle AIGemini 2.5 FlashLLM APIOpenAI APIdeveloper tools
Share
Related
3 min read
Llama 3.1 70B — free API, OpenAI-compatible, no credit card
How to call Meta Llama 3.1 70B for free through InferAll's OpenAI-compatible endpoint. Hosted on NVIDIA NIM, $0 within the free tier, works with the OpenAI SDK you already have.
3 min read
o3 and o4-mini API — OpenAI reasoning models via one key
How to call OpenAI's o3 and o4-mini reasoning models through InferAll's OpenAI-compatible endpoint. Same SDK, same key — no separate API access needed.
3 min read
Claude Opus 4 and Sonnet 4 — via one API key
How to call Claude Opus 4, Sonnet 4, and Haiku 4 through InferAll's Anthropic-compatible endpoint. Same SDK you already use — just change the base URL.