← Blog

Gemini 2.5 Flash API — via one unified key

How to call Google's Gemini 2.5 Flash through InferAll's OpenAI-compatible endpoint. Same SDK, same key as your other models. No Google Cloud setup required.

InferAll Team

3 min read
GeminiGoogle AIGemini 2.5 FlashLLM APIOpenAI APIdeveloper tools
Google's Gemini 2.5 Flash is available through InferAll — no Google Cloud project, no separate API key, no per-project billing setup. Your `ifu_...` key routes to Gemini the same way it routes to GPT-4.1, Claude Sonnet, and free NVIDIA models. **Gemini 2.5 Flash pricing via InferAll:** | Model | Input | Output | |---|---|---| | `gemini-2.5-flash` | $0.15/M | $0.60/M | | `gemini-2.5-pro` | $1.25/M | $10.00/M | | `gemini-2.0-flash` | $0.10/M | $0.40/M | Google's published rates, zero markup. --- ### OpenAI SDK (drop-in) ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one free at inferall.ai/keys ) response = client.chat.completions.create( model="gemini-2.5-flash", messages=[ {"role": "user", "content": "Summarize the key differences between REST and GraphQL."} ], max_tokens=512, ) print(response.choices[0].message.content) ``` No changes to your existing OpenAI code — just point `base_url` at InferAll. --- ### TypeScript ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "gemini-2.5-flash", messages: [{ role: "user", content: "Write a regex to validate an email address." }], max_tokens: 256, }); console.log(response.choices[0].message.content); ``` --- ### LangChain ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="gemini-2.5-flash", openai_api_base="https://api.inferall.ai/v1", openai_api_key="ifu_your_key_here", ) response = llm.invoke("What is the capital of France?") print(response.content) ``` --- ### Vision (multimodal) Gemini 2.5 Flash supports images. Pass base64-encoded images in the message content: ```python import base64 with open("screenshot.png", "rb") as f: image_data = base64.b64encode(f.read()).decode() response = client.chat.completions.create( model="gemini-2.5-flash", messages=[{ "role": "user", "content": [ {"type": "text", "text": "What's wrong with this UI?"}, {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_data}"}}, ], }], max_tokens=512, ) ``` --- ### When to use Gemini 2.5 Flash Gemini 2.5 Flash is in the same cost tier as GPT-4o-mini ($0.15/$0.60) but with a much larger default context window and strong performance on structured extraction, summarization, and multilingual tasks. It's a good default for: - **High-volume pipelines** where per-request cost matters - **Document processing** — long-context summarization, extraction - **Multilingual apps** — strong coverage across languages - **Vision tasks** — image understanding, chart analysis When you need stronger reasoning, step up to `gemini-2.5-pro` (still via the same key and endpoint). --- ### Why not use Google AI Studio directly? You can — but you'd manage a separate Google API key, a separate billing account, and separate client configuration for every project. InferAll gives you one key that routes to Gemini, Claude, GPT-4.1, and free NVIDIA models. Switch between providers by changing one string. Useful when you're benchmarking, building fallback chains, or running multiple projects. Free trial: 200 requests, no credit card. Get your key at [inferall.ai/keys](https://inferall.ai/keys).