Google's Gemma 4 31B (`google/gemma-4-31b-it`) is available free via NVIDIA NIM through InferAll. No credit card, no billing setup — create a key and call it now. ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) response = client.chat.completions.create( model="google/gemma-4-31b-it", messages=[{"role": "user", "content": "What are Gemma 4's key improvements over Gemma 3?"}], max_tokens=512, ) print(response.choices[0].message.content) ``` --- ### What is Gemma 4? Gemma 4 is Google's fourth generation of open-weight foundation models. The 31B instruction-tuned variant (`gemma-4-31b-it`) offers strong performance on reasoning, coding, and instruction following — significantly more capable than the Gemma 3 family while remaining fully open-weight and free to run via NVIDIA NIM. Like all Gemma models, it's fully open-weight under Google's Gemma Terms of Use, available for commercial use, and hosted without charge on NVIDIA's DGX Cloud infrastructure. --- ### TypeScript / Node.js ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "google/gemma-4-31b-it", messages: [{ role: "user", content: "Write a Python function to parse JSON." }], }); console.log(response.choices[0].message.content); ``` ### Streaming ```python with client.chat.completions.create( model="google/gemma-4-31b-it", messages=[{"role": "user", "content": "Explain transformer attention in plain English."}], stream=True, ) as stream: for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` ### Claude Code / Cline / Cursor ```sh export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=ifu_your_key_here ``` Gemma 4 routes as the "sonnet" tier equivalent for Anthropic-compatible clients. --- ### Free Google models on InferAll | Model | Size | Notes | |---|---|---| | `google/gemma-4-31b-it` | 31B | Newest Gemma generation | | `google/gemma-3-12b-it` | 12B | Gemma 3, instruction-tuned | | `google/gemma-3-4b-it` | 4B | Fast, compact Gemma 3 | | `google/codegemma-7b` | 7B | Optimized for code | | `google/gemma-3n-e4b-it` | E4B | Gemma 3 Nano efficient | All are free on NVIDIA NIM. The [full model list](https://api.inferall.ai/ai/v1/models) is always live at the API. --- ### Compare with other free models ```python # Gemma 4 vs Llama 4 vs Nemotron — one prompt, three free models models = [ "google/gemma-4-31b-it", "meta/llama-4-maverick-17b-128e-instruct", "nvidia/nemotron-3-super-120b-a12b", ] for model in models: resp = client.chat.completions.create( model=model, messages=[{"role": "user", "content": "What year was the transformer paper published?"}], max_tokens=50, ) print(f"{model.split('/')[-1]}: {resp.choices[0].message.content.strip()}") ``` --- ### Get started [inferall.ai/keys](https://inferall.ai/keys) — sign up free, then activate via the $5 starter pack at [/billing](https://inferall.ai/billing). The $5 becomes spendable balance: 118+ open NIM models stay $0 in/out against it (within the free-plan daily request caps); premium providers (OpenAI, Anthropic, Google) bill at the provider's published per-token rate with zero markup.

Google Gemma 4 31B — free API at $0 in/out

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

NVIDIA Nemotron 3 Super 120B vs Claude Opus 4: when the free model is good enough

DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible