← Blog

Google Gemma 4 31B — free API, no credit card

How to call Google's Gemma 4 31B for free using any OpenAI-compatible SDK. Hosted on NVIDIA NIM through InferAll. No billing setup, no credit card required.

InferAll Team

2 min read
Gemma 4Google AIfree LLM APINVIDIA NIMOpenAI APIopen source
Google's Gemma 4 31B (`google/gemma-4-31b-it`) is available free via NVIDIA NIM through InferAll. No credit card, no billing setup — create a key and call it now. ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) response = client.chat.completions.create( model="google/gemma-4-31b-it", messages=[{"role": "user", "content": "What are Gemma 4's key improvements over Gemma 3?"}], max_tokens=512, ) print(response.choices[0].message.content) ``` --- ### What is Gemma 4? Gemma 4 is Google's fourth generation of open-weight foundation models. The 31B instruction-tuned variant (`gemma-4-31b-it`) offers strong performance on reasoning, coding, and instruction following — significantly more capable than the Gemma 3 family while remaining fully open-weight and free to run via NVIDIA NIM. Like all Gemma models, it's fully open-weight under Google's Gemma Terms of Use, available for commercial use, and hosted without charge on NVIDIA's DGX Cloud infrastructure. --- ### TypeScript / Node.js ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "google/gemma-4-31b-it", messages: [{ role: "user", content: "Write a Python function to parse JSON." }], }); console.log(response.choices[0].message.content); ``` ### Streaming ```python with client.chat.completions.create( model="google/gemma-4-31b-it", messages=[{"role": "user", "content": "Explain transformer attention in plain English."}], stream=True, ) as stream: for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` ### Claude Code / Cline / Cursor ```sh export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=ifu_your_key_here ``` Gemma 4 routes as the "sonnet" tier equivalent for Anthropic-compatible clients. --- ### Free Google models on InferAll | Model | Size | Notes | |---|---|---| | `google/gemma-4-31b-it` | 31B | Newest Gemma generation | | `google/gemma-3-12b-it` | 12B | Gemma 3, instruction-tuned | | `google/gemma-3-4b-it` | 4B | Fast, compact Gemma 3 | | `google/codegemma-7b` | 7B | Optimized for code | | `google/gemma-3n-e4b-it` | E4B | Gemma 3 Nano efficient | All are free on NVIDIA NIM. The [full model list](https://api.inferall.ai/ai/v1/models) is always live at the API. --- ### Compare with other free models ```python # Gemma 4 vs Llama 4 vs Nemotron — one prompt, three free models models = [ "google/gemma-4-31b-it", "meta/llama-4-maverick-17b-128e-instruct", "nvidia/nemotron-3-super-120b-a12b", ] for model in models: resp = client.chat.completions.create( model=model, messages=[{"role": "user", "content": "What year was the transformer paper published?"}], max_tokens=50, ) print(f"{model.split('/')[-1]}: {resp.choices[0].message.content.strip()}") ``` --- ### Get started [inferall.ai/keys](https://inferall.ai/keys) — no credit card required. 200 free requests to evaluate, then add a card to unlock the full free allowance (still $0) and paid providers at zero markup.