Meta's Llama 4 Maverick (`meta/llama-4-maverick-17b-128e-instruct`) is available free via NVIDIA NIM through InferAll. No credit card, no billing setup — create a key and start calling it now.
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one at inferall.ai/keys
)
response = client.chat.completions.create(
model="meta/llama-4-maverick-17b-128e-instruct",
messages=[{"role": "user", "content": "What makes Llama 4 Maverick different from Llama 3?"}],
max_tokens=512,
)
print(response.choices[0].message.content)
```
---
### What is Llama 4 Maverick?
Llama 4 Maverick is Meta's 17B active parameter Mixture of Experts (MoE) model with 128 experts (`17b-128e`). The MoE architecture activates a subset of its 128 expert networks per token, giving it performance significantly above its active parameter count while keeping inference costs low.
Maverick sits in the Llama 4 family alongside Llama 4 Scout (smaller, faster) and is optimized for instruction-following, reasoning, and code. It's available for free hosting on NVIDIA NIM, which InferAll routes to at zero cost.
---
### TypeScript / Node.js
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.inferall.ai/v1",
apiKey: process.env.INFERALL_API_KEY,
});
const response = await client.chat.completions.create({
model: "meta/llama-4-maverick-17b-128e-instruct",
messages: [{ role: "user", content: "Explain mixture of experts architectures." }],
});
console.log(response.choices[0].message.content);
```
### Streaming
```python
with client.chat.completions.create(
model="meta/llama-4-maverick-17b-128e-instruct",
messages=[{"role": "user", "content": "Walk me through the Llama 4 architecture."}],
stream=True,
) as stream:
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
```
### Claude Code / Cline / Cursor
Point any Anthropic-compatible client at InferAll and Llama 4 Maverick routes under the "opus" tier:
```sh
export ANTHROPIC_BASE_URL=https://api.inferall.ai
export ANTHROPIC_API_KEY=ifu_your_key_here
```
---
### Free models on InferAll (selection)
| Model | Size | Notes |
|---|---|---|
| `meta/llama-4-maverick-17b-128e-instruct` | 17B×128E MoE | Latest Llama generation |
| `meta/llama-3.3-70b-instruct` | 70B | Strong general purpose |
| `meta/llama-3.1-70b-instruct` | 70B | Stable workhorse |
| `nvidia/nemotron-3-super-120b-a12b` | 120B | NVIDIA's largest free model |
| `mistralai/mixtral-8x7b-instruct-v0.1` | 46.7B MoE | Fast, efficient |
| `google/gemma-3-12b-it` | 12B | Google's compact model |
The [full free model list](https://api.inferall.ai/ai/v1/models) is always available at the API — filter by `inputPerM: 0`.
---
### Get started
[inferall.ai/keys](https://inferall.ai/keys) — no credit card required. 200 free requests to evaluate, then add a card to unlock the full free allowance (still $0) and access paid providers at zero markup.
← Blog
Meta Llama 4 Maverick — free API, no credit card
How to call Llama 4 Maverick (17B×128E MoE) for free using any OpenAI-compatible SDK. Hosted on NVIDIA NIM, routed through InferAll. No credit card required.
InferAll Team
2 min read
Llama 4Meta AIfree LLM APINVIDIA NIMOpenAI APIopen sourcemixture of experts
Share
Related
2 min read
GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano — via one API key
How to call OpenAI's GPT-4.1 family through InferAll's OpenAI-compatible endpoint. Try all three tiers — nano to full — with the same key, same SDK, no provider switching.
3 min read
Mistral Codestral 22B — free API for code generation
How to call Codestral 22B for free using any OpenAI-compatible SDK. Mistral's code-specialized model, hosted on NVIDIA NIM through InferAll. No credit card required.
3 min read
Free GPT-4 alternatives — open-source models via the OpenAI API
The top free open-source alternatives to GPT-4, callable with the same OpenAI SDK. No code changes, no credit card required. Hosted on NVIDIA NIM through InferAll.