Meta's Llama 3.3 70B (`meta/llama-3.3-70b-instruct`) is available free via NVIDIA NIM through InferAll. Llama 3.3 70B is the refined final iteration of the Llama 3.x 70B line — more instruction-following polish and better benchmark performance than 3.1 70B, at the same model size.
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one at inferall.ai/keys — no card required
)
response = client.chat.completions.create(
model="meta/llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Explain the difference between Llama 3.1, 3.3, and 4."}],
max_tokens=400,
)
print(response.choices[0].message.content)
```
---
### Llama 3.3 70B vs 3.1 70B vs Llama 4
**Llama 3.1 70B** (`meta/llama-3.1-70b-instruct`) — the original stable 70B model. Widely tested, very reliable baseline.
**Llama 3.3 70B** (`meta/llama-3.3-70b-instruct`) — the refined version. Better instruction following, improved math and reasoning, same 70B architecture. Use this when you need Llama 3.x reliability with better task performance.
**Llama 4 Maverick** (`meta/llama-4-maverick-17b-128e-instruct`) — Meta's newest generation. Mixture of Experts architecture (17B active / 128 expert networks). Higher ceiling for complex tasks but different architecture; some developers stick with 3.3 for stability.
All three are free on NVIDIA NIM through InferAll.
---
### TypeScript / Node.js
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.inferall.ai/v1",
apiKey: process.env.INFERALL_API_KEY,
});
const response = await client.chat.completions.create({
model: "meta/llama-3.3-70b-instruct",
messages: [{ role: "user", content: "Summarize the key differences between REST and GraphQL." }],
});
```
### Streaming
```python
with client.chat.completions.create(
model="meta/llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Write a guide to async programming in Python."}],
stream=True,
) as stream:
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
```
### Claude Code / Cline / Cursor
```sh
export ANTHROPIC_BASE_URL=https://api.inferall.ai
export ANTHROPIC_API_KEY=ifu_your_key_here
```
Llama 3.3 70B serves as the "sonnet-tier" model for Anthropic-compatible clients — balanced performance, free.
---
### Free Llama models on InferAll
| Model | Size | Notes |
|---|---|---|
| `meta/llama-3.3-70b-instruct` | 70B | Best Llama 3.x, refined instruction following |
| `meta/llama-4-maverick-17b-128e-instruct` | 17B×128E | Meta's newest generation (MoE) |
| `meta/llama-3.1-70b-instruct` | 70B | Original 70B baseline |
| `meta/llama-3.1-8b-instruct` | 8B | Fast, lightweight |
| `meta/llama-3.2-90b-vision-instruct` | 90B | Vision + text |
All free on NVIDIA NIM.
---
### Get started
[inferall.ai/keys](https://inferall.ai/keys) — no credit card required. 200 free requests to evaluate, then add a card to unlock the full free allowance (still $0 within it) and paid providers at zero markup.
← Blog
Meta Llama 3.3 70B — free API, OpenAI-compatible
How to call Llama 3.3 70B for free using any OpenAI-compatible SDK. Hosted on NVIDIA NIM through InferAll. No credit card required.
InferAll Team
2 min read
Llama 3.3Meta AIfree LLM APINVIDIA NIMOpenAI APIopen source
Share
Related
2 min read
GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano — via one API key
How to call OpenAI's GPT-4.1 family through InferAll's OpenAI-compatible endpoint. Try all three tiers — nano to full — with the same key, same SDK, no provider switching.
3 min read
Mistral Codestral 22B — free API for code generation
How to call Codestral 22B for free using any OpenAI-compatible SDK. Mistral's code-specialized model, hosted on NVIDIA NIM through InferAll. No credit card required.
3 min read
Free GPT-4 alternatives — open-source models via the OpenAI API
The top free open-source alternatives to GPT-4, callable with the same OpenAI SDK. No code changes, no credit card required. Hosted on NVIDIA NIM through InferAll.