Meta's Llama 3.3 70B (`meta/llama-3.3-70b-instruct`) is available via NVIDIA NIM through InferAll at our open-model rate — the cheapest tier in the gateway. Llama 3.3 70B is the refined final iteration of the Llama 3.x 70B line — more instruction-following polish and better benchmark performance than 3.1 70B, at the same model size. ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) response = client.chat.completions.create( model="meta/llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Explain the difference between Llama 3.1, 3.3, and 4."}], max_tokens=400, ) print(response.choices[0].message.content) ``` --- ### Llama 3.3 70B vs 3.1 70B vs Llama 4 **Llama 3.1 70B** (`meta/llama-3.1-70b-instruct`) — the original stable 70B model. Widely tested, very reliable baseline. **Llama 3.3 70B** (`meta/llama-3.3-70b-instruct`) — the refined version. Better instruction following, improved math and reasoning, same 70B architecture. Use this when you need Llama 3.x reliability with better task performance. **Llama 4 Maverick** (`meta/llama-4-maverick-17b-128e-instruct`) — Meta's newest generation. Mixture of Experts architecture (17B active / 128 expert networks). Higher ceiling for complex tasks but different architecture; some developers stick with 3.3 for stability. All three are open models on NVIDIA NIM through InferAll, at the same open-model rate. --- ### TypeScript / Node.js ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "meta/llama-3.3-70b-instruct", messages: [{ role: "user", content: "Summarize the key differences between REST and GraphQL." }], }); ``` ### Streaming ```python with client.chat.completions.create( model="meta/llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Write a guide to async programming in Python."}], stream=True, ) as stream: for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` ### Claude Code / Cline / Cursor ```sh export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=ifu_your_key_here ``` Llama 3.3 70B serves as the "sonnet-tier" model for Anthropic-compatible clients — balanced performance at the open-model rate. --- ### Llama models on InferAll | Model | Size | Notes | |---|---|---| | `meta/llama-3.3-70b-instruct` | 70B | Best Llama 3.x, refined instruction following | | `meta/llama-4-maverick-17b-128e-instruct` | 17B×128E | Meta's newest generation (MoE) | | `meta/llama-3.1-70b-instruct` | 70B | Original 70B baseline | | `meta/llama-3.1-8b-instruct` | 8B | Fast, lightweight | | `meta/llama-3.2-90b-vision-instruct` | 90B | Vision + text | All on NVIDIA NIM at our open-model rate. --- ### Get started Sign up at [inferall.ai/keys](https://inferall.ai/keys) and fund a key with the $5 starter pack — that $5 becomes usage credit you can spend on any model (Llama, GPT, Claude, Gemini, NIM open models) at the provider's published rate with zero markup.

Meta Llama 3.3 70B — OpenAI-compatible API

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

NVIDIA Nemotron 3 Super 120B vs Claude Opus 4: when the free model is good enough

DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible