← Blog

Meta Llama 3.3 70B — free API, OpenAI-compatible

How to call Llama 3.3 70B for free using any OpenAI-compatible SDK. Hosted on NVIDIA NIM through InferAll. No credit card required.

InferAll Team

2 min read
Llama 3.3Meta AIfree LLM APINVIDIA NIMOpenAI APIopen source
Meta's Llama 3.3 70B (`meta/llama-3.3-70b-instruct`) is available free via NVIDIA NIM through InferAll. Llama 3.3 70B is the refined final iteration of the Llama 3.x 70B line — more instruction-following polish and better benchmark performance than 3.1 70B, at the same model size. ```python from openai import OpenAI client = OpenAI( base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys — no card required ) response = client.chat.completions.create( model="meta/llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Explain the difference between Llama 3.1, 3.3, and 4."}], max_tokens=400, ) print(response.choices[0].message.content) ``` --- ### Llama 3.3 70B vs 3.1 70B vs Llama 4 **Llama 3.1 70B** (`meta/llama-3.1-70b-instruct`) — the original stable 70B model. Widely tested, very reliable baseline. **Llama 3.3 70B** (`meta/llama-3.3-70b-instruct`) — the refined version. Better instruction following, improved math and reasoning, same 70B architecture. Use this when you need Llama 3.x reliability with better task performance. **Llama 4 Maverick** (`meta/llama-4-maverick-17b-128e-instruct`) — Meta's newest generation. Mixture of Experts architecture (17B active / 128 expert networks). Higher ceiling for complex tasks but different architecture; some developers stick with 3.3 for stability. All three are free on NVIDIA NIM through InferAll. --- ### TypeScript / Node.js ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.inferall.ai/v1", apiKey: process.env.INFERALL_API_KEY, }); const response = await client.chat.completions.create({ model: "meta/llama-3.3-70b-instruct", messages: [{ role: "user", content: "Summarize the key differences between REST and GraphQL." }], }); ``` ### Streaming ```python with client.chat.completions.create( model="meta/llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Write a guide to async programming in Python."}], stream=True, ) as stream: for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` ### Claude Code / Cline / Cursor ```sh export ANTHROPIC_BASE_URL=https://api.inferall.ai export ANTHROPIC_API_KEY=ifu_your_key_here ``` Llama 3.3 70B serves as the "sonnet-tier" model for Anthropic-compatible clients — balanced performance, free. --- ### Free Llama models on InferAll | Model | Size | Notes | |---|---|---| | `meta/llama-3.3-70b-instruct` | 70B | Best Llama 3.x, refined instruction following | | `meta/llama-4-maverick-17b-128e-instruct` | 17B×128E | Meta's newest generation (MoE) | | `meta/llama-3.1-70b-instruct` | 70B | Original 70B baseline | | `meta/llama-3.1-8b-instruct` | 8B | Fast, lightweight | | `meta/llama-3.2-90b-vision-instruct` | 90B | Vision + text | All free on NVIDIA NIM. --- ### Get started [inferall.ai/keys](https://inferall.ai/keys) — no credit card required. 200 free requests to evaluate, then add a card to unlock the full free allowance (still $0 within it) and paid providers at zero markup.