DeepSeek V4 is one of the strongest open-weight model families for reasoning, coding, and agentic work — and through InferAll you can call it **free** via NVIDIA NIM. Both tiers are available: `deepseek-ai/deepseek-v4-pro` for maximum capability and `deepseek-ai/deepseek-v4-flash` for cost-efficient, lower-latency work. No credit card, $0 within the free tier, and it works with the OpenAI SDK you already have.
| Model | Best for |
|---|---|
| `deepseek-ai/deepseek-v4-pro` | Maximum reasoning, coding, agentic tasks |
| `deepseek-ai/deepseek-v4-flash` | Cost-efficient, lower-latency, high-volume |
Both are $0 on the free NVIDIA NIM tier.
---
### Quick start (Python)
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one at inferall.ai/keys — no card required
)
# Flash — fast, cost-efficient, great default
response = client.chat.completions.create(
model="deepseek-ai/deepseek-v4-flash",
messages=[{"role": "user", "content": "Refactor this function for readability: ..."}],
max_tokens=512,
)
# Pro — for the hardest reasoning / agentic tasks
response = client.chat.completions.create(
model="deepseek-ai/deepseek-v4-pro",
messages=[{"role": "user", "content": "Design a rate limiter with a sliding window. Explain the tradeoffs."}],
max_tokens=1024,
)
print(response.choices[0].message.content)
```
The only change from calling OpenAI directly is the `base_url`. Your existing code, streaming, and tool use all work unchanged.
---
### TypeScript
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.inferall.ai/v1",
apiKey: process.env.INFERALL_API_KEY,
});
const response = await client.chat.completions.create({
model: "deepseek-ai/deepseek-v4-flash",
messages: [{ role: "user", content: "Write a SQL migration to add a nullable column." }],
max_tokens: 400,
});
console.log(response.choices[0].message.content);
```
---
### Why call DeepSeek V4 through InferAll
**It's genuinely free.** DeepSeek V4 runs on NVIDIA NIM (NVIDIA Inference Microservices) on their DGX Cloud infrastructure, which InferAll exposes at $0. No inference cost to pass through, so it stays free within the allowance — no credit card to start.
**One key for DeepSeek and everything else.** The same `ifu_...` key also calls Qwen 3.5, GLM-5.1, Kimi K2.6, Llama 3.1/3.3 70B, and 110+ more free open models — plus paid GPT-4.1, Claude Opus 4, and Gemini 2.5 when you need a frontier model. Switch by changing one string; no juggling provider keys.
**OpenAI-compatible.** Standard `chat.completion` responses, streaming, tool use, and JSON mode — all working with whatever OpenAI client you already have. Moving from `gpt-4o-mini` to `deepseek-ai/deepseek-v4-flash` is a one-line model-string change.
---
### Pro vs Flash — which to use
Start with **Flash**. It handles the majority of coding, refactoring, summarization, and structured-output tasks at lower latency, and it's free. Step up to **Pro** when you hit a genuinely hard reasoning or multi-step agentic problem where you can feel Flash struggling. Since both are $0 on the free tier, the only real cost of using Pro is latency — so use Flash by default and reserve Pro for the hard cases.
---
### Compare it yourself
The best way to pick a model is to watch several answer the same prompt. The full free roster is one call away — `curl https://api.inferall.ai/ai/v1/models` — so you never hardcode a list that goes stale.
Free trial: 200 requests, no credit card. Get your key at [inferall.ai/keys](https://inferall.ai/keys).
← Blog
DeepSeek V4 — free API (Pro & Flash), OpenAI-compatible, no credit card
How to call DeepSeek V4 Pro and V4 Flash for free through InferAll's OpenAI-compatible endpoint. Hosted on NVIDIA NIM, $0 within the free tier, works with the OpenAI SDK you already have.
InferAll Team
3 min read
DeepSeekDeepSeek V4free LLM APINVIDIA NIMOpenAI APIopen sourcedeveloper tools
Share
Related
3 min read
Gemini 2.5 Flash API — via one unified key
How to call Google's Gemini 2.5 Flash through InferAll's OpenAI-compatible endpoint. Same SDK, same key as your other models. No Google Cloud setup required.
3 min read
Llama 3.1 70B — free API, OpenAI-compatible, no credit card
How to call Meta Llama 3.1 70B for free through InferAll's OpenAI-compatible endpoint. Hosted on NVIDIA NIM, $0 within the free tier, works with the OpenAI SDK you already have.
3 min read
o3 and o4-mini API — OpenAI reasoning models via one key
How to call OpenAI's o3 and o4-mini reasoning models through InferAll's OpenAI-compatible endpoint. Same SDK, same key — no separate API access needed.