If you're building with LangChain or LlamaIndex, you probably have OpenAI's API key hardcoded somewhere and an eye on your usage bill. You can replace it with free open-source models — Llama 3.3 70B, Nemotron 120B, Gemma 4, and more — with two environment variables. No code changes. No credit card.
---
### LangChain
LangChain's `ChatOpenAI` accepts a custom `base_url`. Point it at InferAll:
```python
from langchain_openai import ChatOpenAI
# Before: ChatOpenAI(model="gpt-4o", openai_api_key="sk-...")
# After: free open-source models, same code
llm = ChatOpenAI(
model="meta/llama-3.3-70b-instruct", # free, no card required
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here", # get one at inferall.ai/keys
)
response = llm.invoke("What are the SOLID principles in software design?")
print(response.content)
```
Or use environment variables so your code stays unchanged:
```bash
export OPENAI_BASE_URL=https://api.inferall.ai/v1
export OPENAI_API_KEY=ifu_your_key_here
```
```python
from langchain_openai import ChatOpenAI
# No changes to your existing code needed
llm = ChatOpenAI(model="meta/llama-3.3-70b-instruct")
```
### LangChain with chains and agents
```python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(
model="meta/llama-3.3-70b-instruct",
base_url="https://api.inferall.ai/v1",
api_key="ifu_your_key_here",
)
# Standard LangChain chains work unchanged
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful code assistant."),
("user", "{question}")
])
chain = prompt | llm | StrOutputParser()
result = chain.invoke({"question": "How do I implement a binary search tree in Python?"})
print(result)
```
### LangChain with streaming
```python
for chunk in llm.stream("Explain gradient descent step by step."):
print(chunk.content, end="", flush=True)
```
---
### LlamaIndex
LlamaIndex also uses the OpenAI client under the hood:
```python
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
# Set InferAll as the LLM backend
Settings.llm = OpenAI(
model="meta/llama-3.3-70b-instruct",
api_base="https://api.inferall.ai/v1",
api_key="ifu_your_key_here",
)
# Now use LlamaIndex normally — it routes through free models
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is this document about?")
print(response)
```
---
### Which free model for LangChain development?
| Model | Use case |
|---|---|
| `meta/llama-3.3-70b-instruct` | General purpose, instruction following |
| `nvidia/nemotron-3-super-120b-a12b` | Complex reasoning, longer context |
| `qwen/qwen3-coder-480b-a35b-instruct` | Code generation and review |
| `mistralai/codestral-22b-instruct-v0.1` | Fast code tasks |
| `meta/llama-3.1-8b-instruct` | Speed-critical tasks |
All free via NVIDIA NIM.
### Switch to paid models when you're ready to ship
```python
# Development: free model
llm = ChatOpenAI(model="meta/llama-3.3-70b-instruct", ...)
# Production: swap to GPT-4o at OpenAI's published rate (zero markup)
# Just change the model string — same base_url, same key
llm = ChatOpenAI(model="anthropic/claude-sonnet-4-6", ...) # or gpt-4o
```
---
### Get started
[inferall.ai/keys](https://inferall.ai/keys) — no credit card required. 200 free requests to evaluate, then add a card to unlock the full free allowance (still $0) and paid providers at published rates with zero markup. See the [LLM API aggregator](/solutions/llm-api-aggregator) overview for full details on supported providers and models.
← Blog
Use LangChain with free open-source LLMs — no credit card
How to use LangChain and LlamaIndex with free open-source LLMs via InferAll's OpenAI-compatible API. Two environment variables, no code changes, no credit card.
InferAll Team
3 min read
LangChainLlamaIndexfree LLM APIOpenAI APIopen sourceNVIDIA NIMAI gatewaydeveloper tools
Share
Related
2 min read
GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano — via one API key
How to call OpenAI's GPT-4.1 family through InferAll's OpenAI-compatible endpoint. Try all three tiers — nano to full — with the same key, same SDK, no provider switching.
3 min read
Mistral Codestral 22B — free API for code generation
How to call Codestral 22B for free using any OpenAI-compatible SDK. Mistral's code-specialized model, hosted on NVIDIA NIM through InferAll. No credit card required.
3 min read
Free GPT-4 alternatives — open-source models via the OpenAI API
The top free open-source alternatives to GPT-4, callable with the same OpenAI SDK. No code changes, no credit card required. Hosted on NVIDIA NIM through InferAll.