← Blog

Use LangChain with free open-source LLMs — no credit card

How to use LangChain and LlamaIndex with free open-source LLMs via InferAll's OpenAI-compatible API. Two environment variables, no code changes, no credit card.

InferAll Team

3 min read
LangChainLlamaIndexfree LLM APIOpenAI APIopen sourceNVIDIA NIMAI gatewaydeveloper tools
If you're building with LangChain or LlamaIndex, you probably have OpenAI's API key hardcoded somewhere and an eye on your usage bill. You can replace it with free open-source models — Llama 3.3 70B, Nemotron 120B, Gemma 4, and more — with two environment variables. No code changes. No credit card. --- ### LangChain LangChain's `ChatOpenAI` accepts a custom `base_url`. Point it at InferAll: ```python from langchain_openai import ChatOpenAI # Before: ChatOpenAI(model="gpt-4o", openai_api_key="sk-...") # After: free open-source models, same code llm = ChatOpenAI( model="meta/llama-3.3-70b-instruct", # free, no card required base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) response = llm.invoke("What are the SOLID principles in software design?") print(response.content) ``` Or use environment variables so your code stays unchanged: ```bash export OPENAI_BASE_URL=https://api.inferall.ai/v1 export OPENAI_API_KEY=ifu_your_key_here ``` ```python from langchain_openai import ChatOpenAI # No changes to your existing code needed llm = ChatOpenAI(model="meta/llama-3.3-70b-instruct") ``` ### LangChain with chains and agents ```python from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser llm = ChatOpenAI( model="meta/llama-3.3-70b-instruct", base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", ) # Standard LangChain chains work unchanged prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful code assistant."), ("user", "{question}") ]) chain = prompt | llm | StrOutputParser() result = chain.invoke({"question": "How do I implement a binary search tree in Python?"}) print(result) ``` ### LangChain with streaming ```python for chunk in llm.stream("Explain gradient descent step by step."): print(chunk.content, end="", flush=True) ``` --- ### LlamaIndex LlamaIndex also uses the OpenAI client under the hood: ```python from llama_index.llms.openai import OpenAI from llama_index.core import Settings # Set InferAll as the LLM backend Settings.llm = OpenAI( model="meta/llama-3.3-70b-instruct", api_base="https://api.inferall.ai/v1", api_key="ifu_your_key_here", ) # Now use LlamaIndex normally — it routes through free models from llama_index.core import VectorStoreIndex, SimpleDirectoryReader documents = SimpleDirectoryReader("./data").load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() response = query_engine.query("What is this document about?") print(response) ``` --- ### Which free model for LangChain development? | Model | Use case | |---|---| | `meta/llama-3.3-70b-instruct` | General purpose, instruction following | | `nvidia/nemotron-3-super-120b-a12b` | Complex reasoning, longer context | | `qwen/qwen3-coder-480b-a35b-instruct` | Code generation and review | | `mistralai/codestral-22b-instruct-v0.1` | Fast code tasks | | `meta/llama-3.1-8b-instruct` | Speed-critical tasks | All free via NVIDIA NIM. ### Switch to paid models when you're ready to ship ```python # Development: free model llm = ChatOpenAI(model="meta/llama-3.3-70b-instruct", ...) # Production: swap to GPT-4o at OpenAI's published rate (zero markup) # Just change the model string — same base_url, same key llm = ChatOpenAI(model="anthropic/claude-sonnet-4-6", ...) # or gpt-4o ``` --- ### Get started [inferall.ai/keys](https://inferall.ai/keys) — no credit card required. 200 free requests to evaluate, then add a card to unlock the full free allowance (still $0) and paid providers at published rates with zero markup. See the [LLM API aggregator](/solutions/llm-api-aggregator) overview for full details on supported providers and models.