If you're building with LangChain or LlamaIndex, you probably have OpenAI's API key hardcoded somewhere and an eye on your usage bill. You can route the same code to open-source models — Llama 3.3 70B, Nemotron 120B, Gemma 4, and more — for a fraction of the per-token cost, with two environment variables and no code changes. One `ifu_` key reaches both open NVIDIA NIM endpoints and every major premium provider at the provider's published rate (zero markup). --- ### LangChain LangChain's `ChatOpenAI` accepts a custom `base_url`. Point it at InferAll: ```python from langchain_openai import ChatOpenAI # Before: ChatOpenAI(model="gpt-4o", openai_api_key="sk-...") # After: open-source model at NIM rate, same code llm = ChatOpenAI( model="meta/llama-3.3-70b-instruct", # NVIDIA NIM open model base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", # get one at inferall.ai/keys ) response = llm.invoke("What are the SOLID principles in software design?") print(response.content) ``` Or use environment variables so your code stays unchanged: ```bash export OPENAI_BASE_URL=https://api.inferall.ai/v1 export OPENAI_API_KEY=ifu_your_key_here ``` ```python from langchain_openai import ChatOpenAI # No changes to your existing code needed llm = ChatOpenAI(model="meta/llama-3.3-70b-instruct") ``` ### LangChain with chains and agents ```python from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser llm = ChatOpenAI( model="meta/llama-3.3-70b-instruct", base_url="https://api.inferall.ai/v1", api_key="ifu_your_key_here", ) # Standard LangChain chains work unchanged prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful code assistant."), ("user", "{question}") ]) chain = prompt | llm | StrOutputParser() result = chain.invoke({"question": "How do I implement a binary search tree in Python?"}) print(result) ``` ### LangChain with streaming ```python for chunk in llm.stream("Explain gradient descent step by step."): print(chunk.content, end="", flush=True) ``` --- ### LlamaIndex LlamaIndex also uses the OpenAI client under the hood: ```python from llama_index.llms.openai import OpenAI from llama_index.core import Settings # Set InferAll as the LLM backend Settings.llm = OpenAI( model="meta/llama-3.3-70b-instruct", api_base="https://api.inferall.ai/v1", api_key="ifu_your_key_here", ) # Now use LlamaIndex normally — routes through NIM open models at NIM rate from llama_index.core import VectorStoreIndex, SimpleDirectoryReader documents = SimpleDirectoryReader("./data").load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() response = query_engine.query("What is this document about?") print(response) ``` --- ### Which open model for LangChain development? | Model | Use case | |---|---| | `meta/llama-3.3-70b-instruct` | General purpose, instruction following | | `nvidia/nemotron-3-super-120b-a12b` | Complex reasoning, longer context | | `qwen/qwen3-coder-480b-a35b-instruct` | Code generation and review | | `mistralai/codestral-22b-instruct-v0.1` | Fast code tasks | | `meta/llama-3.1-8b-instruct` | Speed-critical tasks | All on NVIDIA NIM at our open-model rate — the cheapest tier in the gateway. ### Switch to premium models when production demands it ```python # Development / high-volume inner loop: open model on NIM llm = ChatOpenAI(model="meta/llama-3.3-70b-instruct", ...) # Production hard task: swap to Claude Sonnet at Anthropic's published rate (zero markup) # Just change the model string — same base_url, same key llm = ChatOpenAI(model="anthropic/claude-sonnet-4-6", ...) # or gpt-4o ``` --- ### Get started Sign up at [inferall.ai/keys](https://inferall.ai/keys) and fund a key with the $5 starter pack — that $5 becomes usage credit you can spend on any model (open or premium) at the provider's published rate with zero markup. See the [LLM API aggregator](/solutions/llm-api-aggregator) overview for full details on supported providers and models.

Use LangChain with open-source LLMs — one API key for everything

Run Claude Code with 200 free requests via NVIDIA NIM — 60-second setup

NVIDIA Nemotron 3 Super 120B vs Claude Opus 4: when the free model is good enough

One observability ship found three production bugs in five hours