The core move is to route by step class, not to pick one model for the whole agent. An open-source model and a frontier model have different sweet spots, and an agent conveniently separates into steps that map onto each.
The high-volume steps — classify an input, summarize a tool result, decide whether to keep looping, format the final answer, route to the right sub-task — run well on a free OSS model. They're the calls your agent makes the most of, so they're where the free allowance does the most work. Pin a model like meta/llama-3.1-8b-instruct for the fast, frequent ones and meta/llama-3.1-70b-instruct when a step needs more capability but still isn't the hard part.
The hard steps — decompose an ambiguous goal into an ordered plan, reason over a large heterogeneous context, recover from a failed multi-step sequence — are where you want a frontier model. Set the model to claude-sonnet-4-6 or gpt-4o for just that call, then drop back to the free model for the rest of the loop. Because the hard steps are the rare ones, you pay frontier rates on a small slice of traffic while the free tier absorbs the bulk.
For the heavier planning calls you can also reach for claude-opus-4-8 or a long-context Gemini model like gemini-2.5-pro (or gemini-2.5-flash when you want a cheaper frontier option). All of them are the same one-line model change through the same key and the same base URL.