The cheapest LLM call is the one you do not make. GitHub's 19-62% token cut, decoded
GitHub published an instrumented analysis of their agentic CI workflows and reported 19-62% token-cost reductions. The savings are the headline. The technique (pre-agentic data fetching and tool-registry hygiene) is the story most teams will miss.
In this post (3 sections)
GitHub published an instrumented analysis of their own agentic CI workflows last week and reported 19-62% reductions in API token cost across half a dozen production agents. The numbers are good. The technique is better, and most teams running agents in production are not yet doing the thing that produced those savings.
The cheapest LLM call is the one you do not make
That is GitHub's framing and it deserves to be on every wall of every team shipping agents. Their core finding: most "agent turns" in their CI workflows were doing deterministic work, fetching issue metadata, reading file contents, listing branches. Work that does not need a model.
The fix is pre-agentic data fetching: run the deterministic steps first with plain CLI or scripts, hand the assembled context to the agent in one shot, and let the model reason only about what genuinely needs reasoning. Their Auto-Triage workflow cut 62% of token spend doing exactly this, and it runs about 6.8 times a day, so the savings compound to millions of tokens per observation period.
The mindset shift is the hard part. "Use an agent" became the default reach for any task with a whiff of judgement in it, when "run a script, then call the agent once" is usually cheaper and more reliable. Every deterministic step you let the model perform is a step you pay for in tokens, latency, and a chance for the model to do it wrong.
Your tool registry is silently expensive
The number from the report most teams will not have measured: each unused MCP tool registration adds roughly 8-12 KB of schema overhead to every API call. GitHub had about 40 tools registered in one workflow. The cost of "we might use it later" tools shows up as a meaningful bill increase before the agent thinks a single thought, and it also lowers selection accuracy, as I cover in fix the registry, not the agent.
I have been writing about this for two years from a different angle: one tool, one purpose, descriptions are prompts, schemas matter. The cost data finally makes the case in numbers. If you have not audited your tool registry recently, do it this week.
How to apply this in IT services teams
- Build a tiny audit script that logs input tokens, output tokens, and cache-hit rate per agent run. You cannot optimise what you cannot see, and most teams cannot see this today.
- Audit the tool registry. Anything not called in the last 30 days, remove. The schema-overhead cost is real even when the tool is dormant.
- For every agent loop, ask which of these turns are deterministic. Move those out to a pre-agentic step. The model should reason, not fetch.
- Add a relevance gate before invoking the model at all. GitHub's Security Guard skips the LLM entirely for PRs that do not touch security-sensitive files. That is one cheap conditional saving every wasted run.
- Track cache-hit rate per route as a first-class metric. GitHub's Contribution Check workflows hit 82-83% cache reads on input tokens. That is the target shape for stable system prompts.
Two of these levers have their own deep dives: the caching side is in prompt caching is not optional anymore, and the way you watch all of it in production is the agent observability stack we ship to every client. Model tiering, covered in Haiku 4.5 made our router 5x cheaper, stacks on top of these for further savings. For the one-glance tier map before you touch routing code, see stop paying frontier prices for classification.
CI workflow cost is not the headline story here. The bigger lesson is that the teams shipping production agents in 2026 are the ones who treat every LLM call as a budget to defend, not a free lookup. If you want a cost audit of your agent stack, that is one of the most common engagements I run in consulting.
Agentic AI patterns, delivered Thursdays
What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.