What is pre-agentic data fetching?

Running the deterministic parts of a workflow (fetching metadata, reading files, listing branches) with plain scripts or CLI first, then handing the assembled context to the agent in one shot. The model only reasons about what genuinely needs reasoning, which removed up to 62% of token spend in GitHub's Auto-Triage workflow.

How do unused tools increase cost?

Every registered tool ships its schema on every API call, roughly 8-12 KB each in GitHub's measurement. With dozens of tools loaded, that overhead is on the bill before the agent does any work, and the extra candidates also reduce tool-selection accuracy.

What is a relevance gate?

A cheap conditional that decides whether to invoke the model at all. GitHub's Security Guard skips the LLM entirely for pull requests that do not touch security-sensitive files, so every irrelevant run costs nothing instead of a full agent invocation.

Why track cache-hit rate separately from usage?

Usage tells you how many tokens you sent; cache-hit rate tells you how many you actually paid full price for. Stable system prefixes can hit 80%+ cache reads, so without that metric you cannot tell whether caching is working or silently broken.

Does this only apply to CI workflows?

No. CI was GitHub's example, but the principles (do deterministic work without the model, trim the registry, gate invocations, cache stable prefixes) apply to any production agent. Treat every model call as a budget to defend rather than a free lookup.

The Cheapest LLM Call Is the One You Don't Make

In this post (3 sections)

In this post

GitHub published an instrumented analysis of their own agentic CI workflows last week and reported 19-62% reductions in API token cost across half a dozen production agents. The numbers are good. The technique is better, and most teams running agents in production are not yet doing the thing that produced those savings.

The cheapest LLM call is the one you do not make

That is GitHub's framing and it deserves to be on every wall of every team shipping agents. Their core finding: most "agent turns" in their CI workflows were doing deterministic work, fetching issue metadata, reading file contents, listing branches. Work that does not need a model.

The fix is pre-agentic data fetching: run the deterministic steps first with plain CLI or scripts, hand the assembled context to the agent in one shot, and let the model reason only about what genuinely needs reasoning. Their Auto-Triage workflow cut 62% of token spend doing exactly this, and it runs about 6.8 times a day, so the savings compound to millions of tokens per observation period.

The mindset shift is the hard part. "Use an agent" became the default reach for any task with a whiff of judgement in it, when "run a script, then call the agent once" is usually cheaper and more reliable. Every deterministic step you let the model perform is a step you pay for in tokens, latency, and a chance for the model to do it wrong.

Your tool registry is silently expensive

The number from the report most teams will not have measured: each unused MCP tool registration adds roughly 8-12 KB of schema overhead to every API call. GitHub had about 40 tools registered in one workflow. The cost of "we might use it later" tools shows up as a meaningful bill increase before the agent thinks a single thought, and it also lowers selection accuracy, as I cover in fix the registry, not the agent.

I have been writing about this for two years from a different angle: one tool, one purpose, descriptions are prompts, schemas matter. The cost data finally makes the case in numbers. If you have not audited your tool registry recently, do it this week.

Four levers GitHub used, ranked by how often teams miss them

Lever	What it does	Typical miss
Pre-agentic fetch	Deterministic work runs as scripts	Letting the model fetch and read
Registry trim	Removes per-call schema overhead	Keeping "just in case" tools loaded
Relevance gate	Skips the model on irrelevant runs	Invoking the model on every event
Cache-hit tracking	Proves the prefix is being reused	Measuring usage but not cache reads

How to apply this in IT services teams

Build a tiny audit script that logs input tokens, output tokens, and cache-hit rate per agent run. You cannot optimise what you cannot see, and most teams cannot see this today.
Audit the tool registry. Anything not called in the last 30 days, remove. The schema-overhead cost is real even when the tool is dormant.
For every agent loop, ask which of these turns are deterministic. Move those out to a pre-agentic step. The model should reason, not fetch.
Add a relevance gate before invoking the model at all. GitHub's Security Guard skips the LLM entirely for PRs that do not touch security-sensitive files. That is one cheap conditional saving every wasted run.
Track cache-hit rate per route as a first-class metric. GitHub's Contribution Check workflows hit 82-83% cache reads on input tokens. That is the target shape for stable system prompts.

Two of these levers have their own deep dives: the caching side is in prompt caching is not optional anymore, and the way you watch all of it in production is the agent observability stack we ship to every client. Model tiering, covered in Haiku 4.5 made our router 5x cheaper, stacks on top of these for further savings. For the one-glance tier map before you touch routing code, see stop paying frontier prices for classification.

CI workflow cost is not the headline story here. The bigger lesson is that the teams shipping production agents in 2026 are the ones who treat every LLM call as a budget to defend, not a free lookup. If you want a cost audit of your agent stack, that is one of the most common engagements I run in consulting.

The cheapest LLM call is the one you do not make. GitHub's 19-62% token cut, decoded

The cheapest LLM call is the one you do not make

Your tool registry is silently expensive

How to apply this in IT services teams

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

The cheapest LLM call is the one you do not make. GitHub's 19-62% token cut, decoded

The cheapest LLM call is the one you do not make

Your tool registry is silently expensive

How to apply this in IT services teams

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Claude Code Artifacts turn terminal output into live review pages: what Team and Enterprise buyers should pilot first

Agentjacking is real: poisoned Sentry errors can hijack Cursor, Claude Code, and Codex without touching your repo

The June 15 Claude billing change: Agent SDK credits, model retirement, and the checklist I run before anything breaks