What is the difference between RAG and CAG?

RAG retrieves relevant content at request time from an external store. CAG keeps frequently-needed content in the prompt cache so it is reused across calls without a retrieval step. RAG trades a round-trip for unbounded scale; CAG trades context budget for speed and simplicity on stable content.

When should I choose RAG over CAG?

When the corpus is too large or grows unboundedly, when content changes frequently, or when different users must see different subsets under access control. Those are the cases where a fixed cached prefix cannot work.

When is CAG the better choice?

For stable reference material everyone needs, workloads with a high repeat-query rate against the same corpus, and latency-sensitive paths where a retrieval round-trip would cost more than reading the cached prefix.

Can I use RAG and CAG together?

Yes, and most production systems do. Cache the stable reference material and retrieve the volatile or per-user content. A simple rule is to cache anything that does not change within 24 hours and retrieve everything else.

How do I know if I got the RAG and CAG split right?

Measure cache-hit rate per route after you set the split. A low hit rate on content you expected to be stable means it is changing more than you thought or a dynamic value is breaking the cached prefix.

RAG vs CAG: How to Actually Decide

RAG vs CAG: how to actually decide

A decision framework from real implementations. RAG retrieves. CAG stores in cache. Knowing which to use, and when to combine both, determines whether your agent finds the right answer at the right cost.

Jigar JoshiAgentic AI Architect and Consultant

In this post (4 sections)

In this post

RAG retrieves at request time. CAG, for cache or context augmented generation, stores frequently-needed content in the prompt cache so it is reused across calls. They solve overlapping problems with different costs, and the choice is not ideological, it is a function of how your corpus behaves. This is the framework version of the question I raised in Opus 4.7's 1M context: RAG or just stuff it.

When to RAG

The corpus is too large to fit in context, or it grows unboundedly.
Content changes frequently, such as product catalogues or ticket queues.
Per-user access controls apply, where different users must see different subsets.

When to CAG

Stable reference material everyone needs, such as style guides, schemas, or framework docs.
A high repeat-query rate against the same corpus, so the cached prefix is reused constantly.
Latency-sensitive paths where a retrieval round-trip costs more than reading the cached prefix.

RAG and CAG on the dimensions that decide it

Dimension	RAG (retrieve)	CAG (cache)
Corpus size	Large or unbounded	Fits in the prompt budget
Volatility	Changes often	Stable for hours or days
Access control	Per-user subsets	Shared by everyone
Latency	Adds a retrieval round-trip	No round-trip after first call
Best repeat rate	Low or one-off	High

When to combine

Most production systems end up doing both. CAG the stable reference material, RAG the volatile or per-user content. The practical rule: cache anything that does not change in 24 hours, retrieve everything else. Then measure cache-hit rate, which tells you whether you got the split right; the instrumentation for that is in prompt caching is not optional anymore and the agent observability stack we ship.

Common mistakes

Caching per-user content into a shared prefix, which is both a cache-pollution problem and a data-leak risk.
Retrieving stable reference material on every call, paying a round-trip for content that never changes.
Choosing one approach for the whole system instead of splitting by how each slice of content behaves.
Picking a split and never measuring cache-hit rate to confirm it.

RAG and CAG are not rivals, they are two tools for two kinds of content. Get the split right and your agent finds the right answer at the right cost; get it wrong and you pay in latency, spend, or stale answers. Designing that split for a specific workload is a common consulting starting point.

RAG vs CAG: how to actually decide

When to RAG

When to CAG

When to combine

Common mistakes

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

RAG vs CAG: how to actually decide

When to RAG

When to CAG

When to combine

Common mistakes

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Cursor cloud subagents in 2026: /in-cloud, /babysit, and /automate without losing your local guardrails

Claude Fable 5 for agent builders: when the frontier model is worth the routing change

Agentic RAG vs vanilla RAG: why a Sufficient Context Agent beats retrieve-then-pray