AI Engineering.Turning AI prototypes into production systems.

Cursor cloud subagents in 2026: /in-cloud, /babysit, and /automate without losing your local guardrails

Cursor 3.7 lets you spin subagents in cloud VMs with /in-cloud, iterate on a PR until merge-ready with /babysit, and hand off between local and cloud sessions. Cursor 3.8 adds /automate and five GitHub review triggers. Here is the workflow I use so parallel cloud work does not bypass Auto-review, environment snapshots, or pre-push /review.

Jun 18, 202613 min

Agentjacking is real: poisoned Sentry errors can hijack Cursor, Claude Code, and Codex without touching your repo

Tenet Threat Labs injected a fake stack trace through a public Sentry DSN and watched 100+ coding agents execute attacker commands during normal triage. No git write access required. The agent treats the error as ground truth. Here is how I harden observability MCP feeds, scope triage prompts, and block auto-exec on untrusted telemetry.

Jun 17, 202613 min

The June 15 Claude billing change: Agent SDK credits, model retirement, and the checklist I run before anything breaks

Two Anthropic changes land on the same day: programmatic Claude usage moves to a separate monthly credit pool, and claude-opus-4-20250514 plus claude-sonnet-4-20250514 stop answering on the API. Interactive Claude Code is fine. Cron jobs and CI agents are not. Here is how I audit auth paths, claim credits, and grep for retiring model IDs before the first failed run.

Jun 15, 202614 min

Governing agent autonomy in 2026: Auto-review, pre-push review, and why approval prompts are not a security model

Cursor made Auto-review the default run mode and shipped /review so Bugbot runs before you push. Together they treat agent autonomy as a dial: low-stakes actions flow, high-stakes actions slow down. Here is how I wire that pattern into local agents, SDK headless runs, and CI without mistaking convenience for a hard security boundary.

Jun 11, 202614 min

Claude Fable 5 for agent builders: when the frontier model is worth the routing change

Anthropic shipped Claude Fable 5 on June 9: a Mythos-class model with tiered safeguards, mandatory 30-day retention on traffic, and $10/$50 per-million pricing. Days later access was suspended globally pending export-control review. Even if you never touched Fable, the launch tells you how frontier routing, retention policy, and governance will work for agent builders in the second half of 2026.

Jun 9, 202614 min

Agentic RAG vs vanilla RAG: why a Sufficient Context Agent beats retrieve-then-pray

Google Research shipped Agentic RAG on Gemini Enterprise with a Sufficient Context Agent that refuses to answer when retrieval is incomplete. On factuality benchmarks they report up to 34% higher accuracy versus standard RAG. Here is when one-shot RAG is still enough, when you need iterative retrieval, and how I wire the pattern without blowing latency budgets.

Jun 6, 202614 min

Agentic transformation is an operating-model problem, not a model problem

Microsoft published a 6-step playbook for rolling agents out across an enterprise, and the line that matters is "you do not need a bigger model, you need a better operating model." That matches what I see in consulting: the pilots that die do not die on model quality, they die on ownership, evals, and governance. Here is how I read the playbook for IT services teams, and the operating-model gaps that actually stall agent rollouts.

Jun 4, 202611 min

The anatomy of an AI agent: memory, tools, the loop, and guardrails

Strip the hype off an AI agent and four parts are left: a memory, a set of tools, a loop that decides what to do next, and a guardrail that vets every action before it runs. Here is what each part is for, the order they fail in, and where I have written about fixing each one.

Jun 2, 202610 min

Your coding agent has amnesia. Persistent memory is the fix.

Claude Code forgets your architecture, your decisions, and why you ruled things out the moment a session ends. The reliability tax is not tokens, it is re-establishing context every morning. Here is what persistent agent memory actually is, how an open-source engine like Cortex implements it, and how to evaluate a memory layer for your own agents.

May 29, 202611 min

Your agent's supply chain is the attack surface now

A poisoned VS Code extension spent eighteen minutes on the marketplace and walked off with Claude Code credentials and MCP configs. The model was never the target. Your agent's supply chain is: the extensions, skills, MCP servers, tool definitions, and keys it is allowed to touch. Here is how I harden all four layers, and the checklist I run on every deployment.

May 27, 202612 min

Read the post MCP

MCP just went stateless: what the 2026 spec release candidate changes for your servers

The biggest revision of MCP since 1.0 locked as a release candidate on May 21. The protocol goes stateless, extensions move out of the core, and authorization finally speaks OAuth properly. Most of your servers keep working. Here is what actually changes, what breaks, and the migration I would run in the ten weeks before the final spec lands.

May 26, 202611 min

Read the post Multi-Agent

Your agents aren't broken, your tools are: three questions to ask before you build one

When an agent misbehaves, almost everyone reaches for the prompt or the model. The fault is usually further down, in a tool that does too much, lies when it fails, or buries the answer in a wall of raw data. An AI tool is not a function. It is a contract the model has to trust. Here are the three questions I run before writing a single line of any tool.

May 25, 202611 min

Inside Recruiting Atelier: a runnable reference for the primitives of an agentic system

A working open studio that vets duplicates, plans the run, screens, scores, shortlists, and notifies. The whole pipeline lives in roughly ninety lines of supervisor code and a tool registry you can read in one sitting. Here is what is inside, why every piece is there, and what you can copy into your own stack.

May 24, 202614 min

How an agentic studio screens, scores and shortlists candidates for your hiring team

Open Recruiting Atelier and you do not see a generic AI dashboard. You see five named specialists doing the work a screening team would do: catching duplicates, checking the brief, scoring on four dimensions, ranking, drafting the dispatch. Drop one CV or fifty. Click any candidate to see exactly why they landed where they did. This is what AI for recruitment looks like when it respects your judgment instead of replacing it.

May 24, 202610 min

Code agents vs skill agents: when to give an agent the keyboard and when to give it the toolbox

Two ways to let an agent act in the world. Code agents write fresh code into a sandbox. Skill agents pick from a curated menu. The choice should be made in the kickoff, not the postmortem. Here is the framing I use with clients, the four axes where they diverge, and the hybrid pattern most production systems become.

May 22, 202611 min

Tool registry design for agentic AI: how the wrong registry kills accuracy before the prompt is read

I reviewed a system last month with 47 tools in its registry and a 22 percent wrong-tool-selection rate. The team was about to migrate from Sonnet to Opus to fix it. The prompt was fine. The registry was the bug. This is the audit pattern I run on every client codebase before we change anything else, the seven failure modes I see in production, and the numbers from the cleanup.

May 22, 202612 min

AI agent vs agentic AI: what the distinction actually means when you ship one

Vendors blur the line because "agentic" sells. The two terms describe different architectures, with different cost shapes, different observability needs, and different scoping conversations. Here is the framing I use with clients and the three-question test for which one your project actually needs.

May 22, 202612 min

Gemini 3.5 Flash vs Sonnet 4.6: should you re-route your agent stack?

Google shipped 3.5 Flash this week with a "frontier intelligence plus action" pitch and a 4x output-tokens-per-second claim. If your routing layer is on Sonnet 4.6 today, this is the week to re-benchmark. Here is what I am actually moving, what I am leaving alone, and the cost-per-completed-task maths nobody is doing in public.

May 20, 202610 min

MCP governance just became a product: what Databricks Unity AI Gateway changes for enterprise agents

Every enterprise MCP deployment I have audited in the last six months has been hand-rolling tool-access policy, payload logging, and per-team cost limits on top of a gateway someone wrote in two days. Databricks just shipped that as a product. Here is what it actually changes, where the gaps still are, and the migration I would run for a Databricks shop.

May 20, 202612 min

The vector store is not your agent's memory

A new survey from BigAI-NLCO splits LLM memory into three layers. Most production agents I review have built the middle one, called it memory, and skipped the layer on top. Here is what the taxonomy actually buys you.

May 17, 20266 min

Tool descriptions are prompts. Fix the registry, not the agent.

When an agent picks the wrong tool, the registry is broken, not the agent. Three rules I now apply before debugging anything in a multi-tool system: precise names, "when to use" triggers, and a curated load list. Anthropic's new tool-selection telemetry finally puts numbers on what changes accuracy.

May 13, 20269 min

The cheapest LLM call is the one you do not make. GitHub's 19-62% token cut, decoded

GitHub published an instrumented analysis of their agentic CI workflows and reported 19-62% token-cost reductions. The savings are the headline. The technique (pre-agentic data fetching and tool-registry hygiene) is the story most teams will miss.

May 11, 20269 min

Read the post Multi-Agent

Claude Opus 4.7's 1M context: when to RAG and when to just stuff it

A million tokens reliably is real now, but it does not retire RAG. It changes the calculus. Cost, latency, recency, and the prompt-cache angle nobody is talking about.

May 6, 20268 min

Read the post MCP

MCP 1.0 is here. What changes for the servers you already wrote

The protocol stabilised. Most working servers will keep working. Three places the new spec actually requires changes (auth profile, server registry, streaming-response semantics) with diffs from a real migration.

May 1, 20268 min

Why I am replacing supervisor patterns with handoffs

Supervisors looked clean on paper and shipped slow in production. Handoffs read messier in the code but recover better when an agent loses the plot. Two real systems and where supervisors still earn their keep.

Apr 26, 20268 min

Prompt caching is not optional anymore. Measuring a 47% cost drop

A walkthrough from a client engagement: identifying stable prefixes, restructuring the system prompt for cacheability, and the telemetry that proved caching was actually working.

Apr 19, 20267 min

Tool descriptions are prompts. Stop treating them like docstrings

A docstring tells a developer what a function does. A tool description tells a model when to call it. Different audience, different writing. Six concrete edits that lifted tool-call accuracy.

Apr 8, 20268 min

The agent observability stack we ship to every client

Traces, spans, evals, cost-per-completed-task, and the one dashboard panel that catches 80% of regressions. Vendor-agnostic; covers Langfuse, Honeycomb, and rolling your own.

Mar 28, 20268 min

Read the post Multi-Agent

Three patterns I broke in 2025, and what I do instead now

Self-correction loops without budgets, single-agent solutions to multi-domain problems, and using JSON mode to force structure I should have built into the schema. An honest review.

Mar 14, 20268 min

Haiku 4.5 made our router 5x cheaper. The trade-off matters

Replacing Sonnet with Haiku in the dispatcher role cut our orchestration cost dramatically. It also cost us in two specific places I did not predict.

Feb 22, 20267 min

Read the post MCP

Why every team's first MCP server should be "list-files"

Smallest useful server. Hardest one to mess up. Teaches the protocol without distracting domain logic. The 60-line server we hand to teams during training.

Feb 4, 20267 min

Read the post Prompt Engineering

Eval datasets: stop testing your agents on the happy path

If your eval set is the demos you showed the client, you are testing the wrong thing. How we build evals from production failures and the minimum viable suite to ship.

Jan 19, 20268 min

I was wrong about JSON mode. Here is what changed my mind

For two years I told teams to avoid forced JSON outputs and use structured tool calls. That was right then and partially wrong now. Schema enforcement got better, latency penalties got smaller.

Dec 12, 20257 min

Why your agent keeps failing after 3 steps

The exit condition problem nobody talks about. Most agents are built for the happy path, where every tool call succeeds and the task completes cleanly. Real production agents are different.

Nov 8, 20257 min

The one rule for designing agent tools that actually work

One tool, one purpose. Every tool that does two things will fail you on the third call. I have watched this pattern fail in every team I have trained, and the fix is the same refactor.

Oct 17, 20257 min

RAG vs CAG: how to actually decide

A decision framework from real implementations. RAG retrieves. CAG stores in cache. Knowing which to use, and when to combine both, determines whether your agent finds the right answer at the right cost.

Sep 21, 20257 min

Read the post

8 carousel notes

Visual breakdowns on AI Engineering

Production

Agentic AI content quality: 5 agents, one pipeline.

Separate eval from rewrite, route models per agent, guard inputs and outputs. Run it on every page before publish.

Architecture

Stop paying frontier prices for classification.

Four model tiers. Build the router agent first. Same quality, up to 10x cost spread if you route wrong.

Multi-Agent

Sequential or parallel? Draw the flow.

Most multi-agent systems pick the wrong execution flow. One question tells you which to use.

Architecture

Wrong memory. Dead agent.

Four memory types. Four use cases. Pick wrong and your agent forgets, hallucinates, or costs 10x.

Tooling

Your agents aren't broken. Your tools are.

An AI tool is not a function. It is a strict contract.

Architecture

Your agent has no memory. That is the problem.

Three memory types fix this permanently. Context is temporary. Memory is permanent.

Tooling

You have 3 tools. Are you using them correctly?

Cursor drafts. Claude ships. Copilot reviews. One job each — no overlap.

Tool Design

Your agent called the wrong tool.

Fix the description. Not the agent.

See the full AI Engineering update feed

62 ship-news updates

Latest in AI Engineering

Claude

Fable 5 included subscription access window closes today: Anthropic planned June 23 removal to usage credits, but global suspension since June 12 still blocks all access

June 22, 2026 · via Anthropic

Tools

Claude Code ships Artifacts in beta: Team and Enterprise sessions publish live, org-private review pages that update in place at a claude.ai URL

June 19, 2026 · via Claude Code Docs

Tools

Claude Code v2.1.183 tightens auto-mode safety: blocks destructive git resets, agent-amend commits, and infrastructure destroy unless you asked for them

June 19, 2026 · via Claude Code

Tools

GitHub Copilot usage metrics API adds ai_credits_used per user for enterprise and org-level attribution

June 19, 2026 · via GitHub

Open Source

Xiaomi MiMo V2-Flash and TTS endpoints auto-route to MiMo-V2.5 on June 18: legacy model IDs retire June 30

June 18, 2026 · via Xiaomi MiMo

Tools

Cursor Automations add the /automate skill, five GitHub review triggers, and computer-use demos for always-on cloud agents

June 18, 2026 · via Cursor

Tools

Cursor adds /in-cloud subagents, /babysit for PR iteration, and reliable handoff between local and cloud agent sessions

June 17, 2026 · via Cursor

Architecture

Tenet demonstrates Agentjacking: a poisoned Sentry error report hijacks Cursor, Claude Code, and Codex into running attacker code with no repo compromise

June 17, 2026 · via Tenet Security

Open Source

Zhipu ships GLM-5.2: MIT open weights, 1M context, and Anthropic-compatible API for long-horizon coding agents

June 16, 2026 · via Z.ai

Enterprise solutions

How AI Engineering ships in our engagements

The pages below are the buyer-focused, conversion-grade versions of this topic — deliverables, methodology, ROI, security considerations, and CTAs to scope a real engagement.

Explore the Agentic AI Consulting solution

Agentic AI Consulting

Designed, built, and handed off — production agentic systems for enterprise teams.

Explore the MCP Integration solution

MCP Integration

Custom Model Context Protocol servers that turn your systems into agent tools.

Explore the AI Guardrails solution

AI Guardrails

Multi-layer safety, policy, and audit controls for agents in regulated environments.

Explore the AI Systems Engineering Training solution

AI Systems Engineering Training

Eight-day corporate training programs that take dev teams from AI-assisted coding to production agentic systems.

Explore the Enterprise AI Architecture solution

Enterprise AI Architecture

Reference architectures for organisations standing up an AI platform — not one agent, but the foundation for many.

Explore the AI Observability solution

AI Observability

Tracing, eval, cache-hit telemetry, and cost attribution for production agents.

Explore the Multi-Agent Workflows solution

Multi-Agent Workflows

Supervisor + handoff orchestration for portfolios of agents that need to cooperate without arguing.