Solution

Many agents. One reliable outcome.Supervisor + handoff orchestration for production portfolios.

One agent works. Two agents talk. Five agents argue forever unless the orchestration is right. We design the orchestration patterns that make multi-agent systems reliable — supervisor for the plan, parallel sub-agents for the read phase, handoffs for graceful recovery — and ship them with the per-step observability that lets you debug across agent boundaries.

4–5×
speedup on parallelisable read-heavy workflows
Per-agent
eval scoring at every boundary
Single
replayable trace across all agents in the workflow
Graceful
recovery via handoff context, not silent fail
Use cases

Where multi-agent pays back

PR review pipelines

Parallel reviewer + security + test-generator agents triggered on every PR; consolidated comment back within 90 seconds.

Content + SEO pipelines

Research → outline → write → review with role-specialised agents.

Document analysis at scale

Splitter → parallel section-readers → integrator. Adversarial documents handled without context blow-up.

Customer ops triage

Classifier (Haiku) → specialist agents per category → handoff back to human or system action.

Industries served
IT ServicesEnterprise SoftwareContent + MarketingCustomer Ops
System architecture

How the system is wired

Supervisor + parallel sub-agent pattern
Supervisorplans the workFan-outparallel sub-agentsIntegratormerges resultsValidatorcross-checks outputActioncommit · post · escalate
Technology

Multi-agent technology stack

Routing modelClaude Haiku 4.5 — cheap, fast, surprisingly accurateSpecialist agentsSonnet 4.6 per role — review, security, test, draft, etc.OrchestrationSupervisor + handoff patterns · parallel sub-agent executionShared MCPCanonical tools available to all agents in the workflowObservabilityPer-agent trace · cross-agent integration trace · score deltas
Methodology

Multi-agent delivery methodology

01

Role decomposition

Identify the specialists. Each agent gets one clear job and one scoped tool registry.

02

Orchestration choice

Supervisor for known workflows. Handoffs for recovery-critical paths. Swarm only for genuinely parallel exploration.

03

Integration trace design

Per-agent trace plus an integration trace that surfaces the handoff context. Otherwise multi-agent debugging is guesswork.

04

Eval at every boundary

Each agent has its own eval set. The integration step has an end-to-end eval. Drift in either is visible.

05

Production rollout

Behind feature flags per agent. Parallel run with the existing manual workflow before cutover. Cost and accuracy compared explicitly.

Security & scalability

Multi-agent security & scale

Per-agent scopes

Each agent gets the minimum tool registry it needs. The reviewer cannot deploy. The deployer cannot read PII.

Bounded fan-out

Parallel sub-agent counts are bounded by configuration, not by the supervisor's imagination.

Cross-agent audit

A single audit trail covers the full workflow even when 5 agents touched it.

Integrations

Multi-agent integration surface

  • Shared MCP servers across all agents in the workflow
  • Queue-based fan-out for high-throughput pipelines
  • GitHub Actions / GitLab CI / custom orchestrators
  • Langfuse for per-agent and integration traces
Business impact

Why multi-agent beats one-big-agent

A single agent loaded with every tool gets worse at every step. Specialists with narrow scopes are more accurate, cheaper, and easier to debug.

4–5×
speedup on read-parallel workflows
~30%
lower cost per completed task vs. one-big-agent
< 90 s
PR pipeline end-to-end on a typical change
Case studies

How recent engagements actually shipped

IT Services · 6 weeks discovery → handoff

PR review pipeline cuts senior-engineer time 4×

Mid-market IT services firm · Ahmedabad · 180 engineers

Problem

Senior engineers were spending 8–12 hours per week each on first-pass PR review across a 6-team monorepo. Junior PRs waited 2+ days for sign-off; velocity stalled; the highest-judgement people were doing the lowest-judgement work.

Solution

A multi-agent CI workflow triggered on every PR open. Three specialist agents run in parallel — a reviewer (Claude Sonnet 4.6) for code-correctness and convention, a security agent for risk patterns, and a test-generator agent for coverage gaps. Outputs are consolidated into a single PR comment within 90 seconds. Humans review the agent's synthesis, not the raw diff.

Claude Sonnet 4.6 (reviewCustom MCP server: GitHub APIGitHub ActionsLangfuse traces
~36 hrs/wk
senior engineer time reclaimed across the team
< 3 days
payback period at loaded-cost rate
review throughput per senior engineer
0
production regressions traced to AI-passed reviews in 90 days
Read the full case study
Workshop / Public Build · 1 day · 8 hours hands-on

The Agentic Operating System — workshop build

AIMED · public workshop · ~40 engineers

Problem

Most teams meeting agentic AI for the first time get stuck on one of three blockers: tool design, orchestration choice, and the gap between a working demo and a system that survives Monday morning. The AIMED workshop format compresses the answers into one day of hands-on building.

Solution

A day-long live build of "the Agentic Operating System" — a multi-agent shell with a supervisor (planning, decomposition), handoff agents (parallel reads, sequenced writes), shared tool registry via MCP, and observability wired in from line one. Every attendee leaves with a running shell on their own laptop, the source, and the patterns to extend it.

Claude Sonnet 4.6 + Haiku 4.5 (free Claude Code tier worked)Three MCP servers built from scratch: filesPython supervisor + handoff context passingLangfuse traces from the first agent call
40
engineers shipped a running multi-agent shell on their own laptops
3
MCP servers per attendee, written from scratch
8 hrs
concept to working artefact
Read the full case study
Open-Source / Research · 3 weeks weekend builds

Multi-agent research synthesis — open PoC for swarm vs supervisor

Public R&D · open-source on GitHub

Problem

Every team building multi-agent systems faces the same orchestration question and answers it from intuition, not measurement. "Supervisor is cleaner" vs "swarm is faster" gets stated as fact in a hundred conference talks without a single side-by-side benchmark anyone can reproduce. This PoC builds and measures both, on a task with a defensible ground truth.

Solution

A reproducible benchmark: same task (synthesise a literature review across 12 papers on a given topic), same model, same MCP tool registry, same eval rubric. Three runners — single-agent (baseline), supervisor pattern, swarm pattern — each scored on factuality, citation accuracy, coverage, and cost. Code + eval data + raw runs all open-sourced.

Claude Sonnet 4.6 (all three runners use the same model)Custom MCP servers: paper-fetchThree parallel implementations sharing the same tool registryOpen eval rubric
Read the full case study
Deep dives

Read what we publish on this

Production

Claude Code Artifacts turn terminal output into live review pages: what Team and Enterprise buyers should pilot first

Artifacts in Claude Code beta publish self-contained HTML to claude.ai that republishes to the same URL as the session progresses, with version history and org-only sharing. Strict CSP, no external fetch, no backend. Requires Team or Enterprise and claude.ai login. Here is the workflow I use for PR walkthroughs and incident timelines without screenshot threads in Slack.

Read the post
MCP

MCP Enterprise-Managed Authorization is stable: how IdP-provisioned connector access replaces per-server OAuth hell

EMA makes the organization IdP the decision-maker for which MCP servers a user can reach. Admins enable connectors once; clients exchange an Identity Assertion JWT for scoped tokens without redirecting every employee through OAuth per server. Anthropic ships it across Claude, Claude Code, and Cowork; VS Code supports it; Okta is the first IdP. Here is the pilot I run before July 28 stateless transport work lands.

Read the post
Architecture

Cursor cloud subagents in 2026: /in-cloud, /babysit, and /automate without losing your local guardrails

Cursor 3.7 lets you spin subagents in cloud VMs with /in-cloud, iterate on a PR until merge-ready with /babysit, and hand off between local and cloud sessions. Cursor 3.8 adds /automate and five GitHub review triggers. Here is the workflow I use so parallel cloud work does not bypass Auto-review, environment snapshots, or pre-push /review.

Read the post
Production

Agentjacking is real: poisoned Sentry errors can hijack Cursor, Claude Code, and Codex without touching your repo

Tenet Threat Labs injected a fake stack trace through a public Sentry DSN and watched 100+ coding agents execute attacker commands during normal triage. No git write access required. The agent treats the error as ground truth. Here is how I harden observability MCP feeds, scope triage prompts, and block auto-exec on untrusted telemetry.

Read the post
Frequently asked

Multi-Agent Workflows — questions buyers ask

Map your first multi-agent workflow

Bring the workflow you want to automate. We sketch the agent boundaries, the orchestration pattern, and the cost / latency envelope in a 60-minute session.