Anthropic just open-sourced the blueprint for scaling AI agents

They solved their own infrastructure mess, then published the fix.

Anthropic’s AI agents kept dying. Not dramatically — quietly. A container would fail, and with it, the entire session: the conversation history, the work in progress, everything. Engineers had to open shells inside containers holding user data just to figure out what went wrong. Every agent was a pet they couldn’t afford to lose.

So they tore the whole thing apart and rebuilt it around an idea borrowed from operating systems circa 1970.

The Decoupled Mind

The old way was breaking

Anthropic’s first agent architecture was straightforward: put everything in one container. The session log, the agent harness, and the sandbox all shared an environment. File edits were direct syscalls. No service boundaries to design.

But coupling everything into one container created what infrastructure engineers call a “pet” — a named, hand-tended individual you can’t afford to lose. When a container failed, the session was lost. When it went unresponsive, the only debugging window was a WebSocket event stream that couldn’t distinguish between a harness bug, a packet drop, or a container going offline.

The breaking point came when customers asked to connect Claude to their own virtual private cloud. The harness assumed everything it needed lived in the same container. That assumption became a wall.

The Pet That Died

Decouple the brain from the hands

The fix was to split the agent into three independent interfaces: the session (an append-only event log), the harness (the brain — Claude’s control loop), and the sandbox (the hands — where code actually runs).

Each became cattle instead of pets. If a container dies, the harness catches it as a tool-call error. Claude decides whether to retry, and a new container spins up from a standard recipe. If the harness itself crashes, a new one boots with wake(sessionId), pulls the event log, and resumes from the last event. Nothing needs to survive a crash because nothing important lives in any single component.

The interface connecting brain to hands is almost comically simple: execute(name, input) → string. The harness doesn’t know whether the sandbox is a Docker container, a phone, or — as Anthropic’s engineers note — a Pokémon emulator.

The Cattle Grid

The numbers are hard to argue with

The architectural change showed up immediately in the metrics. Median time-to-first-token dropped 60%. The 95th percentile dropped over 90%. That’s not an optimization. That’s the difference between an agent that feels responsive and one users close after 10 seconds.

The security improvement might matter more. In the old design, any code Claude generated ran in the same container as credentials. A prompt injection only had to convince Claude to read its own environment variables, and then you’re done. The new architecture keeps tokens out of the sandbox entirely. Git credentials get wired into the local remote during initialization. OAuth tokens sit in a vault, accessed through an MCP proxy that the harness never touches.

The Vault

The line most people will skip past

There’s a sentence in the engineering blog that I keep coming back to: “Harnesses encode assumptions that go stale as models improve.”

Here’s what they mean. Claude Sonnet 4.5 would wrap up tasks prematurely as it sensed its context limit. They called it “context anxiety.” So they added context resets to the harness. When they upgraded to Opus 4.5, the anxiety was gone. The resets were dead weight.

Think about what that means for anyone building agents right now. The specific harness you build today will be wrong tomorrow. Not because you made a mistake, but because models keep changing what they can handle on their own. Managed Agents is a meta-harness: opinionated about the interfaces (session, harness, sandbox) but deliberately agnostic about what runs behind them.

The Shifting Harness

What builders should take from this

The pattern Anthropic published isn’t unique to them. Anyone running AI agents at scale hits the same problems: containers that become pets, sessions that die with their hosts, credentials leaking into sandboxes, harnesses that calcify around model-specific workarounds.

Notion, Rakuten, and Sentry are already running production workloads on Managed Agents. Rakuten deployed enterprise agents across sales, marketing, and finance, integrated with Slack and Teams, in a single week. At $0.08 per session-hour plus standard token costs, a 4-6 hour agent session runs roughly $1.50-$3.50.

You don’t need Anthropic’s hosted service to apply the architecture. Externalize your session. Make your harness stateless. Treat sandboxes as disposable. That works with any agent framework. Across the 7 sources I read covering the launch, the technical community seems more excited about the pattern than the product.

The Many Minds

Where this leaves us

The 1970s solved a version of this problem. Operating systems virtualized hardware into abstractions — process, file — general enough for programs that didn’t exist yet. The read() command works the same whether it’s hitting a 1970s disk pack or a modern SSD.

Anthropic is betting that agent infrastructure needs the same treatment. Virtualize the components. Make the interfaces stable. Let the implementations change underneath without breaking everything above.

It’s a 50-year-old idea. I think they’re right that it’s the correct one for agents.

If you’re building agents and don’t want to deal with the infrastructure, Augmi deploys AI agents with one click. No container management, no credential plumbing. Start building at augmi.world.

Based on 7 sources including Anthropic’s engineering blog, The Register, The Decoder, TechRadar, Epsilla, DEV Community, and BuildFastWithAI.