Anthropic is giving Claude agents something closer to a work diary than a bigger brain: a new ”Dreaming” system that reviews past sessions, spots repeated mistakes, and turns the lessons into playbooks for future tasks. The feature is part of Claude Managed Agents, and it is meant to help Claude improve at the same kinds of tasks over time.
The company unveiled the feature at its Code with Claude conference in San Francisco, alongside two other tools now in public beta: Outcomes, which checks work against quality criteria, and Multi-Agent Orchestration, which splits complex jobs across specialist agents. Dreaming is the headliner because it tries to solve a problem that keeps showing up in real deployments: long agent sessions can drift, degrade, and quietly get worse the more they do.
How Dreaming works inside Claude Managed Agents
Dreaming does not retrain the model or rewrite its weights. Instead, it runs as a separate background process that periodically combs through previous sessions, looks for recurring patterns, failures, and successful tactics, and converts them into notes and structured instructions. In Anthropic’s framing, the agent is not learning like a neural network so much as building its own operating manual.
That distinction matters. Memory tools usually store conversation context or user preferences; this is more like postmortem analysis. Anthropic’s pitch is that an agent can gradually become better at the same class of task without the expensive, risky loop of model retraining every time it stumbles.
The Lumara demo shows the pitch in miniature
To make the idea less abstract, Anthropic demoed Dreaming with a fictional aerospace startup called Lumara, building autonomous drones meant to land on the Moon. Several agents handled different parts of the job: landing site selection, navigation, and mission success. After a run of mediocre simulations, Dreaming was switched on overnight and produced a detailed landing playbook. The next day, the simulation results improved.
The demo is obviously staged, but the underlying strategy is familiar across the AI industry. OpenAI, Google, and others have all been pushing agentic systems toward longer-running workflows with tools, sub-agents, and verification layers. The difference here is that Anthropic is selling persistence: not just doing the task, but learning how to do that task better the next time.
Why Anthropic is leaning on multi-agent systems
Anthropic also argues that one agent checking its own work inside a long chain of reasoning is less reliable than a separate checker operating in a clean context window. That is a neat way of saying that AI, like people, gets sloppy when it tries to be planner, worker, and reviewer all at once. Multi-Agent Orchestration is meant to avoid that by letting a lead agent hand off sub-tasks to specialists with their own tools and memory limits.
- Dreaming: reviews old sessions and builds playbooks from mistakes and successes
- Outcomes: checks work against predefined quality standards
- Multi-Agent Orchestration: divides complex tasks across specialized agents
That push lines up with a broader industry trend: more modular systems, more verification, and less faith that a single giant prompt can carry the whole burden. It also reflects a practical constraint. When agents run for longer stretches, they are more likely to lose focus, miss edge cases, or optimistically declare victory over something they barely solved.
The real bet is autonomous work, not smarter chat
Anthropic CEO Dario Amodei said the company’s growth has been far faster than expected, with usage and revenue running at roughly 80 times annualized levels instead of the planned tenfold pace, which is why compute remains tight. The company also said it is partnering with SpaceX to expand infrastructure through the Colossus data center. That is a reminder that the AI race is no longer just about benchmark scores; it is about who can keep agents online, productive, and profitable without falling apart mid-task.
If Dreaming works beyond the demo stage, the next fight will not be whether Claude can answer faster. It will be whether these agents can accumulate useful experience fast enough to justify being trusted with real production work, and whether that ”sleep mode” turns into a durable advantage or just another clever layer on top of brittle automation.

