- Claude Opus 4.8 writes its own orchestration scripts and runs hundreds of agents in parallel. You describe the goal; Claude designs the workflow, dispatches the workers, and cross-checks the results.
- Opus 4.8 is roughly 4x less likely than 4.7 to let code flaws pass unflagged, scores 84% on Online-Mind2Web, and leads the Legal Agent Benchmark. Fast mode runs at 2.5x speed and 3x lower cost; base pricing stays at $5 input and $25 output per million tokens.
- Dynamic workflows fix three documented failure modes: agentic laziness, self-preferential bias, and goal drift. The plan moves out of the context window and into a script the runtime executes.
Describe the goal. Claude designs the factory. That is the real story of Claude Opus 4.8 and dynamic workflows: tell Claude "refactor all 500 files in this codebase to use the new auth pattern" or "scan every API endpoint for security vulnerabilities," and it writes the plan, splits the work, dispatches the workers, cross-checks the results, and brings you a verified answer.
On the surface, Anthropic's release looked like a developer feature: Claude can now write orchestration scripts that run hundreds of agents in parallel. What it actually represents is bigger. For the first time, a model is intelligent enough to design its own work structure on the fly instead of relying on a human to predefine how the task should be broken down.
What Is New in Opus 4.8
Before getting to dynamic workflows, it is worth understanding the model underneath them. Opus 4.8 is a meaningful step up from 4.7 across several dimensions that matter in production.
Benchmark improvements. Opus 4.8 now holds the highest score on the Legal Agent Benchmark. It is the first model to break 10% on the all-pass standard, the hardest version of that test. It scores 84% on Online-Mind2Web, a browser automation benchmark built on real production tasks rather than synthetic setups. That is a meaningful jump from 4.7. Scores on Terminal-Bench 2.1 for coding tasks also improved.
Honesty and reliability. The reliability improvement is arguably more important than the benchmark numbers. Opus 4.8 is approximately 4x less likely than 4.7 to let code flaws pass without flagging them. It proactively surfaces uncertainties rather than producing confident-sounding answers that turn out to be wrong. Lower rates of misaligned behaviour across safety benchmarks as well.
Speed and cost. Fast mode operates at 2.5x speed and costs 3x less than previous fast modes. Pricing for the base model is unchanged: $5 per million input tokens, $25 per million output tokens.
The Problem Dynamic Workflows Solve
To understand why dynamic workflows matter, you first need to understand why single-agent approaches fall apart on large tasks. This is not a hardware problem or a cost problem. It is a structural problem, and it shows up in three specific failure modes documented in the official Anthropic release notes.
Claude stops before finishing a complex multi-part task and declares the job done after partial progress. It reviewed 20 of the 50 security issues and reported back. The other 30 were never touched.
Claude prefers its own results when asked to verify them. If you ask the same instance that did the work to check the work, it tends to agree with itself, even when it was wrong. The review is not independent.
Over a very long session, the original objective gets compressed and distorted as the context window fills. Edge-case requirements, constraints, and specific formatting rules gradually disappear from working memory.
These are not edge cases. They are the predictable outcome of asking one agent to hold a large plan in its context window while simultaneously executing it. The plan and the execution compete for the same limited space.
Dynamic workflows solve all three by moving the plan out of Claude's context window and into a JavaScript script that the runtime executes. Different agents are assigned different subtasks, each with their own context window. An adversarial agent can be assigned to check another agent's work without knowing it is reviewing a peer. Goal drift is impossible when the goal is in the script, not the context.
How Dynamic Workflows Actually Work
A dynamic workflow is a JavaScript script that Claude writes on the fly for your specific task. You do not write the script. You describe what you want. Claude reads the task, designs the orchestration, writes the script, and then the runtime executes it, running dozens or hundreds of agents in parallel while your session stays free.
The key difference from how AI agents worked before:
| Old approach | Dynamic workflow |
|---|---|
| Claude holds the plan in its context | Plan lives in the script |
| All results land in Claude's context window | Results stored in script variables |
| Scale: a few delegated tasks per turn | Scale: dozens to hundreds of agents per run |
| If interrupted, restarts from scratch | Resumable from where it stopped |
To trigger a dynamic workflow, you include the word "workflow" in your prompt. Claude detects it, writes the script, and begins execution. You can watch it run using /workflows: each phase, agent count, token spend, and elapsed time.
For maximum effort mode, set /effort ultracode. Claude then decides on its own when a task warrants a full workflow rather than single-agent execution.
What You Can Actually Do With This
This is where it gets concrete. The following examples are drawn directly from the Anthropic documentation and early user reports, not cherry-picked demos.
What you can describe and Claude will execute
- "Run a workflow to audit every API endpoint for missing auth checks." Claude reviews the entire codebase in parallel, not sequentially.
- "Use a workflow to rename our User model to Account everywhere." A 500-file migration, cross-checked for consistency.
- "Go through my last 50 sessions and mine them for corrections I keep making, turn recurring ones into CLAUDE.md rules." Learns from your own behaviour across sessions.
- "Go through #incidents in Slack for the past 6 months and find recurring root causes where nobody has filed a ticket." Cross-functional signal mining.
- "Take my business plan and run a workflow where different agents tear it apart from an investor's, a customer's, and a competitor's perspective." Adversarial review from independent angles.
- "Here's a folder of 80 resumes. Use a workflow to rank them for the backend role and double-check the top ten." Structured evaluation with verification.
The Patterns Claude Uses to Build Workflows
Claude does not invent a new structure for every task. It draws on a small set of orchestration patterns, choosing the one that fits. Understanding these patterns helps you describe tasks in ways that produce better workflows.
Classify-and-act. Claude first classifies items (which files need migration, which tickets are security-related, which resumes meet the criteria) and then sends only the relevant items to the appropriate agents. This avoids wasting compute on work that does not need it. Think of it as triage before execution.
Fan-out and aggregate. Claude fans out the same task to dozens of agents working in parallel, then aggregates their findings into a single verified report. What would take a single agent hours takes minutes when 50 agents work simultaneously. The aggregate step is not just concatenation: Claude synthesises and deduplicates before surfacing results.
Adversarial review. A separate agent, one that did not do the original work, reviews the findings before they are reported. This directly addresses the self-preferential bias problem. The reviewer does not know it is checking a peer's output. It sees only the work product and the task criteria.
Tournament. Multiple agents produce independent solutions to the same problem, then a judging agent picks the best. This is used for naming, design decisions, or any task where you want genuine independent perspectives rather than one attempt iterated upon. You get diversity of approach, not diversity of phrasing.
Predefined. Generic. Fixed.
Written live. Custom. Resumable.
What This Means for a Business
Dan Shipper at Every.to reviewed Opus 4.8 and said Anthropic should have rounded up to 5. The model is that significant a step. The combination of improved honesty (4x less likely to let errors pass) and dynamic workflows changes what a small team can actually accomplish.
The previous ceiling for AI deployment was roughly: one agent, one context window, one task at a time. The new ceiling is: describe the goal, let Claude design the factory, run hundreds of workers in parallel, get a cross-checked result.
For a company deploying this in production, that shift is not incremental. Consider what it unlocks:
- A compliance audit that previously required a specialist team to manually review hundreds of documents can be orchestrated as a workflow: parallel review, adversarial cross-check, synthesised findings.
- A legacy codebase migration, the kind that stalls engineering teams for months, can be broken into parallel tracks and executed in days, with consistency verification built in.
- A research project that needs dozens of sources cross-checked can produce a verified report rather than a single agent's best guess: different agents, independent reads, one synthesised output.
The bottleneck shifts. It is no longer "can the model handle this task?" It is "do we have the right data access, permission layers, and architecture to feed the workflow what it needs?" That is a systems question, not a model question.
"The workflow needs to be redesigned for agents, not for people. Half your data state is not even ready." — Aaron Levie, Box CEO
Levie said this about agentic deployment generally, but it is even more true now. Dynamic workflows amplify the gap between companies that have their data architecture ready and companies that do not. A workflow that cannot reach your live systems is just an expensive demo.
Dynamic workflows require the right architecture to run safely in production.
Book a free Diagnostic: 30–45 minutes, no deck, no pitch. It maps which workflows make sense for your company right now: what the data access looks like, what the permission layer needs, and what the first workflow deployment should be.
Book the Diagnostic →