Key takeaways
  • Claude Opus 4.8 writes its own orchestration scripts and runs hundreds of agents in parallel. You describe the goal; Claude designs the workflow, dispatches the workers, and cross-checks the results.
  • Opus 4.8 is roughly 4x less likely than 4.7 to let code flaws pass unflagged, scores 84% on Online-Mind2Web, and leads the Legal Agent Benchmark. Fast mode runs at 2.5x speed and 3x lower cost; base pricing stays at $5 input and $25 output per million tokens.
  • Dynamic workflows fix three documented failure modes: agentic laziness, self-preferential bias, and goal drift. The plan moves out of the context window and into a script the runtime executes.

Describe the goal. Claude designs the factory. That is the real story of Claude Opus 4.8 and dynamic workflows: tell Claude "refactor all 500 files in this codebase to use the new auth pattern" or "scan every API endpoint for security vulnerabilities," and it writes the plan, splits the work, dispatches the workers, cross-checks the results, and brings you a verified answer.

On the surface, Anthropic's release looked like a developer feature: Claude can now write orchestration scripts that run hundreds of agents in parallel. What it actually represents is bigger. For the first time, a model is intelligent enough to design its own work structure on the fly instead of relying on a human to predefine how the task should be broken down.

What Is New in Opus 4.8

Before getting to dynamic workflows, it is worth understanding the model underneath them. Opus 4.8 is a meaningful step up from 4.7 across several dimensions that matter in production.

Benchmark improvements. Opus 4.8 now holds the highest score on the Legal Agent Benchmark. It is the first model to break 10% on the all-pass standard, the hardest version of that test. It scores 84% on Online-Mind2Web, a browser automation benchmark built on real production tasks rather than synthetic setups. That is a meaningful jump from 4.7. Scores on Terminal-Bench 2.1 for coding tasks also improved.

Honesty and reliability. The reliability improvement is arguably more important than the benchmark numbers. Opus 4.8 is approximately 4x less likely than 4.7 to let code flaws pass without flagging them. It proactively surfaces uncertainties rather than producing confident-sounding answers that turn out to be wrong. Lower rates of misaligned behaviour across safety benchmarks as well.

Speed and cost. Fast mode operates at 2.5x speed and costs 3x less than previous fast modes. Pricing for the base model is unchanged: $5 per million input tokens, $25 per million output tokens.

Opus 4.8 by the numbers
4x more honest
4x less likely than 4.7 to let code flaws pass without flagging them. Proactively surfaces uncertainties.
84% browser automation
Online-Mind2Web score. A meaningful jump. Real production browser tasks, not synthetic benchmarks.
2.5x faster
Fast mode. At 3x lower cost than previous fast mode. Same quality, meaningfully lower inference spend.
Legal Agent #1
First model to break 10% on the all-pass standard for Legal Agent Benchmark. Sets a new category high.

The Problem Dynamic Workflows Solve

To understand why dynamic workflows matter, you first need to understand why single-agent approaches fall apart on large tasks. This is not a hardware problem or a cost problem. It is a structural problem, and it shows up in three specific failure modes documented in the official Anthropic release notes.

Failure mode 1
Agentic laziness

Claude stops before finishing a complex multi-part task and declares the job done after partial progress. It reviewed 20 of the 50 security issues and reported back. The other 30 were never touched.

Failure mode 2
Self-preferential bias

Claude prefers its own results when asked to verify them. If you ask the same instance that did the work to check the work, it tends to agree with itself, even when it was wrong. The review is not independent.

Failure mode 3
Goal drift

Over a very long session, the original objective gets compressed and distorted as the context window fills. Edge-case requirements, constraints, and specific formatting rules gradually disappear from working memory.

These are not edge cases. They are the predictable outcome of asking one agent to hold a large plan in its context window while simultaneously executing it. The plan and the execution compete for the same limited space.

Dynamic workflows solve all three by moving the plan out of Claude's context window and into a JavaScript script that the runtime executes. Different agents are assigned different subtasks, each with their own context window. An adversarial agent can be assigned to check another agent's work without knowing it is reviewing a peer. Goal drift is impossible when the goal is in the script, not the context.

How Dynamic Workflows Actually Work

A dynamic workflow is a JavaScript script that Claude writes on the fly for your specific task. You do not write the script. You describe what you want. Claude reads the task, designs the orchestration, writes the script, and then the runtime executes it, running dozens or hundreds of agents in parallel while your session stays free.

The key difference from how AI agents worked before:

Old approach Dynamic workflow
Claude holds the plan in its context Plan lives in the script
All results land in Claude's context window Results stored in script variables
Scale: a few delegated tasks per turn Scale: dozens to hundreds of agents per run
If interrupted, restarts from scratch Resumable from where it stopped

To trigger a dynamic workflow, you include the word "workflow" in your prompt. Claude detects it, writes the script, and begins execution. You can watch it run using /workflows: each phase, agent count, token spend, and elapsed time.

For maximum effort mode, set /effort ultracode. Claude then decides on its own when a task warrants a full workflow rather than single-agent execution.

What You Can Actually Do With This

This is where it gets concrete. The following examples are drawn directly from the Anthropic documentation and early user reports, not cherry-picked demos.

Dynamic workflow examples

What you can describe and Claude will execute

  • "Run a workflow to audit every API endpoint for missing auth checks." Claude reviews the entire codebase in parallel, not sequentially.
  • "Use a workflow to rename our User model to Account everywhere." A 500-file migration, cross-checked for consistency.
  • "Go through my last 50 sessions and mine them for corrections I keep making, turn recurring ones into CLAUDE.md rules." Learns from your own behaviour across sessions.
  • "Go through #incidents in Slack for the past 6 months and find recurring root causes where nobody has filed a ticket." Cross-functional signal mining.
  • "Take my business plan and run a workflow where different agents tear it apart from an investor's, a customer's, and a competitor's perspective." Adversarial review from independent angles.
  • "Here's a folder of 80 resumes. Use a workflow to rank them for the backend role and double-check the top ten." Structured evaluation with verification.

The Patterns Claude Uses to Build Workflows

Claude does not invent a new structure for every task. It draws on a small set of orchestration patterns, choosing the one that fits. Understanding these patterns helps you describe tasks in ways that produce better workflows.

Classify-and-act. Claude first classifies items (which files need migration, which tickets are security-related, which resumes meet the criteria) and then sends only the relevant items to the appropriate agents. This avoids wasting compute on work that does not need it. Think of it as triage before execution.

Fan-out and aggregate. Claude fans out the same task to dozens of agents working in parallel, then aggregates their findings into a single verified report. What would take a single agent hours takes minutes when 50 agents work simultaneously. The aggregate step is not just concatenation: Claude synthesises and deduplicates before surfacing results.

Adversarial review. A separate agent, one that did not do the original work, reviews the findings before they are reported. This directly addresses the self-preferential bias problem. The reviewer does not know it is checking a peer's output. It sees only the work product and the task criteria.

Tournament. Multiple agents produce independent solutions to the same problem, then a judging agent picks the best. This is used for naming, design decisions, or any task where you want genuine independent perspectives rather than one attempt iterated upon. You get diversity of approach, not diversity of phrasing.

Static workflow (old approach)

Predefined. Generic. Fixed.

Structure
You define the structure in advance
Best for
Well-specified, repeatable tasks
Design
Generic. Built to handle all edge cases
Adaptability
Cannot adapt to the specific shape of your task
If interrupted
Rigid once written
Dynamic workflow (Opus 4.8)

Written live. Custom. Resumable.

Structure
Claude writes the structure for your specific task on the fly
Best for
Novel, complex, large-scale tasks
Design
Custom-built. Optimised for your exact use case
Adaptability
Adapts to what it discovers as it works
If interrupted
Resumable from where it stopped

What This Means for a Business

Dan Shipper at Every.to reviewed Opus 4.8 and said Anthropic should have rounded up to 5. The model is that significant a step. The combination of improved honesty (4x less likely to let errors pass) and dynamic workflows changes what a small team can actually accomplish.

The previous ceiling for AI deployment was roughly: one agent, one context window, one task at a time. The new ceiling is: describe the goal, let Claude design the factory, run hundreds of workers in parallel, get a cross-checked result.

For a company deploying this in production, that shift is not incremental. Consider what it unlocks:

The bottleneck shifts. It is no longer "can the model handle this task?" It is "do we have the right data access, permission layers, and architecture to feed the workflow what it needs?" That is a systems question, not a model question.

"The workflow needs to be redesigned for agents, not for people. Half your data state is not even ready." — Aaron Levie, Box CEO

Levie said this about agentic deployment generally, but it is even more true now. Dynamic workflows amplify the gap between companies that have their data architecture ready and companies that do not. A workflow that cannot reach your live systems is just an expensive demo.

Dynamic workflows require the right architecture to run safely in production.

Book a free Diagnostic: 30–45 minutes, no deck, no pitch. It maps which workflows make sense for your company right now: what the data access looks like, what the permission layer needs, and what the first workflow deployment should be.

Book the Diagnostic →
Sources
1Anthropic, Claude Opus 4.8 release announcement, June 2026. Benchmarks, pricing, and capability overview.
2Anthropic, Orchestrate subagents at scale with dynamic workflows, Claude Code documentation. Full technical reference for dynamic workflow patterns.
3@trq212, A harness for every task: dynamic workflows in Claude Code, X, June 2026. Early user patterns and workflow failure modes.
4@ClaudeDevs, Dynamic workflows launch announcement, X, May 2026.
5Dan Shipper, "Vibe Check: Opus 4.8 — Anthropic Should've Rounded Up to 5", Every.to, June 2026. Review of Opus 4.8 capabilities.
6Steve Yegge, The Anthropic Hive Mind, Medium, February 2026. On Anthropic's culture and pace of development.
John Tan
John Tan

Fractional AI & Product Founder at nativefirst.ai. Ex-CEO, Depict (Y Combinator). Embeds on-site with scaling founders and CEOs to ship Level-3 agents and AI workflows in production.