- The capability jump is real, and it is biggest on long-horizon autonomous work: Mollick's 9.5-hour build, Willison's "several days of work" in 5.5 hours.
- It is not uniformly better. CodeRabbit measured code-review precision and Fable 5 regressed against Opus 4.8 (32.8% vs 35.5%).
- The real story is the access architecture. Fallback routing, silent steering vectors, and gated Mythos access made "Which version did I actually get?" the defining day-1 question.
Anthropic shipped Claude Fable 5 on June 9: the public, safeguarded version of its Mythos-class model, announced alongside a restricted full-capability tier. Same weights as Mythos, plus classifiers, monitoring, and fallback routing to Opus 4.8. Priced at $10 per million input tokens and $50 out, double Opus, free on paid plans until June 22.
Within 24 hours every camp had filed: the researchers, the builders, the safety critics, the eval shops, the business press. This is the full map, with receipts.
The Believers
Andrej Karpathy called it "major-version-bump-deserving" and described the unlock in operator terms: give it more ambitious tasks, the model "gets it" and will just go. From the person who spends most launch days deflating launch days, that registered.
Ethan Mollick's "What it feels like to work with Mythos" supplied the most-quoted line of the day:
"I just asked for something and it happened. And also unnerving because I just asked for something and it happened."
His new job description: "I am closer to a patron. I describe what I want, I pay for it, and I judge the result." The receipts behind the vibes: an interactive isochrone travel map where sub-agents researched more than 2,200 flights plus rail schedules and per-country road speeds, and "Concord," a calibration tool the model built over 9.5 autonomous hours from its own 19-page design doc, with adversarial agent groups testing each other's results.
Simon Willison ran it for 5.5 hours, called it "something of a beast," and estimated it produced several days' worth of his work at roughly $110 a day in tokens. Boris Cherny vouched for it from inside the Claude Code trenches. Nat McAleese, an OpenAI researcher, reportedly said he has barely written a line of code since getting Fable access; the endorsement crossed lab lines, though the exact wording is unverified.
Then the contested ones. Stripe's reported 50-million-line migration became the most argued-about claim of the day, with engineers debating how much was mechanical transformation versus judgment. Victor Taelin reported a 1770% speedup on his HVM evaluator and called it a "personal singularity"; self-reported, not independently audited. The hardest number to dismiss came via Platformer: Firefox went from 76 bug fixes in March to 423 in April with Mythos Preview partner access.
The Eval Data
The independent numbers mostly back the believers. Every's week-long test scored Fable 5 at 91/100 on their Senior Engineer benchmark, against 63 for Opus 4.8 and 62 for GPT-5.5. That is not a margin, that is a different bracket. Artificial Analysis put it at #1 of 374 models on day 1, and the config detail matters: they tested the shipping product, fallback routing and all, not the raw Mythos numbers Anthropic reports.
Now the contrarian data, because it exists and it matters. CodeRabbit measured code-review precision and Fable 5 regressed: 32.8% versus Opus 4.8's 35.5%. Claire Vo's full review found it "conservative on execution" and token-intensive by design, strong on structured design work, limited in practice by its own caution. BeInCrypto's niche trading eval found it picked the right hero metrics but misjudged magnitudes badly.
The takeaway: the jump is task-shaped. Long-horizon autonomous work, the multi-hour agentic builds, is where Fable 5 separates. Tight verification loops like code review are flat or worse. If your workflow is short and precise, your benchmark result will not look like Mollick's.
The Skeptics
Nathan Lambert filed the sharpest critique, framing the launch as "power politics" and landing this line:
"An AI model that gets less intelligent automatically without notifying me is categorically misaligned."
What he is pointing at is the fine print. Fable 5 ships with classifiers, fallback routing to Opus 4.8, and steering vectors that silently degrade output on roughly 0.03% of traffic flagged as sensitive, a behavior surfaced by the researcher Hangsiin. Community testers also reported anomalies suggesting the model behaved differently in incognito sessions, which fed the distrust. Biologists found themselves blocked on basic cancer-research terminology, and researchers coined "camouflage-driven development" for prompts written to look mediocre and dodge classifiers.
The access tier drew its own fire. The full-capability Mythos 5 sits behind Project Glasswing vetting, which Mark Saroufim answered with a proposed reciprocity license, and which the most-cited r/ClaudeAI post called "less like a model launch and more like a preview of AI inequality." Even Karpathy, firmly a believer on capability, called the safeguards "too trigger happy."
The business press added the money angle. Sherwood News noted Anthropic kneecapped the dangerous functions yet kept 2x pricing, days after confidentially filing its IPO prospectus at a reported $965B valuation on a $47B revenue run rate. Sherwood also claims OpenAI filed for its own IPO the same day; that claim is unverified.
The Question That Defines Day 1
The Neuron closed its explainer with the frame that stuck: a model powerful enough to act for hours, risky enough to gate, and "complicated enough that the main question becomes, 'Which version did I actually get?'" That is new. Every prior frontier launch argued about whether the model was good. This one argued, in equal measure, about which model you were actually talking to: Fable, Fable-degraded, or the Opus fallback. The access architecture got equal billing with the capability for the first time.
The carefulest voices held fire. Zvi Mowshowitz explicitly deferred judgment until he has days with it, not hours. Ben Thompson's Stratechery take sits behind the paywall, but the public tease says plenty: very capable, and "some troubling new precedents."
What an Operator Does With This
Ignore the discourse. Three moves. First, run the three workflow changes from the operator guide; they hold regardless of which camp wins the argument. Second, benchmark on your own backlog, not the leaderboards; the public ones are dying anyway, and the CodeRabbit result proves the jump does not transfer evenly. Third, treat the access architecture as a deployment requirement, not an outrage: design around fallback routing and classifier triggers the same way you design around rate limits, and decide before June 22 whether the 2x pricing earns its keep on your tasks.
Discourse over. Ship.
Skip the discourse. Ship something.
The Diagnostic is free: 30–45 minutes. We'll find the first Fable 5 workflow worth running in your company.
Book the Diagnostic →