Operator notes

What it takes to
actually transform with AI.

Field notes on AI deployment, Level-3 agents, MCP servers, and the gap between wanting AI and running it in production.

All Fable 5 Benchmarks Models Strategy Agents Deploy EU
Start here — The Guide
Working with Claude Fable 5: The Operator Guide
Eight posts, one path: the role shift, the workflow mechanics, the async operating model, the economics.
Read the guide →
No posts match. Try a different term.
Latest
All posts

Uber Burned Its AI Budget. Microsoft Cancelled the Licenses. Same Problem.

Two of the biggest enterprise AI deployments in 2025 hit the same wall: access without architecture. The companies that get AI right are not spending less. They know where the money goes.

Read article →

Claude Fable 5 Reactions: The Day 1 Roundup

Every notable launch-day reaction with receipts: Karpathy, Mollick, Willison, the eval data, the safety fight, and the question nobody can answer yet.

Read article →

Claude Fable 5 Didn't Replace You. It Promoted You.

The Claude Code team put it plainly after Fable 5 launched: we used to verify that Claude did the work right. Now we verify it is doing the right work. That is not a threat. That is a promotion.

Read article →

Claude Fable 5: Give It Goals, Not Tasks

Most teams are running Claude like a task runner. Fable 5 is designed for goals. The difference is not just workflow. It is the gap between Level 2 and Level 3.

Read article →

Claude Fable 5: Stop Briefing Your AI. Start Interviewing It.

Most teams dump a brief into Claude and wait for output. The Anthropic team changed how they work with Fable 5: ask Claude to interview you first. Here is why that changes everything.

Read article →

Claude Fable 5: Context, Not Constraints

"Keep it simple" is a constraint. "This feature might be deleted in a month" is context. The Anthropic team's Fable 5 insight: context lets Claude catch things you did not think of. Constraints just limit it.

Read article →

Claude Fable 5 Runs for Hours. Stop Watching Every Step.

Claude Fable 5 can run autonomously for hours, test its own work, and produce better output than human reviewers. Most teams are still watching every step. That is not safety. It is a bottleneck.

Read article →

The Work That Can't Be Trained Away with AI

Claude Fable 5 just landed. Models keep getting smarter. And yet there is a category of enterprise work that gets more valuable, not less, as models improve. Private context. Permission. Accountability.

Read article →

For Every Dollar of Software, Six Dollars of Services

For every dollar spent on software, companies spend six on services. AI does not eliminate that six dollars. With Fable 5, it lets smart operators capture both sides. Here is how the math changes.

Read article →

SWE-bench Update (June 2026): Fable 5 Tops a Dying Leaderboard

Claude Fable 5 hits 80.3% on SWE-bench Pro while OpenAI kills Verified for contamination and FrontierCode resets every frontier model below 30%.

Read article →

GDPval Update (June 2026): The Benchmark That Actually Matters

Claude Fable 5 leads the GDPval-AA leaderboard at 1932 Elo. What expert parity on real deliverables means, and the perfect-brief catch in the fine print.

Read article →

What Is the Artificial Analysis Intelligence Index?

The industry's most-cited single number for model capability: what the ten component evals measure, why v4 cut the top score from 73 to 50, and what a composite hides.

Read article →

What Is SWE-bench? The AI Coding Benchmark, Explained

How the AI coding benchmark works, why SWE-bench Verified died, what SWE-bench Pro and FrontierCode actually measure, and how to read a score.

Read article →

What Is GDPval? The AI Benchmark for Real Work, Explained

OpenAI benchmark for AI on real economic deliverables across 44 occupations. How it is graded, what expert parity means, and what the scores hide.

Read article →

Data Isn't the Moat Anymore

The CRM owned thirty years of enterprise value by owning the database. The orchestration layer is the new gravity well. Switching costs migrate to accumulated reasoning.

Read article →

The Three-Act Playbook Is Dead

Wedge, suite, platform used to take ten years. AI collapsed it to eighteen months. Cursor replaced VS Code at seed stage. Ambition beats timing now.

Read article →

Your Codebase Is Fighting Your AI

540,000 lines of code plus 276,000 lines of tests equals a cage built for a model that no longer needs one. The economics flipped. Most codebases didn't.

Read article →

More Automation Creates More Human Work

AI raises the floor and floods the zone with close-but-not-right output. Demand for expert judgment goes up, not down. The paradox every scaling company hits.

Read article →

Taste Is the New Technical Skill

You can outsource your thinking but never your understanding. Karpathy's agentic engineering thesis, and why taste is recognizing failure before it ships.

Read article →

Claude Opus 4.8 and Dynamic Workflows: What Changes When AI Can Spawn 100 Agents

Claude Opus 4.8 introduced dynamic workflows — Claude writes its own orchestration script, then runs hundreds of agents in parallel for migrations, audits, and tasks too large for any single conversation.

Read article →

OpenAI Codex 5.5: Not Just for Coders. An OS for Knowledge Work.

Codex is named after its coding origins but it has become something broader: a tool-using agentic workspace powered by GPT 5.5 that handles email, research, writing, planning, and operations alongside code.

Read article →

Company Structures Are Based on the Roman Empire. AI Is About to Break That.

The Roman legion was the best management technology of its time. Most companies today are organised the same way: humans as conduit for information at every layer. AI removes the need for the conduit. Here is what changes.

Read article →

The Middle Manager Isn't Being Replaced. The Role Is.

Cloudflare laid off 20% of its workforce while growing at 30%. The people let go were not underperformers. They were measurers — people whose primary work was moving information between layers that could not talk directly.

Read article →

The Difference Between AI Adoption and AI Transformation

AI adoption gives people better tools. The company stays the same. AI transformation redesigns the company around what AI makes possible. Most companies are doing the first and calling it the second.

Read article →

What Does a Company Built Around Intelligence Actually Look Like?

Not a theory. YC, Browserbase, Airtable, Every.to. Real companies doing this right now. Here is what it looks like in practice — the systems, the structure, and what it produces.

Read article →

Why McKinsey Can't Make You AI-Native (And What Can)

McKinsey samples your organisation, delivers a roadmap, and exits. AI transformation touches every function, every workflow, every role. You cannot sample your way to a transformation. Here is why the method has to change.

Read article →

Information Used to Need People to Move It. Now It Doesn't.

Every layer in your company exists because information needed a human to carry it. Meetings. Reports. Middle management. That constraint is lifting. Here is what changes when information moves itself.

Read article →

What Is an Agent Teammate? (And Why It's Not Just a Better Tool)

A tool does what you ask, then stops. An agent teammate takes ownership of a task, makes decisions within defined boundaries, and reports back. Here is the difference — and why it matters for your company.

Read article →

What Is an Agent Operating System? Your Company Needs One.

When you run multiple AI agents, they each start from scratch. They do not know what the others know. They do not follow the same rules. An Agent OS fixes this. Here is what it is and why it matters.

Read article →

AI Models Are Ready. Your Company Isn't.

OpenAI benchmarked AI on real professional tasks across 44 occupations. The models are approaching expert quality. The three things that unlock that performance are context, scaffolding, and oversight. Your company has none of them.

Read article →

Your AI Doesn't Know How Your Company Actually Works. Yet.

Enterprise AI projects fail because they're built on the org-chart version of your company. The agent needs the real one. That version only exists in the field.

Read article →

7 Functions to Deploy AI First. Ranked by Payback Speed.

Most founders ask where to start with AI. The wrong first function wastes 3–6 months. Here's the ranked list: seven functions, ordered by payback speed, deployment difficulty, and compliance overhead.

Read article →

Anthropic Writes 90% of Its Code With AI. Here's What That Actually Takes.

Anthropic says 90% of its code is AI-written. Google says 75%. A founder built 1,000+ PRs with no engineering team. Here's what a software factory actually is — and why most companies are nowhere close.

Read article →

AI Tools Won't Transform Your Company. Redesigning Around AI Will.

Early factories replaced steam engines with electric motors and kept the same floor plan. Marginal gains. The ones that redesigned around electricity got 10x. Most companies are making the same mistake with AI.

Read article →

Your CRM Is Becoming AI Infrastructure

For 30 years, the CRM was where enterprise value lived. AI agents don't need the UI. They need structured data at the API layer. The value is moving — and the window to position above it is open.

Read article →

SWE-bench Update (May 2026): Opus 4.8 Takes the Lead

Claude Opus 4.8 hits 69.2% on SWE-bench Pro, open-weights models close within 6 points at 8x lower cost, and Verified becomes a zombie metric.

Read article →

GDPval Update (May 2026): The Leaderboard Reshuffles

Opus 4.8 takes the lead at 1890 Elo, Grok 4.3 jumps 321 points, and Gemini 3.5 Flash beats Google's own Pro tier on real work.

Read article →

Your AI Pilot Isn't Stuck in Procurement. It's Stuck in Open-Loop.

Most AI pilots fail for one reason: the workflow they're trying to automate was never instrumented. No machine-readable artifacts, no queryable state, no closed loop. You cannot automate what you cannot observe.

Read article →

Stop Hiring a Head of AI. Here's What You Actually Need.

76% of organizations now have a Chief AI Officer. Most haven't shipped a single agent to production. The hire who will get AI into your systems in week one is not the hire who needs six months to understand your company.

Read article →

The EU AI Act Deadline Is August 2026. Most Scaling Companies Haven't Started.

On August 2, GPAI enforcement goes live and high-risk AI system obligations activate. Most B2B SaaS internal agents are limited-risk — but one category catches almost every founder off guard.

Read article →

Why the Next Model Won't Fix Your AI Deployment Problem

GPT-4 became GPT-4o, o1, o3, 4.1. Claude 3 became 3.5, 3.7, 4. The models kept improving. The workflows never got built. The gap isn't capability — it's assembly.

Read article →

3 Waves of AI. Most Companies Are Still in the First.

ChatGPT made AI accessible. Vibe coding made it fast. Agentic engineering makes it useful. Most companies are still in wave 1. Here is what wave 3 actually looks like — and what it takes to get there.

Read article →

What Is a Level-3 AI Agent? (And Why It's the Only Kind Worth Building)

Most companies think they're deploying AI. They're running Level-1 tools at best. Here's the full capability spectrum and what it takes to reach Level-3 — where AI closes operational loops without human intervention.

Read article →

The Operator Gap: Why AI Deployment Fails After the Demo

The models are good. The APIs are accessible. So why isn't your AI pilot in production? The blockers are data access, permission architecture, and the absence of someone who owns the outcome after handoff.

Read article →

What Is an MCP Server and Why Does Every AI Deployment Need One?

Model Context Protocol is the infrastructure layer that connects AI agents to your live internal systems. Without it, agents are isolated from the data that makes them useful. Here's what it is and how it works.

Read article →

On-Prem AI for European Companies: What You Actually Need to Know

GDPR and data residency aren't the blocker most people assume — if you architect for them from the start. A practical guide to on-prem AI for European scaling companies, including why Claude and Codex beat open-weight models for most use cases.

Read article →

SWE-bench Update (April 2026): The Month the Benchmark Broke

Berkeley researchers break 8 agent benchmarks with a 10-line exploit, Mythos Preview exposes the Verified-vs-Pro gap, and GPT-5.5 lands at 58.6%.

Read article →

GDPval Update (April 2026): GPT-5.5 Sets the Bar

GPT-5.5 launches at 84.9% expert parity, economists start writing about AI eating analyst work, and Grok 4.3 enters beta.

Read article →

SWE-bench Update (March 2026): GPT-5.4 Takes Pro

GPT-5.4 leads the standardized SWE-bench Pro set at 59.1%. Post-Verified, the honest-low scores show where deployment work actually lives.

Read article →

GDPval Update (March 2026): GPT-5.4 Crowds the Top

GPT-5.4 moves to the top of GDPval-AA at 1674 Elo with three labs within 70 points. The differentiator shifts to price, context, and your workflows.

Read article →

SWE-bench Update (February 2026): The Month Verified Died

OpenAI deprecates SWE-bench Verified after models reproduce gold patches from task IDs alone. The 80% cluster made it meaningless anyway.

Read article →

GDPval Update (February 2026): Anthropic Takes Both Top Slots

Opus 4.6 retakes #1 at 1606 Elo, then Sonnet 4.6 tops it at 1633 for $3/$15. Gemini 3.1 Pro proves exam brilliance does not transfer to deliverables.

Read article →

AI Benchmarks Update (January 2026): The Index Overhaul

Artificial Analysis rebuilds its Intelligence Index around work-shaped evals. The top score falls from 73 to 50. The models did not get worse.

Read article →

GDPval Update (December 2025): The Leaderboard Arrives

GPT-5.2 hits 70.9% win/tie against professionals and Artificial Analysis launches independent Elo grading. Vendors stop marking their own homework.

Read article →

SWE-bench Update (November 2025): Opus 4.5 Breaks 80

Four frontier releases in twelve days. Claude Opus 4.5 becomes the first model over 80% on Verified, and the 35-point Pro spread is the warning.

Read article →

Join the waitlist.
AI is moving fast.

Chat with me