- AI adoption has come in three waves: AI as interface, vibe coding, and agentic engineering. Most companies are still in Wave 1, getting 20 to 30% faster on discrete tasks while the company stays unchanged.
- Wave 3 results come from architecture, not model upgrades. Anthropic writes 90%+ of its production code with AI and Delivery Hero hit 85% ticket success with zero developer input, both built on permission layers, feedback loops, and exception design.
- Upgrading the model does not upgrade the wave. Running Claude Opus 4.8 in a Wave 1 workflow is a jet engine in a tractor; the constraint is the wave, not the model.
In April 2026, Andrej Karpathy stood on stage at Sequoia's AI Ascent and described a specific moment. He had been using agentic coding tools for months. Then in December, the model outputs stopped needing correction. He kept asking for more. He stopped reviewing. "I was vibe coding." He said he tried to stress this on X: most people experienced AI as a ChatGPT-adjacent thing in 2023, but you had to look again in December because things changed fundamentally. Most people didn't look again.
That failure to look again is where most companies are still stuck. They adopted AI in 2023. It helped. Individual tasks got faster. Nothing structural changed. They are still there.
There have been three distinct waves. Most companies are in the first. The companies winning right now are operating in the third. The gap is not a model gap. It is a wave gap.
This framework draws on Andrej Karpathy's Software 1.0 / 2.0 / 3.0 taxonomy and his vibe coding vs. agentic engineering distinction. The three stages are the article's synthesis, not a formal industry taxonomy, but they map closely to what scaling companies actually experience when adopting AI.
Wave 1. AI as interface.
ChatGPT launched in November 2022. Within a year, every knowledge worker had a better search engine, every marketer had a writing assistant, every developer had an autocomplete. Individual productivity measurably improved. The company did not change.
This is the copilot era. Tom Blomfield at YC put it directly: "If you talk to people a year ago about how AI was useful, they talked about productivity, making engineers 20% more productive, adding co-pilots to workflows. But I think that is actually a broken way of thinking about AI. That's like taking the old way of working and adding a more powerful engine onto it."
The copilot is not wrong. It is incomplete. The ceiling for a copilot is the ceiling of the workflow it assists. If the workflow is broken, a faster version of it is still broken. And crucially: it stops working the moment the person closes the tab.
The majority of companies are still here. They use AI personally but have not changed how the company operates. The board deck mentions AI. No production systems are running.
Wave 2. Vibe coding.
Wave 2 is where many technical founders are right now. Andrej Karpathy named this phase vibe coding: prompting the way you'd direct a junior engineer, without designing the architecture underneath. Garry Tan built 540,000 lines of a Rails application this way before realising the mistake. "The 2013 engineer believes one thing in his bones: capability equals lines of code. That belief was correct for decades, until now." The demos are good. The prototypes ship fast. Nothing goes to production and stays there.
Wave 3. Agentic engineering.
Karpathy made the distinction explicit at AI Ascent. Vibe coding is prompting without architecture. Agentic engineering is deliberate: designed step boundaries, precise tool integrations, exception handling that catches failures before they compound, output validation that runs before downstream action. The model capability is identical in both cases. The results are not.
This is what @systematicls, one of the more precise practitioners writing about this publicly, calls the first principle of agentic engineering: separate research from implementation. Be precise about what you ask from your agents. Give them exactly the context they need. No more. "What you have is a very talented and smart team member: unless you tell it exactly what to focus on, it is going to keep telling you about all the benefits of having spherical objects."
Wave 3 is not about using AI tools better. It is about designing systems that run without you. Tom Blomfield's self-improving loop: a sensor layer that reads from live systems, a policy layer that defines what the agent handles autonomously, a tool layer that lets it act, a quality gate that catches errors, and a learning mechanism that feeds failures back into the top. "When you can run every step of that loop with minimal human intervention, the system gets better while you sleep."
The architecture is what separates them.
- Vibe coding: Prompt → Output → Hope
- Better vibe coding: Prompt → Output → Manual review
- Agentic engineering: Sensor → Policy → Tool → Validation → Learning → Loop
Companies already operating in Wave 3
This is not hypothetical. The pattern is live in production across a range of companies at different scales.
YC built an agent that monitors every query from every YC employee. When it fails, it identifies why: wrong tool, missing database view, bad index. It writes the fix, opens a pull request, gets it reviewed and merged overnight. The next person to ask the same question gets a better answer. The system improves while everyone sleeps.
Browserbase built a single generalised agent, "bb", that runs across engineering, ops, sales, support, and exec. The 10x output increase did not come from a model upgrade. It came from the credential brokering layer, the permission architecture, the skill system, and the Slack integration. Replace the model, the system keeps running.
Every.to runs six separate products (Cora, Monologue, Proof, Sparkle, Spiral, and Every.to itself) with primarily single-person engineering teams. The system is built on compound engineering: each build makes the next one easier. Bug fixes eliminate categories of future bugs. The codebase gets easier to extend over time, not harder.
Delivery Hero's Herogen system built an AI engineering agent architecture and hit 85% ticket success rate with zero-to-one developer interactions per ticket, far exceeding their initial target. The capability gain came from the architecture: exception design, feedback loops, integration into live systems. When better models shipped, they slotted straight in.
Anthropic doesn't use AI as a productivity tool for individual engineers. More than 90% of its production code is AI-written. Agents build agents. The engineering team designs the systems and reviews the output. This is not a team of 10x engineers using Copilot. It is a factory where humans have moved up a level.
None of these outcomes came from a model upgrade. They came from credential brokering layers, permission architectures, skill systems, feedback loops, and exception-handling logic. The model is one parameter inside a larger constructed system. Wave 3 is the construction.
Which wave is your company in?
- AI used personally by individuals across the org
- Demos that worked in staging never made it to production
- Models upgraded repeatedly; workflows unchanged
- Board reports mention AI transformation; no production systems running
- Agents closing operational loops without human intervention
- Exception paths defined before build week, not discovered in week 3
- Model is one parameter in a running system; upgrades take a day
- The system improves overnight without human input
Be honest with yourself about which column describes your company. The Wave 1 / 2 description is not a failure. It is where most companies are. The question is whether you have a plan to move.
What the transition actually requires.
Not the tools. The engineering approach. Three things distinguish Wave 3 from Wave 2:
1. Precision about task boundaries.
Not "build an auth system." "Implement JWT with bcrypt-12 password hashing, refresh token rotation with 7-day expiry." Wave 2 left the model to figure out the implementation details. Wave 3 defines them. The model fills in the code, not the design.
2. Defined endpoints, not open-ended sessions.
Wave 2 runs until the session times out. Wave 3 has a contract: here are the tests that must pass, here is the output that must be validated, here is the state the system must reach before the job is complete. The agent knows when it is done.
3. Exception design before build day.
What happens when the agent hits an ambiguous case? Who gets the escalation? What is the threshold? Wave 2 discovers these in production. Wave 3 defines them on day one. This is the single biggest predictor of whether a Wave 3 system survives its first month of live operation.
Claude Opus 4.8 is a Wave 3 model.
The current frontier models, Claude Opus 4.8 included, are built for Wave 3 workflows. Extended thinking, tool use, multi-step reasoning, MCP integrations. Running Opus 4.8 in a Wave 1 workflow is like installing a jet engine in a tractor. The power is there. The architecture is wrong.
This is the insight the original framing of this article tried to land, now stated plainly: upgrading the model does not upgrade the wave. Companies in Wave 1 who upgrade to Opus 4.8 will use it as a better search engine. Companies in Wave 3 will use it to run overnight loops that improve the business without human intervention.
The model is not the constraint. The wave is.
The First Build is Wave 3 entry.
The First Build (2 weeks) is not a better copilot. It is an entry into Wave 3: one function, one production agent, one closed loop running without human intervention. The agent has defined step boundaries. Exception paths are mapped before a line is written. The output is validated before it touches any downstream system. It runs on live data, not a staging export.
Two weeks is enough time to ship one Wave 3 system if you start with the right function and the right architecture. The Install is Wave 3 function by function across the whole company: on-site, for as long as the rollout runs.
The Diagnostic maps your wave.
Book a free Diagnostic: 30–45 minutes, no deck, no pitch. It maps which wave your current operations are in and returns a three-point read on your highest-leverage first Wave 3 build and what's currently blocking it.
Book the Diagnostic →