Two product teams. Same model. Same prompts, roughly. One ships workflows that hold up in production. The other ships demos that fall apart the moment a real user touches them.

The difference is not the model. Both teams have access to the same frontier APIs. The difference is not prompt engineering, either. Both teams have people who know how to write a system prompt.

The difference is one team can tell when the output is good. The other one cannot.

That sounds soft. It is not. It is the sharpest technical divide in any AI-native org right now.

Vibe Coding vs. Agentic Engineering

Andrej Karpathy coined "vibe coding" earlier this year. He describes it as the mode where the chunks just come out right and you stop correcting them. You're in flow with the model. The output is good enough and you ship it.

But at Sequoia's AI Ascent in April, he drew a harder line. Vibe coding gets you started. Agentic engineering is what actually ships production.

The distinction matters. Vibe coding is a mood. Agentic engineering is a discipline. It requires context management: knowing what to put in the window, what to leave out, and when the window has degraded. It requires verifiability: being able to tell whether the model has done the right thing, not just a plausible thing. And it requires judgment about when to trust the output and when to stop and redirect.

Most teams are stuck in vibe coding and calling it agentic work. The output ships. Then it breaks in production. Then nobody knows why.

The Jagged Skills Problem

Here is the thing that catches even experienced builders off guard. Models do not fail gradually or predictably. They fail sideways.

A model can write a sophisticated multi-step reasoning chain for one task and hallucinate a function signature on the adjacent task without any warning that it has left its zone of competence. Karpathy calls this the jagged skills problem. The capability surface is not smooth. It has spikes and cliffs.

This is not a bug you can patch. It is structural. The model is what Karpathy describes as "jagged, statistical, summoned." Not a tool you wield with consistent results. Something you direct with constant awareness of where the edge is.

"You can outsource your thinking but never your understanding." — Andrej Karpathy, AI Ascent 2026

If you do not know the failure modes, you will not catch them before they reach production. A code review checklist will not save you. Neither will running evals against a benchmark. You need someone who has seen the model fail in this specific way on tasks like this one, and who recognizes the signature before it ships.

That recognition is taste.

What Taste Actually Means

Dan Shipper at Every put it well: when models raise the floor for everyone, taste determines the ceiling.

The floor is real. A junior analyst with a good prompt can now produce a market scan that would have taken a senior analyst two days. A solo developer can prototype a feature in an afternoon. The floor has moved up for everyone.

But the ceiling is where the competition actually happens. And the ceiling is set by the people who can look at the output and know, quickly and with confidence, whether it is right.

This is not about skepticism. Skeptics slow everything down by second-guessing every output. Taste is faster than that. It is pattern recognition built from enough reps with the model that you have internalized where it tends to go wrong. You can scan a 400-line code output and spot the hallucinated import in 30 seconds. You can read a drafted strategy memo and notice that the model has correctly structured it but subtly shifted the framing to something adjacent and wrong.

30 sec
Taste scans 400 lines of output and spots the hallucinated import in 30 seconds. Not skepticism. Pattern recognition built from enough reps to know where the model goes wrong.

Taste is the ability to recognize what is different from the flood of close-but-not-right output AI now produces. It is a technical skill. It compounds with use. And it is unevenly distributed across teams right now.

What This Means for Hiring

The interview question most companies are still asking is: "Can you use AI tools?" That is the wrong question. Everyone can use AI tools. The model is available to anyone with a credit card.

The right question is harder to screen for: Do they know when the output is good?

You can test for this. Give a candidate a set of AI-generated outputs across a range of tasks. Some are correct. Some are plausible but wrong. Some are subtly off in ways that matter. Ask them to sort and explain. The ones with taste will move fast and articulate exactly what tipped them off. The ones without will defer to the model or hedge with "it depends."

You are looking for people who have internalized the failure modes. Who have run enough reps to have a feel for where the model loses the thread. Who can redirect without starting over.

In practice, this tends to be people who have been building with AI in production, not in demos. People who have felt the cost of a missed failure mode. Demos never catch the edge cases. Production always does.

Taste Is a Team Property Too

One person with great taste can only cover so much surface area. The real leverage is when taste is distributed across a team and the team has shared calibration.

This is what makes the difference between an AI team and a team that uses AI. An AI team has shared standards. Everyone knows what good output looks like for the workflows they own. They have seen the failure modes together, debriefed on them, and adjusted. The judgment is not trapped in one person's head.

Getting there requires deliberate work. It means reviewing AI outputs as a team, not just shipping them. It means naming the failure modes when you catch them. It means building a shared vocabulary for what "close but wrong" looks like in your specific domain.

None of this is complicated. Most teams skip it entirely.

The New Technical Divide

Karpathy described feeling "more behind as a programmer than ever before" at AI Ascent. Not because he is slower. Because the paradigm shifted. Software 3.0, as he frames it, means the LLM is the interpreter and your context is the program. The technical skill is not writing code. It is knowing what to put in the window and recognizing when what comes out of it is right.

That is a different skill from what most senior engineers, product managers, and operators were trained on. And most teams have not reckoned with the gap yet.

AI raises the floor for everyone. Models are good enough now that almost any team can produce something that looks like an AI workflow. The gap between a demo that runs once and a workflow that holds up in production is not model capability. It is the judgment of the people directing it.

Taste determines who rises above the floor. That is not a soft skill. It is the most technical skill in an AI-native org.

Find out if your team has agentic taste.

Bring it to a free Diagnostic. 30–45 minutes, one conversation. We'll look at where your AI direction is bottlenecked by taste, not tooling, and what to do about it.

Book the Diagnostic →
Sources
1Andrej Karpathy and Stephanie Zhan, "From Vibe Coding to Agentic Engineering", AI Ascent 2026, Sequoia Capital, April 2026. On Software 3.0, jagged model skills, and taste as technical judgment.
2Dan Shipper, "After Automation", Every.to, May 2026. On AI commoditizing expertise residue and raising demand for judgment.
John Tan
John Tan

Fractional Chief of AI at nativefirst.ai. Former YC CEO (Depict). Embeds with scaling founders and CEOs to ship Level-3 agents and AI workflows in production.