Every few months somebody declares prompt engineering dead. They are half right, every time. The tricks keep dying. The skill keeps moving.

If your prompts still open with "act as a world-class marketer," you are talking to a model from 2023. The models have leveled up three times since then, and each level quietly changed what good communication with them looks like. Here is the arc, with no engineering degree required.

2023: Magic Words

Early models were pattern-completion machines, and the right incantation could transform them. Adding one sentence, "Let's think step by step," lifted a model's score on a math benchmark from 17.7 percent to 78.7 percent.1 One sentence, four times the performance. An entire folklore grew around this: role prompts, fake tips, threats, begging.

The research that followed has a punchline. When scientists tested example chains whose reasoning was logically broken, performance barely dropped: invalid logic kept 80 to 90 percent of the benefit.2 The magic was never the logic. It was the shape. You were not instructing a mind. You were starting a pattern it would complete.

That is why the tricks worked, and why they expired. They compensated for weak models. The compensation became unnecessary.

2024: Context

Then models got good enough that phrasing stopped being the bottleneck. The new bottleneck was what the model knew. It had read the whole internet and precisely none of your company: not your customers, not your pricing logic, not the reason deal X died in March.

So the skill moved from wording to feeding. Custom instructions, uploaded documents, retrieval, projects. The people getting great output were not better writers. They were better librarians. We have a whole post on this gap: your AI doesn't know how your company actually works.

2025: The Brief

Agents arrived, and the unit of communication changed again. You stopped writing clever sentences and started writing job briefs: the goal, the constraints, the success criteria, the tools available. Exactly what you would hand a contractor.

This is the point where managers quietly became better at AI than engineers. The skill had become delegation: goals, not tasks. A precise spec beat a clever phrase every single time.

2026: The Commission

Claude Fable 5 works for hours on a single brief. It asks you clarifying questions before it starts. It spins up its own team of sub-agents, checks its own work, and comes back with something finished. Ethan Mollick, the Wharton professor who has tested every generation of these models, described what that does to your role:

Ethan Mollick
Ethan
Mollick

"Last year I called this working with a wizard: you chant the spell and something happens. With Fable the spell has gotten powerful enough that I am no longer sure I am the wizard. I am closer to a patron. I describe what I want, I pay for it, and I judge the result. The work has shifted from process to outcome. I no longer steer; I commission."

One Useful Thing  ·  June 2026

Patron, not wizard. You describe the outcome, you provide the context, you judge the result. The craft moved one more level up: from writing the brief to knowing what good looks like when it comes back.

Andrej Karpathy made the same point to the technical crowd at Sequoia's AI Ascent: the discipline now is keeping your quality bar while delegating the execution. His version of the boundary: "You can outsource your thinking but not your understanding."4 The model does the work. Whether the work is right is still your job.

Fig. 1
The skill keeps moving up
2023 · MAGIC WORDS "let's think step by step" 2024 · CONTEXT what does it know about you? 2025 · THE BRIEF goal + success criteria 2026 · THE COMMISSION describe, pay, judge model capability each level down stops being an advantage and becomes table stakes
The tricks expire. The skill compounds.

What Actually Stayed the Same

Look at the staircase and one thing runs through every era: the people who got the most out of each model generation were the ones who communicated as if the model were slightly more capable than everyone assumed. In 2023 that meant trusting it with reasoning. In 2024, trusting it with your real documents. In 2025, trusting it with a whole job. In 2026, trusting it with the process and judging the outcome.

And the underlying skill was never syntax. It was always the same three things, moving up a level each time: say what you want clearly, provide the context that makes it possible, and know what good looks like. Wording, then knowledge, then delegation, then taste.

That last one is where it lands today. When the execution is delegated, taste is the technical skill: the ability to judge, in minutes, whether nine hours of autonomous work is excellent or confidently wrong.

How to Update, This Week

Step 1
Retire the incantations

Delete the role-play openers and the magic phrases. Write plainly. When output disappoints, the cause is almost always missing context, not missing magic words. Add what the model could not have known and run it again.

Step 2
Write the context once

Build one living document of your company's reality: what you sell, to whom, how you decide, what good output looks like. Every AI conversation starts from it. This is the 2024 skill, and it is still load-bearing under everything above it.

Step 3
Brief outcomes, judge results

State the goal, the success criteria, and what you will check on delivery. Then review the result like a client, not each step like a supervisor. If you cannot say what you will check, you are not ready to delegate the work yet.

The model outgrew your prompts. The good news is that what it wants from you now, clear goals, honest context, sharp judgment, is the same thing your best people always wanted.

Stop casting spells. Start commissioning work.

We'll train your team on this.

The Diagnostic is a free 30–45 minute conversation. We'll look at how your team actually talks to AI today, and what moving up a level would unlock.

Book the Diagnostic →
Sources
1Kojima et al., 2022 (arXiv 2205.11916). Zero-shot chain-of-thought: "Let's think step by step" lifted MultiArith from 17.7% to 78.7%.
2Wang et al., ACL 2023. Demonstrations with logically invalid reasoning retain 80–90% of chain-of-thought performance; relevance and ordering drive the gains.
3Ethan Mollick, One Useful Thing, June 2026. The patron-not-wizard frame, from his early-access review of Claude Fable 5.
4Andrej Karpathy, Sequoia AI Ascent 2026. "From Vibe Coding to Agentic Engineering."
John Tan
John Tan

Founder and CEO of nativefirst.ai. Embeds with scaling founders and CEOs to ship Level-3 agents and AI workflows in production.