Garry Tan rebuilt a YC jobs board this year. He had 540,000 lines of Rails and 276,000 lines of tests. He called it a Foxconn factory.
Not a compliment. Foxconn is famous for two things: scale and control. Every movement optimized. Every worker inside a cage designed to prevent error. Efficient at what it does. But it only works if the person on the assembly line can't think for themselves.
That's what most engineering orgs have built around their AI. Not infrastructure. A cage.
The economics that built the cage
The cage made sense. Three years ago, LLM calls were expensive and unreliable. Code was cheap. So you wrote code to ration the model. You added retry loops because the API timed out. You added validators because the output drifted. You wrote 1,778 lines to second-guess facts the model should have known. You built scaffolding that assumed the worst.
That was the right call in 2022. Maybe even in 2023.
Both halves of that equation have since inverted. Models are cheap. Models are capable. The retry loop wraps a call the model recovers from on its own. The validator checks output the model would have caught. The fact-checker second-guesses something that is now smarter than the code doing the second-guessing.
"We were writing code to babysit a thing that is now smarter than the code."
The economics that justified the cage are gone. The cage is still there.
Do the audit
Open your codebase. Not all of it. Just the AI-adjacent code. Ask one question for every file: does this exist because the model couldn't be trusted?
Count those lines. Not as a performance review. As a diagnostic.
Retry loops that fire on outputs the model now handles cleanly. Validators that check for errors the model stopped making. Parsers that strip formatting the model stopped producing. Tests that assert behavior the model nails without prompting.
Each of those is a line of code that costs you twice. Once to write. Once every time you have to update it as the model improves. And it will keep improving.
The scaffolding doesn't stay neutral. It actively resists the model getting better. You upgrade the model, and the validator still assumes version-one failure modes. The retry loop still fires on the edge cases that no longer exist. You're paying maintenance costs on distrust you no longer need.
The time traveler problem
The trap doesn't feel like a trap. The code works. Tests pass. Deployments ship. Everything looks fine from the outside.
This is the time traveler problem. You upgraded the engine. You kept the GPS. The car can do 200 mph on modern roads. You're routing it through 2013 streets.
Most builders working with AI today are 2013 engineers with 2026 tools. The mental model didn't update when the model did. So they write modern AI integrations with the same defensive instincts they used when the model was dumb. They wrap a capable system in a cage that was built for a different system.
The code works. That's the whole problem. Working code doesn't announce that it's outdated. It just sits there, accumulating maintenance cost, until the day you try to scale it and find out how much weight you've been carrying.
What just-in-time software looks like
Tan rebuilt Garry's List on the other side of this. The replacement is built on markdown and code. A fraction of the original size. Same capability. Easier to read. Easier to maintain.
He calls it just-in-time software. The behavior lives in instructions you can edit in plain language. Not in validators. Not in retry logic. Not in 1,778-line files whose only job is to second-guess the model.
The shift is architectural, not cosmetic. You stop writing code to constrain the model. You start writing instructions that direct it. The model becomes a first-class participant in the system, not a component you're babysitting.
What changes concretely:
- Retry loops drop out. The model handles transient errors better than the loop does anyway.
- Output validators compress to a single schema check or disappear entirely.
- Fact-checking layers get replaced by a prompt that tells the model what to verify and how to surface uncertainty.
- Test suites shrink to cover your actual logic, not the model's behavior.
The artifact gets smaller. The capability stays. The maintenance cost collapses.
What this means for scaling companies
This is not a refactoring problem. It's a culture problem.
Your AI strategy and your engineering culture have to evolve together. You can't bolt 2026-capable AI onto a distrust-by-default codebase and get Level-3 outcomes. The codebase will win. It will constrain the model back to Level-1 behavior through sheer scaffolding weight.
Level-3 agents can write, review, and ship code with humans setting the guardrails. That architecture requires engineers who trust the model enough to give it real autonomy. That trust requires removing the code that was built on the assumption it couldn't be trusted.
Most teams aren't blocked by model capability. They're blocked by the code they wrote assuming the model would fail.
The engineering culture question is: what do your engineers reach for first when something goes wrong? Another validator? Another retry wrapper? Or a better instruction to the model?
The answer tells you where you actually are on the ladder, independent of what your AI roadmap says.
Where to start
You don't need to rewrite everything. You need to locate the distrust.
Find the file that exists to second-guess the model. The retry loop that fires more often than it helps. The validator that hasn't caught a real error in six months. That's your starting point.
Delete it. Watch what happens. The model probably handles it fine. If it doesn't, you now know exactly what instruction to write instead of what code to add.
That's the flip. Not from code to chaos. From defensive scaffolding to precise direction. The model isn't the thing that needs constraining. It's the thing that knows what to do, if you let it.
The fight isn't with your model. It's with the code you wrote assuming it would lose.
Is your codebase ready for Level 3?
One free conversation. 30–45 minutes. We'll look at where your engineering architecture is fighting your AI and what one change would flip the ratio.
Book the Diagnostic →