Skip to content
Engineering

The Outage Era: Cantonize the Codebase

Kris Steigerwald MAR 2026 8 min read
The Outage Era: Cantonize the Codebase

A few weeks ago I wrote about designing frameworks for LLMs instead of programmers. It was a thought experiment — what if the architecture optimized for machine consumption, not human ergonomics? What if we built the nervous system instead of bolting AI onto the skeleton?

Since then, the thesis got tested in production. Not by us. By Amazon.

The week Amazon lost 6.3 million orders

In December 2025, Amazon engineers deployed Kiro — their own agentic AI coding tool — to make changes to a production environment. Kiro, operating autonomously as designed, determined it needed to “delete and recreate the environment.” The result was a 13-hour AWS outage.

Three months later, in March 2026, Amazon’s storefront went down twice in a single week. The first outage lasted six hours — 120,000 lost orders, 1.6 million website errors. The second was worse: six hours, a 99% drop in U.S. order volume, roughly 6.3 million orders gone.

Amazon’s official response: “It was merely a coincidence that AI tools were involved.” They called it user error.

That framing tells you everything about where we are.

The conductor who can’t read the score

Here’s what actually happened. Not just at Amazon — everywhere. The software development lifecycle didn’t evolve. It compressed.

Features that shipped in two-week sprints now ship daily. Code review went from a gate to a speed bump. And 80% of what’s hitting production was written by a model, not a person. The human engineer has become a conductor — waving the baton over an orchestra of AI agents, each writing code at machine speed across every instrument in the stack.

The problem: the conductor can’t read every instrument’s sheet music anymore. They can’t, because there’s too much of it, it arrived ten minutes ago, and it was written in a style that optimizes for token efficiency over human comprehension.

So when something breaks — and it will, because things always break — the conductor stares at a codebase they don’t fully understand, deploys more LLMs to find the bug, and hopes for the best. It’s LLMs all the way down.

GitHub had 37 incidents in February 2026 alone. AI-generated code introduces security vulnerabilities at 1.5–2x the rate of human-written code. Teams are deploying faster than review processes can absorb.

Amazon’s fix? Require senior engineer approval on all AI-assisted production changes. Add friction. Slow down.

That’s not a solution. That’s an admission that the current architecture doesn’t work.

You can’t review what you can’t comprehend

The review problem isn’t about speed — it’s about legibility. A senior engineer reviewing AI-generated code faces the same comprehension gap as everyone else. The code works. The tests pass. The PR looks clean. But nobody holds the full mental model of what changed, why, and what it touches.

This is the fundamental mismatch: we’re generating code at machine speed and reviewing it at human speed, using human-shaped tools that were designed for human-shaped workflows.

Adding a senior engineer to the approval chain doesn’t close that gap. It just puts a more experienced human in front of the same incomprehensible volume.

Cantonize the codebase

What’s missing isn’t process. It’s language.

When a team starts a project today, there’s no shared lexicon between the humans providing intent and the models generating code. The LLM writes in whatever patterns it infers from the prompt and the codebase. The human reviews in whatever mental model they’ve built over years. These two frames rarely align perfectly — and the delta between them is where bugs live.

You don’t debug a sentence in a language you speak fluently by reading the whole book. You hear which word is wrong. But if someone wrote half the book in a dialect you’ve never seen, you’d have to read everything just to orient yourself.

The fix is to cantonize the codebase — divide it into semi-autonomous districts, the way Switzerland has for seven hundred years. Each canton owns its domain completely. The federal system doesn’t micromanage. It coordinates.

In the Action Potential framework, I called these units “organs” — registered capabilities with defined inputs, outputs, and processing tiers. But the principle is bigger than one framework.

Cantonization means breaking a codebase into registered, immutable, purely functional units — small enough that a human can memorize them, test them to certainty, and fully understand their boundaries. Not all of the code. Ninety percent of it. The canonical surface — recognized, tested, authoritative. The standard by which everything else is measured.

That’s the move. Not restricting what code can exist — creating a known surface so the unknown surface becomes small and visible.

Think about what happens when Kiro decides to “delete and recreate the environment.” In the current architecture, that action is expressible. Nothing in the system’s structure prevents it. The only gate is a human who may or may not be watching.

In a cantonized system, “delete environment” and “create environment” are registered units with declared inputs, outputs, and invariants. Composing them sequentially on a production environment violates the invariants. The system rejects it — not because a senior engineer caught it in a review queue, but because the vocabulary doesn’t allow it. That’s not a guardrail. That’s structure.

name: environment-provision
accepts:
  - infra.environment-spec
  - auth.deploy-credential
emits:
  - infra.environment.created
  - notify.ops-channel
invariants:
  - target: [staging, preview]        # production excluded
  - requires: deployment-window
  - requires: rollback-snapshot
  - destructive: false                 # cannot destroy existing state
tested: true
owner: @infra-team

An LLM composing with this unit literally cannot destroy a production environment. Not because it’s been instructed not to. Because the unit doesn’t accept production as a target. The constraint is in the shape, not the prompt.

The 90/10 split

This isn’t about locking down the entire codebase. It’s about ratio.

When 90% of your code is cantonized — registered, tested, memorized by the humans responsible for it — the remaining 10% becomes instantly visible. New code. Experimental code. The stuff that hasn’t earned its place in the registry yet. That’s where you look when things break.

Right now, when an outage hits, engineers scour the entire codebase. They deploy LLMs to search for bugs across thousands of files, most of which were written by other LLMs days ago. It’s forensics without a crime scene — everything is a suspect because nothing is trusted.

Cantonization flips that. Ninety percent of the codebase is inert — known, stable, tested. The 10% that isn’t cantonized lights up like a contrast agent in an MRI. You don’t search. You see.

Amazon’s fix is the mistake the original article predicted

In the first piece, I argued that we’re still designing tools for human hands even though machines are doing the work. Amazon’s response to their outage crisis — mandatory senior review of all AI-assisted changes — is exactly that mistake.

It’s a human-shaped solution to a machine-shaped problem. It creates a bottleneck that scales inversely with the speed AI enables. The more code AI writes, the slower the review queue gets, the more pressure builds to skip the review, and you’re back where you started.

The math only works if you change the unit of review. Stop reviewing code. Start reviewing compositions of known units. A senior engineer can’t meaningfully review 10,000 lines of AI-generated code. But they can review a composition graph of 15 registered cantons and immediately spot if something doesn’t belong — because they know every canton by heart.

The senior engineer’s value isn’t reading code line by line. It’s knowing the system’s vocabulary well enough to hear when a word is wrong.

The outage era has a thesis

Erlang figured this out in 1986. Every process is a registered unit with a defined behaviour, declared inputs and outputs, and a supervision tree that constrains how units compose. Ericsson runs telecom switches with nine nines of uptime on this model. When something crashes, the supervisor knows exactly which unit failed and restarts it. Nobody greps the codebase.

But the functional programming world and the AI world don’t usually read each other’s work. The people writing about outages are saying “slow down” or “add review gates.” The people writing about AI frameworks are saying “make agents smarter.” Neither side is talking to the other.

Every major outage of the last six months shares a root cause: code was deployed faster than it could be understood. Not faster than it could be written — LLMs solved that. Not faster than it could be tested — CI pipelines handle that. Faster than it could be comprehended by the humans accountable for it.

The answer isn’t slower agents or smarter agents. It’s dumber, smaller units that humans can hold again. That ship sailed the moment LLMs started writing production code. You can’t un-accelerate. But you can shrink the unit of understanding to something a human brain can actually hold.

Cantonize the codebase. Make 90% of it known. Let the other 10% be visible precisely because it’s the part that isn’t canon yet.

The outages will tell you if you got the boundaries right.


This is a continuation of What If We Designed Frameworks for LLMs Instead of Programmers?. Kris Steigerwald is the founder of Velaru, a digital agentic agency for small and mid-size businesses.

ai-code outages action-potential architecture llm-frameworks