The Token Wasn't the Problem. The Composition Was.

Nine seconds. One authenticated API call. PocketOS’s production database gone — and the backups too, since they were stored on the same Railway volume.

The agent that did it was a Cursor coding assistant working in staging, with a task a developer had given it earlier that day. It hit a credential mismatch. It searched the repository for a working token. It found one — created for an unrelated purpose, but it had permissions on production volumes. It called Railway’s GraphQL API. The call succeeded.

That’s the failure that ate a database last week. The post-mortem is a few days old. The wrong lessons are already winning.

In March I wrote a governance framework that treats this exact category as a first-class concern. Combinatorial sensitivity is the gap I keep arguing per-operation governance models can’t close. Worth pulling apart, because almost every read of this incident is missing it.


Look at what the agent did, mechanically.

It was scoped to staging. It hit a credential issue. Searching the repository for credentials when authentication fails is normal coding-agent behavior. Using a token that authenticates is normal — the token was valid. Calling an authenticated API endpoint is normal — that’s how the provider exposes volume management. The provider honoring an authenticated delete call is normal — that is the contract.

Each step, evaluated alone, is defensible. The catastrophe lives entirely in the composition.

This is combinatorial sensitivity: individually innocuous operations producing a sensitive — in this case, irreversible — outcome when joined.

Each read and each action looks fine on its own. The composition is what’s catastrophic. And the composition is the thing the governance layer in production deployments today is not modeling.


The fixes coming out of the post-mortem operate at the per-operation layer. Confirmation prompts on destructive calls. Scoped tokens. Backups in a separate volume. Human-in-the-loop on irreversible commands. The founder’s own write-up reads like a punch list of provider-side hardening — smaller token blast radius, isolated backups, stricter delete confirmations.

All correct. All necessary. None sufficient.

The universe of dangerous compositions is larger than the universe of obviously-dangerous primitives. Put a confirmation prompt on every delete operation, and the next coding agent will compose a different set of fine-looking primitives into a different catastrophic outcome — and there won’t be a confirmation prompt on it, because nobody yet knew to write one.

These fixes are whack-a-mole at the wrong layer. They harden the primitives that just got named in the post-mortem. They don’t address the failure category.

The dominant governance pattern in MCP and OAuth-derivative AI agent infrastructure is built around credentials, scopes, and tool-call permissions — verbs scored individually. That vocabulary cannot answer “is this composition coherent given everything else this agent has done in this session, and everything that was declared about its task.” It can only answer “is this verb authorized.”

The first question is the operational one. The second is the governance one. The space between them is where the catastrophic compositions live.


A note on what the post-mortem says the agent thought.

The published account includes self-narration the agent generated about why it acted — guessed, didn’t verify, all that. AI systems generate plausible-sounding self-narration regardless of what was actually computed. Build the argument on the mechanical facts well-attested across the coverage: token in an unrelated file, single API call, backups co-located. The introspection is theater.


If the token wasn’t the problem, and the API call wasn’t the problem, and the agent isn’t the problem in isolation — what is the unit of analysis?

The session. What the agent has done, what was declared, whether the next call holds together with the rest. Per-operation governance can’t see any of it.

Watch for that vocabulary in the next governance pitch you read. If the answer comes back in credentials and scopes, you don’t have governance. You have a punch list with a press release.