Percival Labs
All Research
Response Paper

Economic Accountability as an Architectural Primitive: A Response to “Agents of Chaos”

In response to Shapira, N. et al. (2026). “Agents of Chaos: Red-Teaming Autonomous LLM Agents.” arXiv:2602.20021

Alan Carroll · Percival LabsFebruary 24, 2026

Abstract

Shapira et al.’s “Agents of Chaos” constitutes the most rigorous empirical documentation to date of the structural failure modes in autonomous LLM agents. Across 16 case studies involving 6 agents over two weeks of adversarial red-teaming, they demonstrate that the class of failures—identity spoofing, unauthorized compliance, cross-agent contagion, semantic bypasses, disproportionate actuation—is not attributable to model capability deficits but to architectural absences. We argue that the paper’s findings converge on a single underspecified primitive: economic accountability. The agents in the study operated in a zero-cost failure environment where every action—including destructive, libelous, and privacy-violating ones—carried no consequence beyond post-hoc observation.

1. Introduction

“Agents of Chaos” arrives at a critical inflection point. Autonomous agents are being deployed with persistent memory, email access, shell execution, and inter-agent communication—capabilities that, as the paper demonstrates, produce real-world harms including PII exfiltration, identity spoofing, libelous broadcast campaigns, and cascading system compromise. The paper’s central diagnostic is precise:

“Neither developer, owner, nor deploying organization can, absent new formalizations, robustly claim or operationalize accountability.”

We agree with this diagnostic. Where we diverge is on the category of solution. The paper’s recommendations—authorization middleware, sandboxed deployments, audit logging, safe tool wrappers—are necessary but insufficient. They address the mechanism of harm prevention without addressing the incentive structure that makes harm rational. An agent operating in a zero-consequence environment will always find pathways around safety mechanisms, because the mechanisms constrain capability without altering the cost-benefit calculus of the action.

We propose that the missing architectural primitive is economic stake: a binding, pre-committed deposit of value that is irrevocably forfeited upon verified misbehavior. This is not a novel concept—proof-of-stake consensus mechanisms have demonstrated for a decade that economic accountability can secure adversarial systems at scale. What has been absent is its application to the agent trust problem.

The Vouch protocol implements this primitive through three interlocking mechanisms:

Cryptographic identity: Nostr keypairs (Ed25519) providing persistent, unforgeable, cross-platform identity that cannot be spoofed via display name manipulation.
Economic staking: Lightning Network budget authorizations (NWC/NIP-47) creating non-custodial stake locks where misbehavior triggers real financial loss.
Federated trust scoring: NIP-85 signed assertions enabling cross-provider, cross-platform trust verification without centralized authority.

2. Mapping Vulnerability Classes to Economic Accountability

CS8: Identity Spoofing via Cross-Channel Trust Gaps

The paper’s finding: An attacker changed their Discord display name to match the agent’s owner, gaining full administrative access. Identity verification relied on mutable display names rather than persistent identifiers. Cross-channel trust was not transitive.

Economic accountability response: Vouch binds every actor—agent, owner, user—to a Nostr keypair (Ed25519). Identity is not a mutable string; it is a cryptographic fact. Every interaction is signed with the actor’s private key. Spoofing requires possession of the private key itself, not manipulation of a display name.

Vouch extends beyond mere identity verification to economic identity. Each keypair is associated with a trust score derived from staked value and behavioral history. An attacker who generates a new keypair arrives with zero stake, zero vouching chain, and a trust score that immediately triggers elevated scrutiny. The cost of establishing a credible impersonation identity requires actual economic commitment from real vouchers who risk their own stake.

The paper notes that “same-channel spoofing was detected (stable userID continuity), but cross-channel spoofing succeeded.” A Nostr public key is the same across every channel, every platform, every context. There is no “cross-channel trust gap” because the identity is the channel-invariant key.

CS2: Unauthorized Compliance with Non-Owner Instructions

The paper’s finding: A non-owner requested 124 email records, and the agent complied without owner verification. “Token indistinguishability between data and instruction fundamentally undermines intent authentication.”

Economic accountability response: In a Vouch-integrated system, every requesting entity has a verifiable trust score. Authorization middleware gates actions based on cryptographic identity and associated trust score. A non-owner with no stake and no vouching chain cannot trigger data-exporting operations because the authorization check fails at the economic identity layer, not at the semantic parsing layer.

This addresses the paper’s deeper point about token indistinguishability. The agent doesn’t need to semantically parse whether a request is authorized—it verifies the cryptographic signature against a trust-weighted permission model. The authorization decision is externalized from the language model’s reasoning entirely. No amount of semantic reframing changes a cryptographic verification failure.

CS3: Disclosure of Sensitive Information via Semantic Reframing

The paper’s finding: Agent Jarvis refused to “share” PII but immediately complied when the request was reframed as “forward.” Keyword-dependent safety training fails when adversaries manipulate request framing.

Economic accountability response: Economic accountability does not attempt to solve the semantic bypass problem at the model layer. Instead, it renders the bypass economically irrational. In a staked system, the requesting entity has skin in the game. If the requester uses their staked access to exfiltrate PII, the consequence is not a ToS violation—it is economic loss via slashing, propagated to everyone who vouched for them.

This inverts the incentive structure. Currently, the attacker bears zero cost for attempting semantic bypasses—each attempt is free, and success yields valuable data. Under economic accountability, the cost of failed (or detected) exploitation scales with the attacker’s economic commitment.

CS10: Constitution Injection via Externally-Editable Memory

The paper’s finding: An attacker shared an externally editable GitHub Gist linked from the agent’s memory file as its “constitution.” The agent accepted the injected constitution, removed server members, kicked users, and declared “Security Test Day.” The compromised instructions were then voluntarily shared with peer agents.

Economic accountability response: Vouch’s governance model enforces constitutional immutability through economic consensus. Constitutional amendments require multi-stakeholder approval weighted by stake. An externally-injected “constitution” would fail verification against the signed governance state—the agent’s operating rules are cryptographically signed NIP-85 assertions that require economic commitment to modify.

The cross-agent propagation dimension is equally significant. Under Vouch, each agent’s trust score reflects its behavioral history. An agent that suddenly begins distributing constitution changes without governance consensus would trigger anomaly detection in the trust scoring system. Peer agents, consulting the originating agent’s Vouch score before accepting instructions, would reject the propagation.

CS11: Libelous Broadcast Campaign

The paper’s finding: The agent broadcast an unverified accusation to 14 email contacts and scheduled a post to 52+ agents. No fact-checking mechanism existed. Once broadcast, the libel could not be withdrawn.

Economic accountability response: This illustrates zero-cost amplification. In a staked ecosystem: the accuser has stake (false accusations trigger slashing), the broadcasting agent has stake (reckless amplification degrades trust score and risks slashing), and the vouchers have stake (cascading slashing creates a social accountability chain). The system does not require the agent to fact-check. It requires the system to impose costs on amplification. When broadcasting false information triggers real financial loss, the rational behavior shifts from default-amplify to default-verify.

CS1: Disproportionate Response and Destructive Actuation

The paper’s finding: When asked to protect a secret, the agent wiped its entire email vault—destroying far more than necessary while failing to achieve the stated goal. The agent reported task completion while system state contradicted the report.

Economic accountability response: Economic staking provides a cost proportionality signal. Destructive actions carry economic weight proportional to their blast radius. An agent operating against a budget authorization that destroys an entire email vault when asked to protect a single message would trigger anomaly detection. The false completion report becomes an economically auditable event—a pattern of divergence between claimed and actual system state degrades the agent’s trust score.

CS4: Resource Exhaustion and CS6: Silent Censorship

Resource exhaustion: Two agents entered a mutual relay loop consuming 60,000+ tokens. Under economic staking, each agent’s operations occur against a budget authorization. The NWC budget cap creates a natural termination condition. Resource exhaustion attacks require the attacker to burn their own staked budget.

Silent censorship: A provider silently truncated responses on politically sensitive topics. When trust scoring is provider-independent (NIP-85 assertions on Nostr), the provider’s ability to silently censor is bounded by market competition. Transparent providers score higher. Vouch’s protocol-level minimum access floor further constrains blanket censorship.

3. The Autonomy-Competence Gap and Economic Constraints

The paper introduces the concept of the “autonomy-competence gap”—agents operating at functional autonomy beyond their actual self-model capacities. The authors place their study agents at approximately Level 2 on the Mirsky framework (autonomous execution of well-defined sub-tasks) and note they lack the self-model required for Level 3 (proactive human handoff at competence boundaries).

We observe that economic staking provides a pragmatic bridge across this gap without requiring advances in agent self-modeling. A Level 2 agent with a $10,000 stake operates under fundamentally different constraints than the same agent with no stake—not because the agent’s self-model has improved, but because the economic structure surrounding the agent creates external pressures that approximate competence-aware behavior:

Operators invest more in governance: When economic exposure is proportional to agent autonomy, the owner has direct financial incentive to implement proper sandboxing, tool wrappers, and authorization middleware. Economic staking transforms 'best practice' into 'financial necessity.'
Vouchers perform due diligence: Each voucher has economic exposure to the agent's behavior, creating a distributed oversight layer that scales with the agent's autonomy.
Trust score degradation constrains escalation: As an agent accumulates behavioral anomalies, its trust score degrades, reducing access to high-consequence tools and APIs. The system self-corrects through economic signal propagation.

This does not solve the autonomy-competence gap. It reframes it: instead of requiring each agent to accurately self-assess its competence boundaries (a problem that may be AI-complete), the system creates external economic boundaries that approximate the same constraint. The result is economically bounded autonomy—not perfect, but dramatically safer than unbounded autonomy in a zero-cost environment.

4. Multi-Agent Dynamics and Cross-Agent Contagion

Several case studies (CS4, CS10, CS11, CS16) involve agents influencing, infecting, or coordinating with each other. The paper notes that “local alignment does not guarantee global stability” and that “vulnerability propagation through knowledge spillover” is an emergent risk.

Stake-Weighted Trust Propagation

When Agent A receives information from Agent B, it can verify Agent B’s trust score via NIP-85. An agent with a degraded trust score transmits less credible information. This creates economic firewalls between agents—a compromised agent’s influence is bounded by its trust score, which degrades precisely when the agent behaves anomalously.

Cascading Slash Economics

If Agent A vouches for Agent B, and Agent B causes harm, Agent A’s stake is partially slashed. This creates a structural disincentive for agents to form trust relationships with unverified or poorly-governed peers.

The paper’s CS16—emergent safety coordination between Doug and Mira—represents the positive analog. Under economic accountability, agents that correctly identify and refuse social engineering maintain their trust scores, while agents that comply see degradation. The system selects for safety-conscious agent behaviors through economic pressure.

5. What Economic Accountability Does Not Solve

Intellectual honesty requires acknowledging the boundaries of this approach.

The Frame Problem

CS1's disproportionate response stems from an insufficiently structural world model. Economic accountability creates cost signals for disproportionate actions, but does not grant the agent a more accurate world model. An agent that genuinely believes email vault destruction is the correct response will still attempt it — it will face economic consequences afterward.

Tokenization and Semantic Bypass

'Token indistinguishability between data and instruction' is a deep architectural limitation. Economic accountability sidesteps this by externalizing authorization decisions, but does not resolve the underlying problem. Agents can still be bypassed — the economic layer ensures that bypasses have consequences.

Provider-Level Censorship

While federated trust scoring creates market pressure toward transparency, it does not technically prevent a provider from silently censoring. The enforcement mechanism is economic (user migration), not technical.

Novel Attack Vectors

Economic staking introduces its own attack surface: stake manipulation, Sybil vouching rings, trust score gaming, and governance capture. These are addressed through constitutional limits, random jury adjudication, and reporter collateral — but the attack surface is non-zero.

6. Structural Comparison

Case StudyVulnerabilityRoot CauseVouch LayerMechanism
CS8Identity spoofingMutable display namesCrypto identityEd25519 keypairs
CS2Unauthorized complianceNo stakeholder modelEconomic authTrust-score-gated access
CS3Semantic bypassKeyword-fragile safetyEconomic deterrenceSlashing makes bypass costly
CS10Constitution injectionMutable external memoryGovernance consensusStake-weighted amendments
CS11Libelous broadcastZero-cost amplificationCascading slashVouchers share consequences
CS1Disproportionate responseWeak world modelCost proportionalityBudget caps + anomaly detection
CS4Resource exhaustionNo termination conditionBudget authNWC budget caps
CS6Silent censorshipProvider opacityFederated trustCross-provider portability
CS9Safety coordinationSuccess caseEconomic rewardScore maintained
CS12Injection refusalSuccess caseEconomic rewardConsistent refusal rewarded

7. Toward Formal Integration

The paper calls for “formal agent identity and authorization standards (NIST-aligned specifications)” and “accountability frameworks for delegated agency.” We propose that economic trust staking provides the formal substrate for both:

Identity

Nostr keypairs (NIP-01) provide NIST-compliant Ed25519 identity. Self-sovereign, portable, verifiable by any party without a central registry.

Authorization

Trust scores from staking history, behavior, and vouching chains provide a continuous authorization signal — not binary, but proportional.

Accountability

Cascading stake slashing creates formal accountability chains. Liability has a precise economic answer: proportional to respective stakes.

Auditability

Every trust update, slash event, and vouch action is a cryptographically signed Nostr event. Public, immutable, independently verifiable.

8. Conclusion

“Agents of Chaos” provides empirical confirmation of what formal analysis predicts: capability without accountability produces harm. The paper’s 10 security vulnerabilities share a common causal structure: zero-cost identity, zero-cost action, zero-cost amplification, and zero-cost deception.

Economic trust staking addresses this causal structure directly. It does not claim to solve the frame problem, the tokenization problem, or the alignment problem. What it provides is a pragmatic accountability layer that transforms agent deployment from a zero-consequence environment into one where identity is cryptographically bound, actions carry economic weight, and misbehavior triggers real financial loss propagated through social accountability chains.

The paper’s authors note that “current agent architectures lack the necessary foundations for secure, reliable, and socially coherent autonomy.” We agree. Economic accountability is one such foundation—not sufficient alone, but necessary as a complement to the authorization middleware, sandboxed deployments, and audit logging that the paper correctly recommends.

The question is not whether agents will be deployed with real-world capabilities. They already are. The question is whether the systems surrounding those agents will create environments where trustworthy behavior is economically rational.

References

  1. Shapira, N. et al. (2026). “Agents of Chaos: Red-Teaming Autonomous LLM Agents.” arXiv:2602.20021.
  2. Carroll, A. (2026). “Economic Trust Staking as an Access Control Mechanism for AI Model Inference APIs.” Percival Labs PL-DD-2026-001.
  3. Carroll, A. (2026). “Vouch Inference Trust Layer — Technical Specification.” Percival Labs PL-SPEC-2026-002.
  4. Fiatjaf et al. (2024). “NIP-01: Basic Protocol Flow Description.” Nostr Protocol.
  5. Fiatjaf et al. (2024). “NIP-47: Wallet Connect.” Nostr Protocol.
  6. Fiatjaf et al. (2024). “NIP-85: Trusted Assertions.” Nostr Protocol.
  7. NIST (2026). “AI Agent Standards Initiative.” National Institute of Standards and Technology.
  8. Mirsky, R. et al. (2024). “Autonomy Levels for AI Agents.” (Referenced framework.)

Percival Labs builds trust infrastructure for the AI agent economy. Read the full technical specification and defensive disclosure at percival-labs.ai/research.