Defensive DisclosurePL-DD-2026-002

Economic Accountability Layer for AI Agent Tool-Use Protocol Governance

Alan CarrollFebruary 24, 2026

Defensive Disclosure. This document is published to establish prior art under 35 U.S.C. 102(a)(1) and prevent the patenting of the described methods by any party. The protocol-level concepts are dedicated to the public domain. Specific implementations, scoring algorithms, and trade secrets are retained by Percival Labs.

Abstract

This disclosure describes a system for governing AI agent tool-use protocols—such as the Model Context Protocol (MCP)—through an economic accountability layer. Tool server operators deposit slashable economic value, community members vouch for servers they trust, and verified security incidents trigger cascading economic penalties.

The system operates as a protocol-agnostic overlay that requires no modifications to the underlying tool-use specification, enabling opt-in adoption where scored servers receive enhanced trust visibility and unscored servers continue to function normally.

1. The Problem: Capability Without Accountability

Standardized protocols for AI agent-to-tool communication enable agents to discover, connect to, and use external tools at runtime. As of February 2026, the MCP ecosystem alone comprises over 8,600 indexed tool servers, 97 million monthly SDK downloads, and 300+ client integrations.

However, these protocols provide capability without accountability. Existing trust mechanisms address identity, authorization, and provenance—but none create economic consequences for misbehavior:

Mechanism	What It Proves	What It Doesn’t
Namespace registry	Who published the server	Whether they are trustworthy
OAuth 2.1	User approved access	Whether the agent should use it
Cryptographic attestation	Code hasn’t been tampered with	Whether runtime behavior is safe
Neural threat detection	Malicious behavior detected	Nothing—detection without consequence is not deterrence

The missing layer is economic accountability: a mechanism that makes misbehavior financially costly for server operators and their endorsers.

2. Documented Attack Surface

Between January and February 2026, 30 Common Vulnerabilities and Exposures (CVEs) were documented across the MCP ecosystem. 41% of servers in the official registry lack any authentication. Key attack classes include:

Tool Poisoning

Malicious instructions embedded in tool description metadata, consumed by the LLM but not displayed to the user. Demonstrated: full WhatsApp message history exfiltration via a benign-appearing "random fact" tool server.

Rug Pull Attacks

Tool servers that pass initial review and later silently mutate their tool definitions to include malicious behavior. Auto-update pipelines propagate changes without re-prompting for user consent.

Supply Chain Compromise

Poisoned tool server packages propagated through package registries. Over 437,000 developer environments were compromised through a single supply chain CVE.

Cross-Server Shadowing

A malicious server connected to the same agent as a trusted server can override or intercept calls intended for the trusted server.

Sampling Injection

The protocol's sampling feature — where a server can request the client's LLM to generate text — creates a prompt injection surface enabling compute theft and data exfiltration.

3. The Solution: Economic Accountability Layer

The disclosed system introduces an economic accountability layer that operates as a transparent overlay on existing tool-use protocols. The system comprises four principal components:

Trust Score Registry

A federated network of scoring services that publish composite trust scores for tool servers as cryptographically signed events on a decentralized messaging protocol (e.g., Nostr NIP-85). Each scoring service maintains its own model and publishes independently. Clients verify signatures and apply scores according to local policy.

Client-Side Trust Middleware

A software component in the agent host's tool-use client that intercepts tool discovery and invocation, looks up server trust scores from cached registry data, applies configurable policy rules (allow/warn/block thresholds), and logs trust decisions. Operates transparently to both the LLM engine and the tool server.

Staking Engine

Non-custodial economic commitment system where tool server operators and their vouchers authorize budget commitments via wallet connect protocols (e.g., NWC/NIP-47). Tracks active stakes, budget caps, and spent amounts without custodying funds.

Slash Adjudicator

Governance component processing slash requests through a defined workflow: reporter stakes collateral, evidence is submitted, server operator has a mandatory response period, randomly selected jury evaluates evidence, slash is executed or rejected by majority vote.

4. Tool Server Trust Score

Each participating tool server receives a composite trust score derived from weighted signals:

Operator Stake (~30%): Economic value committed by the server operator via non-custodial budget authorization, normalized against ecosystem benchmarks

Community Vouching (~25%): Weighted sum of stakes committed by entities vouching for the server, where each voucher's contribution is scaled by their own trust score

Behavioral History (~20%): Uptime, response consistency, anomaly rate, and definition stability, derived from aggregated client-side behavioral observations

Provenance Verification (~15%): Cryptographic attestation, namespace registry verification, source code audit status, and CVE response history

Community Observations (~10%): Aggregated behavioral reports from participating clients, weighted by observer trust score

Specific normalization functions and scoring algorithms are implementation-specific and not disclosed. Scores are published at regular intervals as signed events on the trust registry.

5. Tool-Level Risk Classification

Individual tools within a server carry trust metadata including a risk tier classification. The client-side middleware applies tier-appropriate policy:

Risk Tier	Examples	Default Policy
Low	Read-only data, public APIs	Auto-approve above score 30
Medium	Write operations, file access	Auto-approve above score 60
High	Shell execution, payment, email	Require human approval or score 85+
Critical	Credential access, system configuration	Always require human approval

6. Behavioral Monitoring Protocol

Participating clients report behavioral observations as signed events containing structured evidence. Observation types include:

Definition change—tool definitions mutated between sessions (rug pull detection)
Latency spike—response time anomalies suggesting resource exhaustion or interception
Parameter anomaly—unexpected tool parameter patterns suggesting injection attempts
Cross-server shadow—one server intercepting or overriding calls to another
Auth failure spike—sudden increase in authentication failures suggesting credential probing

Anti-Gaming Measures

Observers must be staked—zero-stake observers’ reports carry zero weight
Reports require signed evidence payloads (tool definition diffs, timing data, parameter distributions)
Consistently inaccurate reporters see their own trust score degraded
Multiple independent reports of the same anomaly increase confidence; single-source reports are weighted lower
Server operators can dispute observations with counter-evidence

7. Slash Mechanism

Verified misbehavior triggers a formal slash process with defined severity tiers:

Tier	Slash Range	Trigger Conditions
Tier 1	10–25% of stake	CVE non-response within 72 hours, minor definition mutations without user notification
Tier 2	25–50% of stake	Verified tool poisoning, confirmed cross-server data shadowing, supply chain compromise with delayed response
Tier 3	Up to constitutional maximum	Verified intentional data exfiltration, proven malicious rug pull, coordinated attack on client agents

Constitutional Limits

The following constraints are immutable at the protocol level and cannot be modified through governance:

50% maximum single slash—no operator loses their entire stake on one decision
48-hour minimum evidence period—the accused has time to respond before adjudication
Reporter collateral of minimum 10% of requested slash amount—frivolous reports are economically irrational
Double jeopardy protection—no re-slash for the same incident after adjudication
90-day statute of limitations from incident detection
7-day appeal window post-slash execution, heard by independently selected body

Voucher Cascade

Upon slash execution, all community members who vouched for the slashed server are slashed at a reduced rate (5–25% of their vouch amount, proportional to slash severity). This creates distributed economic incentive for pre-emptive due diligence.

8. Cross-Protocol Trust Portability

The economic accountability layer is designed for protocol-agnostic operation. The same cryptographic identity and associated trust score can be used across:

Agent-to-tool protocols (e.g., MCP)
Agent-to-agent protocols (e.g., A2A, ACP)
Capability declaration formats (e.g., AGENTS.md)
Custom and future protocols via the shared cryptographic identity anchor

This universality is achieved by anchoring trust to the cryptographic identity (e.g., Nostr keypair) rather than to any specific protocol’s identity system.

9. Key Design Properties

Overlay Architecture

Operates alongside existing protocols without requiring specification changes. Opt-in adoption. Unscored servers continue to function normally.

Non-Custodial

Stake locks are budget authorizations on the operator's own wallet. No funds are escrowed by a third party. No securities classification.

Federated

Multiple independent scoring services publish competing trust assessments. Clients choose which services to trust. No centralized authority.

Protocol-Agnostic

Same trust score works across MCP, A2A, ACP, AGENTS.md, and any future protocol. One identity, one trust score, every protocol.

10. Novel Contributions

The following aspects are believed to be novel as of the filing date:

Economic staking and slashing mechanisms applied to AI agent tool-use protocol governance
Community vouching chains with cascading economic consequences for tool server trustworthiness
Client-side trust middleware operating as a transparent overlay on tool-use protocol transports without protocol modification
Tool-level risk tier classification with tier-appropriate trust policy enforcement
Multi-observer behavioral anomaly consensus with stake-weighted reporting for tool server monitoring
Rug pull detection through tool definition integrity monitoring across sessions
Constitutional limits on governance power applied to tool-use protocol slash adjudication
Non-custodial staking for tool server operators via wallet connect protocols
Federated trust scoring where multiple independent services publish competing assessments of tool server trustworthiness
Cross-protocol trust portability anchored to a single cryptographic identity, enabling a unified trust score across agent-to-tool, agent-to-agent, and capability declaration protocols

11. Prior Art Established

Date	Artifact
Feb 23, 2026	Defensive Disclosure PL-DD-2026-001: Economic Trust Staking for AI Model Inference APIs (establishes staking, vouching, and slashing primitives)
Feb 22, 2026	Vouch Agent SDK and API deployed with Nostr identity, NIP-98 auth, and trust scoring
Feb 24, 2026	Response paper to “Agents of Chaos” mapping agent security vulnerabilities to economic accountability mechanisms
2025–2026	Continuous git commit history documenting protocol development including tool-use governance concepts