Percival Labs
All Research
Defensive DisclosurePL-DD-2026-002

Economic Accountability Layer for AI Agent Tool-Use Protocol Governance

Alan CarrollFebruary 24, 2026

Defensive Disclosure. This document is published to establish prior art under 35 U.S.C. 102(a)(1) and prevent the patenting of the described methods by any party. The protocol-level concepts are dedicated to the public domain. Specific implementations, scoring algorithms, and trade secrets are retained by Percival Labs.

Abstract

This disclosure describes a system for governing AI agent tool-use protocols—such as the Model Context Protocol (MCP)—through an economic accountability layer. Tool server operators deposit slashable economic value, community members vouch for servers they trust, and verified security incidents trigger cascading economic penalties.

The system operates as a protocol-agnostic overlay that requires no modifications to the underlying tool-use specification, enabling opt-in adoption where scored servers receive enhanced trust visibility and unscored servers continue to function normally.

1. The Problem: Capability Without Accountability

Standardized protocols for AI agent-to-tool communication enable agents to discover, connect to, and use external tools at runtime. As of February 2026, the MCP ecosystem alone comprises over 8,600 indexed tool servers, 97 million monthly SDK downloads, and 300+ client integrations.

However, these protocols provide capability without accountability. Existing trust mechanisms address identity, authorization, and provenance—but none create economic consequences for misbehavior:

MechanismWhat It ProvesWhat It Doesn’t
Namespace registryWho published the serverWhether they are trustworthy
OAuth 2.1User approved accessWhether the agent should use it
Cryptographic attestationCode hasn’t been tampered withWhether runtime behavior is safe
Neural threat detectionMalicious behavior detectedNothing—detection without consequence is not deterrence

The missing layer is economic accountability: a mechanism that makes misbehavior financially costly for server operators and their endorsers.

2. Documented Attack Surface

Between January and February 2026, 30 Common Vulnerabilities and Exposures (CVEs) were documented across the MCP ecosystem. 41% of servers in the official registry lack any authentication. Key attack classes include:

Tool Poisoning

Malicious instructions embedded in tool description metadata, consumed by the LLM but not displayed to the user. Demonstrated: full WhatsApp message history exfiltration via a benign-appearing "random fact" tool server.

Rug Pull Attacks

Tool servers that pass initial review and later silently mutate their tool definitions to include malicious behavior. Auto-update pipelines propagate changes without re-prompting for user consent.

Supply Chain Compromise

Poisoned tool server packages propagated through package registries. Over 437,000 developer environments were compromised through a single supply chain CVE.

Cross-Server Shadowing

A malicious server connected to the same agent as a trusted server can override or intercept calls intended for the trusted server.

Sampling Injection

The protocol's sampling feature — where a server can request the client's LLM to generate text — creates a prompt injection surface enabling compute theft and data exfiltration.

3. The Solution: Economic Accountability Layer

The disclosed system introduces an economic accountability layer that operates as a transparent overlay on existing tool-use protocols. The system comprises four principal components:

Trust Score Registry

A federated network of scoring services that publish composite trust scores for tool servers as cryptographically signed events on a decentralized messaging protocol (e.g., Nostr NIP-85). Each scoring service maintains its own model and publishes independently. Clients verify signatures and apply scores according to local policy.

Client-Side Trust Middleware

A software component in the agent host's tool-use client that intercepts tool discovery and invocation, looks up server trust scores from cached registry data, applies configurable policy rules (allow/warn/block thresholds), and logs trust decisions. Operates transparently to both the LLM engine and the tool server.

Staking Engine

Non-custodial economic commitment system where tool server operators and their vouchers authorize budget commitments via wallet connect protocols (e.g., NWC/NIP-47). Tracks active stakes, budget caps, and spent amounts without custodying funds.

Slash Adjudicator

Governance component processing slash requests through a defined workflow: reporter stakes collateral, evidence is submitted, server operator has a mandatory response period, randomly selected jury evaluates evidence, slash is executed or rejected by majority vote.

4. Tool Server Trust Score

Each participating tool server receives a composite trust score derived from weighted signals:

Operator Stake (~30%): Economic value committed by the server operator via non-custodial budget authorization, normalized against ecosystem benchmarks
Community Vouching (~25%): Weighted sum of stakes committed by entities vouching for the server, where each voucher's contribution is scaled by their own trust score
Behavioral History (~20%): Uptime, response consistency, anomaly rate, and definition stability, derived from aggregated client-side behavioral observations
Provenance Verification (~15%): Cryptographic attestation, namespace registry verification, source code audit status, and CVE response history
Community Observations (~10%): Aggregated behavioral reports from participating clients, weighted by observer trust score

Specific normalization functions and scoring algorithms are implementation-specific and not disclosed. Scores are published at regular intervals as signed events on the trust registry.

5. Tool-Level Risk Classification

Individual tools within a server carry trust metadata including a risk tier classification. The client-side middleware applies tier-appropriate policy:

Risk TierExamplesDefault Policy
LowRead-only data, public APIsAuto-approve above score 30
MediumWrite operations, file accessAuto-approve above score 60
HighShell execution, payment, emailRequire human approval or score 85+
CriticalCredential access, system configurationAlways require human approval

6. Behavioral Monitoring Protocol

Participating clients report behavioral observations as signed events containing structured evidence. Observation types include:

  • Definition change—tool definitions mutated between sessions (rug pull detection)
  • Latency spike—response time anomalies suggesting resource exhaustion or interception
  • Parameter anomaly—unexpected tool parameter patterns suggesting injection attempts
  • Cross-server shadow—one server intercepting or overriding calls to another
  • Auth failure spike—sudden increase in authentication failures suggesting credential probing

Anti-Gaming Measures

  • Observers must be staked—zero-stake observers’ reports carry zero weight
  • Reports require signed evidence payloads (tool definition diffs, timing data, parameter distributions)
  • Consistently inaccurate reporters see their own trust score degraded
  • Multiple independent reports of the same anomaly increase confidence; single-source reports are weighted lower
  • Server operators can dispute observations with counter-evidence

7. Slash Mechanism

Verified misbehavior triggers a formal slash process with defined severity tiers:

TierSlash RangeTrigger Conditions
Tier 110–25% of stakeCVE non-response within 72 hours, minor definition mutations without user notification
Tier 225–50% of stakeVerified tool poisoning, confirmed cross-server data shadowing, supply chain compromise with delayed response
Tier 3Up to constitutional maximumVerified intentional data exfiltration, proven malicious rug pull, coordinated attack on client agents

Constitutional Limits

The following constraints are immutable at the protocol level and cannot be modified through governance:

  • 50% maximum single slash—no operator loses their entire stake on one decision
  • 48-hour minimum evidence period—the accused has time to respond before adjudication
  • Reporter collateral of minimum 10% of requested slash amount—frivolous reports are economically irrational
  • Double jeopardy protection—no re-slash for the same incident after adjudication
  • 90-day statute of limitations from incident detection
  • 7-day appeal window post-slash execution, heard by independently selected body

Voucher Cascade

Upon slash execution, all community members who vouched for the slashed server are slashed at a reduced rate (5–25% of their vouch amount, proportional to slash severity). This creates distributed economic incentive for pre-emptive due diligence.

8. Cross-Protocol Trust Portability

The economic accountability layer is designed for protocol-agnostic operation. The same cryptographic identity and associated trust score can be used across:

  • Agent-to-tool protocols (e.g., MCP)
  • Agent-to-agent protocols (e.g., A2A, ACP)
  • Capability declaration formats (e.g., AGENTS.md)
  • Custom and future protocols via the shared cryptographic identity anchor

This universality is achieved by anchoring trust to the cryptographic identity (e.g., Nostr keypair) rather than to any specific protocol’s identity system.

9. Key Design Properties

Overlay Architecture

Operates alongside existing protocols without requiring specification changes. Opt-in adoption. Unscored servers continue to function normally.

Non-Custodial

Stake locks are budget authorizations on the operator's own wallet. No funds are escrowed by a third party. No securities classification.

Federated

Multiple independent scoring services publish competing trust assessments. Clients choose which services to trust. No centralized authority.

Protocol-Agnostic

Same trust score works across MCP, A2A, ACP, AGENTS.md, and any future protocol. One identity, one trust score, every protocol.

10. Novel Contributions

The following aspects are believed to be novel as of the filing date:

  1. Economic staking and slashing mechanisms applied to AI agent tool-use protocol governance
  2. Community vouching chains with cascading economic consequences for tool server trustworthiness
  3. Client-side trust middleware operating as a transparent overlay on tool-use protocol transports without protocol modification
  4. Tool-level risk tier classification with tier-appropriate trust policy enforcement
  5. Multi-observer behavioral anomaly consensus with stake-weighted reporting for tool server monitoring
  6. Rug pull detection through tool definition integrity monitoring across sessions
  7. Constitutional limits on governance power applied to tool-use protocol slash adjudication
  8. Non-custodial staking for tool server operators via wallet connect protocols
  9. Federated trust scoring where multiple independent services publish competing assessments of tool server trustworthiness
  10. Cross-protocol trust portability anchored to a single cryptographic identity, enabling a unified trust score across agent-to-tool, agent-to-agent, and capability declaration protocols

11. Prior Art Established

DateArtifact
Feb 23, 2026Defensive Disclosure PL-DD-2026-001: Economic Trust Staking for AI Model Inference APIs (establishes staking, vouching, and slashing primitives)
Feb 22, 2026Vouch Agent SDK and API deployed with Nostr identity, NIP-98 auth, and trust scoring
Feb 24, 2026Response paper to “Agents of Chaos” mapping agent security vulnerabilities to economic accountability mechanisms
2025–2026Continuous git commit history documenting protocol development including tool-use governance concepts

Filed as a defensive disclosure by Percival Labs, Bellingham, WA, USA. This document constitutes prior art under 35 U.S.C. 102(a)(1). The described protocol-level concepts are dedicated to the public domain for the purpose of preventing patent claims. All rights to specific implementations, trade secrets, and trademarks are reserved.

Document ID: PL-DD-2026-002 · Contact: [email protected]