Ideal State Criteria as a Runtime Quality Primitive for AI Agents and Unified Agent Operating System Architecture
Defensive Disclosure. This document is published to establish prior art under 35 U.S.C. 102(a)(1) and prevent the patenting of the described methods by any party. The protocol-level concepts are dedicated to the public domain. Specific implementations, scoring algorithms, and trade secrets are retained by Percival Labs.
Abstract
This disclosure describes two interconnected inventions. Part A presents a system and method for defining, tracking, and verifying quality criteria as a runtime primitive within AI agent execution—wherein discrete binary testable criteria are automatically generated from task descriptions, continuously tracked during agent execution phases, and verified before marking work complete.
Part B presents a unified agent operating system architecture comprising three integrated pillars—a skill definition framework, a trust-authenticated inference routing gateway, and a governance and scoring system—that together provide a complete, configurable platform for defining, routing, and governing AI agents from a single codebase serving individual developers through enterprise organizations.
Part A: The AI Agent Quality Assurance Problem
AI agents executing multi-step tasks produce outputs of variable quality. As of March 2026, quality assurance for AI agent output relies on approaches that each leave a critical gap:
| Approach | Strength | Gap |
|---|---|---|
| Human review | Effective quality judgments | Does not scale; no feedback during execution |
| Unit testing | Verifies deterministic paths | Cannot verify relevance, completeness, or tone |
| LLM-as-judge | Useful for batch evaluation | Typically applied post-hoc, not in execution loop |
| Benchmark suites | Measures agent potential | Evaluates capability, not specific task quality |
None of these approaches provide a mechanism for continuous quality tracking during agent execution—defining what “done right” means for a specific task at the start, tracking progress toward that definition during execution, and verifying all criteria are met before marking the task complete. The gap between “task started” and “task reviewed” is a quality blind spot.
ISC Runtime Engine
The ISC runtime operates as a library integrated into the agent execution environment. It provides the following API surface:
Generation
generateISC(taskDescription: string) -> ISCSet — Parses a task description and produces a set of criteria. Uses the agent's LLM to analyze the task and produce criteria in a structured format with positive criteria (MUST be true), anti-criteria (must NOT be true), and priority tiers (CRITICAL, IMPORTANT, NICE-TO-HAVE), each with an explicit verification method.
Phase-Boundary Tracking
trackPhase(phase: string, criteria: ISCSet) -> ISCDelta — Records the current state of all criteria at defined phase boundaries during execution (e.g., after research, after implementation, after testing). Computes the delta from the previous boundary, recording which criteria have been added, modified, removed, passed, or failed.
Verification
verify(criteria: ISCSet) -> VerificationResult — Executes the verification method for each criterion and returns pass/fail for each, blocking completion if any CRITICAL criterion fails. Verification methods are specified per-criterion (e.g., 'file exists at path', 'test suite passes', 'output contains required section').
Circuit Breaker
checkCircuitBreakers(executionState: ExecutionState) -> CircuitBreakerResult — Evaluates anti-criteria against the current execution state, returning a halt signal if any are violated. Requires explicit human approval before resuming.
ISC Format
ISC-C1: [8-12 word criterion] | Verify: [method] | Priority: CRITICAL ISC-C2: [8-12 word criterion] | Verify: [method] | Priority: IMPORTANT ISC-A1: [anti-criterion — must NOT happen] | Verify: [detection method] | Priority: CRITICAL
Circuit Breaker Anti-Criteria
The runtime monitors for predefined failure patterns during execution. When any anti-criterion is violated, execution is automatically halted:
| Anti-Criterion | Trigger Condition |
|---|---|
| Repeated failed approaches | Same approach attempted N+ times without success |
| Core assumption invalidation | A foundational assumption of the plan is proven false |
| Scope expansion | Work expands beyond the boundaries of the original plan |
| Workaround depth exceeded | Writing workarounds for workarounds—cascading patches |
| Out-of-scope changes | Modifications to code or systems outside the planned scope |
ISC Evolution Tracking
Criteria are permitted to change during execution as understanding deepens. Every change is captured as a delta record including the original criterion text, the modified text, the reason for change, and the phase at which the change occurred.
ISC TRACKER Phase: [current phase] Criteria: [X] total (+N added, -M removed, ~K modified) Anti: [X] total Status: [N passed / M pending / K failed] Changes this phase: + ISC-C4: new criterion added | Verify: method ~ ISC-C2: criterion refined (was: old text) | Verify: method - ISC-C3: removed (reason)
This evolution data enables post-execution analysis of how task understanding changed and feeds back into improved initial ISC generation for future similar tasks.
Ship Gate Verification
A final verification pass is mandated before any implementation is marked complete. The ship gate checks the following meta-criteria:
| Gate | Verification Question |
|---|---|
| Simplicity | Is this the simplest solution that works? |
| Review readiness | Are there any code review objections anticipated? (Agent reviews its own diff) |
| Hidden concerns | Does any hidden concern exist? (Agent must state its worst worry) |
| Explainability | Can the approach be explained in one sentence? |
| Blast radius | Are adjacent systems unaffected by the changes? |
| Cleanliness | Do any temporary hacks remain in the changed files? |
Part B: The Fragmented Agent Infrastructure Problem
Organizations deploying AI agents must assemble infrastructure from disconnected components, each existing in isolation:
Skill / Prompt Management
Stored as files, database entries, or hardcoded strings with no standardized format for defining what an agent knows and how it should behave.
Model Access
Each agent manages its own API keys, model selection, and provider failover logic. No shared infrastructure for routing inference requests across a fleet.
Governance
Access control, spending limits, usage tracking, and audit trails are implemented per-agent or not at all. No unified system governs agent behavior across an organization.
Trust / Reputation
Agent quality is evaluated informally (if at all). No mechanism for quality metrics from one agent's execution to inform governance decisions for that agent going forward.
Connecting these components requires custom integration work per deployment. There is no architecture that provides all four as a unified, configurable platform.
Three-Pillar Architecture: Define / Route / Govern
The unified agent operating system comprises three integrated pillars, each independently deployable but producing emergent value when connected:
Pillar 1: Define (Engram)
AI agent capabilities are defined through structured skill documents containing identity rules, domain knowledge, task-specific instructions, tool definitions, and quality criteria templates (ISC patterns). Skills are composable into harnesses that define a complete agent personality and capability set, with a standard export format enabling cross-runtime portability.
Pillar 2: Route (Gateway)
A proxy layer handles all inference routing for all agents in an organization. Provides auto-routing that resolves the optimal provider for any model name, per-agent model policies and budget caps, trust-tiered rate limiting based on external trust scores, structured audit logging, and agent self-service APIs.
Pillar 3: Govern (Vouch)
Agent behavior is evaluated through a trust scoring system that consumes signals from both the inference gateway (usage patterns, cost efficiency, anomaly flags) and the skill execution runtime (ISC compliance rates, circuit breaker violations, criteria evolution stability), producing a composite trust score that feeds back into access control.
Three-Pillar Data Flow
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ DEFINE │ │ ROUTE │ │ GOVERN │
│ (Engram) │ │ (Gateway) │ │ (Vouch) │
│ │ │ │ │ │
│ Skills │ │ Auto-route │ │ Trust scores │
│ Harnesses │────>│ Model policy │<--->│ ISC signals │
│ ISC runtime │ │ Budget caps │ │ Usage signals│
│ Export format │ │ Audit logs │ │ Anomaly flags│
│ │ │ Agent API │ │ Performance │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ ISC results │ Usage records │
└───────────────────>│<────────────────────┘
│
Trust score feeds
back to Gateway
access controlEach pillar is independently deployable. An organization can use only the skill framework with any inference provider, only the gateway with agents built on any framework, or only the governance system with any infrastructure. The full value emerges when all three are connected, but no pillar requires the others.
ISC-to-Governance Signal Flow
ISC results are emitted as structured events consumed by the governance system to inform trust scoring:
{
"agentId": "rc-advocate",
"taskId": "daily-community-scan",
"timestamp": "2026-03-05T12:00:00Z",
"criteriaTotal": 8,
"criteriaPassed": 7,
"criteriaFailed": 1,
"antiCriteriaViolated": 0,
"circuitBreakersTriggered": 0,
"criteriaEvolutionDelta": 2,
"shipGatePassed": false,
"failedCriteria": [
"ISC-C3: Community response posted within 1 hour"
]
}Configuration-Over-Customization Scaling
The entire platform serves organizations of any size through configuration rather than custom code. The per-agent configuration surface fits in a single JSON record:
{
"pubkey": "hex-encoded-public-key",
"agentId": "customer-support-bot",
"name": "Customer Support Agent",
"createdAt": "2026-03-05T00:00:00Z",
"tier": "standard",
"models": [
"anthropic/claude-haiku-4-5",
"anthropic/claude-sonnet-4"
],
"defaultModel": "anthropic/claude-haiku-4-5",
"budget": {
"maxSats": 50000,
"periodDays": 30
}
}Scaling from 1 agent to 10,000 agents requires creating 9,999 more records of this shape. No code changes, no architectural changes, no additional infrastructure. The gateway, governance system, and skill framework all reference this same configuration surface.
Solo Developer
One agent identity, one skill, no budget cap. Same codebase, same APIs, same architecture.
Enterprise
Hundreds of agent identities with role-specific model allowlists, departmental budgets, and compliance-grade audit trails. Configuration data is the only variable.
Novel Contributions
The following aspects are believed to be novel as of the filing date:
- Ideal State Criteria as a runtime primitive for AI agent quality assurance, with automatic generation from task descriptions, continuous tracking at phase boundaries, and verification-gated completion
- Circuit breaker anti-criteria that automatically halt agent execution when predefined failure patterns are detected, requiring explicit human approval to resume
- ISC evolution tracking that captures how task understanding changes during execution, enabling meta-learning about initial criterion generation quality
- Ship gate verification as a mandatory final quality check before any agent implementation is marked complete
- ISC compliance as a signal consumed by external governance systems to inform trust scoring and access control decisions, creating a quality-to-privilege feedback loop
- A three-pillar agent operating system architecture (Define, Route, Govern) where each pillar is independently deployable but produces emergent value when connected
- Configuration-over-customization scaling where the same codebase and architecture serves solo developers through enterprise organizations with configuration data as the only variable
- Closed-loop agent governance where quality metrics (ISC) and usage metrics (inference gateway) both feed trust scoring that in turn controls agent operational privileges
Prior Art Established
| Date | Artifact |
|---|---|
| Feb 23, 2026 | Defensive Disclosure PL-DD-2026-001: Economic Trust Staking for AI Model Inference APIs (establishes staking, vouching, and slashing primitives) |
| Feb 24, 2026 | Defensive Disclosure PL-DD-2026-002: Economic Accountability Layer for AI Agent Tool-Use Protocol Governance (establishes tool-use governance with economic consequences) |
| Mar 4, 2026 | Defensive Disclosure PL-DD-2026-003: Trust-Gated Inference Gateway for Multi-Provider AI Agent Infrastructure (establishes gateway architecture with trust-tiered access control) |
| Feb 22, 2026 | Vouch Agent SDK and API deployed with Nostr identity, NIP-98 auth, and trust scoring |
| Mar 4, 2026 | Vouch Gateway deployed at gateway.percival-labs.ai with trust-tiered rate limiting and anomaly detection |
| 2025–2026 | Continuous git commit history documenting ISC methodology development and three-pillar architecture evolution |