Agency: Architecture & Design

Trust-Based Governance for Autonomous AI Agents

An overview of Agency's architecture: dynamic trust levels, peer-to-peer agent coordination, budget enforcement, MCP-based tool provisioning, and multi-user sandboxing. Designed for engineers and technical decision-makers evaluating governed autonomy as infrastructure.

Overview

The Governance Gap

AI agents in 2026 exist on a spectrum with a conspicuous hole in the middle.

On one end: sandboxed toys. They summarize text, answer questions, and generate boilerplate. They're safe because they can't do anything real. You babysit them through every step.

On the other end: uncontrolled wildcards. Give an agent full access and it might deploy broken code to production, burn through your API budget in minutes, or clobber another agent's work. Run three agents in parallel and you've tripled your surface area for disaster with zero coordination guarantees.

The missing middle ground is governed autonomy — agents that can do real work (deploy code, manage infrastructure, coordinate with each other) while operating within enforceable safety boundaries. Not theoretical guardrails. Structural enforcement.

The industry is building faster agents. Almost nobody is building governed agents. Agency closes the gap.

Trust vs. Static Orchestration

Most multi-agent frameworks use static DAG-based orchestration: define a workflow graph, hard-code which agents run in which order, route outputs from step A to step B. This approach works for repeatable pipelines, but it breaks down when agents need to adapt, coordinate, or earn autonomy over time.

Static DAG Orchestration

  • Fixed workflow graphs
  • All agents have the same permissions
  • No memory between runs
  • Central dispatcher bottleneck
  • New agents require workflow changes

Dynamic Trust (Agency)

  • Agents plan and adapt at runtime
  • Permissions scale with demonstrated reliability
  • Persistent identity and memory
  • Peer-to-peer coordination
  • New agents start restricted, earn autonomy

The core insight: trust is a better primitive than workflow graphs. Instead of encoding every possible execution path upfront, define what each agent is allowed to do and let them figure out how to accomplish goals within those boundaries. The system becomes more capable over time as agents earn higher trust levels — without changing a single line of orchestration logic.

Dynamic Trust Levels

Every agent in Agency has an explicit trust level that determines what they can do. Trust is enforced at the runtime level — a restricted agent literally cannot access capabilities above its level. This isn't policy documentation. It's architecture.

The Four Levels

Level Role Capabilities Restrictions
L1 Individual Contributor Read files, use safe tools, basic research No agent spawning, no deploys, no scheduling
L2 Developer L1 + spawn L1 agents, write files, limited shell No deploys, no managing other agents' trust
L3 Manager L2 + spawn L2 agents, deploy to staging, schedule tasks, manage agent trust No production deploys, no system configuration
L4 Autonomous Full capability. All operations auto-approved but logged. None — but L4 promotion always requires human approval

Level boundaries are enforced structurally. When the runtime builds the tool set for an agent run, it filters based on trust level. Restricted agents don't get "permission denied" — the restricted tools simply don't exist in their environment. You can't call what you can't see.

How Trust Is Earned

Trust isn't manually assigned — it's earned through demonstrated reliability. Agency uses judgment-based performance reviews rather than formula-based scoring. A managing agent reads actual run logs, assesses quality, and makes promotion recommendations — the same way a human manager evaluates direct reports.

Why not formulas? A numeric weighting scheme can't distinguish between meaningful categories of failure:

  • Task was impossible vs. agent made a poor decision
  • Succeeded but wasted resources vs. succeeded efficiently
  • Failed due to bad assignment vs. failed due to agent error

These distinctions matter. A formula penalizes an agent for failures outside its control. A contextual review can read the actual logs, understand the situation, and make a judgment call.

The Promotion Cycle

  1. Data gathering — The system compiles a review packet: recent runs, tool usage patterns, escalation history, cost data.
  2. Bidirectional review — Each review includes an agent assessment (did they make good decisions?) and a management self-assessment (were tasks assigned well?).
  3. Sustained readiness — Promotion requires consistent performance across a full observation window. One good run isn't enough.
  4. Recommendation — The manager writes a promotion packet with specific evidence and reasoning.
  5. Human approval — The packet reaches the human operator for final sign-off.

Key asymmetry: promotion requires sustained evidence plus human approval. Demotion is instant. Trust is earned slowly and lost quickly.

The L4 Invariant

L3-to-L4 promotion always requires human approval. This is hard-coded, non-configurable, and non-negotiable. L4 means full autonomous operation — no system should be able to grant itself unlimited autonomy without a human explicitly signing off.

Even if lower-level promotions are eventually automated as trust in the system grows, L3→L4 never will be. This is Agency's most important design invariant.

Safety Mechanisms

Agency implements layered safety mechanisms. Each is independently enforceable — they compose to create defense in depth. None are advisory. The runtime blocks violations structurally, before they reach the language model.

1 Trust-Gated Tool Access

Every tool has a required trust level. The runtime filters the tool list before the agent sees it. Restricted agents don't receive "permission denied" errors — the tools don't exist in their environment.

This is a fundamental design choice: positive whitelisting (deny-by-default) rather than blocklisting. New tools are restricted by default. Agents can only use what's explicitly granted for their level. The surface area of risk scales with trust, not with the total number of available tools.

2 Budget as a Kill Switch

Budgets are hard caps, not guidelines. They serve as an automatic kill switch at three levels:

  • Per-run — Each agent run has a maximum cost. When reached, the run terminates immediately.
  • Hierarchical — A child agent can't exceed its parent's remaining budget. Costs roll up the entire tree.
  • Per-user — For multi-tenant deployments, daily and monthly caps with automatic termination.
Goal budget: $10.00 └── Manager (L3) — $2.10 spent, $7.90 remaining └── Agent A (L2) — $3.50 spent (can't exceed $7.90) │ └── Worker (L1) — $0.80 limit (can't exceed A's remaining) └── Agent B (L2) — $1.50 spent

Warnings fire at 80%. Hard stop at 100%. No exceptions, no "just one more API call." Budget enforcement runs pre-flight checks before every language model invocation. This makes runaway cost impossible by design — a misbehaving agent hits its ceiling and stops, regardless of what it wants to do next.

3 Structured Escalation

When an agent legitimately needs a capability outside its trust level, it doesn't hit a dead end — it gets a structured escalation path. The agent requests temporary access, specifying the exact tool, pattern, and reason. The request routes to a human or managing agent for approval.

Escalation requests have three scopes:

  • One-shot — single use, expires after the action completes
  • Session — valid for the current run only
  • Permanent — added to the agent's profile (requires higher approval threshold)

Escalation history feeds back into promotion reviews. High approval rates indicate good judgment about when to ask for help — evidence for promotion. High denial rates indicate poor judgment — evidence against. Frequently approved patterns are surfaced as candidates for permanent whitelisting, creating a self-improving permissions loop.

4 Approval Gates

Destructive operations pause execution and wait for human review. This isn't a confirmation dialog — it's a genuine gate in the execution pipeline. The agent's state is preserved, the run suspends, and the human receives a notification with full context to make an informed decision.

At higher trust levels, most operations auto-approve (but are logged). The audit trail is non-negotiable at every level — even fully autonomous agents produce a complete record of every action they take.

5 Workspace Isolation

Parallel agents work in isolated workspaces — separate filesystem sandboxes that never share files during execution. No merge conflicts during parallel work. No clobbered files. Changes integrate through standard version control flow — diffs reviewed, conflicts resolved deliberately.

Automatic checkpoints before every agent dispatch ensure a known-good state always exists. If an agent produces broken output, the system can revert to the checkpoint without affecting other agents' work.

6 Full Audit Trail

Every action is logged with full context: which agent, what tool, what parameters, what cost, what outcome. This isn't observability for debugging — it's the core safety primitive.

You can replay any decision any agent made, trace the full execution tree, and understand exactly why a particular action was taken. The audit trail is immutable from the agent's perspective — agents can read their own events but cannot modify or delete them.

Agent Architecture

Agents as Autonomous Entities

Agency treats agents as autonomous entities with persistent identities, accumulated knowledge, and defined specializations — not interchangeable API wrappers. Each agent has:

  • A personality profile defining expertise, communication style, and working patterns
  • Persistent memory accumulating knowledge across sessions
  • Performance history feeding into trust decisions
  • Peer relationships — agents message each other, coordinate on shared work, and build institutional knowledge

This isn't anthropomorphization for its own sake. It's a practical architecture decision. When a research agent sends findings to a frontend agent, that message is persistent, audited, and trust-gated. When an infrastructure agent learns a deployment failed due to a specific configuration issue, that knowledge persists across sessions. Context compounds over time, making each successive interaction more efficient.

Stateless Agent Pools

  • Generic workers, no specialization
  • Context rebuilt every run
  • No learning between sessions
  • Central orchestrator required

Persistent Agent Identities

  • Specialized roles with expertise
  • Memory carries across sessions
  • Trust earned through track record
  • Peer-to-peer coordination

Peer-to-Peer Messaging

Agents communicate through a persistent, audited messaging system. Unlike hub-and-spoke architectures where all communication routes through a central orchestrator, Agency agents can message each other directly.

Messages are:

  • Persistent — survive process restarts, available for later reference
  • Audited — logged with full context in the event trail
  • Trust-gated — messaging doesn't bypass trust level restrictions
  • Wake-capable — sending a message to an idle agent wakes it up to process the message

This enables real coordination without a bottleneck. A research agent can send findings directly to a developer agent. An infrastructure agent can alert the team about deployment status. A manager agent can coordinate without being a single point of failure.

Persistent Memory

Each agent maintains a structured memory store that accumulates learnings over time. This isn't a context window hack or prompt injection — it's a persistent knowledge base that the agent reads at the start of each session and writes to as it discovers new information.

Memory enables agents to avoid repeating mistakes, build on prior work, and develop genuine expertise in their domain. A frontend agent remembers the design system conventions. An infrastructure agent remembers which deployment configurations work. Knowledge compounds instead of resetting.

Agents can also propose improvements to their own configuration — suggesting changes to their personality, tool access, or working patterns. Every proposal requires human approval before taking effect. The system improves over time, but never without explicit sign-off.

Hierarchical Run Trees

Every goal creates a tree of agent runs — not a flat task list. Agents spawn sub-agents who spawn their own sub-agents, with trust boundaries enforced at each level:

Goal: "Refactor authentication to JWT" │ └── Manager (L3) — decomposes goal, delegates └── Research Agent (L2) — JWT analysis, recommendations └── Frontend Agent (L2) — token handling │ └── Worker (L1) — refresh logic └── [Manager reviews and integrates]

Costs roll up through the tree. Every node tracks its own cost and the total cost of its descendants. The root node gives you the total cost of accomplishing the goal. This is observable in real-time via the dashboard — you can watch agents coordinate, see budget consumption, and intervene at any point.

Recursive delegation with trust boundaries at every level means the system scales without increasing risk. An L2 agent can only spawn L1 workers. An L1 worker can't spawn anyone. The tree structure guarantees monotonically decreasing privilege.

MCP Tool Provisioning

Why MCP

Agency uses the Model Context Protocol (MCP) as its tool interface. MCP is an open standard for connecting AI models to external tools and data sources — think of it as a universal adapter between language models and the systems they interact with.

This is a deliberate choice to avoid vendor lock-in. Because tools are defined via an open protocol rather than a proprietary SDK, the same tool definitions work across different language model providers. Switching the underlying model doesn't require rewriting tool integrations.

Trust-Filtered Tool Sets

Agency extends MCP with trust-based filtering. Instead of giving every agent the same set of tools, the runtime dynamically builds a tool set based on the agent's trust level, role, and the specific context of the current task.

The filtering is structural, not advisory:

  • L1 agents see read-only tools: file reading, web search, research utilities
  • L2 agents see L1 tools plus: file writing, agent spawning (L1 only), limited shell access
  • L3 agents see L2 tools plus: staging deployment, task scheduling, trust management
  • L4 agents see everything, with a hard blocklist of permanently forbidden operations

Agent-specific tool customizations layer on top of the level-based defaults. A frontend specialist might get additional design tools. A research agent might get extended web access. The composition is: level defaults + agent-specific additions − global blocklist. The blocklist always wins.

Multi-Engine Execution

Because tools are defined via MCP (not hard-coded to a specific model API), Agency can decouple the agent from the language model. The same agent identity — with its personality, trust level, and memory — can run on different models depending on the task:

Agent (personality + trust level + memory) × Engine (any MCP-compatible LLM — proprietary, open-source, local) × Backend (local process, container, remote host, cloud)

This separation makes intelligent model routing possible. Simple tasks route to fast, inexpensive models. Complex agentic tasks route to high-capability models. The personality and trust constraints stay the same regardless of which model executes — the governance layer is independent of the inference layer.

Task TypeModel TierRelative Cost
File operations, triageFast model1x
Standard coding, synthesisBalanced model3x
Complex agentic work, orchestrationHigh-capability model5–10x
Bulk research, data processingOpen-source model0.5x

Cascade routing starts cheap and escalates on failure: begin a coding task on a balanced model, escalate to a high-capability model if it fails twice, log the outcome for future optimization. The system self-tunes based on observed performance.

Multi-User Sandboxing

Agency supports multiple users with tiered access, workspace isolation, and independent budget enforcement. The multi-user model is designed for teams and shared infrastructure where different users need different levels of access.

Tiered Access

Three user tiers serve three distinct use cases:

┌────────────────────────────────────────────────────────────┐ │ ADMIN │ │ Full system access. No budget caps. Manages users. │ │ Tools: All │ Max trust: unrestricted │ Shell: full │ ├────────────────────────────────────────────────────────────┤ │ POWER USER │ │ BYOK (bring your own keys). Sandboxed. No source access. │ │ Tools: All except admin │ Max trust: L2 │ Shell: sandbox │ ├────────────────────────────────────────────────────────────┤ │ FRIEND │ │ Budget-capped. Read-only tools. Personal assistant. │ │ Tools: Read, Search, Web │ No shell │ No spawning │ └────────────────────────────────────────────────────────────┘
CapabilityAdminPowerFriend
Read filesAnywhereWorkspace onlyWorkspace only
Write/EditAnywhereWorkspace onlyNo
Shell accessUnrestrictedSandboxedBlocked
Web searchYesYesYes
Spawn agentsYesYesNo
Admin controlsYesNoNo
BudgetExemptExempt (BYOK)Daily + monthly caps

Workspace Sandbox

Non-admin users are sandboxed into per-user directories. All file operations for sandboxed users resolve within their workspace — escape attempts are blocked at the runtime level. Each user gets:

  • A private workspace directory for file operations
  • Per-agent memory files that accumulate context about their projects and preferences
  • Isolated session history — one user's conversations are invisible to others

Design principle: agent personality is shared (consistent behavior across all users), but agent memory is per-user (private context history). The same agent identity, same personality, same expertise — but it remembers different things about different users' projects.

Per-User Budget Enforcement

Budget-capped users have hard daily and monthly spending limits. The system tracks usage with rolling windows and enforces at multiple checkpoints:

  • Warning at 80% utilization
  • Hard stop at 100% — the run terminates immediately
  • No exceptions — even mid-conversation, the budget enforcer will halt execution

Power users are exempt from budget caps because they bring their own API keys. Their usage costs them directly, not the system operator. This makes it economically viable to share AI infrastructure without absorbing everyone's compute costs.

Open Standards & Design Principles

Model Context Protocol

Agency is built on MCP, the open standard for connecting AI models to tools and data sources. This means:

  • No vendor lock-in — tool definitions work across language model providers
  • Ecosystem compatibility — any MCP-compatible tool server works with Agency out of the box
  • Standardized interfaces — tools, resources, and prompts follow a shared specification
  • Community contributions — third-party MCP servers extend Agency's capabilities without modifying the core

Using an open standard for tool provisioning means the governance layer (trust, budgets, safety) is independent of any specific model or tool implementation. New models and new tools integrate without changing the safety architecture.

Portable Agent Identity

Agent identities are defined in human-readable configuration files — not database rows or opaque binary formats. This means:

  • Version-controllable — agent configurations live in source control with full history
  • Human-readable — anyone can read and understand an agent's personality, constraints, and capabilities
  • Editable outside the system — update an agent's personality with any text editor
  • Transparent by default — no hidden configuration, no magic behavior

Transparency is a safety property. When you can read exactly what an agent's instructions, constraints, and accumulated knowledge are, you can audit the system without needing specialized tools.

Design Principles

For engineers evaluating the approach, Agency's design choices reflect a specific philosophy:

  1. Enforce, don't advise. Every safety constraint is structural. Agents can't bypass trust levels through clever prompting. The runtime blocks violations before they reach the language model.
  2. Earn, don't assign. Trust is dynamic and evidence-based. Agents start restricted and prove themselves through consistent performance. This mirrors how human organizations actually work.
  3. Compose, don't centralize. Peer-to-peer messaging, distributed planning, and recursive delegation avoid the single-orchestrator bottleneck that limits most multi-agent systems.
  4. Open, don't lock in. MCP for tools, human-readable files for identity, standard protocols for communication. The governance layer works regardless of which model or provider you choose.
  5. Budget as architecture. Cost limits aren't a billing feature — they're a safety mechanism. A budget cap is the most reliable kill switch for a runaway agent. No amount of clever prompting can override a hard spending limit.

Agency's bet: trust-based governance is the missing infrastructure layer for autonomous AI. Not as a constraint on capability, but as the thing that unlocks it. You can't give an agent production deploy access without trust enforcement. You can't run parallel agents without workspace isolation. You can't share your AI infrastructure without budget caps and sandboxing.

The question isn't whether AI agents will run autonomously. It's whether they'll do it safely.