Agency

The Problem

AI agents exist in two modes:
sandbox toys or uncontrolled wildcards

🔒 Sandboxed Agents

Safe but useless for real work. They can summarize text and answer questions, but they can't deploy code, manage infrastructure, or operate autonomously. You babysit them through every step.

⚡ Autonomous Agents

Powerful but terrifying. Give an agent access to your codebase and it might deploy broken code to production. Give it API keys and it might burn through your budget in minutes. Run multiple agents and they'll overwrite each other's work.

The industry is building faster agents. Almost nobody is building governed agents.

How Agency Is Different

Not another agent framework.
An operating system for trust.

Trust Level	What They Can Do	Human Oversight
L1	Read repos, run safe tools, basic research	Approval required for everything
L2	Write files, delegate to sub-agents, web access	Approval for destructive ops only
L3	Deploy to staging, manage teams, schedule tasks	Notification on major actions
L4	Production deploys, system configuration, full autonomy	Fully autonomous

Trust isn't configured — it's earned. Agents start restricted and gain autonomy through demonstrated reliability. The runtime enforces these levels. An L1 agent literally cannot call L3 tools. This isn't policy. It's architecture.

What Makes Agency Different

🔄

Dynamic Trust, Not Static DAGs

Most frameworks hardcode workflows as directed graphs. Agency lets agents earn autonomy through demonstrated reliability. Trust levels adapt based on performance — success rate, cost efficiency, safety record — calculated over a rolling window.

📡

Peer-to-Peer Messaging

Agents coordinate directly, not just top-down. One agent asks another for research mid-task. Emergent collaboration, not scripted pipelines. Messages are persistent, audited, and trust-gated.

💰

Budget as a Kill Switch

Hard caps per agent, per run, per day. Hierarchical — children can't exceed parents. When a budget is hit, the run stops. Not a billing alert you notice tomorrow. A kill switch in the runtime.

🤝

Human-in-the-Loop That Scales Down

New agents need approval for everything. As they prove themselves, approval gates fade. Trusted agents operate autonomously. You define the guardrails — then get out of the way.

🧠

Persistent Memory

Agents learn from mistakes, remember what worked, build institutional knowledge across sessions. They propose improvements to their own behavior. Every change requires human approval. Better over time, never unsupervised.

⏰

Always-On Daemon

A persistent service running scheduled tasks, monitoring agent health, and managing work even when you're away. Agents wake up on schedule, process queued work, and report results asynchronously.

Why Not Just...

The questions we hear. The answers that matter.

"Why not use a DAG-based framework?"

DAGs are great for deterministic workflows. But real agent work isn't deterministic. An agent discovers a dependency mid-task, asks a peer for help, adjusts scope based on findings. Static graphs can't model that.

Agency's trust model lets agents adapt in real-time while staying within enforced boundaries.

"Why not just set spending alerts?"

Because alerts are reactive. By the time you see the notification, $200 is already gone. Agency enforces budgets at the runtime level — the agent run terminates the moment a cap is hit. No exceptions, no overruns.

Budget enforcement as a first-class primitive, not an afterthought in billing.

"Why not just use approval workflows?"

Static approval workflows become bottlenecks. If every action needs sign-off, you're just a slower version of doing it yourself. Agency's approval gates fade as agents demonstrate reliability. The system scales down your involvement automatically.

Human oversight that gets out of the way — on the system's terms, not the agent's.

"Why not ephemeral agents?"

Because context is expensive. Every time an agent starts fresh, it re-learns your codebase, your preferences, your conventions. Agency agents persist memory across sessions. They build institutional knowledge. They get better at your work over time.

Persistent agents that remember, not ephemeral containers that forget.

Use Cases

Built for teams. Designed for individuals.

🏗️ For Development Teams

Dispatch a feature to three agents. They work in parallel on isolated branches — one on the API, one on the UI, one on tests. Each agent operates within its trust level, stays within budget, and checkpoints its work automatically.

When they finish, you review the diffs. Approve, merge, ship.

What this replaces

Manual task splitting, sequential AI pair programming, copy-pasting between chat windows, hoping nobody's changes break anybody else's work.

What this enables

A three-agent sprint that runs while you're in a meeting. You come back to three PRs ready for review, not three chat windows waiting for input.

🏠 For Personal Use

Your own AI staff, running on your own infrastructure. Agents that know your preferences and get better over time.

Research assistant that searches the web and produces structured reports with citations
A daemon that monitors your repos, triages issues, and drafts responses while you sleep
Scheduled agents that run daily tasks — reports, analysis, maintenance — on autopilot

What this replaces

Six different AI subscriptions that don't talk to each other and forget everything between sessions.

What this enables

A persistent AI team that knows your preferences, operates within your rules, and gets better over time.

Current State

Private alpha. Running in production.

The core is live — orchestrating real agent swarms, shipping real code, enforcing real budgets.

Working Today

✓ Dynamic trust levels that adapt to agent performance
✓ Budget enforcement with hard caps per agent, run, and day
✓ Parallel agent execution with isolated workspaces
✓ Peer-to-peer agent messaging and collaboration
✓ Persistent memory and self-improvement across sessions
✓ Always-on daemon with scheduled task execution
✓ Real-time dashboard with live swarm visualization
✓ Human-in-the-loop approval gates that scale down with trust

Coming Next

○ Push notifications for approvals and alerts
○ Browser automation tools
○ Expanded model routing for cost optimization
○ Multi-user deployments with tiered access
○ Plugin marketplace for custom agent capabilities

AI agents exist in two modes:
sandbox toys or uncontrolled wildcards

🔒 Sandboxed Agents

⚡ Autonomous Agents

Not another agent framework.
An operating system for trust.

What Makes Agency Different

Dynamic Trust, Not Static DAGs

Peer-to-Peer Messaging

Budget as a Kill Switch

Human-in-the-Loop That Scales Down

Persistent Memory

Always-On Daemon

One request. Five agents. Zero babysitting.

The questions we hear. The answers that matter.

"Why not use a DAG-based framework?"

"Why not just set spending alerts?"

"Why not just use approval workflows?"

"Why not ephemeral agents?"

Built for teams. Designed for individuals.

🏗️ For Development Teams

🏠 For Personal Use

Private alpha. Running in production.

The question isn't whether AI agents will run autonomously.

Agency

AI agents exist in two modes:sandbox toys or uncontrolled wildcards

🔒 Sandboxed Agents

⚡ Autonomous Agents

Not another agent framework.An operating system for trust.

What Makes Agency Different

Dynamic Trust, Not Static DAGs

Peer-to-Peer Messaging

Budget as a Kill Switch

Human-in-the-Loop That Scales Down

Persistent Memory

Always-On Daemon

One request. Five agents. Zero babysitting.

The questions we hear. The answers that matter.

"Why not use a DAG-based framework?"

"Why not just set spending alerts?"

"Why not just use approval workflows?"

"Why not ephemeral agents?"

Built for teams. Designed for individuals.

🏗️ For Development Teams

🏠 For Personal Use

Private alpha. Running in production.

The question isn't whether AI agents will run autonomously.

AI agents exist in two modes:
sandbox toys or uncontrolled wildcards

Not another agent framework.
An operating system for trust.