Promise Graphs for Agent AI: A New Architecture for Multi-Agent Coordination

missed part of context graphs

February 01, 2026

When we design AI agent systems, we often jump straight to actions and decisions. But there’s a fundamental layer we’re missing — one that better reflects how autonomous systems actually coordinate and interact. This missing layer is the promise graph, built on foundations laid by Mark Burgess’s promise theory — a framework originally developed for distributed systems that has profound implications for how we architect agent AI.

The Problem with Command-Based Thinking

Traditional approaches to multi-agent systems inherit assumptions from centralized computing. We think in terms of commands, instructions, and control flows. Agent A tells Agent B what to do. A coordinator orchestrates a workflow. A master node delegates tasks to workers.

This command-oriented paradigm works in tightly controlled environments, but it breaks down as systems become more distributed, autonomous, and heterogeneous. When agents have their own goals, operate under different constraints, and can’t be directly controlled, the command model doesn’t just become inefficient — it becomes fundamentally misaligned with reality.

Even single-agent systems face this challenge. An AI assistant doesn’t operate in isolation — it interacts with tools, external APIs, databases, knowledge sources, and human users. Each of these interactions involves coordination without central control. The assistant can’t command an API to respond; it can only make requests and handle whatever comes back. The complexity multiplies exponentially when multiple autonomous agents interact.

Mark Burgess and the Foundation of Promise Theory

Mark Burgess, a computer scientist who spent years wrestling with configuration management and distributed systems, recognized this fundamental mismatch. In traditional system administration, you write scripts that command machines to be in certain states. But distributed systems don’t obey commands reliably — networks partition, machines fail, services become unavailable. The command model assumes control you don’t actually have.

Burgess developed promise theory as an alternative framework. Instead of thinking about what you command systems to do, you think about what each component promises to do for others. This isn’t just semantic wordplay — it’s a fundamental shift in how we model autonomous systems.

The Core Principles of Promise Theory

At the heart of promise theory are several key insights:

Agents Own Their Promises: No agent can make promises on behalf of another agent. Each agent is autonomous and responsible only for what it commits to do. This reflects the reality of distributed systems — you can’t force a remote service to behave a certain way; you can only work with what it promises to provide.

Promises Are Voluntary: Unlike obligations imposed from outside, promises are commitments an agent makes based on its own assessment of what it can deliver. An agent evaluates incoming requests (which are themselves promises to use a service) and decides whether to make a corresponding promise in return.

Bilateral Coordination: Effective coordination requires matching promises from both sides. One agent promises to provide a service; another promises to consume it appropriately. When these promises align — when they form what we might call a bilateral lock — cooperation emerges naturally without central orchestration.

Assessment Over Enforcement: Rather than enforcing compliance, promise theory focuses on assessment. Did an agent keep its promises? How well? Who observed this? The framework acknowledges that you can’t force distributed agents to behave, but you can observe, record, and reason about their promise-keeping.

Local Knowledge, Local Action: Each agent operates based on its own local knowledge and assessment. There’s no requirement for global consistency or perfect information. Agents make promises based on what they know, assess their own performance, and may receive assessments from witnesses — but they remain autonomous.

These principles emerged from managing distributed infrastructure, but they apply beautifully to AI agent systems. When multiple AI agents coordinate — or when a single agent interacts with tools and services — they’re engaging in promise-based cooperation whether we model it that way or not.

Beyond Actions: The Promise Layer

Traditional agent architectures focus on decision trees and action logs. But before an agent acts, before it even decides, it makes commitments. This is the insight we need to bring from promise theory into agent AI architecture.

When an agent receives a request, it doesn’t immediately act. First, it assesses: Can I fulfill this? Do I have the resources? Does this align with my goals and constraints? If the assessment is positive, it makes a promise. Only then does it move toward action.

This distinction matters because the promise layer captures intention and commitment separate from execution. An agent might promise to retrieve information, then fail to execute due to an API error. The promise was made and recorded; the action failed. This separation gives us richer causal information than logs that only show “action attempted, action failed.”

Think of promises as bilateral locks. One agent promises to perform work; another promises to consume the result appropriately. When both promises align and are satisfied, coordination happens naturally. Unlike command-based systems where you need central orchestration, promise-based systems are inherently decentralized — each agent assesses its own promises and the promises made to it.

Data Architecture: The Layers of a Promise Graph

A promise graph isn’t a single flat structure. It’s a multi-layered architecture where each layer captures different aspects of agent interaction. Before we can make decisions, we need data. Before we can trace decisions, we need to trace the data and connect it to epistemology — where did this information come from, how reliable is it, what promises were made about it?

Data Trace Layer: At the foundation, we have data signals flowing into an agent. These might be sensor readings, API responses, messages from other agents, or queries from users. Each data signal has provenance — where it came from, what promises were made about its quality or timeliness, what epistemological status it has. This layer connects raw information to the promises that govern its collection and delivery.

Promise Layer: Built on data traces, this layer contains the commitments agents make to each other. Critically, it includes not just successful promises but unfulfilled ones. An agent might assess incoming data and decide it cannot make a certain promise — this negative decision is itself valuable information for causal analysis. Why didn’t a certain outcome occur? Often because a necessary promise was never made.

The promise layer records bilateral commitments. Agent A promises to fetch data from an API. The API (or a wrapper agent) promises to respond within 200ms. Agent B promises to process whatever data A provides. These interlocking promises form a subgraph for each interaction.

Assessment Layer: Promises without assessment are just intentions. The assessment layer captures self-assessment by each agent — did I fulfill my promise? — plus witness information from other agents who observed the interaction. This creates an auditable record with multiple perspectives.

An agent might assess its own promise as fulfilled, but witnesses might note it was late or incomplete. These differing assessments are valuable data. In human systems, reputation emerges from the gap between what people promise and what others observe them delivering. The same dynamics apply to agent systems.

Intent and Action Layer: Here we distinguish between the promise (commitment) and the action (execution). A promise represents intent, but intent can be voluntary or forced. An agent might make a promise freely, or it might make a promise because other promises or obligations constrain it.

This layer records actual actions taken in service of promises, along with metadata about whether actions were voluntary, what constraints existed, and what alternatives were considered. Not all promises lead to actions — some are superseded by events, some are explicitly cancelled, some simply expire. Capturing these non-actions is essential for understanding agent behavior.

Decision Layer: The decision points where promises either convert to actions or remain unfulfilled. An agent might have promised to perform a task, but when execution time arrives, circumstances have changed. Does it proceed? Does it renegotiate? Does it break the promise?

These decision points are where epistemology, promises, and actions intersect. The agent reasons about what it knows (data traces), what it committed to (promises), what it’s capable of (self-assessment), and what it should do (decision). The decision layer makes this reasoning explicit and traceable.

Results Layer: The outcomes and deliverables produced by fulfilled promises, along with their consequences in the broader system. Results are assessed against the original promises — did we deliver what was promised? They’re also assessed in absolute terms — was the result valuable regardless of whether it matched the promise?

This distinction matters because agents can over-deliver or under-deliver relative to promises, and both cases provide learning signals. An agent that consistently over-delivers might be making promises too conservatively. One that under-delivers might be over-optimistic or facing systemic constraints.

The Causal Power of Promise Graphs

Traditional action logs tell you what happened. Promise graphs tell you what was supposed to happen, what actually happened, why, and what prevented other possibilities. This is the difference between recording history and enabling causal analysis.

Consider a multi-agent system trying to complete a research task. Agent A promises to search academic databases. Agent B promises to summarize findings. Agent C promises to synthesize a report. The task fails — but why?

An action log might show: “Agent A executed search. Agent B attempted summary, failed. Task incomplete.”

A promise graph reveals: “Agent A promised to search five databases within 60 seconds. Agent A self-assessed as having searched three databases in 75 seconds — partial fulfillment. Agent B promised to summarize results within 200 tokens. Agent B received incomplete results and was unable to make a summary promise given its constraints. Agent C never received a summary promise from Agent B, therefore never initiated its own work. Witness agents noted that database timeouts caused Agent A’s delays.”

The difference is explanatory depth. The promise graph captures the cascade of broken commitments, the constraints that prevented promises from being made or fulfilled, and the dependency structure that caused downstream failures.

Causal analysis becomes possible because we’re not just analyzing what happened — we’re analyzing the space of what could have happened and what prevented it. This is crucial for learning and optimization. You can’t improve a system by only studying its successes; you need to understand its failures, its bottlenecks, and its unrealized possibilities.

Applications: From Sessions to Reputation

Promise graphs aren’t just theoretical elegance — they solve real architectural challenges in agent systems.

Task Sessions: For a single interaction session, a promise graph captures the complete context of how agents coordinated to achieve a goal. It’s short-lived but information-rich, providing full traceability of who promised what, whether promises were kept, what data drove those promises, and what resulted.

Imagine an AI assistant helping you plan a trip. It promises to check flight prices, hotel availability, and weather forecasts. Each of these involves sub-promises to external APIs. The promise graph for this session shows the entire network of commitments, which ones were kept, which took longer than promised, and how all of this flowed toward the final recommendation.

When you ask “why did you recommend this hotel?” the assistant can trace back through the promise graph: “The weather API promised hourly forecasts but delivered daily forecasts, so I couldn’t assess rain probability for your specific arrival time. Given this constraint, I prioritized hotels with covered parking, which this option provides.” The reasoning is grounded in the actual promises and their fulfillment, not just the final decision.

Reputation Systems: When agents interact repeatedly across sessions, their promise graphs accumulate into long-term reputation data. Did this agent consistently fulfill its promises? How expensive were these interactions in terms of time or resources? What’s the pattern of promise-keeping over time?

This gives you a foundation for trust and reliability scoring that’s grounded in actual behavior, not just outcomes. An agent might achieve good outcomes by luck while breaking promises frequently. Another might have worse outcomes but reliably deliver on commitments. For building trustworthy agent ecosystems, the latter pattern is more valuable.

Traditional reputation systems often collapse complex interactions into scalar ratings. Promise graphs preserve the richness: this agent is excellent at fulfilling data retrieval promises within tight time constraints, but struggles with promises requiring synthesis of conflicting information. You can match agents to tasks based on their demonstrated promise-keeping patterns in relevant domains.

Action Logs and Auditability: Traditional action logs tell you what happened. Promise graphs tell you what was supposed to happen, what actually happened, and why. For regulated industries, compliance scenarios, or anywhere you need to explain agent behavior, this difference is crucial.

A financial AI agent makes a trade recommendation. A simple action log shows: “Agent analyzed market data, recommended buying stock X, user approved, trade executed.” This might satisfy minimal audit requirements, but it doesn’t explain the reasoning.

The promise graph shows: “Market data API promised real-time prices but delivered 15-minute delayed data — witnessed and assessed. Agent promised recommendations based on technical analysis of five indicators. Agent self-assessed as having analyzed three indicators completely and two partially due to data delays. Agent promised to flag high-confidence versus low-confidence recommendations. This recommendation was flagged low-confidence due to incomplete analysis. User acknowledged low-confidence flag before approving.”

For compliance and audit, the promise graph provides defensible explanations. For post-mortems after trading losses, it enables genuine causal analysis of what went wrong and where the system could be improved.

Learning and Optimization: The cognition loop becomes richer when you analyze promises alongside results. An agent can learn not just “this action produced this outcome” but “this promise pattern led to these assessment scores, which resulted in these outcomes, and here’s what prevented other outcomes.”

You’re optimizing at the level of commitments and coordination, not just individual actions. An agent might learn that promising 200ms response times leads to frequent promise-breaking and low witness assessments, while promising 500ms response times achieves better actual performance and higher trust scores. The learning signal comes from the relationship between promises, assessments, and outcomes — a three-way relationship that simple reinforcement learning on action-outcome pairs misses.

For multi-agent systems, promise graph analysis enables meta-learning about coordination patterns. Which promise structures lead to successful collaboration? When agents make conservative versus aggressive promises, how does this affect system-level performance? How do witness assessments correlate with actual outcomes, and when do they diverge?

Implementing Promise Graphs: Practical Considerations

Moving from theory to implementation requires thinking through data structures and storage strategies. A promise graph is inherently a multi-layered hypergraph where nodes can represent agents, promises, actions, assessments, and results, while edges represent relationships like “made by,” “witnessed by,” “led to,” and “caused.”

Each promise is itself a complex object containing bilateral commitments — what one agent promises to do and what another promises to do in response. This structure maps naturally to labeled property graphs where properties capture the rich metadata of each promise: time constraints, resource requirements, confidence levels, dependencies on other promises.

For storage, you’re balancing query needs against write performance. Sessions might be short-lived and query-intensive, requiring fast graph traversal. Long-term reputation data might be append-mostly, with occasional complex analytics queries. The multi-layer architecture suggests different storage strategies for different layers — hot session data in memory-backed graphs, historical assessment data in columnar storage for analytical queries.

The assessment layer poses interesting challenges because it involves multiple perspectives on the same promise. An agent’s self-assessment, witness assessments from other agents, and later analytical assessment might all coexist and sometimes conflict. Rather than trying to resolve these to a single truth, the promise graph preserves multiple perspectives, allowing reasoning systems to weigh different sources of evidence.

Why This Matters for Agent AI

As we build increasingly complex multi-agent systems — whether it’s AI assistants coordinating with external tools, autonomous agents negotiating with each other, or hybrid human-AI teams — we need better primitives for modeling interaction. Actions and decisions aren’t enough. They capture what happened but not the fabric of commitments and assessments that make coordination possible.

Promise graphs give us:

Causal analysis that captures what didn’t happen and why, not just what did. In complex systems, understanding unrealized possibilities is often more valuable than cataloging realized outcomes. Why didn’t the agent pursue strategy X? Because it couldn’t secure a necessary promise from a data provider. This kind of reasoning requires the promise layer.

Decentralized coordination without requiring central orchestration. As Mark Burgess recognized in distributed systems, you can’t rely on centralized control in genuinely autonomous systems. Promise-based architectures embrace this reality. Agents coordinate through bilateral promises, assessed locally, without needing a global coordinator or consistent worldview.

Auditable traces that connect data signals through promises to actions and results. The full stack from epistemology to outcome becomes traceable. Where did this data come from? What promises were made about it? How did those promises influence agent commitments? How were those commitments fulfilled? What resulted? This end-to-end traceability is essential for building trustworthy AI systems.

Reputation and trust mechanisms that emerge from tracked promise-keeping. Rather than bolting reputation systems onto agent architectures as an afterthought, promise graphs make reputation a natural byproduct of the architecture itself. Every interaction produces evidence about promise-keeping that accumulates into reputation data.

Rich learning signals about what coordination patterns work. Machine learning on promise graphs enables agents to learn not just task-specific skills but meta-skills about coordination: when to make ambitious versus conservative promises, how to assess other agents’ reliability, what promise patterns lead to successful collaboration.

Natural alignment with how autonomous systems actually work. Perhaps most fundamentally, promise theory and promise graphs align with the reality of autonomous agents. You can’t command an LLM-based agent to think a certain way — you can only prompt it and work with what it produces. You can’t force an API to respond — you can only request and handle responses. You can’t control other agents in a multi-agent system — you can only coordinate through commitments. Promise graphs model this reality directly rather than papering over it with command-and-control abstractions that don’t actually work.

The Path Forward

The promise graph is a relatively new concept in agent AI systems, but it builds on decades of distributed systems research and promise theory developed by Mark Burgess. As we push toward more sophisticated multi-agent systems, the limitations of action-centric and decision-centric architectures become increasingly apparent.

We need frameworks that embrace autonomy rather than fighting it, that make coordination explicit rather than hiding it, that preserve causal richness rather than collapsing it. Promise graphs provide such a framework.

For anyone building multi-agent systems, reputation frameworks, or trying to make agent behavior more explainable and traceable, promise graphs deserve serious consideration. They’re not just another graph structure — they’re a different way of thinking about how autonomous systems coordinate, learn, and build trust.

The shift from commanding to promising, from isolated decisions to bilateral coordination, from action logs to causally-rich interaction histories — this shift mirrors the broader evolution in how we understand distributed intelligence. As AI systems become more capable and more autonomous, the promise-based perspective becomes not just useful but essential.

Mark Burgess built promise theory to tame the chaos of distributed systems. We’re now facing similar chaos in multi-agent AI. The solution may well follow the same principles: embrace autonomy, make commitments explicit, assess rather than enforce, and build coordination from voluntary bilateral promises rather than top-down control.