Back to blog EN FR

Beyond Code: The Foundations of Structured Specification

Position

Code has become a by-product of the specification. Previously, code stood as the source of truth because it was executable: what wasn’t in the code didn’t exist. This made sense as long as writing code was the slow, deliberate step — a step that, by its very duration, forced the underlying intent to be clarified. With AI agents, this step becomes the fastest. The slow part is now deciding what you want, agreeing on it, and proving that the running system fulfills the intent after the automatic generation step. The specification becomes the source of truth; the code becomes a by-product against which correctness checks are performed.

This shift from code to specification is only viable under one condition: the specification must be structured to be manipulated by an AI agent and read by a human. Thousands of lines of unstructured textual content suit neither the human nor the agent. Our proposal is to structure this specification in text format following an explicit metamodel in the form of a Domain-Specific Language (DSL).

2. Three Layers to Cleanly Separate Levels of Abstraction

The specification is structured in three layers: a Product Requirements Document, a set of System Requirements, and a Domain Model. Each answers a distinct question, uses specific concepts, addresses a defined audience, and remains agnostic of the decisions made in the downstream layer.

The PRD (Product Requirement Document) carries the why and for whom. It describes personas, jobs-to-be-done, features, success metrics, hypotheses, risks. It speaks the product language and reads as a document meant for stakeholders. It does not say how the system will behave — it says why the system must exist, for whom, and the criteria for evaluating its success.

System Requirements carry what the system must do. They translate the intent from the PRD into testable requirements, written in EARS syntax — "[While …,] [When|If|Where …,] the system shall response." Each requirement is short, scoped to a single bounded context, carries an identifier, a justification, a MoSCoW priority (Must, Should, Could, Won’t), and a source line that points to the PRD feature that originated it.

The Domain Model carries the concepts that will be implemented to fulfill the requirement. DDD concepts are present but with a high level of formalization: bounded contexts, aggregates, entities, value types, commands, events, invariants, agreements, reactions. The Domain Model is independent of the technical stack — the same domain model can be realized with different architectural styles or technical stacks since those are implementation concerns. It nonetheless adheres to a precise structural model: design decisions are made there that the upstream System Requirements layer must not make.

These three layers cleanly separate responsibilities and provide the appropriate abstractions for each layer to fulfill its role. A single document that simultaneously contains personas, EARS requirements, and domain entities is bound to fail: it is too vague for engineers because product-related content dilutes precision, and too concrete for the product because implementation detail obscures the original intent.

In our RideNow example. The product decision “the passenger can cancel free of charge before a driver is assigned” exists simultaneously as a feature in a .prd file, as an EARS obligation in a .sysreq file, and as a command, event, and invariant in a .domain file. Three files that any agent can traverse, any human can read.

In RideNow. The product feature “Cancellations with fair fees and penalties” remained a paragraph in the PRD. It generates eight EARS requirements in the Ride Management context (REQ-RIDE-060 to REQ-RIDE-067), three more in Payment (pre-auth release, post-assignment fee calculation) and Driver Management (driver suspension for excessive cancellations). These obligations were never going to fit in the same paragraph as the feature. The domain, for its part, realizes them through a CancelRequestByRider command, a RideRequestCancelledByRider event, a noFeeBeforeAssignment invariant on the RideRequest aggregate, and a reaction that triggers the refund on the Payment side. The same intent, at three different granularities, in three vocabularies, without any loss of information.

The Domain Metamodel

A domain model is only exhaustive if it has three dimensions: structure, behavior, and properties. The first two are familiar in DDD; the third is what makes the model verifiable by construction, and it is in this dimension that most of the value of a metamodel designed to be read and written by AI agents resides.

Structure describes what exists: entities with their identity, value types with their constraints, aggregates with their root and their transactional invariants, bounded contexts with their boundaries and their context-map patterns. This is the static dimension of the model.

Behavior describes what happens in response to a stimulus (external, internal, or temporal):

  • operations on entities,
  • state machines that constrain lifecycle transitions,
  • domain services that orchestrate operations beyond the scope of a single aggregate.
  • commands (explicit modification intents),
  • queries (reading of current state),
  • internal events (past facts within a context),
  • external events (past facts observed from another context),
  • temporal events (past facts produced by the passage of time),
  • reactions (automated rules that trigger effects in response to events). This is the dynamic dimension of the model.

Properties describe what must be logically evaluable: invariants (synchronous state predicates within an aggregate), agreements (cross-aggregate predicates with an inconsistency window and reconciliation), preconditions and postconditions of operations (the contract of each operation), lifecycle constraints (transition guards, invariants scoped to a particular state), escalation chains (ordered sequences of compensations upon failure). This is the contractual dimension of the model — the one that answers the question “what must be true at any given moment?”.

These details, which were left implicit in the pre-AI era, can now be addressed from the start, and then used downstream by AI agents to generate and execute the verification code. This enables fully integrated verifiability from the outset. These properties are first-class citizens of the metamodel, with their predicate, their scope, and their enforcement strategy.

In RideNow. On the RideRequest aggregate of the Ride Management context, all three dimensions coexist: structure declares the RideRequest entity, its fields, its aggregate root; behavior declares the CancelRequestByRider command and the state machine with the unassigned → cancelled and assigned → cancelledAfterAssignment transitions, as well as the RideRequestCancelledByRider and RideRequestCancelledAfterAssignment events, and finally the reaction that triggers the refund on the Payment side; properties declare the noFeeBeforeAssignment invariant (with enforcement strategy by rejection), the cancellationAllowedFromCurrentState precondition on the command, and the postcondition that constrains the resulting fee based on the prior state.

Verifiability by Construction

Each property declared in the domain model is a predicate verifiable by traditional code, provided the associated synthetic data generator is available. Each Value Type comes with a value generator — for example, a value PhoneNumber { countryCode: Integer, number: #"[0-9]*"} — but also more complex composite value types, themselves composed of more primitive values. For instance, the value type FareEstimate {...} is composed of value types Amount, Distance, Duration, SurgeMultiplier, etc., each having its own generator; the FareEstimate generator accounts for the consistency of each attribute through procedural generation algorithms (for example: distance is defined first, then duration is derived from it, then amount is derived in turn, in order to preserve the consistency of attributes).

By verifiability by construction, we mean: every property declared in the model (invariant, agreement, precondition, postcondition, state constraint) carries a predicate and can therefore be verified with deterministic code that uses the synthetic data generators. This approach can be qualified as domain-driven property-based testing.

Another example: an invariant “the order total equals the sum of its lines” declared on the Order aggregate is executable by a generator that produces arbitrary sequences of orders with order lines (AddLine, RemoveLine, UpdateLine), and verifies the invariant predicate after each action.

To use the term established in the software industry for this kind of guide: a computational guide is a verification that executes deterministically with traditional code: a unit test, an assertion, a static check, a PBT test. The machine can say yes or no unambiguously. In our context, every property declared in the model is a candidate for computational verification: the precondition, the postcondition, the invariant, the agreement predicate, the transition guard. All these assertions can be executed against the observable state of the system, in test or at runtime, without human intervention.

But not every verification is deterministic. Some verifications require an agent acting as reviewer — for example, judging that a feature name in the PRD respects the ubiquitous language used in the domain. These verifications are inferential guides: an LLM reads the relevant fragments of specification or code, applies heuristics, and returns an argued verdict.

The combination of both types of guides covers the verification space: deterministic for what can be expressed as a predicate, LLM for what requires contextual judgment. It is this combination that makes the AI agent useful in the role of verifier.

The same declarations feed runtime observability. Invariants and agreements can be continuously evaluated against production event streams; drift between the model and production becomes observable.

In RideNow. The agreement “weeklyPayoutBalancesCaptures” declared in the Payment context states that the sum of captures for a given week must equal the sum of payments to drivers for that same week, with an explicit inconsistency window of six hours after Sunday 11:59 PM UTC. Eventually consistent is an auditable assertion, with a predicate (sum(captures) == sum(payouts)), a scope (the week), and a named time window. The reconciliation that maintains the agreement is declared alongside it; if it fails, the escalation chain takes over, with retry, alternative compensation, alert, suspension, manual intervention. The chain must terminate; the metamodel enforces this. None of the preceding assertions is prose — all are predicates the machine can execute.

Traceability and Ubiquitous Language

The three layers are linked together by relationships between concepts at the different abstraction levels. The traceability chain materializes through three simple markers, present in the specification files:

  • Each set of requirements declares prd-source "ride-now.prd" — a resolvable pointer to the upstream PRD.
  • Each EARS requirement carries a source "Feature: <name> — AC: <text>" line — a searchable reference to the exact PRD acceptance criterion it formalizes.
  • Each domain element that realizes a requirement declares satisfies [REQ-…] — a back-link, from implementation to obligation.

This chain is part of the file content, and an agent or human with filesystem access can traverse it easily. Two capabilities are enabled by this provenance chain:

  • Coverage: which must requirements have no domain element declaring satisfies? — these are uncovered requirements, and the query is a single grep on the domain files.
  • Impact analysis: if a PRD feature changes, which requirements name it in their source, and which domain elements satisfy those requirements?.

The verification of ubiquitous language consistency is also a consequence of this structuring. This ubiquitous language is traditionally maintained by human discipline and collective work. With a structured metamodel and three layers that share a common vocabulary, this discipline becomes automatable: an agent can traverse the .prd, .sysreq, and .domain files, list names that appear in only one layer, and alert when the PRD talks about “passenger” while the domain only knows a “Customer”. The ubiquitous language becomes a verifiable criterion.

Traceability and ubiquitous language produce another, more structural benefit: the decoupling of change cadences, and the traceability of modifications by level. An upstream layer, by definition, remains agnostic of modifications that occur in the downstream layer — a design decision in the domain does not rewrite the PRD. Conversely, a change in an upstream layer explicitly propagates down the chain and identifies the downstream elements to reconsider. The coupling is unidirectional and auditable, which is exactly the property a team wants in order to allow each layer to evolve at its own pace.

In RideNow. When we touched the cancellation policy during construction, we traversed the chain in seconds. Cancellations with fair fees and penalties → eight requirements in Ride Management → approximately fifteen domain elements spread across RideRequest, Ride, and the cancellation reactions → cross-context obligations on Payment (refund) and Driver Management (suspension threshold). No tool to install — the references were in the files. A ubiquitous language variation surfaced at the same time: the PRD spoke of “cancellation fee”, the domain of “cancellationCharge”. Two names for one thing. An alert that an agent could have raised during authoring.

From Verifiable Intent to Verified Result

The AI agent becomes useful in both directions of the chain: production and verification. Downstream, it produces the artifact from structured intent; upstream, it verifies the artifact against the intent.

Downstream (production), the agent reads the specification, identifies the requirements to satisfy, generates the code that realizes the relevant domain elements, declares the corresponding satisfies, respects the ubiquitous language, and honors the invariants. The structure gives it a framework in which it cannot drift without leaving a trace: every produced artifact is inscribed in the chain, and every omission becomes detectable by coverage queries.

Upstream (verification), the agent reads the produced artifact (whether it produced it itself or another agent did), traverses the satisfies lists to verify coverage, executes the computational guides (PBT from preconditions and postconditions, validation of invariants against generated sequences, verification of agreements against event traces), applies the inferential guides (judgment on respect of ubiquitous language, clarity of descriptions, consistency of modeling choices). The structure gives it what it needs to produce a verdict on the adequacy of the result with the intent.

This dual directionality is the practical consequence of everything that precedes. The separation into three layers gives the agent a clear working framework with the right separations and abstractions for each type of task. The metamodels of each layer (PRD, System Requirements, Domain Model) give it precise targets for producing and verifying. The traceability chain gives it an impact map. The ubiquitous language gives it the names to respect and audit.

AI amplifies in the direction you point it. Provide it with structured intent — with a clean separation of responsibilities, traced end-to-end, verifiable by construction, readable in a single shared vocabulary — and the amplification compounds: every artifact of code produced, and ultimately every aspect of the structure and behavior of our software system, is verifiable.