Skip links

Formalizing Cross-Domain State Preservation in Isabelle/HOL

Cleanroom corridor with lone instrument, representing state preservation model simplification

TL;DR

Translating our semi-formal specification into Isabelle/HOL code required three key design decisions. First, we simplified the partially synchronous network model to an atomic sync function, securing proof tractability at the cost of an explicit gap with implementation reality. Second, we designed a 4-tier locale hierarchy in State_Preservation.thy (state_machine → state_preservation → symmetric_state_preservation → multi_domain_preservation) so the generic theory extends naturally beyond the regulatory domain. Third, in Regulatory_Instance.thy we discovered that the core definitions of the global state model, particularly update_all_chains and sync, would determine the difficulty of every subsequent proof. This article documents the choices made during the transition from specification to code, and their rationale.


In the previous article, we fixed the semi-formal specification for Property 1. Five states, seven actions, twelve valid transitions, a partially synchronous network assumption, a preemptive locking mechanism, and the mathematical expression of cross-domain state preservation homomorphism. Now that specification must be converted into .thy code that Isabelle/HOL can actually verify.

This conversion demanded far more decisions than expected. Assumptions left implicit in natural language surfaced one by one before the theorem prover, and some of those decisions changed the meaning of the model itself. This article records that process.

The First Decision: Abandoning Partial Synchrony

In the semi-formal specification, we adopted the partially synchronous network model of Dwork, Lynch, and Stockmeyer [1]. An upper bound Δ on message delivery exists but is not known a priori. Messages are eventually delivered. Order is not guaranteed.

As soon as we began writing Isabelle/HOL code, the cost of faithfully formalizing this model became clear.

Representing partial synchrony faithfully requires modeling the nondeterminism of message delivery as a relational definition (inductive). We flagged this tension in the previous article; at the code-writing stage, the cost materialized. In an inductive-based model, the synchronization outcome is not unique. The same initial state can yield multiple possible final states depending on message arrival order, and the proof obligation transforms into “the property holds across all possible execution paths.” This raises proof complexity by at least one order of magnitude.

What we want to prove in Property 1 is the logical correctness of the synchronization mechanism itself. “Does the result remain the same when messages are delayed or reordered?” is an important question, but it is a separate question from “does the synchronization logic produce the correct state?”

Based on this distinction, we made the decision: define sync as a deterministic function that executes atomically.

The implications of this decision must be understood precisely. Within the model, sync completes instantly. Lock acquisition, state validation, all-chain update, and lock release are processed as a single atomic step. In the actual implementation, this atomicity will be realized through the preemptive locking mechanism with timeouts and automatic release. But the model abstracts away those implementation details.

This is an explicit trade-off, and we do not hide the gap.

Failed proof on laptop beside handwritten notes reworking state preservation sync as atomic

The first proof attempt failed against the partially synchronous model. The handwritten note captures the decision point: tractability won over realism, and sync became atomic.

Model assumption → Implementation reality → Gap mitigation

Atomic sync execution → network delays and partial failures possible → preemptive lock + timeout + automatic release. Instant lock acquisition → distributed lock contention → lock queue + timeout. Honest nodes → Byzantine nodes possible → D-quencer BFT consensus (Property 2).

Proving eventual consistency under partial synchrony, including message delays, retransmissions, and out-of-order delivery, remains a future task within Property 1’s scope. Defining the partially synchronous model in the semi-formal specification was not wasted effort. It marks the point the model must ultimately reach; the current model is the first verification checkpoint on that path.

The State Preservation Locale Hierarchy

The design of the generic theory file was directly inspired by Lochbihler’s Merkle Functor [2]. Lochbihler defined a generic interface called merkle_interface while verifying Canton’s authenticated data structures, then implemented Canton’s transaction trees as an instance of that interface. We follow the same structure: define a generic theory for state preservation, then implement the regulatory domain as an instance.

State_Preservation.thy is 383 lines, with four locales stacked hierarchically.

Locale 1: state_machine

The most fundamental abstraction. It defines finite state sets, finite action sets, a deterministic transition function, and terminal states.

locale state_machine =
  fixes states :: "'s set"
    and actions :: "'a set"
    and transition :: "'s ⇒ 'a ⇒ 's option"
    and terminal :: "'s set"
  assumes finite_states: "finite states"
    and finite_actions: "finite actions"
    and transition_closed: ...
    and terminal_stuck: ...
    and terminal_subset: ...

The key function is apply_actions, which sequentially applies a list of actions. Once a terminal state is reached, all subsequent transitions return None. This single locale suffices to verify whether the ERC-TRUST regulatory state transition model is correctly expressed. Single-chain, single-domain state machine properties belong here.

Locale 2: state_preservation

Defines the structure-preservation relationship between two state machines. It inherits two state_machine locales (source and target) and adds a state mapping (state_map), an action mapping (action_map), and a synchronization function (sync).

locale state_preservation =
  source: state_machine + target: state_machine +
  fixes state_map :: "'s1 ⇒ 's2"
    and action_map :: "'a1 ⇒ 'a2"
    and sync :: "'s1 ⇒ 's2"
  assumes naturality: ...

The core assumption is the naturality condition: the transition function and the synchronization function commute. “First transition, then sync” yields the same result as “first sync, then transition.” The concept comes from natural transformations in category theory.

The key theorem proved at this level is sequential_preservation. If naturality holds for a single action, it holds for action sequences of arbitrary length. The proof proceeds by induction on the action list. Failure (None) preservation is also proved separately (sequential_preservation_none).

Locale 3: symmetric_state_preservation

Defines bidirectional preservation. It adds state_map_inv, action_map_inv, and sync_inv to enable reverse mapping and synchronization. Roundtrip guarantee (sync_inv ∘ sync = id) and state_map_injective (no information loss) are the key properties.

Oraclizer’s state synchronization is inherently bidirectional. Regulatory actions originating on-chain are reflected off-chain, and off-chain events are reflected on-chain. Guaranteeing this bidirectionality at the locale level is the role of this tier.

Locale 4: multi_domain_preservation

Extends beyond a single pair (source-target) to an arbitrary set of domains.

locale multi_domain_preservation =
  fixes domains :: "'d set"
    and domain_state :: "'d ⇒ 'id ⇒ 's option"
    and sm_transition :: "'s ⇒ 'a ⇒ 's option"
    ...
    and sync_all :: "'id ⇒ 'a ⇒ 's ⇒
      ('d ⇒ 'id ⇒ 's option) ⇒ ('d ⇒ 'id ⇒ 's option)"
  assumes finite_domains: "finite domains"
    and all_domains_are_state_machines: ...

The key definition is connected_domains (the set of domains where a given asset exists), and the key properties are cross_domain_consistency (all connected domains agree after sync_all) and sync_isolation (other assets remain unchanged).

The design rationale for this 4-tier structure is separation of concerns. Single state machine properties, pairwise preservation, bidirectional preservation, and multi-domain preservation are each proved independently, with each higher locale inheriting all theorems from below. When instantiating the regulatory domain in Regulatory_Instance.thy, we can choose how far up the hierarchy to instantiate.

Locale Hierarchy of State_Preservation.thy Four locales stacking from basic state machine to multi-domain preservation, with key properties at each level STATE_PRESERVATION.THY — LOCALE HIERARCHY 383 lines · 4 locales · domain-independent theory L1 locale state_machine fixes: states, actions, transition, terminal assumes: finite_states, finite_actions, transition_closed, terminal_stuck → apply_actions :: ‘s ⇒ ‘a list ⇒ ‘s option KEY PROPERTIES • Terminal states absorb all actions • Action sequences compose via fold • Transition determinism (from fun def.) Scope: single chain, single domain extends L2 locale state_preservation inherits: source + target (state_machine) fixes: state_map, action_map, sync assumes: naturality (commuting diagram) → sequential_preservation (induction) KEY PROPERTIES • F_target ∘ transition = sync ∘ F_source • Sequential preservation (list induction) • Failure (None) preservation Scope: source → target pair extends L3 locale symmetric_state_preservation adds: state_map_inv, action_map_inv, sync_inv assumes: roundtrip (sync_inv ∘ sync = id) → state_map_injective (no info loss) KEY PROPERTIES • Bidirectional sync (on-chain ↔ off-chain) • Injectivity of state mapping Scope: bidirectional pair extends L4 locale multi_domain_preservation fixes: domains, domain_state, sync_all assumes: finite_domains, all_domains_are_state_machines → cross_domain_consistency, sync_isolation KEY PROPERTIES • All connected domains consistent after sync • Sync isolation (other assets unchanged) • Parametric over arbitrary domain count Scope: N domains (the full cross-chain model) Regulatory_Instance.thy (1053 lines) instantiates this hierarchy via interpretation
STATE_PRESERVATION.THY
383 lines · 4 locales · domain-independent
L1 state_machine
fixes: states, actions, transition, terminal
assumes: finite_states, finite_actions, transition_closed, terminal_stuck
Scope: single chain, single domain
↓ extends
L2 state_preservation
inherits: source + target (state_machine)
fixes: state_map, action_map, sync
assumes: naturality (commuting diagram)
Scope: source → target pair
↓ extends
L3 symmetric_state_preservation
adds: state_map_inv, action_map_inv, sync_inv
assumes: roundtrip (sync_inv ∘ sync = id)
Scope: bidirectional pair
↓ extends
L4 multi_domain_preservation
fixes: domains, domain_state, sync_all
assumes: finite_domains, all_domains_are_state_machines
cross_domain_consistency, sync_isolation
Scope: N domains (the full cross-chain model)
Regulatory_Instance.thy (1053 lines) instantiates this hierarchy

Figure 1: Locale Hierarchy and Theorem Propagation in State_Preservation.thy

Regulatory_Instance.thy: Formalizing the Regulatory Domain

The regulatory instance file is 1,053 lines, roughly 2.7 times the generic locale’s line count, because concrete definitions, auxiliary lemmas, and proofs all reside in this file.

Formalizing States and Actions

The semi-formal specification’s states and actions translate directly into Isabelle’s datatype.

datatype reg_state = ACTIVE | FROZEN | SEIZED | CONFISCATED | RESTRICTED
datatype reg_action = FREEZE | SEIZE | CONFISCATE | RESTRICT
                    | UNFREEZE | UNRESTRICT | RELEASE

The transition table is defined using the fun keyword. In Isabelle, fun defines total functions via pattern matching, and the definition itself guarantees determinism. Of the 35 (state, action) combinations, 12 return Some (valid transition) and the remaining 23 return None (invalid).

fun reg_transition :: "reg_state ⇒ reg_action ⇒ reg_state option"
where
  "reg_transition ACTIVE FREEZE = Some FROZEN"
| "reg_transition ACTIVE SEIZE = Some SEIZED"
| "reg_transition ACTIVE CONFISCATE = Some CONFISCATED"
| "reg_transition ACTIVE RESTRICT = Some RESTRICTED"
| "reg_transition FROZEN UNFREEZE = Some ACTIVE"
| "reg_transition FROZEN SEIZE = Some SEIZED"
| "reg_transition FROZEN CONFISCATE = Some CONFISCATED"
| "reg_transition SEIZED RELEASE = Some ACTIVE"
| "reg_transition SEIZED CONFISCATE = Some CONFISCATED"
| "reg_transition RESTRICTED UNRESTRICT = Some ACTIVE"
| "reg_transition RESTRICTED FREEZE = Some FROZEN"
| "reg_transition RESTRICTED CONFISCATE = Some CONFISCATED"
| "reg_transition _ _ = None"

The final wildcard pattern (_ _) handles all 23 invalid combinations in one line. The fun exhaustiveness checker automatically confirms no cases are missing.

From this definition, an interpretation creates an instance of the state_machine locale.

interpretation reg_sm: state_machine
  reg_states reg_actions reg_transition reg_terminal

This single line means every theorem proved in the state_machine locale now automatically applies to the regulatory state machine. The practical benefit of the locale hierarchy manifests here.

The Global State Model

Global state is defined using Isabelle’s record and type_synonym.

type_synonym asset_id = nat
type_synonym chain_id = nat

record asset_state =
  as_reg_state :: reg_state
  as_owner :: nat
  as_locked :: bool

type_synonym chain_state = "asset_id ⇒ asset_state option"

record global_state =
  gs_chains :: "chain_id ⇒ chain_state"
  gs_locks  :: "asset_id ⇒ chain_id option"

Several abstraction decisions are embedded here. Defining asset_id and chain_id as nat (natural numbers) is a deliberate simplification. In practice, asset_id is a composite identifier and chain_id is an EIP-155 chain ID, but all the model needs is distinguishable identifiers. nat provides equality comparison and finiteness for free.

Note that gs_locks has type asset_id ⇒ chain_id option. Some cid means the asset is locked by an action originating on chain cid; None means no lock. Recording the source chain at lock time provides information needed for debugging and rollback, and in proofs it enables tracing the cause of a lock.

The oss_queue (message queue) present in the semi-formal specification was removed from the final model. Under the atomic sync assumption, modeling a message queue introduces unnecessary complexity. Messages are defined as oss_message records but are passed directly as parameters to the sync function rather than stored in a queue.

The Core Function: Structure of sync

The sync function encapsulates the entire synchronization workflow in a single function: lock acquisition → transition validation → all-chain update → lock release.

definition sync ::
  "global_state ⇒ chain_id ⇒ reg_action ⇒ asset_id
   ⇒ nat ⇒ nat ⇒ global_state option"
where
  "sync gs src act aid auth ts = (
    case get_reg_state gs src aid of
      None ⇒ None
    | Some current_st ⇒
      (case reg_transition current_st act of
        None ⇒ None
      | Some new_st ⇒
        let gs_locked = acquire_lock gs aid src in
        let targets = connected_chains gs aid in
        let gs_updated = update_all_chains gs_locked aid new_st targets in
        Some (release_lock gs_updated aid)))"

The most important sub-function in this definition is update_all_chains. It propagates a given asset’s regulatory state to all connected chains. How this function is defined turns out to determine the difficulty of every subsequent proof, a story we explore in detail in the next article.

The connected_chains function returns the set of chains on which a given asset exists.

definition connected_chains ::
  "global_state ⇒ asset_id ⇒ chain_id set"
where
  "connected_chains gs aid =
    {cid. ∃ast. gs_chains gs cid aid = Some ast}"

This set is dynamic. It changes as assets are registered on new chains or burned. In proofs, consistency is guaranteed relative to connected_chains at the moment sync executes.

The Valid State Invariant

We must define what it means for a global state to be “valid.” This serves both as a proof precondition and as a proof conclusion (does sync preserve valid state?).

definition valid_state :: "global_state ⇒ bool"
where
  "valid_state gs ⟷ consistent_state gs ∧ no_locked_without_reason gs"

Separating the two conditions produced a cleaner proof structure. consistent_state means the regulatory state of a given asset agrees across all chains where it exists; this is the definition of cross-domain state preservation itself. no_locked_without_reason is lock integrity: an asset flagged as locked must have a corresponding entry in gs_locks, and vice versa.

Keeping these two conditions separate rather than merging them was intentional. In proofs, preservation of each condition can be established independently. A merged invariant forces unnecessary condition-splitting work during proof.

What the Modeling Process Revealed

Writing code surfaced things invisible at the semi-formal specification stage.

The interpretation collision problem. After registering the reg_sm: state_machine interpretation, attempting to instantiate the multi_domain_preservation locale triggered name collisions with the existing interpretation. The solution was to separate the state_machine predicate into a standalone lemma and reference it directly during instantiation. This was a point where our understanding of Isabelle’s locale system deepened.

Natural reflection of the SEIZED→FROZEN exclusion. The transition we excluded in the semi-formal specification on legal and proof-complexity grounds was automatically handled as None by the wildcard pattern in reg_transition. A design decision reflected in code without additional work is a small piece of evidence that the model is consistent with the specification.

The inductive invariant nature of valid_state. Initially, we thought of valid_state as a simple precondition. But proving that sync preserves valid_state turns it into an inductive invariant: if the initial state is valid, then validity holds after arbitrarily many synchronizations. This property (valid_state_preservation) is essential for long-term system stability and is independently important from the core theorem regulatory_homomorphism. This was something we did not anticipate at the specification stage.

From Semi-Formal Specification to Formal Model Five abstraction decisions with color-coded types: direct encoding, simplified, reduced, and discovered SEMI-FORMAL SPECIFICATION → ISABELLE/HOL MODEL Five abstraction decisions during formal model construction SPECIFICATION DECISION ISABELLE/HOL Partially synchronous network (DLS model) eventual delivery, unknown Δ SIMPLIFIED inductive (relational) → fun (total) proof complexity: ≥ 1 order of magnitude definition sync Atomic, deterministic lock → validate → update → unlock 5 states, 7 actions 12 valid transitions transition table + 3 invariants DIRECT ENCODING 1:1 mapping, no loss fun guarantees determinism (I3) datatype reg_state fun reg_transition 35 patterns, wildcard default GlobalState chains + oss_queue + locks message queue modeled REDUCED oss_queue removed entirely atomic sync → queue is redundant record global_state gs_chains + gs_locks only messages as function params Composite asset IDs EIP-155 chain IDs complex real-world formats SIMPLIFIED → nat (natural numbers) only need distinguishability type_synonym asset_id = nat free equality + finiteness “All chains consistent” (natural language) implicit precondition DISCOVERED Split into 2 separable conditions inductive invariant property emerges definition valid_state consistent_state ∧ no_locked_without_reason Direct Simplified Reduced Discovered (not in original spec)
SPEC → MODEL DECISIONS
Five abstraction decisions
SIMPLIFIED
Partially synchronous network
→ Atomic sync function (deterministic)
inductive → fun; ≥ 1 order of magnitude
DIRECT ENCODING
5 states, 7 actions, 12 transitions
datatype + fun
1:1 mapping, determinism from fun
REDUCED
GlobalState with oss_queue
global_state without queue
Atomic sync removes need for queue
SIMPLIFIED
Composite / EIP-155 IDs
nat
Only need distinguishability
DISCOVERED
“All chains consistent”
valid_state = 2 separable conditions
Inductive invariant emerges from formalization

Figure 2: From Semi-Formal Specification to Formal Model — Abstraction Decisions

The Proof Target in Formal Terms

The core theorem to be proved on this model is regulatory_homomorphism.

theorem regulatory_homomorphism:
  assumes "valid_state gs"
    and "get_reg_state gs src aid = Some current_st"
    and "reg_transition current_st act = Some new_st"
    and "sync gs src act aid auth ts = Some gs'"
    and "c ∈ connected_chains gs aid"
    and "asset_exists gs c aid"
    and "connected_chains gs aid ≠ {}"
  shows "get_reg_state gs' c aid = Some new_st"

In natural language: given a valid global state, if a valid regulatory action is applied to an asset on the source chain and synchronization succeeds, then on every chain where that asset exists, the regulatory state equals the new state (new_st).

Compared to the semi-formal expression, the structure is the same but everything is concrete. “Across all connected domains” becomes ∀c ∈ connected_chains gs aid. “Identical state” becomes get_reg_state gs' c aid = Some new_st.

Additional subsidiary theorems are needed: valid_state_preservation (sync preserves valid state) and reg_multi_domain_instantiation (instantiation of the generic locale for the regulatory domain). The proof process for these is the subject of the next article.

From Model to Proof

383 lines of generic theory and 1,053 lines of regulatory instance. A total of 1,436 lines of Isabelle/HOL code constitutes the state preservation model for Property 1. This model encodes every decision from the semi-formal specification into a machine-checkable form.

But a model is not a proof. The existence of definitions does not imply that theorems hold. In fact, the first proof attempt for regulatory_homomorphism after completing the model failed. And the cause of that failure lay in the model definitions themselves.

The next article documents the concrete obstacles encountered during the proof process: how the definition of update_all_chains affected proof tractability, how the interplay between model and proof worked in resolving those obstacles, and how Isabelle’s handling of existential quantifiers differed from what we expected.


Conclusion

The transition from semi-formal specification to formal model was not a “translation” but a redesign.

The largest decision was simplifying the partially synchronous network model to atomic sync. This secured proof tractability but created an explicit gap between the model and reality. That gap was documented rather than hidden, and it points toward the direction of future extensions.

The 4-tier locale hierarchy was inspired by Lochbihler’s Merkle Functor [2] pattern, but the concrete structure was freshly designed for our problem domain. The hierarchy from state_machine to multi_domain_preservation is not specific to the regulatory domain; it applies to any cross-domain synchronization scenario with finite states and deterministic transitions.

The most meaningful discovery during modeling was the inductive invariant nature of valid_state. “Starting from a valid state, validity is maintained after sync” is a mathematical guarantee of long-term system stability. This was not anticipated at the specification stage. The process of constructing a formal model deepens understanding of the design itself.

Open question: how much does the definition of update_all_chains affect proof tractability? The answer to that question could only be found by attempting the proof.


References

[1] Dwork, C., Lynch, N. & Stockmeyer, L. (1988). “Consensus in the Presence of Partial Synchrony.” Journal of the ACM, 35(2), 288-323. https://doi.org/10.1145/42282.42283

[2] Lochbihler, A. & Marić, O. (2020). “Authenticated Data Structures As Functors in Isabelle/HOL.” In: FMBC 2020. OASIcs vol. 84, 6:1-6:15. https://doi.org/10.4230/OASIcs.FMBC.2020.6

[3] Nipkow, T. & Klein, G. (2014). Concrete Semantics: With Isabelle/HOL. Springer. http://concrete-semantics.org/

[4] Nipkow, T., Paulson, L.C. & Wenzel, M.T. (2002). Isabelle/HOL: A Proof Assistant for Higher-Order Logic. LNCS vol. 2283, Springer. https://isabelle.in.tum.de/doc/tutorial.pdf

[5] Lochbihler, A. & Marić, O. (2020). Authenticated Data Structures As Functors. Archive of Formal Proofs. https://isa-afp.org/entries/ADS_Functor.html

Learn More

Read more on Oraclizer Research:

Defining Cross-Domain State Preservation: What Are We Proving? — Semi-formal specification of Property 1, state space design, and threat model definition

Higher-Order Logic as a Design Language: Why Our Verification Requires HOL — Isabelle/HOL selection rationale and HOL fundamentals

Why We’re Starting Formal Verification: Applying Our Own Standard — Pattern 3 self-application motivation and verification target introduction

Read Next

Insurance and Recovery Economics: Preparing for Black Swan Events
Earlier designs cut node risk by 73%, but the unpredictable 27% needs different rules. This study fixes how a staking insurance pool is sized (15% of stake, not protected value), bootstrapped, and banded; why a reserve held in its own token collapses with it; and how session protection follows the sync-degree hierarchy when security breaks mid-session.
Oraclizer Core ⋅ May 29, 2026
Tokenized Securities Under the CLARITY Act: The Weight of Codification
The CLARITY Act tokenized securities clause settles a single proposition in statute: tokenization is a delivery method, not a new asset class. That one sentence codifies the regulatory status of tokenized securities in U.S. law for the first time and derives an entire infrastructure specification for boundaries the token crosses.
Oraclizer Core ⋅ May 23, 2026
Sync Degree Hierarchy: Classifying What Assets Demand from State Synchronization
Sync degree hierarchy turns sync requirement strength into a four-level classification axis for RWA assets. S₀ static through S₃ atomic state binding form a reduction relation where causal consistency separates S₁ from S₂. Existing oracles, structurally two independent channels, are capped at S₁ by definition. Regulatory action forces S₃.
Oraclizer Core ⋅ May 20, 2026
Why RWA Isn’t Real DeFi Collateral Yet: The Non-Atomicity of the Collateral Layer
Tokenized RWA-backed stablecoin supply reached $8.5B, yet only 12% operates inside DeFi. Aave Horizon's dual structure separates rather than solves regulatory state synchronization. Three conditions from cross-border securities trading transfer into the DeFi collateral layer, with a fourth condition added when the protocol becomes a regulatory subject.
Oraclizer Core ⋅ May 14, 2026