TL;DR
- In mechanizing the functor laws, the obstacle that ate the most time was not mathematics but tooling. The three functor laws passed on the first attempt, and then a single name,
extract, broke seven consecutive builds. - The place I feared most, the monotone decrease of the inconsistency measure (
sync_reduces_inconsistency) that underwrites convergence from an arbitrary initial state, closed on the first try without the fallback plan. The time went somewhere else entirely. - The moment I tried to formalize the degree hierarchy as a functor tower, I hit a design-level wall. The finiteness that
state_machinedemands collided with the infinity of the global state, and that collision forced the natural transformation to be redefined as a standalone predicate. - A further honest gain: an assumption I had believed correct at the design stage turned out to be wrong in front of the code. The belief that the glue predicate guarantees validity did not hold, and the real source of validity lay elsewhere.
The earlier post, From a Separate Entry to a Functor Tower, corrected the design and recorded the lineage, and it deferred one thing to the next article. Whether the functor laws close without sorry, where the naturality square stalls, and how it is discharged: that record belongs to a separate piece. This is that record.
The least honest way to keep a research journal is to smooth the obstacles in hindsight and write them up as “this is how it was designed, and this is how it passed.” Real proof work does not proceed that way. My prior intuition about what would be hard and what would be easy was wrong again and again, and that wrongness is itself the most valuable thing this round produced. So half of this post is about the hypotheses I got wrong.
The place I feared and the place that actually stalled were not the same
The first thing worth recording is that the spot I had marked as the point of no retreat at the design stage gave way without resistance.
In the earlier prelude post I had written down the most uncertain point of this composition work in advance. Does the inconsistency measure really decrease strictly for every regulatory action? Could a release action propagated to only some chains first (UNFREEZE, RELEASE, and the like) temporarily increase the count of inconsistent pairs in an intermediate state? In case Isabelle demanded a counterexample there, I had even prepared to retreat to a weaker conclusion, eventuality (Plan B-1).
That spot, sync_reduces_inconsistency, closed on the first try. The key was that the atomic sync abstraction fixed in the prelude worked exactly as intended. A single sync drives the entire set of connected chains to one regulatory state. The inconsistent pairs of that asset are then wiped out while the pairs of other assets remain. Formally, the inconsistency set shrinks under the subset relation, and since one witness pair that existed before the equalization disappears, it becomes a proper subset, so over the finiteness assumption psubset_card_mono delivers the strict decrease of the count.
The “intermediate state propagated to only some chains” that the prelude worried about is, under the atomic sync abstraction, outside the model. Synchronization is reflected atomically across all connected chains or it does not happen at all. That abstraction choice is precisely what closed the worry. At the same time, it honestly reveals what was not proven. The intermediate state of partial propagation is out of this model’s scope, and lifting that assumption is left to subsequent properties.
The time went somewhere else. And those places were almost entirely tooling, not mathematics.
extract.Figure 1: Anticipated risk versus actual effort across the three proof targets
The functor laws passed; a single name, extract, broke seven builds
Start with the three functor laws. preservation_id, preservation_compose, and preservation_assoc all passed on the first proof attempt.
theorem preservation_id:
assumes "state_machine states actions transition terminal"
shows "state_preservation states actions transition terminal
states actions transition terminal id id"
That the identity morphism is a preservation morphism (F(id) = id) closed in a single by simp, since id_apply discharges naturality immediately. That the composite of preservation morphisms is a preservation morphism (F(g ∘ f) = F(g) ∘ F(f)) was a reuse of a pattern already verified in the earlier proof: take the two morphisms with interpret and stack the second morphism’s naturality square on top of the first’s. And associativity was honestly one line.
theorem preservation_assoc:
assumes "state_preservation Sₐ Aₐ Tₐ Fₐ S₋ A₋ T₋ F₋ f f'"
and "state_preservation S₋ A₋ T₋ F₋ Sₐ Aₐ Tₐ Fₐ g g'"
and "state_preservation Sₐ Aₐ Tₐ Fₐ Sₐ Aₐ Tₐ Fₐ k k'"
shows "((k ∘ g) ∘ f = k ∘ (g ∘ f)) ∧ ((k' ∘ g') ∘ f' = k' ∘ (g' ∘ f'))"
by (simp add: comp_assoc)
Because state_map and action_map are functions, associativity of morphism composition follows directly from associativity of function composition. The three morphism hypotheses only fix the objects across which the equation is read (the objects of the category); they are not used in the proof body. The cost is low, but the standing rises sharply. Only once associativity is nailed down is the category axiom set, identity plus composition plus associativity, complete, and only then does the “Functor” in the title become a theorem rather than rhetoric. The moment these three closed together is the code-side reality of the work I have been calling functor completion.
And then, precisely after that, the build began to break.
*** Outer syntax error: name expected, but end-of-input.
At command <malformed> (line 222)
The symptom was consistent. The build always broke at the header of the authenticated_state locale, before it ever reached the proof body. And I spent five cycles suspecting the wrong things.
The first suspect was the cartouche, the balance of Isabelle’s \<open>...\<close> brackets. A grep counted 67 against 67, perfectly balanced. Not it. Next I suspected the placement of the fixes and for clauses and switched to the ADS entry’s pattern (for ... +). Same failure. I suspected the mk: qualifier and removed it. Same failure. I suspected the locale import itself and fell back to a plain fixes/assumes form. Still it failed. At least that ruled out the import.
What finally came into view was that every version failed at exactly the same place. and extract :: ..., that line. “Name expected” after and meant that extract was not being accepted as a parameter name. Running grep '"extract"' Pure.thy gave the answer. extract is a reserved keyword in Isabelle (a thy_decl command). It was a word that could not be used as a parameter name.
The fix was almost absurdly simple. I renamed extract to extract_map. A single token containing an underscore is not a keyword, so it was safe. It passed in one build.
At command <malformed> (line 222) — always at the locale header
thy_decl keyword.→ renamed
extract_map · passed in one build.Figure 2: Four structural hypotheses, each rejected, converging on a single reserved word
locale authenticated_state =
mk: merkle_interface h bo m
for h :: "('a_m, 'a_h) hash" and bo and m +
fixes extract_map :: "'a_m ⇒ global_state option"
assumes extract_respects_merging: ...
The lesson from this episode is about the posture of debugging itself. For five cycles I suspected structure: cartouche balance, clause placement, qualifier, import. Each was a plausible hypothesis, each was mechanically checked, and each was rejected. The real cause was not structure but vocabulary. An error reading “name expected, end-of-input at a header” is, nine times out of ten, a sign that a fixes name is a keyword. From now on, before suspecting structure, suspect the single adjacent token first. Curiously, the same word inside inner syntax, \<^term>\<open>extract\<close>, parses safely as a free variable. The keyword clash happens only in outer syntax.
Looking back at the functor-law part, the proofs themselves took zero retries, and effectively the entire time went to a single name, extract. This is the texture of proof work that a finished theorem statement never shows.
If I write it honestly, when the fifth build collapsed at the same spot, I even started to suspect that I had fundamentally misunderstood something. The functor laws, the part I had braced for as the heaviest in this work, closed at a stroke, and what blocked the way right after was a single line of a locale header. The feeling on learning the cause was not relief but something closer to deflation, that what had swallowed half a day was not a mathematical difficulty but one word. And yet that deflation was the first instance of a lesson that would recur throughout this work. The machine does not care where I expect difficulty. It waves through the spots where I tense up bracing for the crux, and it trips me where I have relaxed. The difficulty of a proof and my prediction of it were, from the start, separate things.
Does it actually use the ADS lemmas, or merely import them?
In the earlier post I had guarded against a particular trap in advance. If you import Lochbihler and Marić’s ADS_Functor and then never actually use its lemmas, the worry of “isn’t this just the word functor?” comes back in the opposite direction as “isn’t this just an imported locale signature?” So I designed the ADS lemmas to be load-bearing in the proof body, that is, actual steps that carry weight.
This was not a pledge but a fact to be checked in the code. I set the premises of authenticated_state‘s two assumptions, extract_respects_merging and extract_under_blinding, in the ADS-native form, namely hash equality (h a = h b, mergeability). So when proving the two soundness theorems, mk.merge_respects_hashes (which derives h a = h b in soundness) and mk.hash (blinding preserves the hash) actually discharge the premises. And mk.join (the merge result is the least upper bound of the two inputs) derives bo a ab and bo b ab, adding the structural conclusion that the extracted states refine one another.
The step that derives h x = h y from mk.hash had to be split off as a separate lemma (bo_hash_eq) to stay clean. It is a small step that pulls a pointwise fact out of the ADS lemma bo ≤ vimage2p h h (=) with predicate2D and finishes with vimage2p_def, but if left inline, the automation hides the ADS lemma call inside itself, and the claim of being load-bearing stops being visible in the code. Splitting it out explicitly was the honest choice.
The most honest discovery: the glue does not give validity
One of the most valuable things this work produced was the code correcting something I had written down wrongly at the design stage.
The design said this: “if state_join holds and the two partial states are valid, then the merged state is valid.” It sounded plausible intuitively. Yet in the code this step did not hold.
The reason lies in the definition of validity. The validity of a global state (valid_state) consists of two pieces: that the regulatory state is consistent (consistent_state), and that any locked asset has a reason to be locked (no_locked_without_reason). But the glue predicates state_join and state_refines constrain only regulatory-state consistency (get_reg_state, connected_chains); they touch gs_locks not at all. So there is no way to derive the lock half of validity from the glue alone.
Tracing where valid_state actually comes from in the code made the answer clear. Validity comes directly from the locale assumption extract_preserves_validity, the extraction invariant that “every extracted state is valid.” The glue predicates provide not validity but structural coherence: that the merged state is the join of the two partial states, and that one partial state is a partial view of the other. And what holds up that structural coherence is the ADS join and hash lemmas.
This discovery had to be reflected directly in the prose. A sentence like “the glue predicate guarantees validity” contradicts the code. The honest statement is this: extraction maintains the validity invariant (extract_preserves_validity), and the ADS merge/blinding lemmas guarantee the cross-domain structural relation of the extracted states. The roles are separated, and writing that separation precisely is what makes the prose match the formal proof. Half the value of formal verification comes exactly from this, the machine catching what the designer wrongly believed.
I paused here for a moment. On paper, the one line “state_join implies valid” would have passed without any resistance. It sounds plausible, it matches intuition, and no one would have raised a hand to object on the spot. What told me the sentence was wrong was neither a colleague’s review nor my own re-reading, but a prover that stalled, unable to derive the lock half. Doing formal verification means, in the end, having my intuition questioned without pause, and I felt that again. It is an uncomfortable thing, but I like this discomfort. Far better that the build stops and tells me “that step does not hold” than that I publish carrying a false belief. If I had to name the single most valuable line in this work, it would not be a theorem that closed but a spot like this that did not close, and so forced me to fix my design.

The design said the glue gives validity; the crossed-out arrow is where the prover refused to derive the lock half.
The natural transformation did not simply sit on top of the functor
Moving on to the degree hierarchy, I hit a design-level wall before the build even started.
The plan looked simple. Place a preservation functor F_k at each degree k, and formalize degree demotion as a natural transformation between them, η_k : F_(k+1) ⇒ F_k. But the moment I tried to nail the degree tower onto the existing state_preservation instance, it died instantly. The reason is that the state_machine locale on which state_preservation stands requires finiteness of the state set (finite states).
The problem was that the global state is infinite. global_state is a function from chain identifiers to asset identifiers to asset states, and the identifiers are all natural numbers. It was no accident that the existing instances all held over the regulatory state (a finite set of five elements). The instant the degree tower is nailed onto a state_preservation instance over the global state, it collapses at the finiteness obligation.
There were two branches here: abandon the degree tower (split the relevant property off into separate work), or define the natural transformation differently. I chose the latter. The safe choice might have been to abandon it. But I had a hunch that this wall was a limit of the tooling, not of the mathematics. state_machine requires finiteness because that abstraction was built for finite-state machines, not because the property of the natural transformation I was trying to prove needs finiteness. All that is needed is the single fact that the naturality square commutes, and that holds even when the state space is infinite. Once I judged that the blockage was the vessel and not the essence, the direction settled on changing the vessel.
The solution was to define the natural transformation as a standalone predicate mirroring the naturality square of state_preservation, directly over the global state.
The Wall Was the Vessel, Not the Mathematics
Two readings of the same obstacle: what actually blocked it, and what changing the vessel bought.
Reading 1 · What blocked the proof
state_machine requires finite states, but global_state is infinite (identifiers are naturals). The vessel blocked it, not the math.Verdict: change the vessel, keep the mathematics. Redefine the natural transformation as a standalone predicate.
Reading 2 · What fixing the hub bought
forget▸step ≠ step▸forget
Suc k, never 0.
forget▸step = step▸forget
Figure 3: The finiteness wall, and the standalone predicate plus fixed hub that clear it
definition natural_transformation ::
"gobj ⇒ gobj ⇒ (global_state ⇒ global_state) ⇒ bool" where
"natural_transformation Fhi Flo η ⟷
(∀gs. gs ∈ obj_states Fhi ⟶ η gs ∈ obj_states Flo)
∧ (∀opn gs gs'. gs ∈ obj_states Fhi ⟶ obj_step Fhi opn gs = Some gs'
⟶ obj_step Flo opn (η gs) = Some (η gs'))
∧ (∀opn gs. gs ∈ obj_states Fhi ⟶ obj_step Fhi opn gs = None
⟶ obj_step Flo opn (η gs) = None)"
With the action map taken implicitly as the identity and the terminal clause dropped, this corresponds exactly to a natural transformation between Set-valued functors, that is, to the definition of a simulation/homomorphism between transition systems. I packed the functor into a record called gobj (a state set and a step function) and defined the natural transformation over it. This bypassed the finiteness wall while the intended statement form type-checked unchanged.
What matters is that this is not a weakening. The closure of natural-transformation composition (nt_vertical_compose) was proven by reusing the very same pattern as the functor law preservation_compose, stacking the lower transformation’s naturality square on the upper one’s. The fact that the whole degree ladder coheres, that the S₃ ⇒ S₂ ⇒ S₁ ⇒ S₀ ladder is not a loose collection of point-by-point forgetful maps but a single structurally coherent hierarchy, comes out of this composition closure.
theorem nt_compose:
"natural_transformation (F (Suc (Suc k))) (F k)
(degree_forget (Suc k) ∘ degree_forget (Suc (Suc k)))"
Fixing the hub made the partiality wall disappear
The heart of a natural transformation is that the naturality square commutes: demote-then-step must equal step-then-demote (broadcast_forget_commute). And there is a trap here that generally does not close well. When the transition is a partial function, the naturality of a demotion that “drops a chain” usually breaks, because if the dropped chain had an enabled step, the square goes out of alignment.
I avoided this by defining the degree model as a hub-and-spoke coupling width. Degree k is the coupling of chains 0 through k, and the hub is always chain 0. The demotion degree_forget (Suc k) empties chain (Suc k), but the hub (chain 0) is never emptied, because 0 ≠ Suc k. Since the condition enabling a transition is the hub’s regulatory state, the demotion does not touch that condition. I cleared the partiality wall with the single choice of fixing the hub.
In actually closing the proof, the tooling wall appeared again. While proving the naturality-square equation, the record’s select-of-update rewrite did not fire under a freshly created lambda binder. The same rewrite was fine outside the lambda, but inside the body of the λaid'. ... emitted by broadcast, simp did not revisit the interior.

Fixing chain 0 as the permanent hub is what made the partiality wall in the naturality square disappear.
The solution was a double ext. If you resolve the function equality with a single ext, the broadcast lambda on the right-hand side stays unexpanded. Fixing both variables c and aid' instead turns the goal into a fully applied cell-level equality.
proof (intro ext) fix c aid' consider (le) "c ≤ k" | (top) "c = Suc k" | (gt) "Suc k < c" by linarith ...
This way the broadcast lambda immediately beta-reduces at aid', the interior is exposed, and the cell-level rewrite fires. Splitting into three branches (c ≤ k, c = Suc k, Suc k < c) and feeding the arithmetic facts with linarith closes it.
The same trap recurred three times: antiquotations inside text
While closing the three files, the exact same kind of obstacle appeared three times. And all three were in the documentation, not the proofs.
The first was in the functor file. Just before the final theorem, the build failed with *** Undefined fact: "fair_leader" (line 883). I thought line 883 was a using fair_leader inside an interpretation proof, and on the hypothesis “doesn’t a sublocale proof context expose the locale’s own assumptions under plain names?” I replaced the sublocale with context ... begin plus interpretation. It failed the same way. Hypothesis rejected.
The real cause was that line 883 was not a proof but the @{thm [source] fair_leader} antiquotation in the text block just before it. fair_leader is a locale-local fact, so it is not resolved in theory-level text. The fix was to change the two antiquotations to plain cartouches, ‹fair_leader›.
I picked up one rule here. Even when the error line looks like a proof, suspect the text antiquotation on the same line first. And this rule hit twice more right away. In the degree-hierarchy file, over_provisioning_guarantees came up Undefined after every proof had passed, because a text block forward-referenced a theorem defined later. And when I first built the new files all the way to the PDF document, the underscore in the identifier ADS_Functor inside text prose triggered LaTeX math mode and stopped the build. Wrapping it in the typewriter form (\<^verbatim>\<open>ADS_Functor\<close>) fixed it.
The lesson of these three obstacles converges to one point. What stops you in formal verification is often not the mathematical content of a theorem but the surface of the tooling around the proof: keywords, the scope of antiquotations, LaTeX escaping. The experience where a two-character fix was the final build blocker when proving the consensus liveness property recurred here in a different guise. The difference is that this time I carried the rules picked up in earlier files into the later ones. That accumulation blocked the same traps from the second occurrence onward.
Where this has narrowed to so far
Laying out the obstacles cleared so far in one place, the terrain of this work was almost the reverse of what I expected. The place I feared, the strict decrease of the measure, gave way on the first try. The place I thought easy, a single line of a locale header, broke seven builds. The place I assumed would simply sit on top, a degree tower over the functor, hit the finiteness wall and forced the natural transformation to be redefined. The design I believed correct, the assumption that glue gives validity, turned out wrong in front of the code.
What I hold in hand right now is not a finished result but a set of rules picked up each time an obstacle was cleared. Look at the vocabulary before suspecting a keyword. Hold up the ADS lemmas by calling them, not by importing them. Glue is structure, not validity. Clear the partiality wall by fixing the hub. Even when an error line looks like a proof, look at the antiquotation on the same line first. These rules carried from earlier files into later ones and cut down the same traps from the second time on. The work is still walking toward the next obstacle, carrying those rules.
What this work teaches is simple. How often the intuition about what is easy and what is hard turns out wrong, the machine points out without an inch of give. So rather than saying “this is closed,” I record only the spots that stalled and the path that narrowed them. There is a separate place to lay out, in order, what was proven and what it means, and which assumptions still remain. Until then, one question remains. Will the obstacles I have marked as the next hard ones really stop me there, or will yet another entirely unexpected line block the way? That answer, too, will come only in front of the code.
The design these proofs close against was locked in earlier, in Compositional Verification: Design Locked In, and the broader system context lives in the Oraclizer documentation.
References
- Lochbihler, A., & Marić, O. (2020). Authenticated Data Structures as Functors in Isabelle/HOL. FMBC 2020, OASIcs vol. 84, pp. 6:1-6:16. Archive of Formal Proofs entry: ADS_Functor. doi:10.4230/OASIcs.FMBC.2020.6
- Abadi, M., & Lamport, L. (1995). Conjoining Specifications. ACM Transactions on Programming Languages and Systems, 17(3), 507-534.
- Dijkstra, E. W. (1974). Self-stabilizing Systems in Spite of Distributed Control. Communications of the ACM, 17(11), 643-644.
- Mac Lane, S. (1998). Categories for the Working Mathematician (2nd ed.). Graduate Texts in Mathematics, vol. 5. Springer.
- Nipkow, T., Paulson, L. C., & Wenzel, M. (2002). Isabelle/HOL: A Proof Assistant for Higher-Order Logic. Lecture Notes in Computer Science, vol. 2283. Springer.
- Kim, J. (2026). Safety and Liveness of Cross-Domain State Preservation under Byzantine Faults: A Mechanized Proof in Isabelle/HOL. arXiv:2604.03844.





