Missing in 0023 was an application of the traced state monad. Here we play with a simple one: the left fold on lists, which in Haskell is defined by
foldl :: (b -> a -> b) -> b -> [a] -> b foldl f e  = e foldl f e (a:as) = foldl f (f e a) as
An important observation is that
foldl is really a stateful traversal of a colist, updating a state while deconstructing the colist — in Haskell we could write
foldlS :: MonadState b m => (b -> a -> b) -> [a] -> m () foldlS f  = return () foldlS f (a:as) = modify (flip f a) >> foldlS f as
foldl in terms of
foldl f e as = execState (foldlS f as) e
execState :: State s a -> s -> s, with
State s being an instance of
It is then no wonder that
foldl doesn’t produce anything for an infinite list, since the stateful computation never reaches the end.
But this is exactly the problem which the traced state monad can help to solve: we can mark all the intermediate states so that the computation is guaranteed to be productive even on infinite lists.
The resulting computation becomes the left scan:
scanlTS :: (MonadTrace m, MonadState b m) => (b -> a -> b) -> [a] -> m () scanlTS f  = return () scanlTS f (a:as) = mark >> modify (flip f a) >> scanlTS f as
where the type class
MonadTrace m extends
Monad m with the operation
mark :: m ().
(See Maciej and Jeremy’s MSFP ’12 paper for a full treatment in Haskell.)
Porting the left scan back to our Agda development in 0023, we need a definition of colists,
data ListF (A X : Type) : Type where  : ListF A X _∷_ : A → X → ListF A X record Colist (A : Type) : Type where coinductive field decon : ListF A (Colist A) open Colist
and a first attempt is
scanl₀ : (S → A → S) → Colist A → States S Unit scanl₀ f as with decon as ... |  = return tt ... | a ∷ as' = mark >> modify (flip f a) >> scanl₀ f as'
This definition doesn’t pass Agda’s productivity/termination check, however. By contrast, Agda would happily accept the manually expanded definition:
scanl₁ : (S → A → S) → Colist A → States S Unit force (scanl₁ f as s) with decon as ... |  = ret tt s ... | a ∷ as' = int s (scanl₁ f as' (f s a))
But this definition breaks abstraction: we are forced to abandon the traced state monad operations and work with the underlying representation. We should definitely look for an alternative.
But before we move on, let us take the opportunity to reconstruct what it takes to check the productivity of codata definitions, which is unified with termination checking when codata are defined by copattern matching.
We are more familiar with termination of functions defined on inductive data by pattern matching: such functions terminate when arguments of recursive calls are strictly smaller, i.e., have less constructors than the input patterns.
For example, a function
f on natural numbers terminates if its inductive clause looks like
f (suc n) = … f n …
If a recursive call is not syntactically directly performed on a smaller argument but on the result of another function applied to the argument, e.g.,
f (suc n) = … f (g n) …
then it could be the case that
n bigger (e.g., by re-applying
n), causing non-termination; less diligent termination checkers (like Agda’s) would simply refuse to accept dubious definitions like this.
In short, Agda’s termination checker needs to see syntactically that pattern size strictly decreases with recursive calls.
Turn this around and we get productivity checking as part of termination checking: the termination checker also needs to see syntactically that copattern size strictly decreases with corecursive calls.
Think of codata as black boxes that, upon deconstruction (which could be visualised as opening the box or pushing a button on the box), produce some results and possibly new black boxes, which can be further deconstructed.
Also think of applying a deconstructor as sending a deconstruction request to a black box, and defining codata by copattern matching as specifying how requests are handled.
scanl₁ f as s : Trace S Unit is like a black box which doesn’t move until a
force request comes in, at which point the black box (by the definition of
scanl₁) reacts by sending a
decon request to another black box
as : Colist A and deciding which
int) to produce depending on what
as produces in response to the
However a request is handled, it should be done in finite time, ending with just results and black boxes not handling requests, so that even though we may continue sending more requests indefinitely, we don’t have to wait indefinitely for the requests to be handled, and the whole process is under our control — this is the definition of productivity, which is just termination of deconstruction.
To ensure termination of deconstruction, requests/deconstructors should get strictly fewer in corecursive calls so that they are guaranteed to be handled and disappear eventually, which is analogous to the situation where constructors should get strictly fewer in recursive calls.
For example, we may abstract the definition of
force (scanl₁ f as s) = … scanl₁ f as' (f s a) …
There is no deconstructor outside the corecursive call on the right-hand side, which has fewer deconstructors than the left-hand side, so Agda deems the definition productive/terminating.
On the other hand, the definition of
scanl₀ may be abstracted as
scanl₀ f as = … h (scanl₀ f as') …
h standing for the monadic operations outside the corecursive call.
If taken as a computation rule, the definition would expand forever by itself, so it’s perfectly reasonable that the termination checker rejects the definition.
Even if the left-hand side were
force (scanl₀ f as), it would still be sensible to prohibit corecursive calls from being enclosed in arbitrary functions (like
h here), because those functions may apply deconstructors to the corecursive calls and cause non-termination — this is analogous to the reason for prohibiting recursive calls of the form
f (g n) from the discussion of inductive definitions.
Constructors outside corecursive calls (e.g., the
a ∷ as' case of
scanl₁) are fine, however, since it’s easy to see that they’re not deconstructors.
Back to our problem: can we somehow write a definition in the style of
scanl₀ that is accepted by Agda’s termination checker?
If we want to stay with copatterns rather than switch to a different formalism like guarded type theory, then it seems the only way out is to use Danielsson’s embedded language technique.
The idea is to define an embedded language whose (potentially infinite) programs consist of the operations that the definition uses, and rewrite the definition to produce, upon deconstruction, some program fragment of the language; Agda can then straightforwardly see that the new definition is productive, emitting program fragments upon deconstruction.
To actually compute what the original definition computes, we write an interpreter of the language, which amounts to a proof that the specific forms of program fragment emitted by the new definition do give rise to productive codata.
Concretely, for the left scan, the key to its productivity is the invocation of
mark before the corecursive call: forcing the left scan (applied to an initial state) should produce a
TraceStep, which has to be an
int for a corecursive call to be possible, which means that an intermediate state is needed as the first argument of
int, and the presence of
mark guarantees the existence of this intermediate state.
(By contrast, the left fold does not invoke
mark and indeed is not productive.)
The embedded language we use will be defined such that every program fragment containing a corecursive call also contains at least an invocation of
mark before the corecursive call, and then we will be able to write a productive interpreter for this language.
The actual definition of the language consists of two inner inductive types defining the forms of program fragment described above and an outer coinductive type allowing such program fragments to unfold indefinitely:
mutual record SynStates (S A : Type) : Type₁ where coinductive field force : Productive S A data Productive (S : Type) : Type → Type₁ where norec : States S A → Productive S A mark>>_ : Marked S A → Productive S A _>==_ : States S B → (B → Productive S A) → Productive S A data Marked (S : Type) : Type → Type₁ where corec : SynStates S A → Marked S A _>==_ : States S B → (B → Marked S A) → Marked S A
SynStates S A is a type of syntactic representations of
States S A — later we will define an interpreter of type
eval : SynStates S A → States S A
force deconstructor is applied to a
SynStates S A, the emitted program fragment has type
Productive S A, and can contain either no corecursive calls, in which case we can directly use any
States S A (wrapped in the
norec constructor), or a
mark followed by another program fragment of type
Marked S A (explained below); in the latter case, to allow
States operations to be performed before the invocation of
mark so as to make the language more general-purpose,
Productive S A also includes a bind operator/constructor
For a program fragment of type
Marked S A, since it is preceded by at least a
mark, it can freely make corecursive calls (wrapped in the
corec constructor) and perform arbitrary
States operations before the corecursive calls (using the
The left scan can then be syntactically expressed in this language,
synScanl : (S → A → S) → Colist A → SynStates S Unit force (synScanl f as) with decon as ... |  = norec (return tt) ... | a ∷ as' = mark>> modify (flip f a) >== λ _ → corec (synScanl f as')
and the actual left scan can be obtained by evaluating
scanl : (S → A → S) → Colist A → States S Unit scanl f as = eval (synScanl f as)
It remains to define
eval, which is one of several mutually recursive functions, each interpreting one layer of the definition of
mutual eval : SynStates S A → States S A eval sa = evalP (force sa) evalP : Productive S A → States S A force (evalP (norec ma) s) = force (ma s) force (evalP (mark>> ka) s) = int s (evalM ka s) force (evalP (mb >== f) s) = force (evalTP (mb s) f) evalTP : Trace S B → (B → Productive S A) → Trace S A force (evalTP tb f) with force tb ... | ret b s = force (evalP (f b) s) ... | int s tb' = int s (evalTP tb' f) evalM : Marked S A → States S A evalM (corec sa) = eval sa evalM (mb >== f) = λ s → evalTM (mb s) f evalTM : Trace S B → (B → Marked S A) → Trace S A force (evalTM tb f) with force tb ... | ret b s = force (evalM (f b) s) ... | int s tb' = int s (evalTM tb' f)
The termination argument for this set of definitions is slightly more involved: for example,
force (evalP (mb >== f) s) reduces to
force (evalTP (mb s) f), which still leaves the
force request unhandled; depending on the result of
force (evalTP tb f) may either loop back to
force (evalP (f b) s), where the
force request is still unhandled but the first argument
f b of
evalP is smaller than the original argument
mb >== f, or produce
int s (evalTP tb' f), where the
force request is handled and disappears.
I have to turn on the pragma
--termination-depth=2 for Agda to check more carefully and accept this set of definitions, probably because of the mutual recursion between
evalTP (and similarly between
Again it is interesting to make a comparison with the inductive setting, where the counterpart of Danielsson’s embedded language technique should be the Bove–Capretta method. If a function makes recursive calls whose arguments do not get syntactically smaller, Bove and Capretta tell us that we can build an accessibility relation relating the arguments before and after the recursive calls and do induction on the accessibility proof instead; separately from the definition of the function, we then prove that all possible inputs are accessible, that is, the argument changes are well-founded. The method is general but not scalable by itself because we have to craft an accessibility relation for every function, so it’ll be more useful if we can prepare a set of well-founded (accessibility) relations from which we can build the ones required by the Bove–Capretta method more easily. This is structurally the same as Danielsson’s technique: we first build a syntactic structure to satisfy the termination checker, and then separately prove that the decreasing of patterns/copatterns in the syntactic structure is well-founded. To make Danielsson’s technique more useful, we should not craft a new embedded language for every function, but should design a versatile language that provides enough operations for defining most functions or can be extended with operations as long as they satisfy certain properties.
Regarding versatility, the presented
SynStates language isn’t too satisfactory:
mark isn’t the only operation that guarantees productivity — we should allow any operation to guard corecursive calls as long as it necessarily emits an intermediate state, with
mark being a special case.
Marked should have allowed corecursive calls to appear anywhere — that is, we should have used this version:
data Marked' (S : Type) : Type → Type₁ where corec : SynStates S A → Marked' S A embed : States S A → Marked' S A _>==_ : Marked' S B → (B → Marked' S A) → Marked' S A
But for this version it seems much harder to write a productive interpreter acceptable to Agda.
Here is my speculation:
Corecursive functions can be regarded as corresponding to state machines, with the states being the arguments of the functions, and applying a deconstructor triggers a state transition.
Traversing a list-like structure corecursively (like
Marked) is easier, because when we are in the middle of the traversal, the state is still a list, whereas traversing a tree-like structure corecursively (like
Marked') requires a zipper-like state to record the rest of the tree to be traversed, which is of course more complicated.
Maybe this is one reason that we more often see lists rather than trees as examples of corecursive programming?
Agda file: TracedState.agda