Recently I reread Nils Anders Danielsson’s POPL 2008 paper Lightweight Semiformal Time Complexity Analysis for Purely Functional Data Structures, whose technique (implemented in Agda) has been adopted by a few groups of people like McCarthy et al (who develop a comparable library in Coq) and very recently Handley et al (who automate/mechanise the reasoning with Liquid Haskell and include relational cost analysis). To a first and very crude approximation, the idea is to insert ‘ticks’ into a program corresponding to the uses of the resources we want to track (numbers of reductions, recursive calls, space allocations, etc), and then prove that the number of ticks is bounded by a particular function of a certain aspect of the input (e.g., the length of an input list).
Running time is an intensional property of programs, so if we wanted to define a predicate that counts the ticks, the predicate would have to be defined on the structure of programs rather than the functions they compute; program execution also has to be somehow modelled and reasoned about explicitly. One way to avoid all this hassle is to modify the program a little and expose the ticks as a part of observable output: for example, rewriting the program into one that uses a writer monad to accumulate the ticks — this is what Handley et al (and presumably some others) do. We could then use whatever tools/techniques that are available for reasoning about program behaviour to prove bounds on the number of ticks — Liquid Haskell in Handley et al’s case.
Exposing the ticks at value level can be somewhat unsatisfactory though: if unoptimised, the ticks would remain at runtime, but we only want to determine the number of ticks statically.
What Danielsson does is expose the ticks at type level instead, through the use of a graded monad (called an annotated monad in the paper).
A graded monad (which I learned from Jeremy Gibbons) has an indexing structure that works like a (simplified version of) writer monad at type level: over a monoid $(A, (+), 0)$, the two operations of a graded monad M : A -> Set -> Set
have types
return : X -> M 0 X
_>>=_ : M a X -> (X -> M b Y) -> M (a + b) Y
and the usual monad laws type-check because of the monoid laws.
Danielsson calls his ticking monad Thunk
(graded over the monoid of natural numbers and addition), which features a ticking operation
√_ : Thunk n X -> Thunk (1 + n) X
(√_
is a low-precedence operator so its scope extends as far as possible.)
At runtime, Thunk
is just the identity monad, and the uses of Thunk
operations should be easy to optimise away.
Remark. One could instead define the ticking operation as
tick : Thunk 1 ⊤
but the two definitions are obviously interchangeable: √_ = tick >>_
and tick = √ return tt
. (End of remark.)
Danielsson’s main aim is to verify (amortised) bounds for operations of lazy data structures, which contain ‘thunks’ whose contents are suspended and only evaluated when required.
These thunks are explicitly represented as pieces of computation in the Thunk
monad.
For example, a lazy list is either empty or a pair of a head element and a suspended tail whose type is wrapped inside Thunk
.
Thunk
is indexed with running time (i.e., the number of ticks that will happen if a thunk is evaluated), and we can add a parameter (c
below) to the datatype of lazy lists specifying how much time is required to evaluate each thunk in a list:
data SeqL (A : Set) (c : ℕ) : ℕ -> Set where
[] : SeqL A c 0
_∷_ : A -> Thunk c (SeqL A c n) -> SeqL A c (1 + n)
To state the number of ticks in terms of the length of the input list, we also make SeqL
length-indexed.
Here Danielsson’s example is insertion sort:
sort : Seq A n -> Thunk (1 + 5 * n) (SeqL A (4 * n) n)
sort [] = √ return []
sort (x ∷ xs) = √ insert x =<< sort xs
where Seq
is the (usual) strict version of length-indexed lists (that does not contain Thunk
).
What the type of sort
says is rather fine-grained:
assuming the length of the input list xs
is n
, the first constructor of sort xs
can be computed in 1 + 5 * n
steps, and then each subsequent one takes 4 * n
steps to compute.
In the body of sort
there are two ticks, which record reductions from the left-hand side to the right-hand side of the equations (e.g., reducing sort []
to []
).
It’s easy to check that sort
is type-correct given the type of insert
insert : A -> SeqL A c n -> Thunk 4 (SeqL A (4 + c) (1 + n))
and assuming that the type-checker is aware of the necessary numeric properties like 1 + ((1 + 5 * n) + 4) = 1 + 5 * (n + 1)
.
(Agda’s type-checker isn’t though, and hence the code above is actually not complete.)
Danielsson then goes on to show that computing the minimum of the result of sort
takes only linear time (because only the first thunk needs to be evaluated).
The definition of insert
is a lot messier.
First we put ticks in places where reductions happen in the usual definition of insert
:
-- type-incorrect
insert : A -> SeqL A c n -> Thunk 4 (SeqL A (4 + c) (1 + n))
insert x [] = √ return (x ∷ return [])
insert x (y ∷ ys) = √ x ≤? y >>= λ b →
√ if b then return (x ∷ return (y ∷ ys))
else return (y ∷ (insert x =<< ys))
where _≤?_ : A -> A -> Thunk 1 Bool
is a comparison function on A
.
(Note that a tick is present at the beginning of the lambda expression to account for the beta-reduction.)
This doesn’t type-check because the type of insert
requires that the two clauses have the same running time, but the nil clause has only one tick whereas the cons clause has more; also, the if_then_else_
expression is given the type
if_then_else_ : Bool -> X -> X -> Thunk 1 X
which requires its two branches to have the same type, so there is a similar problem for the two branches. Therefore we have to put in various tick-adding functions into the definition to make all the numbers of ticks match — not just the top-level ticks, but also the ticks inside the thunks in the lists. The resulting code is
insert : A -> SeqL A c n -> Thunk 4 (SeqL A (4 + c) (1 + n))
insert x [] = √ returnW 2 (x ∷ returnW (3 + c) [])
insert x (y ∷ ys) = √ x ≤? y >>= λ b →
√ if b then x ∷ wait (2 + c) (waitL 2 (y ∷ ys))
else y ∷ (insert x =<< ys)
where the types of the tick-adding functions are
returnW : (n : ℕ) -> X -> Thunk (1 + n) X
wait : (n : ℕ) -> Thunk m X -> Thunk (1 + n + m) X
waitL : (c : ℕ) -> SeqL A c' n -> Thunk 1 (SeqL A (2 + c + c') n)
Comparison. Handley et al uses inequalities in the types, so that the programmer only needs to show that the different costs of the clauses/branches are all below a common bound, avoiding the tedious tick-adding above. (End of comparison.)
I feel that, to make this all work, it takes a lot of fiddling and going back and forth between programs and types (to figure out the right time function in the type and add the necessary tick-adding functions in the program), whereas the usual practice is to write a function and then estimate/infer its running time extrinsically. Or putting it as a question: apart from simplifying the formalism, is there any benefit putting running time information in the types and programs intrinsically? My current feeling is ‘not much’, unless we can find a methodology to help us design algorithms and data structures that meet given time bounds (about which I don’t have any idea).
The way it all fits together is quite neat though (and the writing is exceptionally crisp):
rather than developing dedicated formalisms, we can just reuse ordinary induction (intrinsically) to count ticks and verify solutions of recurrence relations, all from first principles; also, I regard the Thunk
monad primarily as a way to record ticks, but it also works nicely for dealing with lazy evaluation — how the dual role works out is still somewhat mysterious to me.
The other obvious question to ask is: why not switch to asymptotic analysis to avoid all the trouble with constants (and lower-order terms)? But a counter-question immediately pops out: what’s so important about asymptotic statements anyway? The complexity classes are important in theory, but in practice an algorithm, say, being in polynomial time doesn’t necessarily make it useful. So we have a dilemma here: we don’t usually care about constants too much, but we don’t want astronomically large constants either. More generally, what kind of running time (or, in general, resource) statements do we want in real life?