A while ago Shin dug up a mysterious program left by Richard Bird in his paper ‘Zippy tabulations of recursive functions’ at MPC 2008 (where my very first paper appeared!), which was about implementing a generic top-down specification of divide-and-conquer computation using a bottom-up algorithm. There’s a lot to talk about the problem and its treatment, but I’ll focus on only the parts that I reformulated using dependent types and string diagrams, and make notes of some possible future work. The story here is therefore deliberately incomplete, and we’re hoping to tell a better story with a full-length paper.
The generic divide-and-conquer problem aims to compute a solution from a list recursively from its immediate sub-lists — for example, given a (non-empty) list "abcd"
, we’d compute its four immediate sub-lists "abc"
, "abd"
, "acd"
, and "bcd"
, recursively solve the problem on these four sub-lists, and combine the four sub-solutions into a solution for the whole list using a given function g
; if the input list is a singleton, that’s a base case and we use another given function f
to compute a solution directly.
Here’s a simply typed program expressing this divide-and-conquer strategy (vectors should count as simple types now, right?):
td₀ : {A S : Set} (f : A → S) (g : List S → S)
{n : ℕ} → Vec A (suc n) → S
td₀ f g {zero } = f ∘ head
td₀ f g {suc n} = g ∘ map (td₀ f g {n}) ∘ choose₀ (2 + n) (1 + n) 2+n≥1+n
where 2+n≥1+n = ≤′-step ≤′-refl
The function choose₀
computes, in general, all the length-k
sub-lists of a length-n
list:
choose₀ : (n k : ℕ) → n ≥′ k → Vec A n → List (Vec A k)
choose₀ _ zero _ _ = [ [] ]
choose₀ (suc k) (suc k) ≤′-refl xs = [ xs ]
choose₀ (suc n) (suc k) (≤′-step n≥1+k) (x ∷ xs) =
choose₀ n (suc k) n≥1+k xs ++ map (x ∷_) (choose₀ n k n≥k xs)
where n≥k = ≤′-trans (≤′-step ≤′-refl) n≥1+k
Richard introduced a data structure called binomial trees (not to be confused with the kind of binomial tree used in binomial heaps), which Shin observed can be used to store the results of choose₀
while retaining the recursive computation structure by replacing the list-constructing functions [_]
and _++_
in the three cases with different tree constructors, as done in choose₁
below.
data BT₁ (A : Set) : ℕ → ℕ → Set where
tipL : A → BT₁ A n 0
tipR : A → BT₁ A (suc n) (suc n)
bin : BT₁ A n (suc k) → BT₁ A n k → BT₁ A (suc n) (suc k)
choose₁ : (n k : ℕ) → n ≥′ k → Vec A n → BT₁ (Vec A k) n k
choose₁ _ zero _ _ = tipL []
choose₁ (suc k) (suc k) ≤′-refl xs = tipR xs
choose₁ (suc n) (suc k) (≤′-step n≥1+k) (x ∷ xs) =
bin (choose₁ n (suc k) n≥1+k xs) (map (x ∷_) (choose₁ n k n≥k xs))
where n≥k = ≤′-trans (≤′-step ≤′-refl) n≥1+k
The map
function used in choose₁
is now the version for binomial trees:
map : (A → B) → BT₁ A n k → BT₁ B n k
map f (tipL x) = tipL (f x)
map f (tipR x) = tipR (f x)
map f (bin t u) = bin (map f t) (map f u)
As we introduce more dependent types, we can make our programs more precise.
With BT₁
, for example, we can now indicate (slightly) more precisely that g
takes sub-solutions for the immediate sub-lists.
td₁ : {A S : Set} (f : A → S) (g : {k : ℕ} → BT₁ S (2 + k) (1 + k) → S)
{n : ℕ} → Vec A (suc n) → S
td₁ f g {zero } = f ∘ head
td₁ f g {suc n} = g ∘ map (td₁ f g {n}) ∘ choose₁ (2 + n) (1 + n) 2+n≥1+n
where 2+n≥1+n = ≤′-step ≤′-refl
That’s a step towards the right direction, but is only a very small step.
If we really want to say precisely what g
does, we should index the solutions with the problems they solve by refining S
into a type family {k : ℕ} → Vec A (suc k) → Set
, such that S xs
means the type of solutions for the list xs
.
We should also refine binomial trees so that they can be used to store solutions with types that have the right indices:
data BT : (n k : ℕ) (P : Vec A k → Set) → Vec A n → Set where
tipL : P [] → BT n 0 P xs
tipR : P xs → BT (suc n) (suc n) P xs
bin : BT n (suc k) P xs
→ BT n k (λ ys → P (x ∷ ys)) xs
→ BT (suc n) (suc k) P (x ∷ xs)
Extensionally, BT n k P xs
means that P
holds for all the length-k
sub-lists of xs
, or more precisely, a proof of BT n k P xs
consists of a collection of proofs P ys
where ys
ranges over all the length-k
sub-lists of xs
.
Intensionally, this is like fusing the computation of choose
into BT
in a continuation-passing style at type level (or in logic programming if we think of indexed datatypes as logic programs) with P
being the continuation, such that BT n k P xs
stores P ys
for every length-k
sub-list ys
of xs
:
In the tipL
and tipR
cases, the results are []
and xs
, and we ‘return’ the results by ‘calling’ P
on []
and xs
, that is, storing values of types P []
and P xs
.
In the bin
case, the input list is x ∷ xs
, and we should return its length-(1 + k
) sub-lists.
The left sub-tree calls P
on all the length-(1 + k
) sub-lists of xs
(that do not include x
).
The right sub-tree calls λ ys → P (x ∷ ys)
on all the length-k
sub-lists of xs
, or equivalently, calls P
on all the length-(1 + k
) sub-lists x ∷ ys
(that include x
) where ys
ranges over all the length-k
sub-lists of xs
.
This is a very interesting datatype design pattern, which should be generalised and studied in more detail — see the remark near the All
datatype below.
With BT
, we can precisely specify the behaviour of g
by giving it the type
g : {k : ℕ} {xs : Vec A (2 + k)} → BT (2 + k) (1 + k) S xs → S xs
where the premise is a BT
-tree that contains solutions of type S ys
for all the immediate sub-lists ys
of xs
, and from the premise we can compute a solution of type S xs
.
We also have a map
function for BT
:
map : (∀ {xs} → P xs → Q xs) → ∀ {xs} → BT n k P xs → BT n k Q xs
map f (tipL p) = tipL (f p)
map f (tipR p) = tipR (f p)
map f (bin t u) = bin (map f t) (map f u)
The indexing starts to get overwhelming, but fortunately we can introduce categorical abstractions to manage and suppress the details.
Rather than working in the category of types and functions, we can switch to the categories of families of types and functions with a fixed type of indices.
Such a category is parametrised by an index type A : Set
and has objects of type A → Set
and morphisms between P
and Q : A → Set
of type {x : A} → P x → Q x
.
To facilitate working at the level of abstraction offered by this kind of category, we can define some notations:
_⇉_ : (A → Set) → (A → Set) → (A → Set)
(P ⇉ Q) x = P x → Q x
∀[_] : (A → Set) → Set
∀[ P ] = ∀ {x} → P x
With these, we can rewrite the type of g
as
g : {k : ℕ} → ∀[ BT (2 + k) (1 + k) S ⇉ S ]
and the type of map
for BT
as
map : ∀[ P ⇉ Q ] → ∀[ BT n k P ⇉ BT n k Q ]
To fully understand the types, we probably still need to expand the notational definitions, but the notations encourage us to read the types also at a more abstract level as if they were simple types.
For example, g
simply takes a BT
-tree of S
’s to an S
, and map
applies a function from P
to Q
to a BT
-tree of P
’s and returns a BT
-tree of Q
’s, parametrically in P
and Q
.
We can perform a similar translation on choose₁
, which, abstractly speaking, takes a list to a tree of lists.
What’s a list in the categories of indexed types and functions?
One natural answer is the All
datatype, which is a predicate on lists that holds when a given predicate P
holds for all the elements of a list:
data All {A : Set} (P : A → Set) : ∀ {n} → Vec A n → Set where
[] : All P []
_∷_ : P x → All P xs → All P (x ∷ xs)
Remark.
BT
generalises All
in the sense that BT _ 1 (P ∘ head)
is isomorphic to All P
.
Moreover, observe that both BT
and All
say that a predicate P
holds for all the results of a non-deterministic function, which in the case of BT
is choose
, and in the case of All
is the function that non-deterministically chooses one element (which is a specialised version of choose
).
This suggests that we could construct similar datatypes from non-deterministic functions (or relations, or logic programs) in general, although it’s not immediately clear how useful that construction will be.
(End of remark.)
We can then give choose
a richer type, which strictly generalises the type of choose₁
, while keeping the same function definition:
choose : (n k : ℕ) → n ≥′ k → ∀[ All P ⇉ BT n k (All P) ]
choose _ zero _ _ = tipL []
choose (suc k) (suc k) ≤′-refl ps = tipR ps
choose (suc n) (suc k) (≤′-step n≥1+k) (p ∷ ps) =
bin (choose n (suc k) n≥1+k ps) (map (p ∷_) (choose n k n≥k ps))
where n≥k = ≤′-trans (≤′-step ≤′-refl) n≥1+k
And this works for td
too:
td : {A : Set} {S : ∀ {k} → Vec A (suc k) → Set}
→ (g : ∀ {k} → ∀[ BT (2 + k) (1 + k) S ⇉ S ])
→ ∀ {n} → ∀⟨ Vec A (suc n) ⟩[ All (S ∘ [_]) ⇉ S ]
td g {zero } = head
td g {suc n} = g ∘ map (td g {n}) ∘ choose (2 + n) (1 + n) 2+n≥1+n
where 2+n≥1+n = ≤′-step ≤′-refl
where head
is defined by
head : ∀⟨ Vec A 1 ⟩[ All (S ∘ [_]) ⇉ S ]
head (s ∷ []) = s
with the help of a version of the ∀[_]
notation for specifying the index type:
∀⟨_⟩[_] : (A : Set) → (A → Set) → Set
∀⟨ A ⟩[ P ] = {x : A} → P x
Note that f
is omitted in this final definition of td
because the solutions for the singleton sub-lists of an input list xs : Vec A (suc n)
are now given in the list of type All (S ∘ [_]) xs
.
I think the lesson here is that the programs you write are probably more general than you think — and this should probably be one of the slogans for selling category theory to the (functional) programmer. In general it seems that we should be able to mechanise category switching along the line of Conal Elliott’s ‘Compiling to categories’, although I don’t know (yet) how straightforward it is when the target category is indexed types and functions, and when indexed datatypes are involved.
When we generalised choose
, what we have also secretly accomplished is to make it a natural transformation from All
to BT n k ∘ All
.
We have thus ascended to the even more abstract realm of the 2-category of categories, functors, and natural transformations, and can enjoy the convenience of reasoning with string diagrams.
(Cue the modulation to D major in the fourth movement of Mahler’s first symphony.)
As a (bottom-up) string diagram, the type of choose
is
At this level of abstraction, we view and manipulate layers of type structures such as BT n k
and All
as if they are tuples of elements — for example, here we simply think of choose
as turning an All
into a pair of BT n k
and All
.
Similarly, we can draw the type of g
as
S
should in fact be lifted to a constant functor const S
to fit into the diagrams (that is, the functor categories).
Now we can draw a specialised case td g {3}
as a pleasing diagram:
From the All
at the bottom we spawn three layers of BT
to split the problem recursively and reach the base case, where we can extract an S
and then consume the layers (that is, solve the sub-problems) using g
.
Note that we can only apply head
and start consuming after spawning the three layers — it might be tempting to apply head
to an earlier All
in the diagram, but that’s actually prohibited:
The elements we are working with are functors, which can be composed only when they have matching types.
These types —or more precisely speaking, the categories between which the functors operate on— correspond to the planes.
For example, in the diagram of choose n k
,
the bottom All
is a functor from the category of families with index type A
to the category of families with index type Vec A n
; since we are dealing with categories of families exclusively, we put only the index types on the right and left planes, which correspond to the two categories.
More interestingly, the top-right All
goes from index type A
to index type Vec A k
, while the top-left BT n k
goes from Vec A k
to Vec A n
, so the two functors can be composed, and the composed functor has the same type as the bottom All
, making choose n k
a type-correct natural transformation.
Similarly, the full diagram of head
is
where the right plane can have any index type (indicated by an Agda hole ‘?
’) because S
and S ∘ [_]
actually denote constant functors, and the left plane is restricted to the index type Vec A 1
.
Now we can check that the innermost choose 2 1
and head
in the diagram of td g {3}
are indeed well typed,
whereas if we replace choose 2 1
with choose 4 3
, for example, then the diagram becomes type-incorrect because the U-shaped plane cannot have both Vec A 3
(required by choose 4 3
) and Vec A 1
(required by head
) as its index type.
Now the bottom-up algorithm is not too far away.
The spawning part of the td g {3}
diagram is a right-leaning tree, spawning BT (suc k) k
structures for the invocations of g
but in an order such that we cannot start applying g
before finishing the spawning.
If we introduce a natural transformation
upgrade : n > k → ∀[ BT n k P ⇉ BT n (suc k) (BT (suc k) k P) ]
or diagrammatically
which satisfies this co-associativity-like equation
we can rewrite the spawning part of td g {3}
into a left-leaning tree (where the unTip
will be explained below):
And this is a specialised case of the bottom-up algorithm!
First we initialise by creating a BT 4 1 S
using choose 4 1
and head
— that is, solutions for all the length-1 sub-lists.
Then we use upgrade
to create a BT 4 2
tree that will later store length-2 solutions, but for now we are storing only BT 2 1 S
’s, which collect exactly those length-1 solutions that will be used to compute a length-2 solution.
(The definition of upgrade
is the one left unexplained in Richard’s paper; it’s somewhat complex and omitted here.)
Apply g
to all these BT 2 1 S
’s and we get solutions for all the length-2 sub-lists.
Going upwards iteratively, we get a length-4 solution wrapped in a BT 4 4
tree, which we can rip away using unTip : ∀[ BT n n P ⇉ P ]
that satisfies the co-unitality-like equation:
In general, the bottom-up algorithm has the same type as td
and is defined as follows:
bu : {A : Set} {S : ∀ {k} → Vec A (suc k) → Set}
→ (g : ∀ {k} → ∀[ BT (2 + k) (1 + k) S ⇉ S ])
→ ∀ {n} → ∀⟨ Vec A (suc n) ⟩[ All (S ∘ [_]) ⇉ S ]
bu g = unTip ∘ loop (≤⇒≤‴ z≤n) ∘ map head ∘ choose _ 1 (≤⇒≤′ (s≤s z≤n))
where
loop : n ≥‴ k → ∀[ BT (suc n) (suc k) S ⇉ BT (suc n) (suc n) S ]
loop ≤‴-refl = id
loop (≤‴-step n≥1+k) = loop n≥1+k ∘ map g ∘ upgrade (s≤s (≤‴⇒≤ n≥1+k))
At the abstraction level of type structure we don’t really see why we should use the bottom-up algorithm though — to see that, we have to go back to the more concrete level of computation.
The bottom-up computation is preferred to the top-down one for the standard reason: td
solves overlapping sub-problems multiple times, whereas bu
solves each sub-problem just once.
(On the other hand, td
is more suitable as a specification.)
The starting point of Shin and I was actually upgrade
: Shin wanted to understand Richard’s definition, and eventually came up with a version of the co-associativity-like equation as its specification, from which Richard’s definition can be calculated.
When contemplating how dependent types can help to derive upgrade
from the specification, the BT
datatype came to me unexpectedly as a way of encoding the computation of choose
at type level.
I believe that the use of BT
in the type of upgrade
characterises its behaviour completely, and we should be able to derive the co-associativity- and co-unitality-like equations from the type, possibly using parametricity.
While the reformulation looks neat, now I think some part of it over-generalises and doesn’t work nicely from an algorithmic point of view:
the generalisation from lists to All
is not necessary, and sort of forces the base cases to be singleton lists rather than the empty list, which I think is more natural (putting less restrictions on the lengths and eliminating most of the (1 +_)
’s and (2 +_)
’s in the indices).
Well, I think the lesson here is that category theory, string diagrams, etc are not magic bullets; rather, they are tools that are indeed powerful but should be used wisely like any other tools.
On the other hand, if we decide to do a more complex mental exercise anyway, category theory and string diagrams do help to manage the complexity effectively.
Agda file: BT.agda