The trek goes on

0034

The power and joy of abstraction

Posted at 16:29 on 1 October 2023, and revised at 20:07 on 21 November 2023

A while ago Shin dug up a mysterious program left by Richard Bird in his paper ‘Zippy tabulations of recursive functions’ at MPC 2008 (where my very first paper appeared!), which was about implementing a generic top-down specification of divide-and-conquer computation using a bottom-up algorithm. There’s a lot to talk about the problem and its treatment, but I’ll focus on only the parts that I reformulated using dependent types and string diagrams, and make notes of some possible future work. The story here is therefore deliberately incomplete, and we’re hoping to tell a better story with a full-length paper.

The generic divide-and-conquer problem aims to compute a solution from a list recursively from its immediate sub-lists — for example, given a (non-empty) list "abcd", we’d compute its four immediate sub-lists "abc", "abd", "acd", and "bcd", recursively solve the problem on these four sub-lists, and combine the four sub-solutions into a solution for the whole list using a given function g; if the input list is a singleton, that’s a base case and we use another given function f to compute a solution directly. Here’s a simply typed program expressing this divide-and-conquer strategy (vectors should count as simple types now, right?):

td₀ : {A S : Set} (f : A → S) (g : List S → S)
      {n : ℕ} → Vec A (suc n) → S
td₀ f g {zero } = f ∘ head
td₀ f g {suc n} = g ∘ map (td₀ f g {n}) ∘ choose₀ (2 + n) (1 + n) 2+n≥1+n
  where 2+n≥1+n = ≤′-step ≤′-refl

The function choose₀ computes, in general, all the length-k sub-lists of a length-n list:

choose₀ : (n k : ℕ) → n ≥′ k → Vec A n → List (Vec A k)
choose₀ _        zero    _                   _   = [ [] ]
choose₀ (suc k) (suc k)  ≤′-refl             xs  = [ xs ]
choose₀ (suc n) (suc k) (≤′-step n≥1+k) (x ∷ xs) =
  choose₀ n (suc k) n≥1+k xs ++ map (x ∷_) (choose₀ n k n≥k xs)
  where n≥k = ≤′-trans (≤′-step ≤′-refl) n≥1+k

Richard introduced a data structure called binomial trees (not to be confused with the kind of binomial tree used in binomial heaps), which Shin observed can be used to store the results of choose₀ while retaining the recursive computation structure by replacing the list-constructing functions [_] and _++_ in the three cases with different tree constructors, as done in choose₁ below.

data BT₁ (A : Set) : ℕ → ℕ → Set where
  tipL : A                           → BT₁ A      n       0
  tipR : A                           → BT₁ A (suc n) (suc n)
  bin  : BT₁ A n (suc k) → BT₁ A n k → BT₁ A (suc n) (suc k)

choose₁ : (n k : ℕ) → n ≥′ k → Vec A n → BT₁ (Vec A k) n k
choose₁ _        zero    _                   _   = tipL []
choose₁ (suc k) (suc k)  ≤′-refl             xs  = tipR xs
choose₁ (suc n) (suc k) (≤′-step n≥1+k) (x ∷ xs) =
  bin (choose₁ n (suc k) n≥1+k xs) (map (x ∷_) (choose₁ n k n≥k xs))
  where n≥k = ≤′-trans (≤′-step ≤′-refl) n≥1+k

The map function used in choose₁ is now the version for binomial trees:

map : (A → B) → BT₁ A n k → BT₁ B n k
map f (tipL x)  = tipL (f x)
map f (tipR x)  = tipR (f x)
map f (bin t u) = bin (map f t) (map f u)

As we introduce more dependent types, we can make our programs more precise. With BT₁, for example, we can now indicate (slightly) more precisely that g takes sub-solutions for the immediate sub-lists.

td₁ : {A S : Set} (f : A → S) (g : {k : ℕ} → BT₁ S (2 + k) (1 + k) → S)
      {n : ℕ} → Vec A (suc n) → S
td₁ f g {zero } = f ∘ head
td₁ f g {suc n} = g ∘ map (td₁ f g {n}) ∘ choose₁ (2 + n) (1 + n) 2+n≥1+n
  where 2+n≥1+n = ≤′-step ≤′-refl

That’s a step towards the right direction, but is only a very small step. If we really want to say precisely what g does, we should index the solutions with the problems they solve by refining S into a type family {k : ℕ} → Vec A (suc k) → Set, such that S xs means the type of solutions for the list xs. We should also refine binomial trees so that they can be used to store solutions with types that have the right indices:

data BT : (n k : ℕ) (P : Vec A k → Set) → Vec A n → Set where

  tipL : P [] → BT      n       0  P xs
  tipR : P xs → BT (suc n) (suc n) P xs

  bin  : BT      n  (suc k)        P           xs
       → BT      n       k (λ ys → P (x ∷ ys)) xs
       → BT (suc n) (suc k)        P      (x ∷ xs)

Extensionally, BT n k P xs means that P holds for all the length-k sub-lists of xs, or more precisely, a proof of BT n k P xs consists of a collection of proofs P ys where ys ranges over all the length-k sub-lists of xs. Intensionally, this is like fusing the computation of choose into BT in a continuation-passing style at type level (or in logic programming if we think of indexed datatypes as logic programs) with P being the continuation, such that BT n k P xs stores P ys for every length-k sub-list ys of xs: In the tipL and tipR cases, the results are [] and xs, and we ‘return’ the results by ‘calling’ P on [] and xs, that is, storing values of types P [] and P xs. In the bin case, the input list is x ∷ xs, and we should return its length-(1 + k) sub-lists. The left sub-tree calls P on all the length-(1 + k) sub-lists of xs (that do not include x). The right sub-tree calls λ ys → P (x ∷ ys) on all the length-k sub-lists of xs, or equivalently, calls P on all the length-(1 + k) sub-lists x ∷ ys (that include x) where ys ranges over all the length-k sub-lists of xs. This is a very interesting datatype design pattern, which should be generalised and studied in more detail — see the remark near the All datatype below. With BT, we can precisely specify the behaviour of g by giving it the type

g : {k : ℕ} {xs : Vec A (2 + k)} → BT (2 + k) (1 + k) S xs → S xs

where the premise is a BT-tree that contains solutions of type S ys for all the immediate sub-lists ys of xs, and from the premise we can compute a solution of type S xs. We also have a map function for BT:

map : (∀ {xs} → P xs → Q xs) → ∀ {xs} → BT n k P xs → BT n k Q xs
map f (tipL p)  = tipL (f p)
map f (tipR p)  = tipR (f p)
map f (bin t u) = bin (map f t) (map f u)

The indexing starts to get overwhelming, but fortunately we can introduce categorical abstractions to manage and suppress the details. Rather than working in the category of types and functions, we can switch to the categories of families of types and functions with a fixed type of indices. Such a category is parametrised by an index type A : Set and has objects of type A → Set and morphisms between P and Q : A → Set of type {x : A} → P x → Q x. To facilitate working at the level of abstraction offered by this kind of category, we can define some notations:

_⇉_ : (A → Set) → (A → Set) → (A → Set)
(P ⇉ Q) x = P x → Q x

∀[_] : (A → Set) → Set
∀[ P ] = ∀ {x} → P x

With these, we can rewrite the type of g as

g : {k : ℕ} → ∀[ BT (2 + k) (1 + k) S ⇉ S ]

and the type of map for BT as

map : ∀[ P ⇉ Q ] → ∀[ BT n k P ⇉ BT n k Q ]

To fully understand the types, we probably still need to expand the notational definitions, but the notations encourage us to read the types also at a more abstract level as if they were simple types. For example, g simply takes a BT-tree of S’s to an S, and map applies a function from P to Q to a BT-tree of P’s and returns a BT-tree of Q’s, parametrically in P and Q.

We can perform a similar translation on choose₁, which, abstractly speaking, takes a list to a tree of lists. What’s a list in the categories of indexed types and functions? One natural answer is the All datatype, which is a predicate on lists that holds when a given predicate P holds for all the elements of a list:

data All {A : Set} (P : A → Set) : ∀ {n} → Vec A n → Set where
  []  :                  All P []
  _∷_ : P x → All P xs → All P (x ∷ xs)

Remark. BT generalises All in the sense that BT _ 1 (P ∘ head) is isomorphic to All P. Moreover, observe that both BT and All say that a predicate P holds for all the results of a non-deterministic function, which in the case of BT is choose, and in the case of All is the function that non-deterministically chooses one element (which is a specialised version of choose). This suggests that we could construct similar datatypes from non-deterministic functions (or relations, or logic programs) in general, although it’s not immediately clear how useful that construction will be. (End of remark.)

We can then give choose a richer type, which strictly generalises the type of choose₁, while keeping the same function definition:

choose : (n k : ℕ) → n ≥′ k → ∀[ All P ⇉ BT n k (All P) ]
choose _        zero    _                   _   = tipL []
choose (suc k) (suc k)  ≤′-refl             ps  = tipR ps
choose (suc n) (suc k) (≤′-step n≥1+k) (p ∷ ps) =
  bin (choose n (suc k) n≥1+k ps) (map (p ∷_) (choose n k n≥k ps))
  where n≥k = ≤′-trans (≤′-step ≤′-refl) n≥1+k

And this works for td too:

td : {A : Set} {S : ∀ {k} → Vec A (suc k) → Set}
   → (g : ∀ {k} → ∀[ BT (2 + k) (1 + k) S ⇉ S ])
   → ∀ {n} → ∀⟨ Vec A (suc n) ⟩[ All (S ∘ [_]) ⇉ S ]
td g {zero } = head
td g {suc n} = g ∘ map (td g {n}) ∘ choose (2 + n) (1 + n) 2+n≥1+n
  where 2+n≥1+n = ≤′-step ≤′-refl

where head is defined by

head : ∀⟨ Vec A 1 ⟩[ All (S ∘ [_]) ⇉ S ]
head (s ∷ []) = s

with the help of a version of the ∀[_] notation for specifying the index type:

∀⟨_⟩[_] : (A : Set) → (A → Set) → Set
∀⟨ A ⟩[ P ] = {x : A} → P x

Note that f is omitted in this final definition of td because the solutions for the singleton sub-lists of an input list xs : Vec A (suc n) are now given in the list of type All (S ∘ [_]) xs.

I think the lesson here is that the programs you write are probably more general than you think — and this should probably be one of the slogans for selling category theory to the (functional) programmer. In general it seems that we should be able to mechanise category switching along the line of Conal Elliott’s ‘Compiling to categories’, although I don’t know (yet) how straightforward it is when the target category is indexed types and functions, and when indexed datatypes are involved.

When we generalised choose, what we have also secretly accomplished is to make it a natural transformation from All to BT n k ∘ All. We have thus ascended to the even more abstract realm of the 2-category of categories, functors, and natural transformations, and can enjoy the convenience of reasoning with string diagrams. (Cue the modulation to D major in the fourth movement of Mahler’s first symphony.) As a (bottom-up) string diagram, the type of choose is

At this level of abstraction, we view and manipulate layers of type structures such as BT n k and All as if they are tuples of elements — for example, here we simply think of choose as turning an All into a pair of BT n k and All. Similarly, we can draw the type of g as

S should in fact be lifted to a constant functor const S to fit into the diagrams (that is, the functor categories). Now we can draw a specialised case td g {3} as a pleasing diagram:

Top-down specification as a diagram

From the All at the bottom we spawn three layers of BT to split the problem recursively and reach the base case, where we can extract an S and then consume the layers (that is, solve the sub-problems) using g. Note that we can only apply head and start consuming after spawning the three layers — it might be tempting to apply head to an earlier All in the diagram, but that’s actually prohibited: The elements we are working with are functors, which can be composed only when they have matching types. These types —or more precisely speaking, the categories between which the functors operate on— correspond to the planes. For example, in the diagram of choose n k,

the bottom All is a functor from the category of families with index type A to the category of families with index type Vec A n; since we are dealing with categories of families exclusively, we put only the index types on the right and left planes, which correspond to the two categories. More interestingly, the top-right All goes from index type A to index type Vec A k, while the top-left BT n k goes from Vec A k to Vec A n, so the two functors can be composed, and the composed functor has the same type as the bottom All, making choose n k a type-correct natural transformation. Similarly, the full diagram of head is

where the right plane can have any index type (indicated by an Agda hole ‘?’) because S and S ∘ [_] actually denote constant functors, and the left plane is restricted to the index type Vec A 1. Now we can check that the innermost choose 2 1 and head in the diagram of td g {3} are indeed well typed,

whereas if we replace choose 2 1 with choose 4 3, for example, then the diagram becomes type-incorrect because the U-shaped plane cannot have both Vec A 3 (required by choose 4 3) and Vec A 1 (required by head) as its index type.

Now the bottom-up algorithm is not too far away. The spawning part of the td g {3} diagram is a right-leaning tree, spawning BT (suc k) k structures for the invocations of g but in an order such that we cannot start applying g before finishing the spawning. If we introduce a natural transformation

upgrade : n > k → ∀[ BT n k P ⇉ BT n (suc k) (BT (suc k) k P) ]

or diagrammatically

which satisfies this co-associativity-like equation

Co-associativity-like equation

we can rewrite the spawning part of td g {3} into a left-leaning tree (where the unTip will be explained below):

And this is a specialised case of the bottom-up algorithm! First we initialise by creating a BT 4 1 S using choose 4 1 and head — that is, solutions for all the length-1 sub-lists. Then we use upgrade to create a BT 4 2 tree that will later store length-2 solutions, but for now we are storing only BT 2 1 S’s, which collect exactly those length-1 solutions that will be used to compute a length-2 solution. (The definition of upgrade is the one left unexplained in Richard’s paper; it’s somewhat complex and omitted here.) Apply g to all these BT 2 1 S’s and we get solutions for all the length-2 sub-lists. Going upwards iteratively, we get a length-4 solution wrapped in a BT 4 4 tree, which we can rip away using unTip : ∀[ BT n n P ⇉ P ] that satisfies the co-unitality-like equation:

Co-unitality-like equation

In general, the bottom-up algorithm has the same type as td and is defined as follows:

bu : {A : Set} {S : ∀ {k} → Vec A (suc k) → Set}
   → (g : ∀ {k} → ∀[ BT (2 + k) (1 + k) S ⇉ S ])
   → ∀ {n} → ∀⟨ Vec A (suc n) ⟩[ All (S ∘ [_]) ⇉ S ]
bu g = unTip ∘ loop (≤⇒≤‴ z≤n) ∘ map head ∘ choose _ 1 (≤⇒≤′ (s≤s z≤n))
  where
    loop : n ≥‴ k → ∀[ BT (suc n) (suc k) S ⇉ BT (suc n) (suc n) S ]
    loop  ≤‴-refl        = id
    loop (≤‴-step n≥1+k) = loop n≥1+k ∘ map g ∘ upgrade (s≤s (≤‴⇒≤ n≥1+k))

At the abstraction level of type structure we don’t really see why we should use the bottom-up algorithm though — to see that, we have to go back to the more concrete level of computation. The bottom-up computation is preferred to the top-down one for the standard reason: td solves overlapping sub-problems multiple times, whereas bu solves each sub-problem just once. (On the other hand, td is more suitable as a specification.)

The starting point of Shin and I was actually upgrade: Shin wanted to understand Richard’s definition, and eventually came up with a version of the co-associativity-like equation as its specification, from which Richard’s definition can be calculated. When contemplating how dependent types can help to derive upgrade from the specification, the BT datatype came to me unexpectedly as a way of encoding the computation of choose at type level. I believe that the use of BT in the type of upgrade characterises its behaviour completely, and we should be able to derive the co-associativity- and co-unitality-like equations from the type, possibly using parametricity. While the reformulation looks neat, now I think some part of it over-generalises and doesn’t work nicely from an algorithmic point of view: the generalisation from lists to All is not necessary, and sort of forces the base cases to be singleton lists rather than the empty list, which I think is more natural (putting less restrictions on the lengths and eliminating most of the (1 +_)’s and (2 +_)’s in the indices). Well, I think the lesson here is that category theory, string diagrams, etc are not magic bullets; rather, they are tools that are indeed powerful but should be used wisely like any other tools. On the other hand, if we decide to do a more complex mental exercise anyway, category theory and string diagrams do help to manage the complexity effectively.

Agda file: BT.agda