The trek goes on

0040

What is intrinsically typed programming?

Posted at 16:43 on 11 October 2024

Another summer passed, with another group of intern students to whom I needed to teach intrinsically typed programming. This has never been easy because there’s a serious lack of instructional material and even papers, and I have to get my students to read the few papers on the subject (mostly by Conor McBride) and highlight/explain the right patterns/idioms for them. One exercise they encountered was proving the associativity of vector append, which I frowned at. I told them that they shouldn’t need to prove this if they’re writing the right kind of intrinsically typed programs. But what’s ‘the right kind of intrinsically typed programs’? And what are the benefits of writing this kind of programs? I don’t remember seeing satisfactory/systematic answers to these questions.

One vague answer I gave my students was that the computation happening at index level should correspond to the program, which is then free of type casts (e.g., subst in the Agda Standard Library). So, for example, vector append

append₀ : Vec A m → Vec A n → Vec A (m + n)
append₀ []       ys = ys
append₀ (x ∷ xs) ys = x ∷ append₀ xs ys

and the addition appearing in its type should have the same computation structure:

_+_ : ℕ → ℕ → ℕ
zero  + n = n
suc m + n = suc (m + n)

Similarly, if we are writing reversed append

revcat₀ : Vec A m → Vec A n → Vec A (m +' n)
revcat₀ []       ys = ys
revcat₀ (x ∷ xs) ys = revcat₀ xs (x ∷ ys)

then we should use a different addition in the type that matches the structure:

_+'_ : ℕ → ℕ → ℕ
zero  +' n = n
suc m +' n = m +' suc n

But I want to say something more precise to explain, for example, why it’s okay to use _+_ in the type of append but not revcat. The rest of this post gives a possible explanation.

The type checker performs computation automatically, and intuitively that’s the only thing we need for checking ‘good’ intrinsically typed programs. Conversely, if that’s not enough, we’ll need to insert type casts and prove equalities to help the type checker. So there are things that the type checker can do automatically and others that it cannot. What I want to do is port append and revcat to a more primitive setting where less things happen automatically, but then we’ll see that computation corresponds to something special such that it is reasonable to expect automatic treatment.

The porting is actually something familiar to the dependently typed programmer: replacing functions with relations. Instead of using the function _+_, we can define a relational version

data Add : ℕ → ℕ → ℕ → Set where
  zero :             Add  zero   n n
  suc  : Add m n r → Add (suc m) n (suc r)

and rewrite append₀ to produce a vector of length r such that Add m n r holds:

append₁ : Vec A m → Vec A n → Add m n r → Vec A r
append₁ []       ys  zero   = ys
append₁ (x ∷ xs) ys (suc a) = x ∷ append₁ xs ys a

And we can rewrite revcat₀ using Add too:

revcat₁ : Vec A m → Vec A n → Add m n r → Vec A r
revcat₁ []       ys  zero   = ys
revcat₁ (x ∷ xs) ys (suc a) =
  revcat₁ xs (x ∷ ys) {! Add m (suc n) (suc r)
                       ⊣ xs : Vec A m; ys : Vec A n; a : Add m n r !}

This time Agda readily accepts the definition revcat₁ xs (x ∷ ys) of the second clause, but we need to provide a proof of Add. The goal type and the relevant part of the context (separated by ‘’) are shown in the hole. What I want to emphasise is that this ‘relationalisation’ technique is doing something more fundamental than I used to think: exposing a derivation that Agda needs to construct implicitly for type checking. That is, if we used _+_ in the index of revcat₀’s result type,

revcat₂ : Vec A m → Vec A n → Vec A (m + n)
revcat₂ []       ys = ys
revcat₂ (x ∷ xs) ys = revcat₂ xs (x ∷ ys)  -- doesn’t type-check

conceptually Agda would need to construct a derivation of judgemental equality implicitly but couldn’t, so type checking would fail. By doing the ‘relationalisation’, the derivation becomes a proof we construct explicitly and not Agda’s responsibility, and Agda no longer blocks the definition. The proof is easy but does require an induction (to be applied to a):

right-suc : Add m n r → Add m (suc n) (suc r)
right-suc  zero   = zero
right-suc (suc a) = suc (right-suc a)

And the hole left in revcat₁ can be fulfilled by right-suc a.

We can now compare append₁ and revcat₁. Both type-check, but the amount of effort is visibly different. For append₁, the Add proof is deconstructed in the only way dictated by the shape of its indices, and in the inductive case the sub-proof is directly passed to the inductive call, whereas in the corresponding case of revcat₁, the sub-proof needs to be non-trivially transformed (by right-suc) before it can be passed to the inductive call. This non-trivial transformation can be thought of as something Agda cannot do automatically. I think this leads to a clearer explanation —maybe even a working definition— of what counts as good intrinsically typed programming: proofs are designed to be simple enough such that the type checker can handle them automatically and implicitly.

Note that this working definition doesn’t imply that the computational structure of append follows that of _+_, which I think is additionally due to the fact that the shape of a vector follows that of its length index, so the answer I gave to my students was only a special case. (One should be able to come up with an example that requires only trivial proof manipulation but involves more complex inductive families whose constructors don’t just mirror their indices.) I hope all these can be made mathematically precise one day! (Together with Conor, Pierre Dagand did take a first step with their ‘coherent algebras’.)

Also note that the relationalisation can be done in another way that may be the first instinct of many dependently typed programmers: using the relation m + n ≡ r rather than Add m n r. This doesn’t completely eliminate functions in indices though — I want to start from a simple setting where indices have only variables and constructors and unification is easy. This setting is close enough to logic programming — in fact, Peter Dybjer did mention in his 1994 paper introducing inductive families that ‘[a]n application of this theory is as foundation for logic programming as proposed by Hagiya and Sakurai’. In this setting, we can encode not only deterministic functions but also non-deterministic ones (a representation of which was explored in 0038), and it is tempting to think about whether it makes sense for the type checker to deal with non-deterministic functions automatically. More generally, we’ll see more clearly what kinds of proof are involved in checking dependently typed programs and start thinking about which kinds of proof can be automated (such as the idea of integrating SMT solving proposed in 0038) to make intrinsically typed programming more powerful.