The trek goes on


In-place pointer programs from functional programs

Posted at 21:56 on 31 March 2022, and revised at 00:26 on 3 April 2022

In-place pointer algorithms, which transform linked data structures by adjusting the pointers and without moving the data or allocating new space in the heap, are intriguing because their manipulation of pointers is often obscure and yet performs the intended task correctly and efficiently. The classical example is list reversal: defining singly linked lists in C++ (which I haven’t written for years!) by

struct IntListNode {
  int          head;
  IntListNode* tail;

typedef IntListNode* IntList;

we can implement in-place list reversal by

IntList reverse(IntList xs) {
  IntList ys = 0;           // (0)
  while(xs) {               // (1)
    IntList zs = xs->tail;  // (2)
    xs->tail = ys;          // (3)
    ys = xs;                // (4)
    xs = zs;                // (5)
  return ys;

(At least for me) it is easier to understand the program as an ephemeral version of the standard functional list reversal with an accumulating parameter, which is initialised to [] in step (0) above. In Haskell:

revcat :: [a] -> [a] -> [a]
revcat []     ys = ys
revcat (z:zs) ys = revcat zs (z:ys)

The revcat function is tail-recursive, and thus corresponds to a loop (1) that operates on two local variables xs and ys corresponding to the two inputs of revcat. If xs is empty, the loop stops and ys is the result, corresponding to the first clause of revcat. Otherwise, we pattern-match on xs, calling its tail zs (2). To prepare for the next run of the loop, we should (roughly speaking) update xs and ys to zs and z:ys respectively, corresponding to the second clause of revcat. The update of xs is easy but has to be done last (5). The update of ys requires us to construct z:ys first. The key observation/assumption here is that the list node pointed to by xs and storing z is not needed elsewhere, and can be reused in the construction of z:ys simply by making the tail of the node point to ys (3). Now the list pointed to by xs is z:ys, so the update of ys is done by assigning xs to ys (4). The update of xs has to be done last because its value is needed for the update of ys; alternatively (and hypothetically for C++), we could write steps (2)–(5) as a parallel assignment

(xs, ys, xs->tail) = (xs->tail, xs, ys);

which rotates the values of the three variables.

The key to the conversion from the functional/recursive program revcat xs [] to the in-place imperative/‘loopy’ program reverse(xs) is the observation that the cons node (z:) can be safely reused because it is used exactly once on the right-hand side — this kind of resource constraint naturally points towards a linear type system. Indeed, back in 2000, Martin Hofmann has given a provably correct translation from linearly typed functional programs to malloc-free C programs (at ESOP as an extended abstract, and in a journal article); being malloc-free, the resulting C programs necessarily use no extra heap space and have to do their job in-place. The basic idea is to represent the space of a node abstractly as an inhabitant of a special type ‘’; inhabitants of are managed linearly, meaning that node space cannot be duplicated or discarded. (Hofmann mentioned that he used the word ‘linear’ in the sense of ‘affine linear’, meaning that -inhabitants could be discarded, but this looks bad because it implies that allocated space may be forgotten without being freed.) The (:) constructor is enriched to Cons :: ◇ -> a -> [a] -> [a] with an extra -typed argument, so that (i) pattern-matching a list with Cons d x xs releases the space of a node, which we can refer to by the variable d, and (ii) constructing a Cons node requires us to supply a -typed argument, in effect specifying the space where the Cons node will reside. Rewriting revcat with the new Cons constructor,

revcat :: [a] -> [a] -> [a]
revcat []            ys = ys
revcat (Cons d z zs) ys = revcat zs (Cons d z ys)

we see that d makes the reuse of the space of Cons nodes explicit — the space of the Cons node on the left-hand side is reused as the space of the Cons node on the right-hand side because the two nodes use the same space represented by d. In this way, we can give space instructions at the comfortable abstraction level of functional programs, which are then translated to C programs for efficient execution. Elegant!

Of course, in general we need to be able to allocate and deallocate space, and allow copying and sharing, which require extensions to Hofmann’s linear type system (some of which Hofmann discusses briefly at the end of the journal article). We also want to omit space instructions if possible, so that we can stay at an even more comfortable abstraction level — this requires some kind of linear-type inference. All these are achieved (to some extent — see the discussion at the end) by Reinking et al’s reference-counting system presented at PLDI 2021, which translates Koka, an effectful functional language, to C (in the same spirit as Hofmann’s work, although they fail to cite Hofmann, which I think is a somewhat serious oversight). In their system, every allocated node is paired with a reference count, which affects at runtime whether the node should be duplicated —when it is still referred to by other parts of the program and cannot be destructively updated— or reclaimed/reused —when the current part of the program owns the sole reference to the node. By making updates to reference counts atomic (when needed), the C programs even work in a concurrent setting. What convinces me that they are on the right track is their tree traversal example: their Figure 3 is a completely standard functional program which uses a stack to store the parts of a tree that are ‘out of focus’, i.e., not currently being processed (see, for example, my blog post 0025 or Conor McBride’s ‘clowns and jokers’); from the functional program, their translation produces the in-place traversal program (their Figure 2) that reuses the out-of-focus parts of the tree as the stack. There is a more difficult version of the traversal problem: the Schorr–Waite graph marking algorithm, which is notorious for the difficulty of its correctness proof — for example, both Hongseok Yang’s treatment using separation logic and Richard Bird’s treatment using purely functional equational reasoning are fairly complicated. The tree version of the problem is easier, but it is nice that, correspondingly, the solution is easy too: just write the functional program and invoke the compiler.

I found Hofmann’s work about two years ago, and the journal version was discovered and presented by Bo-Yu Cheng at the ‘Festival of Modality and Linearity’ seminar right after FLOLAC 2020. Recently Shin talked about a derivation exercise from Kaldewaij’s Programming: The Derivation of Algorithms (Exercise 1 of Section 10.2, on page 168): given an array of integers, reorder the elements (using only swaps) such that either all even-indexed elements are even or all odd-indexed elements are odd. Inspired by Hofmann, my instinct was to solve the problem functionally for lists (rather than arrays) first: separate the input list into two sub-lists of even and odd elements respectively, and then interleave the two sub-lists to get the output list. In Haskell, define

separate :: (a -> Bool) -> [a] -> ([a], [a])
separate p [] = ([], [])
separate p (x:xs) | p x       = ((x:) *** id) (separate p xs)
                  | otherwise = (id *** (x:)) (separate p xs)

interleave :: ([a], [a]) -> [a]
interleave ([]  , zs) = zs
interleave (y:ys, zs) = y : interleave (zs, ys)

where (f *** g) (x, y) = (f x, g y), and the function

interleave . separate even

should solve the problem. Then we calculate a tail-recursive version of the function and compile it to an in-place imperative program with loops. I find that I’m pretty bad at deriving tail-recursive programs though; the best I can do is a straightforward fusion producing

isep :: (a -> Bool) -> [a] -> [a]
isep p [] = []
isep p (x:xs) | p x       = x : isep (not . p) xs
isep p (x:xs) | otherwise = adjSwap (x : isep p xs)


adjSwap :: [a] -> [a]
adjSwap []       = []
adjSwap [x]      = [x]
adjSwap (x:y:xs) = y : x : adjSwap xs

The behaviour of isep is rather unexpected — for example,

isep even ([0,2,4] ++ [1,3..19]) = [0,1,2,3,4,5,19,7,17,9,15,11,13]

where the prefix [0,2,4,1,3,5] of the input list is correctly reordered, and the rest is somehow split into two halves and interleaved after the latter half is reversed. This example indicates that isep p is not actually equal to interleave . separate p, but it does solve the problem too. I’m not sure what reasoning should be done here — one (not so attractive) possibility is to prove that the two functions are equal up to some crazy list equality. But since filling in the technical details isn’t important at this point, I simply QuickChecked isep with

quickCheck (\xs -> let ys = isep even xs
                   in  everyOther even ys || everyOther odd (safeTail ys))


everyOther :: (a -> Bool) -> [a] -> Bool
everyOther p []     = True
everyOther p (x:xs) = p x && everyOther p (safeTail xs)

safeTail :: [a] -> [a]
safeTail []     = []
safeTail (_:xs) = xs

and the tests do succeed. I think what the unexpected behaviour of isep demonstrates is that recursion is far more expressive than looping, which corresponds to just a very restricted form of recursion, namely tail recursion — imagine what kind of program with loops is needed to replicate isep’s behaviour. The general message may be that a better way to write imperative programs (not limited to in-place ones) is to move to the more liberal domain of functional programs for better reasoning facilities, and transform functional programs with verified behaviour to tail-recursive programs, which are then translated to imperative programs by some mechanical and even cosmetic procedure.

Back to Kaldewaij’s problem, we can still attempt to write tail-recursive programs by hand. The first —not yet tail-recursive— version I write is

direct :: (a -> Bool) -> [a] -> [a]
direct p xs = produce p xs [] []

produce :: (a -> Bool) -> [a] -> [a] -> [a] -> [a]
produce p xs []     zs = consume p xs zs
produce p xs (y:ys) zs = y : produce (not . p) xs zs ys

consume :: (a -> Bool) -> [a] -> [a] -> [a]
consume p []     zs = zs
consume p (x:xs) zs | p x       = x : produce (not . p) xs zs []
                    | otherwise = consume p xs (x:zs)

The direct function is the entry point and initialises the ‘local variables’ ys and zs of produce to [], which are used to store elements drawn from xs satisfying p and not . p respectively. The produce function emits elements alternately from ys and zs, and switches to the consume function to replenish the lists when they are empty. Neither produce or consume is tail-recursive, but this is easy to fix: add an accumulating parameter as the destination of the emitted elements, and reverse the accumulated list at the end. Another potential problem is the switching from p to not . p in both functions, which we can ‘defunctionalise’ using a boolean to indicate whether we currently want elements satisfying or not satisfying p. Finally we merge produce and consume into a single loop function, which is tail-recursive, and initialise the local variables in another direct' function, which is successfully QuickChecked:

direct' :: (a -> Bool) -> [a] -> [a]
direct' p xs = reverse (loop p False xs [] [] [])

loop :: (a -> Bool) -> Bool -> [a] -> [a] -> [a] -> [a] -> [a]
loop p b []     []     []     acc = acc
loop p b []     []     (z:zs) acc = loop p b [] [] zs (z:acc)
loop p b (x:xs) []     zs     acc | p x /= b  = loop p (not b) xs zs [] (x:acc)
                                  | otherwise = loop p b xs [] (x:zs) acc
loop p b xs     (y:ys) zs     acc = loop p (not b) xs zs ys (y:acc)

The loop function can be straightforwardly translated into Koka

fun loop(b : bool, xs : list<int>, ys : list<int>, zs : list<int>, acc : list<int>) : <div> list<int>
  match ys
    Nil -> match xs
             Nil -> match zs
                      Nil -> acc
                      Cons(z, zs) -> loop(b, xs, ys, zs, Cons(z, acc))
             Cons(x, xs) -> if (x % 2 == 0) != b { loop(!b, xs, zs, ys, Cons(x, acc)) }
                                            else { loop( b, xs, ys, Cons(x, zs), acc) }
    Cons(y, ys) -> loop(!b, xs, zs, ys, Cons(y, acc))

and then compiled to C, although the resulting C program is still some distance away from a required one that operates on arrays using swaps only: the resulting program would rearrange the pointers so that the input list is gradually separated into two sub-lists and then interleaved back into a list, whereas with an array, the expected solution is probably similar to the solution to the Dutch national flag problem, representing and maintaining the sub-lists as segments of the array. It seems interesting to figure out the relationship between lists and arrays, and how we may convert between list programs and array programs. Also, the Koka compiler seems to handle parallel assignment incorrectly: loop often swaps its local variables ys and zs, but the C program emitted by the Koka compiler performs the swapping in the (x % 2 == 0) != b case by the assignments ys = zs; zs = ys;. I went back to the PLDI paper and discovered that they actually only formalised a small part of the compilation, namely the ‘Perceus’ algorithm that inserts reference-counting instructions (Section 3.4), but the optimisations that enable in-place updates remain informal (Sections 2.3–2.5). Apparently there is still a lot of work to do to establish the paradigm of writing (efficient) imperative programs functionally, from high-level reasoning techniques to low-level compilation mechanisms.

Recovering from the ICFP submission, more slowly than expected…
End of page