In-place pointer algorithms, which transform linked data structures by adjusting the pointers and without moving the data or allocating new space in the heap, are intriguing because their manipulation of pointers is often obscure and yet performs the intended task correctly and efficiently. The classical example is list reversal: defining singly linked lists in C++ (which I haven’t written for years!) by
struct IntListNode {
int head;
IntListNode* tail;
};
typedef IntListNode* IntList;
we can implement in-place list reversal by
IntList reverse(IntList xs) {
IntList ys = 0; // (0)
while(xs) { // (1)
IntList zs = xs->tail; // (2)
xs->tail = ys; // (3)
ys = xs; // (4)
xs = zs; // (5)
}
return ys;
}
(At least for me) it is easier to understand the program as an ephemeral version of the standard functional list reversal with an accumulating parameter, which is initialised to []
in step (0) above.
In Haskell:
revcat :: [a] -> [a] -> [a]
revcat [] ys = ys
revcat (z:zs) ys = revcat zs (z:ys)
The revcat
function is tail-recursive, and thus corresponds to a loop (1) that operates on two local variables xs
and ys
corresponding to the two inputs of revcat
.
If xs
is empty, the loop stops and ys
is the result, corresponding to the first clause of revcat
.
Otherwise, we pattern-match on xs
, calling its tail zs
(2).
To prepare for the next run of the loop, we should (roughly speaking) update xs
and ys
to zs
and z:ys
respectively, corresponding to the second clause of revcat
.
The update of xs
is easy but has to be done last (5).
The update of ys
requires us to construct z:ys
first.
The key observation/assumption here is that the list node pointed to by xs
and storing z
is not needed elsewhere, and can be reused in the construction of z:ys
simply by making the tail of the node point to ys
(3).
Now the list pointed to by xs
is z:ys
, so the update of ys
is done by assigning xs
to ys
(4).
The update of xs
has to be done last because its value is needed for the update of ys
; alternatively (and hypothetically for C++), we could write steps (2)–(5) as a parallel assignment
(xs, ys, xs->tail) = (xs->tail, xs, ys);
which rotates the values of the three variables.
The key to the conversion from the functional/recursive program revcat xs []
to the in-place imperative/‘loopy’ program reverse(xs)
is the observation that the cons node (z:)
can be safely reused because it is used exactly once on the right-hand side — this kind of resource constraint naturally points towards a linear type system.
Indeed, back in 2000, Martin Hofmann has given a provably correct translation from linearly typed functional programs to malloc
-free C programs (at ESOP as an extended abstract, and in a journal article); being malloc
-free, the resulting C programs necessarily use no extra heap space and have to do their job in-place.
The basic idea is to represent the space of a node abstractly as an inhabitant of a special type ‘◇
’; inhabitants of ◇
are managed linearly, meaning that node space cannot be duplicated or discarded.
(Hofmann mentioned that he used the word ‘linear’ in the sense of ‘affine linear’, meaning that ◇
-inhabitants could be discarded, but this looks bad because it implies that allocated space may be forgotten without being freed.)
The (:)
constructor is enriched to Cons :: ◇ -> a -> [a] -> [a]
with an extra ◇
-typed argument, so that (i) pattern-matching a list with Cons d x xs
releases the space of a node, which we can refer to by the variable d
, and (ii) constructing a Cons
node requires us to supply a ◇
-typed argument, in effect specifying the space where the Cons
node will reside.
Rewriting revcat
with the new Cons
constructor,
revcat :: [a] -> [a] -> [a]
revcat [] ys = ys
revcat (Cons d z zs) ys = revcat zs (Cons d z ys)
we see that d
makes the reuse of the space of Cons
nodes explicit — the space of the Cons
node on the left-hand side is reused as the space of the Cons
node on the right-hand side because the two nodes use the same space represented by d
.
In this way, we can give space instructions at the comfortable abstraction level of functional programs, which are then translated to C programs for efficient execution.
Elegant!
Of course, in general we need to be able to allocate and deallocate space, and allow copying and sharing, which require extensions to Hofmann’s linear type system (some of which Hofmann discusses briefly at the end of the journal article). We also want to omit space instructions if possible, so that we can stay at an even more comfortable abstraction level — this requires some kind of linear-type inference. All these are achieved (to some extent — see the discussion at the end) by Reinking et al’s reference-counting system presented at PLDI 2021, which translates Koka, an effectful functional language, to C (in the same spirit as Hofmann’s work, although they fail to cite Hofmann, which I think is a somewhat serious oversight). In their system, every allocated node is paired with a reference count, which affects at runtime whether the node should be duplicated —when it is still referred to by other parts of the program and cannot be destructively updated— or reclaimed/reused —when the current part of the program owns the sole reference to the node. By making updates to reference counts atomic (when needed), the C programs even work in a concurrent setting. What convinces me that they are on the right track is their tree traversal example: their Figure 3 is a completely standard functional program which uses a stack to store the parts of a tree that are ‘out of focus’, i.e., not currently being processed (see, for example, my blog post 0025 or Conor McBride’s ‘clowns and jokers’); from the functional program, their translation produces the in-place traversal program (their Figure 2) that reuses the out-of-focus parts of the tree as the stack. There is a more difficult version of the traversal problem: the Schorr–Waite graph marking algorithm, which is notorious for the difficulty of its correctness proof — for example, both Hongseok Yang’s treatment using separation logic and Richard Bird’s treatment using purely functional equational reasoning are fairly complicated. The tree version of the problem is easier, but it is nice that, correspondingly, the solution is easy too: just write the functional program and invoke the compiler.
I found Hofmann’s work about two years ago, and the journal version was discovered and presented by Bo-Yu Cheng at the ‘Festival of Modality and Linearity’ seminar right after FLOLAC 2020. Recently Shin talked about a derivation exercise from Kaldewaij’s Programming: The Derivation of Algorithms (Exercise 1 of Section 10.2, on page 168): given an array of integers, reorder the elements (using only swaps) such that either all even-indexed elements are even or all odd-indexed elements are odd. Inspired by Hofmann, my instinct was to solve the problem functionally for lists (rather than arrays) first: separate the input list into two sub-lists of even and odd elements respectively, and then interleave the two sub-lists to get the output list. In Haskell, define
separate :: (a -> Bool) -> [a] -> ([a], [a])
separate p [] = ([], [])
separate p (x:xs) | p x = ((x:) *** id) (separate p xs)
| otherwise = (id *** (x:)) (separate p xs)
interleave :: ([a], [a]) -> [a]
interleave ([] , zs) = zs
interleave (y:ys, zs) = y : interleave (zs, ys)
where (f *** g) (x, y) = (f x, g y)
, and the function
interleave . separate even
should solve the problem. Then we calculate a tail-recursive version of the function and compile it to an in-place imperative program with loops. I find that I’m pretty bad at deriving tail-recursive programs though; the best I can do is a straightforward fusion producing
isep :: (a -> Bool) -> [a] -> [a]
isep p [] = []
isep p (x:xs) | p x = x : isep (not . p) xs
isep p (x:xs) | otherwise = adjSwap (x : isep p xs)
where
adjSwap :: [a] -> [a]
adjSwap [] = []
adjSwap [x] = [x]
adjSwap (x:y:xs) = y : x : adjSwap xs
The behaviour of isep
is rather unexpected — for example,
isep even ([0,2,4] ++ [1,3..19]) = [0,1,2,3,4,5,19,7,17,9,15,11,13]
where the prefix [0,2,4,1,3,5]
of the input list is correctly reordered, and the rest is somehow split into two halves and interleaved after the latter half is reversed.
This example indicates that isep p
is not actually equal to interleave . separate p
, but it does solve the problem too.
I’m not sure what reasoning should be done here — one (not so attractive) possibility is to prove that the two functions are equal up to some crazy list equality.
But since filling in the technical details isn’t important at this point, I simply QuickChecked isep
with
quickCheck (\xs -> let ys = isep even xs
in everyOther even ys || everyOther odd (safeTail ys))
where
everyOther :: (a -> Bool) -> [a] -> Bool
everyOther p [] = True
everyOther p (x:xs) = p x && everyOther p (safeTail xs)
safeTail :: [a] -> [a]
safeTail [] = []
safeTail (_:xs) = xs
and the tests do succeed.
I think what the unexpected behaviour of isep
demonstrates is that recursion is far more expressive than looping, which corresponds to just a very restricted form of recursion, namely tail recursion — imagine what kind of program with loops is needed to replicate isep
’s behaviour.
The general message may be that a better way to write imperative programs (not limited to in-place ones) is to move to the more liberal domain of functional programs for better reasoning facilities, and transform functional programs with verified behaviour to tail-recursive programs, which are then translated to imperative programs by some mechanical and even cosmetic procedure.
Back to Kaldewaij’s problem, we can still attempt to write tail-recursive programs by hand. The first —not yet tail-recursive— version I write is
direct :: (a -> Bool) -> [a] -> [a]
direct p xs = produce p xs [] []
produce :: (a -> Bool) -> [a] -> [a] -> [a] -> [a]
produce p xs [] zs = consume p xs zs
produce p xs (y:ys) zs = y : produce (not . p) xs zs ys
consume :: (a -> Bool) -> [a] -> [a] -> [a]
consume p [] zs = zs
consume p (x:xs) zs | p x = x : produce (not . p) xs zs []
| otherwise = consume p xs (x:zs)
The direct
function is the entry point and initialises the ‘local variables’ ys
and zs
of produce
to []
, which are used to store elements drawn from xs
satisfying p
and not . p
respectively.
The produce
function emits elements alternately from ys
and zs
, and switches to the consume
function to replenish the lists when they are empty.
Neither produce
or consume
is tail-recursive, but this is easy to fix: add an accumulating parameter as the destination of the emitted elements, and reverse the accumulated list at the end.
Another potential problem is the switching from p
to not . p
in both functions, which we can ‘defunctionalise’ using a boolean to indicate whether we currently want elements satisfying or not satisfying p
.
Finally we merge produce
and consume
into a single loop
function, which is tail-recursive, and initialise the local variables in another direct'
function, which is successfully QuickChecked:
direct' :: (a -> Bool) -> [a] -> [a]
direct' p xs = reverse (loop p False xs [] [] [])
loop :: (a -> Bool) -> Bool -> [a] -> [a] -> [a] -> [a] -> [a]
loop p b [] [] [] acc = acc
loop p b [] [] (z:zs) acc = loop p b [] [] zs (z:acc)
loop p b (x:xs) [] zs acc | p x /= b = loop p (not b) xs zs [] (x:acc)
| otherwise = loop p b xs [] (x:zs) acc
loop p b xs (y:ys) zs acc = loop p (not b) xs zs ys (y:acc)
The loop
function can be straightforwardly translated into Koka
fun loop(b : bool, xs : list<int>, ys : list<int>, zs : list<int>, acc : list<int>) : <div> list<int>
match ys
Nil -> match xs
Nil -> match zs
Nil -> acc
Cons(z, zs) -> loop(b, xs, ys, zs, Cons(z, acc))
Cons(x, xs) -> if (x % 2 == 0) != b { loop(!b, xs, zs, ys, Cons(x, acc)) }
else { loop( b, xs, ys, Cons(x, zs), acc) }
Cons(y, ys) -> loop(!b, xs, zs, ys, Cons(y, acc))
and then compiled to C, although the resulting C program is still some distance away from a required one that operates on arrays using swaps only:
the resulting program would rearrange the pointers so that the input list is gradually separated into two sub-lists and then interleaved back into a list, whereas with an array, the expected solution is probably similar to the solution to the Dutch national flag problem, representing and maintaining the sub-lists as segments of the array.
It seems interesting to figure out the relationship between lists and arrays, and how we may convert between list programs and array programs.
Also, the Koka compiler seems to handle parallel assignment incorrectly: loop
often swaps its local variables ys
and zs
, but the C program emitted by the Koka compiler performs the swapping in the (x % 2 == 0) != b
case by the assignments ys = zs; zs = ys;
.
I went back to the PLDI paper and discovered that they actually only formalised a small part of the compilation, namely the ‘Perceus’ algorithm that inserts reference-counting instructions (Section 3.4), but the optimisations that enable in-place updates remain informal (Sections 2.3–2.5).
Apparently there is still a lot of work to do to establish the paradigm of writing (efficient) imperative programs functionally, from high-level reasoning techniques to low-level compilation mechanisms.