The trek goes on

0013

Rice’s theorem, modally typed

Posted at 17:15 on 30 October 2020

My colleagues Liang-Ting Chen and Tsung-Ju Chiang and I have been looking into modal type theory in search of a solid theoretic foundation for typed meta-programming. In particular, we have concentrated on (a variation of) Alex Kavvos’s intensional recursion (proposed here), a type-theoretic formulation of the Gödel-Löb (GL) axiom. Computationally, GL corresponds to Kleene’s second recursion theorem, which allows us to write programs that have access to their own code. This self-referencing ability leads to paradoxical constructions that lie at the heart of the classic results in computability theory, for example the undecidability of the halting problem and, more generally, Rice’s theorem. Below is our attempt to formulate the arguments somewhat informally with modal types, followed by some discussions of future directions.

We start with intensional recursion, which can be understood by comparing it with ‘extensional’ recursion, that is, the familiar fixed-point combinator that allows us to define recursive functions

Y : (α → α) → α

which satisfies the equation

Y f = f (Y f)

To define something of type α recursively with Y, we should supply a function of type α → α, in which we define something of type α in terms of the argument also of type α, which is a reference to the thing being defined. The type for intensional recursion has the same structure except that we throw in some modalities:

GL : □(□α → α) → □α

Think of □α as the type of ‘code representing something of type α’. Of the three □’s above, the second one is the most essential because it signifies what kind of self-reference is permitted here: to define (the code of) something of type α using GL, we supply (the code of) a function of type □α → α, in which we define something of type α in terms of the argument — this time of type □α — which is a reference to the code of the thing being defined. We can also write down a fixed-point equation for GL:

⟦ GL c ⟧ = ⟦ c ⟧ (GL c)

for all c : □(□α → α), where ⟦_⟧ : □α → α is the evaluator/interpreter function. This equation gives code a coinductive feel: GL c : □α by itself does not reduce, but if we ‘force’ it by applying ⟦_⟧, a copy of its body c will be forced/evaluated/expanded, and the result will refer to GL c itself, which will need to be forced again to spawn another copy of c, and so on.

An important fact has to be mentioned at this point: GL and ⟦_⟧ together give rise to Y, and thus non-termination. Here is the construction, which also serves as a first example of programming with GL:

Y : (α → α) → α
Y = ⟦ GL ⌜λself f. f (⟦ self ⟧ f)⌝ ⟧

Here ⌜_⌝ quotes (at meta-level) a term of type α and turns it into a piece of code that has type □α; it interacts with evaluation via the equation ⟦ ⌜t⌝ ⟧ = t. We’re defining the Y combinator using GL, so we can suppose that we’re given the code self : □((α → α) → α) of the combinator itself, and then define the combinator in terms of self. Onto the actual meat of the definition: the combinator should take a function f : α → α and compute the infinitely nested applications of f, so in the definition we receive a second argument f and make the outermost invocation of f, whose argument is computed by evaluating self — the code of the combinator — on f, producing the rest of the infinitely nested applications. It is straightforward to verify that this definition does give rise to the intended behaviour of extensional recursion:

  Y f
=  {- above definition of Y -}
  ⟦ GL ⌜λself f. f (⟦ self ⟧ f)⌝ ⟧ f
=  {- fixed-point equation of GL -}
  ⟦ ⌜λself f. f (⟦ self ⟧ f)⌝ ⟧ (GL ⌜λself f. f (⟦ self ⟧ f)⌝) f
=  {- evaluating a quotation; function application (twice) -}
  f (⟦ GL ⌜λself f. f (⟦ self ⟧ f)⌝ ⟧ f)
=  {- above definition of Y -}
  f (Y f)

Now we’re ready to establish the undecidability of the halting problem with GL. Suppose that we’re given a total decision procedure h : □A → Bool such that h c = true iff ⟦ c ⟧ ↓ (converges) for all c : □A. We can then construct a ‘disobedient’ program GL oppositeH : □A that does the opposite of what h says it does. Assuming the existence of a convergent value stop : A, it is tempting to define the body oppositeH : □(□A → A) as the quote of the function

λself. if h self then Y id else stop
  : □A → A

which says that if h says it’s convergent then it will become Y id and diverge, or otherwise it will become stop and converge. However, the above expression is an open one (whose free variables are the assumptions h and stop), and quotation acts on closed expressions only (unless we switch to a contextual modal type theory, which we avoid for now). So to be technically correct, we should instead construct a closed expression

λh stop self. if h self then Y id else stop
  : (□A → Bool) → A → □A → A

and define oppositeH by ‘applying’ the quote of the expression above to our two assumptions using the applicative combinator

_·_ : □(α → β) → □α → □β

which interacts with evaluation via the equation ⟦ cf · cx ⟧ = ⟦ cf ⟧ ⟦ cx ⟧. This means that the assumptions should also be □-typed — our real assumptions are thus

ch    : □(□A → Bool)
cstop : □A

such that ⟦ ch ⟧ is total, ⟦ ch ⟧ c = true iff ⟦ c ⟧ ↓, and ⟦ cstop ⟧ ↓. Assuming that _·_ is left-associative, the definition of oppositeH is then

oppositeH : □(□A → A)
oppositeH = ⌜λh stop self. if h self then Y id else stop⌝ · ch · cstop

Now we can check that GL oppositeH is indeed disobedient and leads to a paradox and eventually a contradiction:

  ⟦ ch ⟧ (GL oppositeH) = true
⇔  {- ⟦ ch ⟧ decides the halting problem -}
  ⟦ GL oppositeH ⟧  ↓
⇔  {- fixed-point equation of GL -}
  ⟦ oppositeH ⟧ (GL oppositeH)  ↓
⇔  {- definition of oppositeH; evaluating code application -}
  if ⟦ ch ⟧ (GL oppositeH) then Y id else ⟦ cstop ⟧  ↓
⇔  {- (only) the else branch converges -}
  ⟦ ch ⟧ (GL oppositeH) = false

Note that if ⟦ ch ⟧ was not total, in particular if ⟦ ch ⟧ was not defined on GL oppositeH, then this bi-implication wouldn’t be a contradiction, so the totality assumption about ⟦ ch ⟧ is important — given that ⟦ ch ⟧ (GL oppositeH) is defined, it is either true or false, but the argument above says that it has to be both, which is impossible. Also important is the assumption that cstop exists. Due to the presence of Y, we know that the type A necessarily has divergent values, but we can’t say the same thing for convergent values, and must assume their existence so that the construction can go through. This assumption can be phrased alternatively as ‘A has to contain both divergent and convergent values’, which is closer to the form of the ‘non-triviality’ assumption of Rice’s theorem, which we will see next.

In the above construction, whether we are dealing with convergence or some other property doesn’t seem crucial — the essence of the construction is to do the opposite of what a given decision procedure ⟦ cd ⟧ : □A → Bool (the evaluation of some cd : □(□A → Bool)) says, so we can carry out the construction as long as the property decided by ⟦ cd ⟧ is non-trivial, that is, there exist ca : □A and cb : □A such that ⟦ cd ⟧ ca = true and ⟦ cd ⟧ cb = false. The generalised construction is then

oppositeR : □(□A → A)
oppositeR = ⌜λd a b self. if d self then b else a⌝ · cd · ca · cb

For the paradoxical argument to go through, however, we need one more assumption about ⟦ cd ⟧: it should be extensional, that is, ⟦ cx ⟧ = ⟦ cy ⟧ implies ⟦ cd ⟧ cx = ⟦ cd ⟧ cy for all cx : □A and cy : □A — the decision procedure should say the same thing for extensionally equal programs, i.e., programs with the same behaviour, or contrapositively, if two programs are distinguished by the decision procedure, they must have different behaviour. Now we can largely replay the argument:

  ⟦ cd ⟧ (GL oppositeR) = true
⇔  {- (⇒) see below; (⇐) ⟦ cd ⟧ extensional and ⟦ cd ⟧ ca = true -}
  ⟦ GL oppositeR ⟧ = ⟦ ca ⟧
⇔  {- fixed-point equation of GL -}
  ⟦ oppositeR ⟧ (GL oppositeR) = ⟦ ca ⟧
⇔  {- definition of oppositeR; evaluating code application -}
  if ⟦ cd ⟧ (GL oppositeR) then ⟦ cb ⟧ else ⟦ ca ⟧ = ⟦ ca ⟧
⇔  {- (only) the else branch equals ⟦ ca ⟧ -}
  ⟦ cd ⟧ (GL oppositeR) = false

In the forward direction of the first step, by the fixed-point equation of GL and the totality of ⟦ cd ⟧, we see that ⟦ GL oppositeR ⟧ is either ⟦ ca ⟧ or ⟦ cb ⟧, but the second case is impossible since ⟦ GL oppositeR ⟧ = ⟦ cb ⟧ would imply ⟦ cd ⟧ (GL oppositeR) = ⟦ cd ⟧ cb = false by the extensionality of ⟦ cd ⟧, contradicting the antecedent. The extensionality of ⟦ cd ⟧ also plays a role in the forward direction of the last step, where we need ⟦ ca ⟧ ≠ ⟦ cb ⟧, which holds because ca and cb are distinguished by ⟦ cd ⟧ and thus cannot be extensionally equal. What we’ve proved is exactly Rice’s theorem: non-trivial and extensional decision procedures cannot exist — it is impossible to write a decision procedure that distinguishes programs in terms of their behaviour.

The proofs and even the statements of the undecidability of the halting problem and Rice’s theorem presented here are different from those in existing literature, though. Traditional treatments are untyped and thus are able to construct an untyped self-application that leads to the paradox; the relationship between the untyped arguments and the typed ones presented here is still mysterious to me. Another mysterious relationship to be investigated is the one between the presented arguments, which use intensional recursion, and a categorical version of Rice’s theorem presented by Kavvos (here), which uses extensional recursion.

And Rice’s theorem is still not intuitive enough to me, probably because it’s a negative theorem — what it constructs is a contradiction. In constructive mathematics we often prefer positive notions to negative ones — for example, instead of defining inequality as the negation of equality, which only says that it’s impossible for two things to be equal, it is preferable to use an apartness relation that positively provide evidence that separates two things. What would a positive version of Rice’s theorem look like?

The theoretical foundation of the development above is essentially Kavvos’s intensional PCF, which (like PCF) is not strongly normalising (since we can derive Y). If what we want is a strongly normalising calculus with GL, then we can’t allow the calculus to be able to express evaluation, which, however, seems to be essential to the whole development. We’re now working on an alternative calculus (which should be strongly normalising) with two code modalities □ and ⊠ (based on this paper), where □ marks code for self-reference but not for evaluation, and ⊠ marks code that can be evaluated. By ‘upgrading’ GL to the type ⊠(□α → α) → ⊠α and including a combinator of type ⊠α → □α in this calculus, we can type the construction for Rice’s theorem, which doesn’t evaluate the □-typed argument self, but not Y, which evaluates self. In this calculus, the halting problem ceases to be meaningful because all programs necessarily terminate. For Rice’s theorem to be meaningful, we need to make sure that the type □A → Bool is inhabited by non-constant functions so that the statement of Rice’s theorem properly quantifies over non-trivial decision procedures. That is, we should add to the calculus interesting operations on □-typed code (but not to the extent that enables its evaluation), or otherwise we can simply prove that anything of type □A → Bool is a constant function from the fact that we can’t do anything on □-typed stuff (which Tsung-Ju has done), without going through the argument for Rice’s theorem. But what operations can we add? For example, could we have the inverse of Rice’s theorem, which allows us to add non-trivial but non-extensional operations to the calculus without destroying strong normalisation?

We should of course ask whether a calculus with GL is interesting practically, that is, has some uses other than constructing paradoxes. I mentioned that the fixed-point equation of GL has a coinductive feel, and indeed, the resemblance between GL and guarded corecursion has been noted by quite some people. So another aspect worth looking into is the relationship between the dual-modality calculus and guarded type theory. But still, are there more practical uses of GL?

Finally, one reason to have a strongly normalising calculus is to use it as a proof system, which in this case should allow us to internalise Rice’s theorem, but how this can be done is still rather unclear. In particular, an equality assumption such as ⟦ cd ⟧ ca = true should probably be expressed using an internalised term equality (or just reduction) relation on □A, but this relation is closely related to evaluation and should interact with the latter in some way, about which I sense the danger of self-reference...

This post is harder to write than I thought... 😥