The trek goes on


From amplitude amplification to Jordan’s lemma

Posted at 10:40 on 20 August 2023, and revised at 15:04 on 12 September 2023

Since the last blog post, I joined Liang-Ting to write a generic bidirectional type synthesiser that’s correct by construction, and we have submitted a paper about it to POPL. 🤞 From mid-September to early December I’ll be busy reviewing papers for CPP, ESOP, and TFP, and organising APLAS’s student research competition and poster session (again 🤞 but from the other side), to be held at IIS, while AIM will also take place at IIS the week before APLAS. Until then, I have about two months to do a bit of my own research (as well as some less interesting tasks such as preparing reports to conclude my first project funded by the National Science and Technology Council).

For quantum algorithms, Kai-Min has mentioned a few times that Jordan’s lemma offers a unified view of various generalisations of amplitude amplification (the key idea underlying the famous Grover search algorithm) to higher dimensions. I’ve been stuck with quantum Fourier transform for quite a while, and eager to see some new quantum ideas. The view offered by Jordan’s lemma doesn’t disappoint, but there are quite a few conceptual and technical gaps that I’ll need to fill in myself. Before then, I need to make some rough (and possibly incorrect) notes to record what I saw and thought about.

In an attempt to present amplitude amplification as a more generally applicable technique, assume that we are given a unitary/circuit $A$ that maps an initial state, say $\ket{0^m}$, to a superposition of a good state $\ket g$ —which is the result we want to compute— and a bad state $\ket b$, but we don’t know what $\ket g$ and $\ket b$ are except that they include a single-bit flag for us to easily distinguish them: $\ket g \defeq \ket 1 \ket{g'}$ and $\ket b \defeq \ket 0 \ket{b'}$. Let the probability of getting $\ket g$ from a measurement be $p$, and give the result of $A$ a name:

$$ \ket r \defeq A \ket{0^m} = \sqrt{p}\, \ket{g} + \sqrt{1-p}\, \ket{b} = \sqrt{p}\, \ket 1 \ket{g'} + \sqrt{1-p}\, \ket 0 \ket{b'} $$

(Apparently there can be some relative phase too, but it’s not important for the presentation and omitted.) Our goal is to ‘amplify’ the probability of getting $\ket g$ to nearly $1$ (so that we can find out what $\ket g$ is with high probability).

The usual presentation proceeds by noting that $\ket g$ and $\ket b$ are orthogonal and $\ket r$ —being a linear combination of $\ket g$ and $\ket b$— lies on the two-dimensional plane spanned by $\ket g$ and $\ket b$. If we set $\ket b$ as the X-axis and $\ket g$ as the Y-axis, then we can rewrite $\ket r$ as

$$ \ket r = \sin\theta\, \ket g + \cos\theta\, \ket b $$

where $\sin\theta = \sqrt p$ and $\cos\theta = \sqrt{1-p}$, so that the angle $\theta$ is between $\ket r$ and $\ket b$. $\theta$ is proportional to $p$ (since both $\arcsin$ and $\sqrt\cdot$ are monotonic), so if the probability of getting $\ket g$ is small, $\theta$ is also small and $\ket r$ is close to $\ket b$, which is bad. The (ingenious) idea of amplitude amplification is that we can rotate a quantum state from $\ket r$ towards $\ket g$ by performing two reflections, the first through $\ket b$ and the second through $\ket r$. I imagine that this is like ‘catapulting’ the state along the direction from $\ket b$ to $\ket r$, with $\ket g$ being further down the direction, and we’ll get near $\ket g$ if we perform the rotation enough times. It is possible that we might catapult the state too far, so we should be careful with how many times we perform the rotation, but for now I’m more interested in reasoning about one such rotation.

To write things down, we should say a bit about the mathematical formulation of reflections. The reflection of $\ket u$ through $\ket v$ can be thought of as decomposing $\ket u = \lambda \ket v + \ket{v^\bot}$ for some $\ket{v^\bot}$ orthogonal to $\ket v$, and then negating the orthogonal component $\ket{v^\bot}$; mathematically this can be expressed as $2 \ket v \bra v - I$, where the $-I$ negates both components, and then the doubled projector $2 \ket v \bra v$ compensates and restores the component $\lambda \ket v$. Following this pattern, in general we can reflect through the image of a projector $\Pi$ by the operator $2\Pi - I$.

The second reflection $2 \ket r \bra r - I$ can be reduced to a reflection through $\ket{0^m}$ by a basis change using $A$:

$$ 2 \ket r \bra r - I = 2 A\ket{0^m}\bra{0^m}A^\dagger - AA^\dagger = A(2\ket{0^m}\bra{0^m} - I)A^\dagger $$

The reflection through $\ket{0^m}$ is independent from $\ket r$ and (apparently) easier to implement as a quantum circuit. The first reflection $2\ket{b}\bra{b} - I$ is more interesting. At first glance it’s not clear how to implement this reflection since $\ket{b}$ is unknown, but with the distinguishing flag that we assume to be in both $\ket g$ and $\ket b$ it becomes straightforward. Let $\ket v$ be a state in the plane spanned by $\ket g$ and $\ket b$, that is, $\ket v = x\,\ket 1 \ket{g'} + y\, \ket 0 \ket{b'}$. Then reflecting $\ket v$ through $\ket b$ is the same as reflecting it through the hyperplane mapped to by the projector $\ket 0 \bra 0 \otimes I$, i.e., those states whose flag is $\ket 0$: this hyperplane contains $\ket 0 \ket{b'}$ and is orthogonal to $\ket 1 \ket{g'}$, so reflecting $\ket v$ through this hyperplane negates the component $x\,\ket 1 \ket{g'}$ and leaves the component $y\, \ket 0 \ket{b'}$ unchanged. It is also easy to verify by calculation that

$$ (2\ket{b}\bra{b} - I) \ket v = (2(\ket 0 \bra 0 \otimes I) - I) \ket v $$

This alternative reflection through the hyperplane (on the right-hand side) is independent from $\ket b$ and again implementable.

Until this point it all looks comprehensible, but if we generalise it further, the situation can become bewildering. For example (in the ‘QMA’ setting), the input to $A$ may include some witness $\ket w$ in addition to some ancilla $\ket{0^m}$, and the second reflection will be reduced to one through $I \otimes \ket{0^m}\bra{0^m}$. We can still do the two reflections, which are now both high-dimensional however, and it’s not obvious what kind of rotation the two reflections perform, or even if the behaviour is still rotation. Here’s where Jordan’s lemma comes in: very roughly speaking, given any two projectors $\Pi_0$ and $\Pi_1$ (and there are no further conditions!), the whole state space can be decomposed into a direct sum of orthogonal $1$- and $2$-dimensional sub-spaces such that each of the sub-spaces is closed under $\Pi_i$ (so that we can reason about the behaviour of $\Pi_i$ independently in each of the sub-spaces), and is in fact trivial in the $1$-dimensional sub-spaces, so it’s basically just the $2$-dimensional sub-spaces to which we should pay attention. In the case of the composition of the two reflections $(2\Pi_1 - I)(2\Pi_0 - I)$, we can think of its behaviour as performing simultaneous and independent rotations in several ($2$-dimensional) planes with different speeds, reducing the understanding of a high-dimensional phenomenon to a $2$-dimensional analysis.

We didn’t have time to go into the technical details (and Kai-Min is less interested in working out the details anyway), so that’s what I got from the meeting so far. Instead of continuing with the technicalities, here are some extrapolations about what we should have in a high-level quantum-algorithmic language from what I’ve seen. Such a language needs constructs that are fundamental enough for expressing high-level concepts that may or may not yet exist — we should probably derive operations such as reflection, rotation, etc from more fundamental concepts (rather than make them built-in), and provide good abstraction mechanisms to make it natural to formulate and use derived concepts. Likewise, Jordan’s lemma should ideally be a library component (like datatype-generic programming in the dependently typed setting) or at least be a property that can be conveniently stated (like the fold fusion theorem in traditional Algebra of Programming). For now, it seems that the most fundamental concept is still the manipulation of superposition (linear combination), but not necessarily with respect to the computational basis or any basis — basis-dependence looks like a red herring to me now (although the expansion with respect to a basis should still be possible). Quantum gates are pure functions that can operate on superposition states, but the kind of superposition-manipulating primitive I have in mind should be closer to programmer intentions. For example, we should be able to support orthogonal decomposition (rewriting a state into a sum of orthogonal states but not necessarily with respect to a basis) and reflection as negating the orthogonal component (the $2\Pi - I$ formulation just isn’t intuitive enough and should be discouraged) as well as basis change (which is about rewriting states to sums of different sets of basis states), none of which is directly supported by quantum gates. (For an analogy, natural deduction works great for proving (intuitionistic) logical formulas, whereas the weakest precondition calculus can also be thought of as constructing proofs about logical formulas, but is much less convenient.) It turns out that the rphase construct of the Qunity language does natively support orthogonal decomposition followed by the inducing of different relative phases in the parallel and orthogonal components; this seems to be on the right direction (and proves that my reasoning isn’t too far-fetched), but are we restricted to inducing relative phases? Another thing that I’ll have to think about is a derivation from a spec that doesn’t say anything about rotation to an algorithm that uses rotation, and eventually to a program that uses the superposition-manipulating primitive (or some other more suitable primitive). A quantum-algorithmic language or reasoning framework should be rich enough for expressing the whole derivation, which reflects the thought process for (re-)inventing amplitude amplification.