The trek goes on

0027

Understanding the Deutsch–Jozsa algorithm

Posted at 22:12 on 24 September 2022, and revised at 09:45 on 12 October 2022

I started to write this post at the end of ICFP 2022 in Ljubljana, Slovenia, where I presented the paper ‘Datatype-generic programming meets elaborator reflection’, which is the result of a team effort with Liang-Ting and Tzu-Chi. Hopefully this will be the start of a series of papers that revitalise dependently typed datatype-generic programming (in particular ornamentation, of which I’m disappointed to see that there has been no satisfactory subsequent development while I was away and working on bidirectional programming), and eventually lead to a generic library that the practical Agda programmer will be happy to use. After all these years, I’m still uncomfortable with chatting with random strangers, and did so only a few times at the conference (incidentally, this may have helped to reduce the risk of getting covid), but I can now chat pretty comfortably with people who I know at least to some extent, which is probably enough.

While working on the paper, I stopped blogging, again — the last time I worked on datatype-generic programming, which was for my DPhil, my blogging had also ground to a halt. This is probably not a coincidence: datatype-generic programming is a research topic whose outcome is relatively predictable, so it’s easier to see what to do next and even make a realistic plan for a few years, which is perfect for a DPhil. Once I have a plan, I feel the pressure to finish it on time and try to avoid ‘distractions’. However, having been laid all out, the plan is not necessarily exciting, so I get trapped in something that I’m dedicated to finish but don’t feel the urge to write about in addition to the papers, and my blog dries up as a consequence.

But I’m not doing a DPhil anymore, and it’s healthier to get ‘distracted’ from time to time anyway, so here’s something unrelated to datatype-generic programming (and actually more exciting to me now): I studied PQP again with my summer intern student Yi-Hsiang Kuo along with Liang-Ting, and we made it to the chapter on quantum algorithms. I think I see some insight from the diagrammatic representation of the Deutsch–Jozsa algorithm that would probably be difficult to see from the traditional presentation. Based on the insight, the aim of this post is to give a different, and hopefully more straightforward, explanation of the algorithm.

One distinguishing feature of quantum computation is the ability to operate on states in superposition, which is often explained by analogy with parallel computation. Given a classical function $f : \mathbf{2}^2 \to \mathbf{2}$ (where $\mathbf{2} \defeq \{\mathtt 0, \mathtt 1\}$ and $\mathbf{2}^2 = \{\mathtt{00}, \mathtt{01}, \mathtt{10}, \mathtt{11}\}$; the exponent $2$ is arbitrarily chosen for presentation purposes, and can be generalised), we can extend it to a linear map $\bar f : \mathsf{F}(\mathbf{2}^2) \to \mathsf{F}\mathbf{2}$, where $\mathsf{F}(\mathbf{2}^n)$ denotes the free vector space over $\mathbb{C}$ generated from $\mathbf{2}^n$; in particular, we can apply $\bar f$ to the uniform superposition of all the possible inputs

$$ \bar f(|\mathtt{00}\rangle + |\mathtt{01}\rangle + |\mathtt{10}\rangle + |\mathtt{11}\rangle) $$

and (due to linearity) get the uniform superposition of all the outputs, as if we’re evaluating $f$ on all the possible inputs in parallel:

$$ |f(\mathtt{00})\rangle + |f(\mathtt{01})\rangle + |f(\mathtt{10})\rangle + |f(\mathtt{11})\rangle $$

Different from parallel computation, though, is that the evaluation results are summed together and not available separately — if we directly measure this state, we’ll get one of the results with equal probability, which is not too useful. A quantum algorithm has to make clever use of this sum somehow.

Technical remarks. As per standard notation, we wrap vectors in ‘kets’ $|\cdot\rangle$ and their adjoints in ‘bras’ $\langle\cdot|$ (think of them as column and row vectors if it helps), and $|\mathtt{00}\rangle$ etc coincides with the tensor product $|\mathtt 0\rangle \otimes |\mathtt 0\rangle$ etc. In this post we will only care about whether a state is possible or not (that is, whether its length is non-zero), and not about the exact probabilities, so we will not bother to normalise our states (that is, make their length $1$), and sometimes use equalities ‘$\approx$’ that are up to multiplication of a non-zero scalar to one side. (End of technical remarks.)

In fact, we have more forms of sums at our disposal. The above evaluation can be seen as part of the result of a change of basis. Originally $\bar f$ is defined by its behaviour on the ‘Z-basis’ (or ‘computational basis’) $\{|\mathtt{00}\rangle, |\mathtt{01}\rangle, |\mathtt{10}\rangle, |\mathtt{11}\rangle\}$. But this is not the only basis of $\mathsf F(\mathbf{2}^2)$ — another typically used ‘X-basis’ is

\begin{align*} |\texttt{++}\rangle &= |\mathtt{00}\rangle + |\mathtt{01}\rangle + |\mathtt{10}\rangle + |\mathtt{11}\rangle \\ |\texttt{+-}\rangle &= |\mathtt{00}\rangle - |\mathtt{01}\rangle + |\mathtt{10}\rangle - |\mathtt{11}\rangle \\ |\texttt{-+}\rangle &= |\mathtt{00}\rangle + |\mathtt{01}\rangle - |\mathtt{10}\rangle - |\mathtt{11}\rangle \\ |\texttt{--}\rangle &= |\mathtt{00}\rangle - |\mathtt{01}\rangle - |\mathtt{10}\rangle + |\mathtt{11}\rangle \end{align*}

where

\begin{align*} |\texttt{+}\rangle &\defeq |\mathtt 0\rangle + |\mathtt 1\rangle \\ |\texttt{-}\rangle &\defeq |\mathtt 0\rangle - |\mathtt 1\rangle \end{align*}

are the X-basis of $\mathsf F\mathbf{2}$. The uniform superposition of all possible inputs is exactly $|\texttt{++}\rangle$, which is distinct from all the other X-basis elements in the sense that all the others have exactly the same number of plus and minus signs. The behaviour of $\bar f$ can be investigated with respect to the X-basis as well.

Deutsch’s problem is (artificially) designed to make the sums useful: assuming that $f$ is either constant (always producing the same output) or balanced (producing two $\mathtt 0$ and two $\mathtt 1$ as the outputs — in general the same number of $\mathtt 0$ and $\mathtt 1$), the aim is to decide which is the case. It is immediately clear that a constant $f$ —let’s write this as $f_{\mathsf c}$— can be distinguished from a balanced $f$ —let’s write this as $f_{\mathsf b}$— by applying their linear version to $|\texttt{++}\rangle$ and inspecting the output:

\begin{align*} \bar f_{\mathsf c}\;|\texttt{++}\rangle &\approx |\mathtt 0\rangle \mathrel{\text{or}} |\mathtt 1\rangle \\ \bar f_{\mathsf b}\;|\texttt{++}\rangle &\approx |\texttt+\rangle \end{align*}

That is, when applied to $|\texttt{++}\rangle$, $\bar f_{\mathsf c}$ produces $|\mathtt 0\rangle$ or $|\mathtt 1\rangle$ (multiplied by $4$, which is omitted); for $\bar f_{\mathsf b}$, the output sum consists of exactly two $|\mathtt 0\rangle$ and two $|\mathtt 1\rangle$, and is essentially $|\texttt+\rangle$.

In fact, we can ask a very succinct question to decide whether $f$ is $f_{\mathsf c}$ or $f_{\mathsf b}$: if we apply $\bar f$ to $|\texttt{++}\rangle$, will the output ‘contain’ $|\texttt-\rangle$? Here a state $|\varphi\rangle$ ‘contains’ another state $|\psi\rangle$ exactly when their inner product $\langle\psi|\varphi\rangle$ is non-zero. (Think of $\langle\psi|\varphi\rangle$ as the result of some kind of test that detects the ‘amount’ of $|\psi\rangle$ in $|\varphi\rangle$ by projecting the latter onto the former, which is the geometric intuition of the inner product.) So the answer can be computed simply by checking whether $\langle\texttt-|\bar f|\texttt{++}\rangle$ is non-zero or not. If $f$ is $f_{\mathsf c}$, then the output —be it $|\mathtt 0\rangle$ or $|\mathtt 1\rangle$— will contain $|\texttt-\rangle$; otherwise, if $f$ is $f_{\mathsf b}$, then the output will be $|\texttt+\rangle$, which does not contain $|\texttt-\rangle$. It’s kind of expensive to compute $\langle\texttt-|\bar f|\texttt{++}\rangle$ classically though (especially when the domain of $f$ is generalised from $\mathbf 2^2$ to $\mathbf 2^n$), and we want a more efficient quantum solution.

One valuable intuition offered by PQP is the close relationships between quantum computation and various other theories, among which the closest one is linear algebra. In this case, the linear-algebraic expression $\langle\texttt-|\bar f|\texttt{++}\rangle$ could be ‘implemented’ in quantum theory by replacing $\langle\texttt-|$ with a measurement with respect to the X-basis, which produces either $|\texttt+\rangle$ or $|\texttt-\rangle$ probabilistically (with probabilities dictated by a postulate of quantum theory). However, even if this solution could ultimately be implemented as a quantum circuit, it is not satisfactory: we’ll get $|\texttt+\rangle$ or $|\texttt-\rangle$ with equal probability if $f$ is $f_{\mathsf c}$, or get $|\texttt+\rangle$ if $f$ is $f_{\mathsf b}$, and cannot perfectly distinguish the two cases. But surprisingly, if we transpose $\langle\texttt-|\bar f|\texttt{++}\rangle$ to $\langle\texttt{++}|\bar f^{\mathrm T}|\texttt-\rangle$ and ‘implement’ the latter as measuring the result of evaluating $\bar f$ backwards, we’ll be able to distinguish the two cases perfectly. The backward evaluation may be somewhat surprising because it may not seem very natural and easy to compute. Here the PQP intuition may help again: quantum maps are somewhat ‘directionless’ and reminiscent of relational programming, whose characteristic feature is that programs can be evaluated in any direction.

To see why backward evaluation works, we need a bit more analysis of $\bar f$ with respect to the whole X-basis, not just $|\texttt{++}\rangle$. For $\bar f_{\mathsf c}$, its behaviour is very different when applied to the X-basis elements other than $|\texttt{++}\rangle$ — because the numbers of plus and minus signs are equal and everything is cancelled out, the output simply vanishes! I cannot resist the temptation to express the behaviour of $\bar f_{\mathsf c}$ on the X-basis in a ‘pseudo-pattern-matching’ notation:

\begin{align*} & \bar f_{\mathsf c}\;|\texttt{++}\rangle \approx |\mathtt 0\rangle \mathrel{\text{or}} |\mathtt 1\rangle \\ & \awa{\bar f_{\mathsf c}\;|\texttt{++}\rangle}{\bar f_{\mathsf c}\;\_} = 0 \end{align*}

(Note that the length of the state $|\mathtt 0\rangle$ is $1$, whereas the length of the state $0$ is $0$; the former is a perfectly normal state corresponding to the bit $\mathtt 0$, whereas the latter is an impossible state.) For $\bar f_{\mathsf b}$, applying it to the other X-basis elements produces a $\pm 1$-weighted sum of two $|\mathtt 0\rangle$ and two $|\mathtt 1\rangle$. If the two $|\mathtt 0\rangle$ get the same sign, the two $|\mathtt 1\rangle$ will both get the opposite sign, so the sum will essentially be $|\texttt-\rangle$; otherwise, everything is cancelled out. So

\begin{align*} & \bar f_{\mathsf b}\;|\texttt{++}\rangle \approx |\texttt+\rangle \\ & \awa{\bar f_{\mathsf b}\;|\texttt{++}\rangle}{\bar f_{\mathsf b}\;\_} \approx |\texttt-\rangle \mathrel{\text{or}} 0 \end{align*}

That is, applying $\bar f_{\mathsf b}$ to any X-basis element other than $|\texttt{++}\rangle$ will produce either $|\texttt-\rangle$ or nothing.

Now we ask the backward question: among the X-basis elements, what is a possible input that allows $\bar f$ to produce an output that contains $|\texttt-\rangle$? If $f$ is $f_{\mathsf c}$, then the only possible input that produces any output is $|\texttt{++}\rangle$, and that output —be it $|\mathtt 0\rangle$ or $|\mathtt 1\rangle$— always contains $|\texttt-\rangle$, so the answer will be $|\texttt{++}\rangle$. On the other hand, if $f$ is $f_{\mathsf b}$, then the answer cannot be $|\texttt{++}\rangle$, since the corresponding output $|\texttt+\rangle$ does not contain $|\texttt-\rangle$ — the answer will be something other than $|\texttt{++}\rangle$. So by getting an answer to the question and checking whether it is $|\texttt{++}\rangle$ or not, we’ll be able to decide whether $f$ is $f_{\mathsf c}$ or $f_{\mathsf b}$. The target state $|\texttt-\rangle$ is chosen so as to exclude $|\texttt{++}\rangle$ from the domain of $\bar f_{\mathsf b}$, which then becomes disjoint from the domain of $\bar f_{\mathsf c}$.

We can then write a quantum map in ZX-calculus to answer the question. (The ZX-diagrams below are in the style of PQP, where the direction of computation is from bottom to top, and quantum systems coexist with classical ones through the doubling construction.) First, the quantum map $\hat f$ and its transpose are drawn as

For $\hat f$, its two inputs flow in from the bottom through the two (‘thick’) quantum wires, and its output goes out from the top wire; the transpose of $\hat f$ is the original diagram rotated by 180 degrees, with a twist to keep the outputs in the original order. (I’m skipping a lot of details about the ZX-calculus, and in fact I’ve invented my own graphical notations to make ZX-diagrams easier to draw, but hopefully the high-level intuition is still recognisable.) Applying the transpose to $|\texttt-\rangle$ (represented as a one-output Z-spider with phase $\pi$ below) gives us a quantum state representing the possible inputs that are mapped to $|\texttt-\rangle$ by $\bar f$. Then we measure this state with respect to the X-basis (performed by two bastard X-spiders below), so that we can check whether the result is $\texttt{++}$ or something else. As a ZX-diagram (which does not include the last check):

This is not something we can run on a quantum computer yet: the input state $|\texttt-\rangle$ and the measurement can be readily implemented as part of a quantum circuit, but the transposed $\hat f$ —and even just $\hat f$ itself— cannot. We have to retrofit a classical function such as $f$ as an invertible/reversible function and encode it as a quantum circuit, which serves as the input to the algorithm — traditionally this is called an oracle. The standard oracle construction gives us a circuit $U_f$ such that

$$ U_f(|x\rangle \otimes |y\rangle) = |x\rangle \otimes |f(x) \oplus y\rangle \qquad \forall x : \mathbf 2^2, y : \mathbf 2$$

where ‘$\oplus$’ is the exclusive-or operation on bits. This specification can be directly transcribed into ZX-calculus:

where some input, intermediate, and output states are annotated in green to show the correspondence with the specification. Note that this is merely a mathematical specification and not a quantum circuit, but the $U_f$ construction does give a legitimate quantum circuit with equivalent behaviour.

Remark. The way in which the backward question is phrased may have depended on this particular form of oracle; if a different form of oracle is used, we may have to ask a different question. (End of remark.)

Now the remaining task is extending the oracle diagram (that is, use the oracle as a black box) to express the application of the transpose. Here is, I believe, where ZX-calculus shines. We want the left part of the diagram to be just two wires connected to the outputs of the transpose of $\hat f$ —which are just the inputs of $\hat f$ because of the ‘directionlessness’— and it’s instinctive for a ZX-calculus user to plug in two one-output Z-spiders (which are in red below, and represent $|\texttt+\rangle$)

so as to trigger the spider fusion rule and get

We’re free to rearrange ZX-diagrams as long as connectivity doesn’t change, so we can directly rearrange the above diagram into a form that’s close to what we want:

Now we want the input to the transpose of $\hat f$ to be $|\texttt-\rangle$. This is again easy for a ZX-calculus user — the one-input, two-output X-spider copies X-basis elements, so we can directly plug in $|\texttt-\rangle$:

The state is then copied by the X-spider, and we discard the extra one.

So we have found out how we can use the oracle to compute the application of the transpose. To conform to the standard quantum circuit model, qubits should start with $|\mathtt 0\rangle$ and measurements should be performed with respect to the Z-basis. But this is easy to fix by inserting Hadamards (represented as boxes) to switch between Z- and X-spiders. In summary, the algorithm can be implemented as follows, by putting the red parts around the oracle:

I should confess that this derivation was worked out backwards from the solution given in PQP, although I think my derivation makes more sense than PQP’s. And I do prefer my derivation to the traditional presentation, which directly gives the final circuit and verifies what happens at each point of interest in the circuit. To see why Deutsch’s problem can be solved with just one superposition query, apart from a little bit of basic knowledge about quantum computing, all one needs are just the change to the X-basis and the revelation that backward evaluation is possible, and the rest is standard maths. After figuring out what to compute mathematically, we then use the ZX-calculus to derive, quite straightforwardly, how to use the oracle to carry out that computation, so there is a clean separation between the analysis of the problem and the implementation of the solution as a quantum circuit, which is just a twisted form of a simple computation. Most importantly, all the techniques including change of basis, backward evaluation, and the oracle encoding are general and intuitive enough that there is prospect to reuse them to understand old algorithms and invent new ones. On the other hand, the Deutsch–Jozsa algorithm is pretty basic, and I’ll have to work on more realistic problems —in particular ones that involve probabilities— to make sure that what I’m doing is scalable. But I’m happy that for a basic algorithm I have a correspondingly basic explanation now, which should be a good start.

(Thanks to ZX Yang for feedback on this post.)

Follow-up: 0028 (Understanding the Bernstein–Vazirani algorithm)