### New paper out now! "Hardness results for decoding the surface code with Pauli noise"

The other week I came out with my first paper since starting grad school: Hardness results for decoding the surface code with Pauli noise, with my advisor Akimasa Miyake. I'm quite excited about this result and want to share a nontechnical summary of the results and their implications.

First, some background. Quantum computers work by taking advantage of the ways that the laws of physics at the small scales of atoms and molecules (that is, quantum physics) work differently than the the laws of physics at larger scales that we are all familiar with (that is, classical physics). We know we can use these different laws of physics to store and process information differently than how we do so on regular, "classical" computers. In classical computers, the basic unit of information storage is the bit, which can take the value 0 or 1, and we do computation by flipping bits between 0 and 1 based on the values of other bits. Quantum mechanics allows us to store information in "quantum bits", or **qubits**. A qubit can be any physical system following the laws of quantum physics that has 2 basic states: one analogous to a "0" and one analogous to a "1". Quantum physics tells us that the allowed states of such a system are not just the 0 state or the 1 state, but also any combination of them that contains some about of 0 and some amount of 1. This is distinct from allowing the state to be any number between 0 and 1, like 0.5 or 0.25. Quantum mechanics actually gives us more power than that.

The fact that quantum states are continuous, rather than discrete like the classical bits that can be 0 or 1, means quantum computing gives us more computational power but also comes with some problems. Continuous states means that small errors can add up over time, over the course of a long computation, and ruin the computation. This is not a problem in classical computation for 2 reasons:

- The discrete, 0 or 1 state of classical bits means that if we get small errors, we can remove them by simply rounding to the nearest 0 or 1, and the error goes away. For example, we could store the state of a classical bit by the voltage in a capacitor: 0 volts or 1 volts, for example. We could always just continuously measure the voltage in the capacitor and round down to 0 volts or round up to 1 volt, and we would never have any problems with sufficiently small errors. We can't do this with quantum states due to their continuous nature.
- Also, us humans have become so incredibly good at manufacturing extremely accurate, error free components for classical computers. One talking point that quantum error correction people like to mention is that the dominant source of errors in computers at the electronics level (I mean, caused by the hardware acting up, not by humans using the device incorrectly) is stray cosmic rays from outer space striking the computer. That's how good we've gotten at manufacturing computers! Still, even if we got that good at manufacturing quantum computers (and we are very far away from that), we would still have to deal with errors due to point number 1 above.

**quantum error correction**. This involves encoding the information in our qubits using many more qubits for redundancy in a way that allows us to detect and correct errors. This is analogous to error correction with classical bits, where you encode some information in classical bits with many more bits for redundancy. The simplest classical error correcting code is called the "repetition code": if you want to send a bit over some noisy channel where the bit could be flipped between 0 or 1 accidentally, just send many copies of the bit: send 0000... or 1111..., however many times you want, instead of 0 or 1. Then whomever receives the bits at the other end can take majority vote of the bits they received to recover the original bit. Of course, this is a very inefficient way to do error correction. There are many ways to do error correction in a way that uses fewer bits but retains the same amount of redundancy, but this is still an instructive example. We can do a somewhat similar song and dance with qubits, encoding some qubits with a larger number of qubits to protect against errors that occur during computation.

The quantum states we want to protect with quantum error correcting codes are kind of like the probability distributions of classical bits that I just talked about. Quantum states can be combinations of different states, with different weights that determine the probabilities of those different states. Except, unlike with classical probability distributions where the "weights", or probabilities, have to be nonnegative real numbers, the weights in quantum states, or "amplitudes" as we call them, can be negative numbers, and even complex numbers! In quantum error correction, we don't only need to protect the values of the individual states that make up the whole quantum state (this would be called protecting against "bit flip noise"), we also need to protect the values of the complex number weights of all the individual states that make up the whole quantum state ("phase flip noise" is one such type of noise that affects the complex number weights of the individual states that make up the quantum state but doesn't affect the values of those individual states). So quantum error correction is really a more general, complicated, and interesting task than classical error correction.

One of the most promising quantum error correcting codes that we think we will use to build a quantum computer is called the **surface code**. The surface code has several properties that make it practically useful:

- All of the error corrections are geometrically local in 2D. This means that we can build our quantum computer on a grid on a 2D surface, where only neighboring qubits can talk to each other, and we can still run the surface code on such a quantum computer. This turns out to be exactly how we build some types of quantum computers, in particular superconducting quantum computers, which are what Google and IBM are investing in.
- We know not only how to encode information in the surface code to protect it against errors, but we also know how to
**do computation on the encoded information**. It's one thing to encode some qubits using an error correcting code to protect them against errors, but that's not all we need: we also need to do our computations that change the state of the qubits, all while the qubits are encoded in this error correcting code. - It has a very high tolerance for errors, moreso than many other quantum codes.

With all these nice properties, the surface code has been a favorite of the quantum error correction community for many years. An active area of research about the surface code has been **decoding algorithms** for it. Decoding is the task of determining what error happened and how to correct it, given some measurement results extracted from the code that give some indirect information about the error that happened. Decoding is a task for regular old classical computers: an essential part of any large, useful quantum computer will be classical computers churning alongside them, processing information about what errors happened on the quantum computer and figuring out how to correct them.

OK, with all of that context out of the way, now for my results and how they fit into this picture. I have
shown that 2 closely related but different decoding problems for the
surface code are "hard" computational problems in a formal sense. This
means that we can't have decoding algorithms that are both efficient and accurate, in a certain technical sense of both of those words. There are 2 different decoding problems associated with quantum error correction: what we call **Maximum Probability Error (MPE)** decoding, and **Maximum Likelihood (ML)** decoding. I will explain the difference between those 2 decoding problems in a bit, but first let me get to the results. I have shown that MPE decoding is **NP-hard** and ML decoding is **#P-hard**. First I'll explain what it means to be NP-hard or #P-hard, then I'll explain the difference between those 2 decoding problems.

One of the most surprising, interesting results in computer science is that a large class of computational problems that we call **NP** are all equivalent in the sense that if even one problem in that class has an efficient algorithm that solves it, then every problem in that class automatically does as well. This is surprising because many of these problems in NP seem not at all related on the surface: they range from planning the shortest route to many locations (the travelling salesman problem), to coloring regions on a map so none of the bordering regions have the same color (the graph coloring problem), to optimally packing bins into boxes (the aptly-named bin packing problem), to solving classic games like minesweeper and many Nintendo games. Computer scientists have shown this incredible phenomenon by constructing algorithms to convert between any of these problems. So an algorithm that solves one of them can solve all of them. Computer scientists have searched for decades for efficient algorithms for all these problems, many of which are very practically important problems, yet no such efficient algorithms have turned up. They view this as overwhelming evidence that all of these problems must not be efficiently solvable, for if there were efficient algorithms for all these problems, then surely we must have found one of them by now! Formally proving this empirical observation that all NP-hard problems are not efficiently solvable is the famous P versus NP problem that has a million dollar prize attached to it by the Clay mathematics institute.

A problem is **NP-hard** if it is one of those special problems that can be used to efficiently solve all problems in NP. A problem is **#P-hard** if it is a problem that not only allows one to efficiently solve an NP-hard problem, but also **count the number of solutions to** any NP-hard problem. Intuitively, a problem being #P-hard is much stronger than merely being NP-hard, because counting solutions (as for #P) should be harder than merely determining whether a problem has 0 or ≥1 solutions (as for NP).

OK, now for the explanation of what those 2 different decoding problems are. In classical error correction, you send a message that is a bunch of bits, encoded using a larger number of bits that has some redundancy so you can detect and correct errors. You detect the error by figuring out which bits in the message were flipped, and you correct the error by flipping those same bits so they go back to their original value. Given any message that is a noisy, corrupted message from an error correcting code, there may be several possible original messages consistent with that corrupted code word. Generally, you want to find the message that requires flipping the smallest number of bits to get from the corrupted message to that original message. The idea is that the message that requires flipping the smallest number of bits to get between that message and the corrupted message is the one that is the most likely to have been the original message.

You can do a similar strategy with quantum error correcting codes. You have some measurement results that you've done on error correcting code that give you some indirect information about what error may have happened. These measurement results are called **syndromes**. In general there may be many possible errors that are consistent with the syndromes, that would have caused those syndromes to occur. The task is to find an error consistent with the syndromes that has maximum probability of occurring, among all the errors consistent with the syndromes. This is called **Maximum Probability Error (MPE)** decoding.

There is another decoding strategy associated with quantum codes, however. This decoding strategy takes into account to a special property of quantum codes that different errors can be corrected by the same correction operation. This special property of quantum codes is called **degeneracy**, and it really doesn't have a counterpart in classical error correction. I'll give a simple example to explain this property. Say you've done your measurements on your quantum code and determined that there are 4 possible errors consistent with the measurement results: E_{1},_{ }E_{2}, E_{3}, and E_{4}. Due to that special property of quantum codes we call degeneracy, it could be the case that E_{1} and E_{2} are corrected by the same correction operation, and E_{3} and E_{4} are also corrected by the same correction operation (different than the one for E_{1} and E_{2}). We would say that E_{1} and E_{2}, as with E_{3} and E_{4}, are **logically equivalent**. So, if the actual error was E_{1}, but your decoding algorithm returns E_{2} instead, there's actually no problem at all: you will still do the "correct" correction operation. So to determine what error correction operation to apply, you want to consider the sum of the probabilities of E_{1} and E_{2}, versus the sum of the probabilities of E_{3} and E_{4}. Whichever probability is higher tells you which error correction operation you should apply.

This is what the **Maximum Likelihood (ML)** decoding task: you consider all the possible errors consistent with the syndromes, partition them into groups (or "cosets") of errors that are logically equivalent (which means they share the same correction operation), and look at the sum of error probabilities for those groups. We call this sum of error probabilities for a group of logically equivalent errors the **coset probability**. You output an error correction operation corresponding to a group of errors with maximum coset probability.

ML decoding doesn't maximize the probability that you found the original error, but it does maximize the probability that you found the right correction operation, which is what really matters at the end of the day. So ML decoding is really "better" than MPE decoding in that sense. Although, ML decoding can be more computationally intensive than MPE decoding, so MPE decoding is still a viable strategy to use for quantum computing. Intuitively, it seems like ML decoding may be much harder than MPE decoding, because you have to consider sums of probabilities of many errors, instead of probabilities of individual errors. My results confirm this intuition.

OK, now that I've explained what the results say, I'll say just a little about how I proved them. The classic NP-hard problem, the one that started this whole field of computer science about NP-hard problems, is the **SAT problem**. SAT, short for "satisfiability", is the problem of determining whether any boolean formula has a TRUE/FALSE assignment of its variables that makes the formula output TRUE. OK, that's a lot of jargon, let's break that down. A "boolean formula" is an expression involving some variables (which can take the values TRUE or FALSE) and AND, OR, and NOT expressions. We want to know if we can make TRUE/FALSE assignments to the variables in a way that makes the whole boolean formula output TRUE. For example, a boolean formula could be (x AND NOT y). In that boolean formula, x and y are the input variables. We would say this formula is satisfiable because we could set x = TRUE, then the formula would output TRUE. The boolean formula x AND y AND NOT x is not satisfiable, for example.

Likewise, the problem **#SAT** is the problem of, given a boolean formula, determining how many different assignments to the input variables make the formula output true. This is a harder problem than determining whether there is zero or more than zero such assignments, as in the SAT problem.

The technical parts of my paper are about showing how to convert a SAT problem into a surface code MPE decoding problem, such that the answer to the decoding problem tells you the answer to the SAT problem. I take a boolean formula, which is something that can be laid down in a flat 2D plane, and carefully embed it in the flat 2D planar structure of the surface code. That establishes MPE decoding as NP-hard. Once we have that result, extending it to show that ML decoding is #P-hard is a much shorter exercise. Recall that ML decoding is related to summing probabilities of many different errors, which should already seem related to counting solutions of an NP hard problem (specifically, the NP hard problem that is MPE decoding! But also, SAT, since I showed that SAT and MPE decoding are in a certain sense equivalent problems).

One interesting part of my results is that we formulate the decoding problem slightly differently, slightly more generally in fact, than is standard. Specifically, we consider the input to the decoding problem to be not only the measurement results from the code, but also a description of the noise process that occurs on a per-qubit level. This is a very reasonable setting for the decoding problem, since quantum computers (in particular superconducting quantum computers, where the qubits are large-ish [on the order of a millimeter] and subject to manufacturing irregularities) will not have the exact same error processes occurring on each qubit, and we know we can do error correction more effectively if we take into account this information on the specific noise present when we decode. Although people often think about the performance of error correcting codes and decoding algorithms with only very simple noise that is the same for all qubits, we know that this is not actually how error correction will go in practice with real quantum computers.

Is this bad news for quantum computing? Does this mean we will have to abandon the surface code? No and no! While these results are certainly theoretically interesting, they don't actually spell bad news for using the surface code to build quantum computers. This is because these results are all about **worst-case** hardness: they say that these decoding problems are too hard to have an efficient algorithm that always gets the right answer for every possible input. But it turns out we don't need such strict standards to use the surface code in quantum computing. It turns out that all we need to use the surface code in practice are decoding algorithms that are efficient and correct in the **average case**: they need to get the right answer for all errors that are likely to occur in practice. My results show that there are always edge cases that defeat all efficient decoding algorithms, although these edge cases are unlikely enough to occur in practice that they won't interfere with using the surface code in practice.

So, what is the practical takeaway of these results? The impact of these results is that they will inform further research into decoding algorithms. Whenever you have some formal hardness results for practical problems of interest, you have several options:

- Give up ☹️. No way, we're not doing that! ❌
- Reformulate the problem. Are we even thinking about the problem the right way? Are we translating the real world problem we want to solve into a formal definition of a problem we can feed into a computer the right way? For surface decoding, yes we are thinking about the problem the right way—reformulating the problem isn't an option. ❌
- Try for approximation algorithms—algorithms that always return something close to the right answer, even it's not the exact right answer. Well, I also proved some hardness of approximation results, that it's hard to output an error or correction operation that's at all close to the right answer. So this is not an option. ❌
- Look for special cases (restrictions of the problem) that you can solve exactly. Yes, this is an active area of research, and there are several interesting special cases that we can solve exactly. ✅
- Think about heuristic algorithms—algorithms that don't always work, but work often enough to be useful. Yes, much of the research on surface code decoding algorithms focuses on these types of algorithms. ✅

## Comments

## Post a Comment