Computer science

google-gets-an-error-corrected-quantum-bit-to-be-stable-for-an-hour

Google gets an error-corrected quantum bit to be stable for an hour


Using almost the entire chip for a logical qubit provides long-term stability.

Google’s new Willow chip is its first new generation of chips in about five years. Credit: Google

On Monday, Nature released a paper from Google’s quantum computing team that provides a key demonstration of the potential of quantum error correction. Thanks to an improved processor, Google’s team found that increasing the number of hardware qubits dedicated to an error-corrected logical qubit led to an exponential increase in performance. By the time the entire 105-qubit processor was dedicated to hosting a single error-corrected qubit, the system was stable for an average of an hour.

In fact, Google told Ars that errors on this single logical qubit were rare enough that it was difficult to study them. The work provides a significant validation that quantum error correction is likely to be capable of supporting the execution of complex algorithms that might require hours to execute.

A new fab

Google is making a number of announcements in association with the paper’s release (an earlier version of the paper has been up on the arXiv since August). One of those is that the company is committed enough to its quantum computing efforts that it has built its own fabrication facility for its superconducting processors.

“In the past, all the Sycamore devices that you’ve heard about were fabricated in a shared university clean room space next to graduate students and people doing kinds of crazy stuff,” Google’s Julian Kelly said. “And we’ve made this really significant investment in bringing this new facility online, hiring staff, filling it with tools, transferring their process over. And that enables us to have significantly more process control and dedicated tooling.”

That’s likely to be a critical step for the company, as the ability to fabricate smaller test devices can allow the exploration of lots of ideas on how to structure the hardware to limit the impact of noise. The first publicly announced product of this lab is the Willow processor, Google’s second design, which ups its qubit count to 105. Kelly said one of the changes that came with Willow actually involved making the individual pieces of the qubit larger, which makes them somewhat less susceptible to the influence of noise.

All of that led to a lower error rate, which was critical for the work done in the new paper. This was demonstrated by running Google’s favorite benchmark, one that it acknowledges is contrived in a way to make quantum computing look as good as possible. Still, people have figured out how to make algorithm improvements for classical computers that have kept them mostly competitive. But, with all the improvements, Google expects that the quantum hardware has moved firmly into the lead. “We think that the classical side will never outperform quantum in this benchmark because we’re now looking at something on our new chip that takes under five minutes, would take 1025 years, which is way longer than the age of the Universe,” Kelly said.

Building logical qubits

The work focuses on the behavior of logical qubits, in which a collection of individual hardware qubits are grouped together in a way that enables errors to be detected and corrected. These are going to be essential for running any complex algorithms, since the hardware itself experiences errors often enough to make some inevitable during any complex calculations.

This naturally creates a key milestone. You can get better error correction by adding more hardware qubits to each logical qubit. If each of those hardware qubits produces errors at a sufficient rate, however, then you’ll experience errors faster than you can correct for them. You need to get hardware qubits of a sufficient quality before you start benefitting from larger logical qubits. Google’s earlier hardware had made it past that milestone, but only barely. Adding more hardware qubits to each logical qubit only made for a marginal improvement.

That’s no longer the case. Google’s processors have the hardware qubits laid out on a square grid, with each connected to its nearest neighbors (typically four except at the edges of the grid). And there’s a specific error correction code structure, called the surface code, that fits neatly into this grid. And you can use surface codes of different sizes by using progressively more of the grid. The size of the grid being used is measured by a term called distance, with larger distance meaning a bigger logical qubit, and thus better error correction.

(In addition to a standard surface code, Google includes a few qubits that handle a phenomenon called “leakage,” where a qubit ends up in a higher-energy state, instead of the two low-energy states defined as zero and one.)

The key result is that going from a distance of three to a distance of five more than doubled the ability of the system to catch and correct errors. Going from a distance of five to a distance of seven doubled it again. Which shows that the hardware qubits have reached a sufficient quality that putting more of them into a logical qubit has an exponential effect.

“As we increase the grid from three by three to five by five to seven by seven, the error rate is going down by a factor of two each time,” said Google’s Michael Newman. “And that’s that exponential error suppression that we want.”

Going big

The second thing they demonstrated is that, if you make the largest logical qubit that the hardware can support, with a distance of 15, it’s possible to hang onto the quantum information for an average of an hour. This is striking because Google’s earlier work had found that its processors experience widespread simultaneous errors that the team ascribed to cosmic ray impacts. (IBM, however, has indicated it doesn’t see anything similar, so it’s not clear whether this diagnosis is correct.) Those happened every 10 seconds or so. But this work shows that a sufficiently large error code can correct for these events, whatever their cause.

That said, these qubits don’t survive indefinitely. One of them seems to be a localized temporary increase in errors. The second, more difficult to deal with problem involves a widespread spike in error detection affecting an area that includes roughly 30 qubits. At this point, however, Google has only seen six of these events, so they told Ars that it’s difficult to really characterize them. “It’s so rare it actually starts to become a bit challenging to study because you have to gain a lot of statistics to even see those events at all,” said Kelly.

Beyond the relative durability of these logical qubits, the paper notes another advantage to going with larger code distances: it enhances the impact of further hardware improvements. Google estimates that at a distance of 15, improving hardware performance by a factor of two would drop errors in the logical qubit by a factor of 250. At a distance of 27, the same hardware improvement would lead to an improvement of over 10,000 in the logical qubit’s performance.

Note that none of this will ever get the error rate to zero. Instead, we just need to get the error rate to a level where an error is unlikely for a given calculation (more complex calculations will require a lower error rate). “It’s worth understanding that there’s always going to be some type of error floor and you just have to push it low enough to the point where it practically is irrelevant,” Kelly said. “So for example, we could get hit by an asteroid and the entire Earth could explode and that would be a correlated error that our quantum computer is not currently built to be robust to.”

Obviously, a lot of additional work will need to be done to both make logical qubits like this survive for even longer, and to ensure we have the hardware to host enough logical qubits to perform calculations. But the exponential improvements here, to Google, suggest that there’s nothing obvious standing in the way of that. “We woke up one morning and we kind of got these results and we were like, wow, this is going to work,” Newman said. “This is really it.”

Nature, 2024. DOI: 10.1038/s41586-024-08449-y  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Google gets an error-corrected quantum bit to be stable for an hour Read More »

google’s-deepmind-tackles-weather-forecasting,-with-great-performance

Google’s DeepMind tackles weather forecasting, with great performance

By some measures, AI systems are now competitive with traditional computing methods for generating weather forecasts. Because their training penalizes errors, however, the forecasts tend to get “blurry”—as you move further ahead in time, the models make fewer specific predictions since those are more likely to be wrong. As a result, you start to see things like storm tracks broadening and the storms themselves losing clearly defined edges.

But using AI is still extremely tempting because the alternative is a computational atmospheric circulation model, which is extremely compute-intensive. Still, it’s highly successful, with the ensemble model from the European Centre for Medium-Range Weather Forecasts considered the best in class.

In a paper being released today, Google’s DeepMind claims its new AI system manages to outperform the European model on forecasts out to at least a week and often beyond. DeepMind’s system, called GenCast, merges some computational approaches used by atmospheric scientists with a diffusion model, commonly used in generative AI. The result is a system that maintains high resolution while cutting the computational cost significantly.

Ensemble forecasting

Traditional computational methods have two main advantages over AI systems. The first is that they’re directly based on atmospheric physics, incorporating the rules we know govern the behavior of our actual weather, and they calculate some of the details in a way that’s directly informed by empirical data. They’re also run as ensembles, meaning that multiple instances of the model are run. Due to the chaotic nature of the weather, these different runs will gradually diverge, providing a measure of the uncertainty of the forecast.

At least one attempt has been made to merge some of the aspects of traditional weather models with AI systems. An internal Google project used a traditional atmospheric circulation model that divided the Earth’s surface into a grid of cells but used an AI to predict the behavior of each cell. This provided much better computational performance, but at the expense of relatively large grid cells, which resulted in relatively low resolution.

For its take on AI weather predictions, DeepMind decided to skip the physics and instead adopt the ability to run an ensemble.

Gen Cast is based on diffusion models, which have a key feature that’s useful here. In essence, these models are trained by starting them with a mixture of an original—image, text, weather pattern—and then a variation where noise is injected. The system is supposed to create a variation of the noisy version that is closer to the original. Once trained, it can be fed pure noise and evolve the noise to be closer to whatever it’s targeting.

In this case, the target is realistic weather data, and the system takes an input of pure noise and evolves it based on the atmosphere’s current state and its recent history. For longer-range forecasts, the “history” includes both the actual data and the predicted data from earlier forecasts. The system moves forward in 12-hour steps, so the forecast for day three will incorporate the starting conditions, the earlier history, and the two forecasts from days one and two.

This is useful for creating an ensemble forecast because you can feed it different patterns of noise as input, and each will produce a slightly different output of weather data. This serves the same purpose it does in a traditional weather model: providing a measure of the uncertainty for the forecast.

For each grid square, GenCast works with six weather measures at the surface, along with six that track the state of the atmosphere and 13 different altitudes at which it estimates the air pressure. Each of these grid squares is 0.2 degrees on a side, a higher resolution than the European model uses for its forecasts. Despite that resolution, DeepMind estimates that a single instance (meaning not a full ensemble) can be run out to 15 days on one of Google’s tensor processing systems in just eight minutes.

It’s possible to make an ensemble forecast by running multiple versions of this in parallel and then integrating the results. Given the amount of hardware Google has at its disposal, the whole process from start to finish is likely to take less than 20 minutes. The source and training data will be placed on the GitHub page for DeepMind’s GraphCast project. Given the relatively low computational requirements, we can probably expect individual academic research teams to start experimenting with it.

Measures of success

DeepMind reports that GenCast dramatically outperforms the best traditional forecasting model. Using a standard benchmark in the field, DeepMind found that GenCast was more accurate than the European model on 97 percent of the tests it used, which checked different output values at different times in the future. In addition, the confidence values, based on the uncertainty obtained from the ensemble, were generally reasonable.

Past AI weather forecasters, having been trained on real-world data, are generally not great at handling extreme weather since it shows up so rarely in the training set. But GenCast did quite well, often outperforming the European model in things like abnormally high and low temperatures and air pressure (one percent frequency or less, including at the 0.01 percentile).

DeepMind also went beyond standard tests to determine whether GenCast might be useful. This research included projecting the tracks of tropical cyclones, an important job for forecasting models. For the first four days, GenCast was significantly more accurate than the European model, and it maintained its lead out to about a week.

One of DeepMind’s most interesting tests was checking the global forecast of wind power output based on information from the Global Powerplant Database. This involved using it to forecast wind speeds at 10 meters above the surface (which is actually lower than where most turbines reside but is the best approximation possible) and then using that number to figure out how much power would be generated. The system beat the traditional weather model by 20 percent for the first two days and stayed in front with a declining lead out to a week.

The researchers don’t spend much time examining why performance seems to decline gradually for about a week. Ideally, more details about GenCast’s limitations would help inform further improvements, so the researchers are likely thinking about it. In any case, today’s paper marks the second case where taking something akin to a hybrid approach—mixing aspects of traditional forecast systems with AI—has been reported to improve forecasts. And both those cases took very different approaches, raising the prospect that it will be possible to combine some of their features.

Nature, 2024. DOI: 10.1038/s41586-024-08252-9  (About DOIs).

Google’s DeepMind tackles weather forecasting, with great performance Read More »

flour,-water,-salt,-github:-the-bread-code-is-a-sourdough-baking-framework

Flour, water, salt, GitHub: The Bread Code is a sourdough baking framework

One year ago, I didn’t know how to bake bread. I just knew how to follow a recipe.

If everything went perfectly, I could turn out something plain but palatable. But should anything change—temperature, timing, flour, Mercury being in Scorpio—I’d turn out a partly poofy pancake. I presented my partly poofy pancakes to people, and they were polite, but those platters were not particularly palatable.

During a group vacation last year, a friend made fresh sourdough loaves every day, and we devoured it. He gladly shared his knowledge, his starter, and his go-to recipe. I took it home, tried it out, and made a naturally leavened, artisanal pancake.

I took my confusion to YouTube, where I found Hendrik Kleinwächter’s “The Bread Code” channel and his video promising a course on “Your First Sourdough Bread.” I watched and learned a lot, but I couldn’t quite translate 30 minutes of intensive couch time to hours of mixing, raising, slicing, and baking. Pancakes, part three.

It felt like there had to be more to this. And there was—a whole GitHub repository more.

The Bread Code gave Kleinwächter a gratifying second career, and it’s given me bread I’m eager to serve people. This week alone, I’m making sourdough Parker House rolls, a rosemary olive loaf for Friendsgiving, and then a za’atar flatbread and standard wheat loaf for actual Thanksgiving. And each of us has learned more about perhaps the most important aspect of coding, bread, teaching, and lots of other things: patience.

Hendrik Kleinwächter on his Bread Code channel, explaining his book.

Resources, not recipes

The Bread Code is centered around a book, The Sourdough Framework. It’s an open source codebase that self-compiles into new LaTeX book editions and is free to read online. It has one real bread loaf recipe, if you can call a 68-page middle-section journey a recipe. It has 17 flowcharts, 15 tables, and dozens of timelines, process illustrations, and photos of sourdough going both well and terribly. Like any cookbook, there’s a bit about Kleinwächter’s history with this food, and some sourdough bread history. Then the reader is dropped straight into “How Sourdough Works,” which is in no way a summary.

“To understand the many enzymatic reactions that take place when flour and water are mixed, we must first understand seeds and their role in the lifecycle of wheat and other grains,” Kleinwächter writes. From there, we follow a seed through hibernation, germination, photosynthesis, and, through humans’ grinding of these seeds, exposure to amylase and protease enzymes.

I had arrived at this book with these specific loaf problems to address. But first, it asks me to consider, “What is wheat?” This sparked vivid memories of Computer Science 114, in which a professor, asked to troubleshoot misbehaving code, would instead tell students to “Think like a compiler,” or “Consider the recursive way to do it.”

And yet, “What is wheat” did help. Having a sense of what was happening inside my starter, and my dough (which is really just a big, slow starter), helped me diagnose what was going right or wrong with my breads. Extra-sticky dough and tightly arrayed holes in the bread meant I had let the bacteria win out over the yeast. I learned when to be rough with the dough to form gluten and when to gently guide it into shape to preserve its gas-filled form.

I could eat a slice of each loaf and get a sense of how things had gone. The inputs, outputs, and errors could be ascertained and analyzed more easily than in my prior stance, which was, roughly, “This starter is cursed and so am I.” Using hydration percentages, measurements relative to protein content, a few tests, and troubleshooting steps, I could move closer to fresh, delicious bread. Framework: accomplished.

I have found myself very grateful lately that Kleinwächter did not find success with 30-minute YouTube tutorials. Strangely, so has he.

Sometimes weird scoring looks pretty neat. Kevin Purdy

The slow bread of childhood dreams

“I have had some successful startups; I have also had disastrous startups,” Kleinwächter said in an interview. “I have made some money, then I’ve been poor again. I’ve done so many things.”

Most of those things involve software. Kleinwächter is a German full-stack engineer, and he has founded firms and worked at companies related to blogging, e-commerce, food ordering, travel, and health. He tried to escape the boom-bust startup cycle by starting his own digital agency before one of his products was acquired by hotel booking firm Trivago. After that, he needed a break—and he could afford to take one.

“I went to Naples, worked there in a pizzeria for a week, and just figured out, ‘What do I want to do with my life?’ And I found my passion. My passion is to teach people how to make amazing bread and pizza at home,” Kleinwächter said.

Kleinwächter’s formative bread experiences—weekend loaves baked by his mother, awe-inspiring pizza from Italian ski towns, discovering all the extra ingredients in a supermarket’s version of the dark Schwarzbrot—made him want to bake his own. Like me, he started with recipes, and he wasted a lot of time and flour turning out stuff that produced both failures and a drive for knowledge. He dug in, learned as much as he could, and once he had his head around the how and why, he worked on a way to guide others along the path.

Bugs and syntax errors in baking

When using recipes, there’s a strong, societally reinforced idea that there is one best, tested, and timed way to arrive at a finished food. That’s why we have America’s Test Kitchen, The Food Lab, and all manner of blogs and videos promoting food “hacks.” I should know; I wrote up a whole bunch of them as a young Lifehacker writer. I’m still a fan of such things, from the standpoint of simply getting food done.

As such, the ultimate “hack” for making bread is to use commercial yeast, i.e., dried “active” or “instant” yeast. A manufacturer has done the work of selecting and isolating yeast at its prime state and preserving it for you. Get your liquids and dough to a yeast-friendly temperature and you’ve removed most of the variables; your success should be repeatable. If you just want bread, you can make the iconic no-knead bread with prepared yeast and very little intervention, and you’ll probably get bread that’s better than you can get at the grocery store.

Baking sourdough—or “naturally leavened,” or with “levain”—means a lot of intervention. You are cultivating and maintaining a small ecosystem of yeast and bacteria, unleashing them onto flour, water, and salt, and stepping in after they’ve produced enough flavor and lift—but before they eat all the stretchy gluten bonds. What that looks like depends on many things: your water, your flours, what you fed your starter, how active it was when you added it, the air in your home, and other variables. Most important is your ability to notice things over long periods of time.

When things go wrong, debugging can be tricky. I was able to personally ask Kleinwächter what was up with my bread, because I was interviewing him for this article. There were many potential answers, including:

  • I should recognize, first off, that I was trying to bake the hardest kind of bread: Freestanding wheat-based sourdough
  • You have to watch—and smell—your starter to make sure it has the right mix of yeast to bacteria before you use it
  • Using less starter (lower “inoculation”) would make it easier not to over-ferment
  • Eyeballing my dough rise in a bowl was hard; try measuring a sample in something like an aliquot tube
  • Winter and summer are very different dough timings, even with modern indoor climate control.

But I kept with it. I was particularly susceptible to wanting things to go quicker and demanding to see a huge rise in my dough before baking. This ironically leads to the flattest results, as the bacteria eats all the gluten bonds. When I slowed down, changed just one thing at a time, and looked deeper into my results, I got better.

Screenshot of Kleinwaechter's YouTube page, with video titles like

The Bread Code YouTube page and the ways in which one must cater to algorithms.

Credit: The Bread Code

The Bread Code YouTube page and the ways in which one must cater to algorithms. Credit: The Bread Code

YouTube faces and TikTok sausage

Emailing and trading video responses with Kleinwächter, I got the sense that he, too, has learned to go the slow, steady route with his Bread Code project.

For a while, he was turning out YouTube videos, and he wanted them to work. “I’m very data-driven and very analytical. I always read the video metrics, and I try to optimize my videos,” Kleinwächter said. “Which means I have to use a clickbait title, and I have to use a clickbait-y thumbnail, plus I need to make sure that I catch people in the first 30 seconds of the video.” This, however, is “not good for us as humans because it leads to more and more extreme content.”

Kleinwächter also dabbled in TikTok, making videos in which, leaning into his German heritage, “the idea was to turn everything into a sausage.” The metrics and imperatives on TikTok were similar to those on YouTube but hyperscaled. He could put hours or days into a video, only for 1 percent of his 200,000 YouTube subscribers to see it unless he caught the algorithm wind.

The frustrations inspired him to slow down and focus on his site and his book. With his community’s help, The Bread Code has just finished its second Kickstarter-backed printing run of 2,000 copies. There’s a Discord full of bread heads eager to diagnose and correct each other’s loaves and occasional pull requests from inspired readers. Kleinwächter has seen people go from buying what he calls “Turbo bread” at the store to making their own, and that’s what keeps him going. He’s not gambling on an attention-getting hit, but he’s in better control of how his knowledge and message get out.

“I think homemade bread is something that’s super, super undervalued, and I see a lot of benefits to making it yourself,” Kleinwächter said. “Good bread just contains flour, water, and salt—nothing else.”

Loaf that is split across the middle-top, with flecks of olives showing.

A test loaf of rosemary olive sourdough bread. An uneven amount of olive bits ended up on the top and bottom, because there is always more to learn.

Credit: Kevin Purdy

A test loaf of rosemary olive sourdough bread. An uneven amount of olive bits ended up on the top and bottom, because there is always more to learn. Credit: Kevin Purdy

You gotta keep doing it—that’s the hard part

I can’t say it has been entirely smooth sailing ever since I self-certified with The Bread Code framework. I know what level of fermentation I’m aiming for, but I sometimes get home from an outing later than planned, arriving at dough that’s trying to escape its bucket. My starter can be very temperamental when my house gets dry and chilly in the winter. And my dough slicing (scoring), being the very last step before baking, can be rushed, resulting in some loaves with weird “ears,” not quite ready for the bakery window.

But that’s all part of it. Your sourdough starter is a collection of organisms that are best suited to what you’ve fed them, developed over time, shaped by their environment. There are some modern hacks that can help make good bread, like using a pH meter. But the big hack is just doing it, learning from it, and getting better at figuring out what’s going on. I’m thankful that folks like Kleinwächter are out there encouraging folks like me to slow down, hack less, and learn more.

Flour, water, salt, GitHub: The Bread Code is a sourdough baking framework Read More »

qubit-that-makes-most-errors-obvious-now-available-to-customers

Qubit that makes most errors obvious now available to customers


Can a small machine that makes error correction easier upend the market?

A graphic representation of the two resonance cavities that can hold photons, along with a channel that lets the photon move between them. Credit: Quantum Circuits

We’re nearing the end of the year, and there are typically a flood of announcements regarding quantum computers around now, in part because some companies want to live up to promised schedules. Most of these involve evolutionary improvements on previous generations of hardware. But this year, we have something new: the first company to market with a new qubit technology.

The technology is called a dual-rail qubit, and it is intended to make the most common form of error trivially easy to detect in hardware, thus making error correction far more efficient. And, while tech giant Amazon has been experimenting with them, a startup called Quantum Circuits is the first to give the public access to dual-rail qubits via a cloud service.

While the tech is interesting on its own, it also provides us with a window into how the field as a whole is thinking about getting error-corrected quantum computing to work.

What’s a dual-rail qubit?

Dual-rail qubits are variants of the hardware used in transmons, the qubits favored by companies like Google and IBM. The basic hardware unit links a loop of superconducting wire to a tiny cavity that allows microwave photons to resonate. This setup allows the presence of microwave photons in the resonator to influence the behavior of the current in the wire and vice versa. In a transmon, microwave photons are used to control the current. But there are other companies that have hardware that does the reverse, controlling the state of the photons by altering the current.

Dual-rail qubits use two of these systems linked together, allowing photons to move from the resonator to the other. Using the superconducting loops, it’s possible to control the probability that a photon will end up in the left or right resonator. The actual location of the photon will remain unknown until it’s measured, allowing the system as a whole to hold a single bit of quantum information—a qubit.

This has an obvious disadvantage: You have to build twice as much hardware for the same number of qubits. So why bother? Because the vast majority of errors involve the loss of the photon, and that’s easily detected. “It’s about 90 percent or more [of the errors],” said Quantum Circuits’ Andrei Petrenko. “So it’s a huge advantage that we have with photon loss over other errors. And that’s actually what makes the error correction a lot more efficient: The fact that photon losses are by far the dominant error.”

Petrenko said that, without doing a measurement that would disrupt the storage of the qubit, it’s possible to determine if there is an odd number of photons in the hardware. If that isn’t the case, you know an error has occurred—most likely a photon loss (gains of photons are rare but do occur). For simple algorithms, this would be a signal to simply start over.

But it does not eliminate the need for error correction if we want to do more complex computations that can’t make it to completion without encountering an error. There’s still the remaining 10 percent of errors, which are primarily something called a phase flip that is distinct to quantum systems. Bit flips are even more rare in dual-rail setups. Finally, simply knowing that a photon was lost doesn’t tell you everything you need to know to fix the problem; error-correction measurements of other parts of the logical qubit are still needed to fix any problems.

The layout of the new machine. Each qubit (gray square) involves a left and right resonance chamber (blue dots) that a photon can move between. Each of the qubits has connections that allow entanglement with its nearest neighbors. Credit: Quantum Circuits

In fact, the initial hardware that’s being made available is too small to even approach useful computations. Instead, Quantum Circuits chose to link eight qubits with nearest-neighbor connections in order to allow it to host a single logical qubit that enables error correction. Put differently: this machine is meant to enable people to learn how to use the unique features of dual-rail qubits to improve error correction.

One consequence of having this distinctive hardware is that the software stack that controls operations needs to take advantage of its error detection capabilities. None of the other hardware on the market can be directly queried to determine whether it has encountered an error. So, Quantum Circuits has had to develop its own software stack to allow users to actually benefit from dual-rail qubits. Petrenko said that the company also chose to provide access to its hardware via its own cloud service because it wanted to connect directly with the early adopters in order to better understand their needs and expectations.

Numbers or noise?

Given that a number of companies have already released multiple revisions of their quantum hardware and have scaled them into hundreds of individual qubits, it may seem a bit strange to see a company enter the market now with a machine that has just a handful of qubits. But amazingly, Quantum Circuits isn’t alone in planning a relatively late entry into the market with hardware that only hosts a few qubits.

Having talked with several of them, there is a logic to what they’re doing. What follows is my attempt to convey that logic in a general form, without focusing on any single company’s case.

Everyone agrees that the future of quantum computation is error correction, which requires linking together multiple hardware qubits into a single unit termed a logical qubit. To get really robust, error-free performance, you have two choices. One is to devote lots of hardware qubits to the logical qubit, so you can handle multiple errors at once. Or you can lower the error rate of the hardware, so that you can get a logical qubit with equivalent performance while using fewer hardware qubits. (The two options aren’t mutually exclusive, and everyone will need to do a bit of both.)

The two options pose very different challenges. Improving the hardware error rate means diving into the physics of individual qubits and the hardware that controls them. In other words, getting lasers that have fewer of the inevitable fluctuations in frequency and energy. Or figuring out how to manufacture loops of superconducting wire with fewer defects or handle stray charges on the surface of electronics. These are relatively hard problems.

By contrast, scaling qubit count largely involves being able to consistently do something you already know how to do. So, if you already know how to make good superconducting wire, you simply need to make a few thousand instances of that wire instead of a few dozen. The electronics that will trap an atom can be made in a way that will make it easier to make them thousands of times. These are mostly engineering problems, and generally of similar complexity to problems we’ve already solved to make the electronics revolution happen.

In other words, within limits, scaling is a much easier problem to solve than errors. It’s still going to be extremely difficult to get the millions of hardware qubits we’d need to error correct complex algorithms on today’s hardware. But if we can get the error rate down a bit, we can use smaller logical qubits and might only need 10,000 hardware qubits, which will be more approachable.

Errors first

And there’s evidence that even the early entries in quantum computing have reasoned the same way. Google has been working iterations of the same chip design since its 2019 quantum supremacy announcement, focusing on understanding the errors that occur on improved versions of that chip. IBM made hitting the 1,000 qubit mark a major goal but has since been focused on reducing the error rate in smaller processors. Someone at a quantum computing startup once told us it would be trivial to trap more atoms in its hardware and boost the qubit count, but there wasn’t much point in doing so given the error rates of the qubits on the then-current generation machine.

The new companies entering this market now are making the argument that they have a technology that will either radically reduce the error rate or make handling the errors that do occur much easier. Quantum Circuits clearly falls into the latter category, as dual-rail qubits are entirely about making the most common form of error trivial to detect. The former category includes companies like Oxford Ionics, which has indicated it can perform single-qubit gates with a fidelity of over 99.9991 percent. Or Alice & Bob, which stores qubits in the behavior of multiple photons in a single resonance cavity, making them very robust to the loss of individual photons.

These companies are betting that they have distinct technology that will let them handle error rate issues more effectively than established players. That will lower the total scaling they need to do, and scaling will be an easier problem overall—and one that they may already have the pieces in place to handle. Quantum Circuits’ Petrenko, for example, told Ars, “I think that we’re at the point where we’ve gone through a number of iterations of this qubit architecture where we’ve de-risked a number of the engineering roadblocks.” And Oxford Ionics told us that if they could make the electronics they use to trap ions in their hardware once, it would be easy to mass manufacture them.

None of this should imply that these companies will have it easy compared to a startup that already has experience with both reducing errors and scaling, or a giant like Google or IBM that has the resources to do both. But it does explain why, even at this stage in quantum computing’s development, we’re still seeing startups enter the field.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Qubit that makes most errors obvious now available to customers Read More »

microsoft-and-atom-computing-combine-for-quantum-error-correction-demo

Microsoft and Atom Computing combine for quantum error correction demo


New work provides a good view of where the field currently stands.

The first-generation tech demo of Atom’s hardware. Things have progressed considerably since. Credit: Atom Computing

In September, Microsoft made an unusual combination of announcements. It demonstrated progress with quantum error correction, something that will be needed for the technology to move much beyond the interesting demo phase, using hardware from a quantum computing startup called Quantinuum. At the same time, however, the company also announced that it was forming a partnership with a different startup, Atom Computing, which uses a different technology to make qubits available for computations.

Given that, it was probably inevitable that the folks in Redmond, Washington, would want to show that similar error correction techniques would also work with Atom Computing’s hardware. It didn’t take long, as the two companies are releasing a draft manuscript describing their work on error correction today. The paper serves as both a good summary of where things currently stand in the world of error correction, as well as a good look at some of the distinct features of computation using neutral atoms.

Atoms and errors

While we have various technologies that provide a way of storing and manipulating bits of quantum information, none of them can be operated error-free. At present, errors make it difficult to perform even the simplest computations that are clearly beyond the capabilities of classical computers. More sophisticated algorithms would inevitably encounter an error before they could be completed, a situation that would remain true even if we could somehow improve the hardware error rates of qubits by a factor of 1,000—something we’re unlikely to ever be able to do.

The solution to this is to use what are called logical qubits, which distribute quantum information across multiple hardware qubits and allow the detection and correction of errors when they occur. Since multiple qubits get linked together to operate as a single logical unit, the hardware error rate still matters. If it’s too high, then adding more hardware qubits just means that errors will pop up faster than they can possibly be corrected.

We’re now at the point where, for a number of technologies, hardware error rates have passed the break-even point, and adding more hardware qubits can lower the error rate of a logical qubit based on them. This was demonstrated using neutral atom qubits by an academic lab at Harvard University about a year ago. The new manuscript demonstrates that it also works on a commercial machine from Atom Computing.

Neutral atoms, which can be held in place using a lattice of laser light, have a number of distinct advantages when it comes to quantum computing. Every single atom will behave identically, meaning that you don’t have to manage the device-to-device variability that’s inevitable with fabricated electronic qubits. Atoms can also be moved around, allowing any atom to be entangled with any other. This any-to-any connectivity can enable more efficient algorithms and error-correction schemes. The quantum information is typically stored in the spin of the atom’s nucleus, which is shielded from environmental influences by the cloud of electrons that surround it, making them relatively long-lived qubits.

Operations, including gates and readout, are performed using lasers. The way the physics works, the spacing of the atoms determines how the laser affects them. If two atoms are a critical distance apart, the laser can perform a single operation, called a two-qubit gate, that affects both of their states. Anywhere outside this distance, and a laser only affects each atom individually. This allows a fine control over gate operations.

That said, operations are relatively slow compared to some electronic qubits, and atoms can occasionally be lost entirely. The optical traps that hold atoms in place are also contingent upon the atom being in its ground state; if any atom ends up stuck in a different state, it will be able to drift off and be lost. This is actually somewhat useful, in that it converts an unexpected state into a clear error.

Image of a grid of dots arranged in sets of parallel vertical rows. There is a red bar across the top, and a green bar near the bottom of the grid.

Atom Computing’s system. Rows of atoms are held far enough apart so that a single laser sent across them (green bar) only operates on individual atoms. If the atoms are moved to the interaction zone (red bar), a laser can perform gates on pairs of atoms. Spaces where atoms can be held can be left empty to avoid performing unneeded operations. Credit: Reichardt, et al.

The machine used in the new demonstration hosts 256 of these neutral atoms. Atom Computing has them arranged in sets of parallel rows, with space in between to let the atoms be shuffled around. For single-qubit gates, it’s possible to shine a laser across the rows, causing every atom it touches to undergo that operation. For two-qubit gates, pairs of atoms get moved to the end of the row and moved a specific distance apart, at which point a laser will cause the gate to be performed on every pair present.

Atom’s hardware also allows a constant supply of new atoms to be brought in to replace any that are lost. It’s also possible to image the atom array in between operations to determine whether any atoms have been lost and if any are in the wrong state.

It’s only logical

As a general rule, the more hardware qubits you dedicate to each logical qubit, the more simultaneous errors you can identify. This identification can enable two ways of handling the error. In the first, you simply discard any calculation with an error and start over. In the second, you can use information about the error to try to fix it, although the repair involves additional operations that can potentially trigger a separate error.

For this work, the Microsoft/Atom team used relatively small logical qubits (meaning they used very few hardware qubits), which meant they could fit more of them within 256 total hardware qubits the machine made available. They also checked the error rate of both error detection with discard and error detection with correction.

The research team did two main demonstrations. One was placing 24 of these logical qubits into what’s called a cat state, named after Schrödinger’s hypothetical feline. This is when a quantum object simultaneously has non-zero probability of being in two mutually exclusive states. In this case, the researchers placed 24 logical qubits in an entangled cat state, the largest ensemble of this sort yet created. Separately, they implemented what’s called the Bernstein-Vazirani algorithm. The classical version of this algorithm requires individual queries to identify each bit in a string of them; the quantum version obtains the entire string with a single query, so is a notable case of something where a quantum speedup is possible.

Both of these showed a similar pattern. When done directly on the hardware, with each qubit being a single atom, there was an appreciable error rate. By detecting errors and discarding those calculations where they occurred, it was possible to significantly improve the error rate of the remaining calculations. Note that this doesn’t eliminate errors, as it’s possible for multiple errors to occur simultaneously, altering the value of the qubit without leaving an indication that can be spotted with these small logical qubits.

Discarding has its limits; as calculations become increasingly complex, involving more qubits or operations, it will inevitably mean every calculation will have an error, so you’d end up wanting to discard everything. Which is why we’ll ultimately need to correct the errors.

In these experiments, however, the process of correcting the error—taking an entirely new atom and setting it into the appropriate state—was also error-prone. So, while it could be done, it ended up having an overall error rate that was intermediate between the approach of catching and discarding errors and the rate when operations were done directly on the hardware.

In the end, the current hardware has an error rate that’s good enough that error correction actually improves the probability that a set of operations can be performed without producing an error. But not good enough that we can perform the sort of complex operations that would lead quantum computers to have an advantage in useful calculations. And that’s not just true for Atom’s hardware; similar things can be said for other error-correction demonstrations done on different machines.

There are two ways to go beyond these current limits. One is simply to improve the error rates of the hardware qubits further, as fewer total errors make it more likely that we can catch and correct them. The second is to increase the qubit counts so that we can host larger, more robust logical qubits. We’re obviously going to need to do both, and Atom’s partnership with Microsoft was formed in the hope that it will help both companies get there faster.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Microsoft and Atom Computing combine for quantum error correction demo Read More »

ibm-boosts-the-amount-of-computation-you-can-get-done-on-quantum-hardware

IBM boosts the amount of computation you can get done on quantum hardware

By making small adjustments to the frequency that the qubits are operating at, it’s possible to avoid these problems. This can be done when the Heron chip is being calibrated before it’s opened for general use.

Separately, the company has done a rewrite of the software that controls the system during operations. “After learning from the community, seeing how to run larger circuits, [we were able to] almost better define what it should be and rewrite the whole stack towards that,” Gambetta said. The result is a dramatic speed-up. “Something that took 122 hours now is down to a couple of hours,” he told Ars.

Since people are paying for time on this hardware, that’s good for customers now. However,  it could also pay off in the longer run, as some errors can occur randomly, so less time spent on a calculation can mean fewer errors.

Deeper computations

Despite all those improvements, errors are still likely during any significant calculations. While it continues to work toward developing error-corrected qubits, IBM is focusing on what it calls error mitigation, which it first detailed last year. As we described it then:

“The researchers turned to a method where they intentionally amplified and then measured the processor’s noise at different levels. These measurements are used to estimate a function that produces similar output to the actual measurements. That function can then have its noise set to zero to produce an estimate of what the processor would do without any noise at all.”

The problem here is that using the function is computationally difficult, and the difficulty increases with the qubit count. So, while it’s still easier to do error mitigation calculations than simulate the quantum computer’s behavior on the same hardware, there’s still the risk of it becoming computationally intractable. But IBM has also taken the time to optimize that, too. “They’ve got algorithmic improvements, and the method that uses tensor methods [now] uses the GPU,” Gambetta told Ars. “So I think it’s a combination of both.”

IBM boosts the amount of computation you can get done on quantum hardware Read More »

google-identifies-low-noise-“phase-transition”-in-its-quantum-processor

Google identifies low noise “phase transition” in its quantum processor


Noisy, but not that noisy

Benchmark may help us understand how quantum computers can operate with low error.

Image of a chip above iridescent wiring.

Google’s Sycamore processor. Credit: Google

Back in 2019, Google made waves by claiming it had achieved what has been called “quantum supremacy”—the ability of a quantum computer to perform operations that would take a wildly impractical amount of time to simulate on standard computing hardware. That claim proved to be controversial, in that the operations were little more than a benchmark that involved getting the quantum computer to behave like a quantum computer; separately, improved ideas about how to perform the simulation on a supercomputer cut the time required down significantly.

But Google is back with a new exploration of the benchmark, described in a paper published in Nature on Wednesday. It uses the benchmark to identify what it calls a phase transition in the performance of its quantum processor and uses it to identify conditions where the processor can operate with low noise. Taking advantage of that, they again show that, even giving classical hardware every potential advantage, it would take a supercomputer a dozen years to simulate things.

Cross entropy benchmarking

The benchmark in question involves the performance of what are called quantum random circuits, which involves performing a set of operations on qubits and letting the state of the system evolve over time, so that the output depends heavily on the stochastic nature of measurement outcomes in quantum mechanics. Each qubit will have a probability of producing one of two results, but unless that probability is one, there’s no way of knowing which of the results you’ll actually get. As a result, the output of the operations will be a string of truly random bits.

If enough qubits are involved in the operations, then it becomes increasingly difficult to simulate the performance of a quantum random circuit on classical hardware. That difficulty is what Google originally used to claim quantum supremacy.

The big challenge with running quantum random circuits on today’s hardware is the inevitability of errors. And there’s a specific approach, called cross-entropy benchmarking, that relates the performance of quantum random circuits to the overall fidelity of the hardware (meaning its ability to perform error-free operations).

Google Principal Scientist Sergio Boixo likened performing quantum random circuits to a race between trying to build the circuit and errors that would destroy it. “In essence, this is a competition between quantum correlations spreading because you’re entangling, and random circuits entangle as fast as possible,” he told Ars. “We use two qubit gates that entangle as fast as possible. So it’s a competition between correlations or entanglement growing as fast as you want. On the other hand, noise is doing the opposite. Noise is killing correlations, it’s killing the growth of correlations. So these are the two tendencies.”

The focus of the paper is using the cross-entropy benchmark to explore the errors that occur on the company’s latest generation of Sycamore chip and use that to identify the transition point between situations where errors dominate, and what the paper terms a “low noise regime,” where the probability of errors are minimized—where entanglement wins the race. The researchers likened this to a phase transition between two states.

Low noise performance

The researchers used a number of methods to identify the location of this phase transition, including numerical estimates of the system’s behavior and experiments using the Sycamore processor. Boixo explained that the transition point is related to the errors per cycle, with each cycle involving performing an operation on all of the qubits involved. So, the total number of qubits being used influences the location of the transition, since more qubits means more operations to perform. But so does the overall error rate on the processor.

If you want to operate in the low noise regime, then you have to limit the number of qubits involved (which has the side effect of making things easier to simulate on classical hardware). The only way to add more qubits is to lower the error rate. While the Sycamore processor itself had a well-understood minimal error rate, Google could artificially increase that error rate and then gradually lower it to explore Sycamore’s behavior at the transition point.

The low noise regime wasn’t error free; each operation still has the potential for error, and qubits will sometimes lose their state even when sitting around doing nothing. But this error rate could be estimated using the cross-entropy benchmark to explore the system’s overall fidelity. That wasn’t the case beyond the transition point, where errors occurred quickly enough that they would interrupt the entanglement process.

When this occurs, the result is often two separate, smaller entangled systems, each of which were subject to the Sycamore chip’s base error rates. The researchers simulated this by creating two distinct clusters of entangled qubits that could be entangled with each other by a single operation, allowing them to turn entanglement on and off at will. They showed that this behavior allowed a classical computer to spoof the overall behavior by breaking the computation up into two manageable chunks.

Ultimately, they used their characterization of the phase transition to identify the maximum number of qubits they could keep in the low noise regime given the Sycamore processor’s base error rate and then performed a million random circuits on them. While this is relatively easy to do on quantum hardware, even assuming that we could build a supercomputer without bandwidth constraints, simulating it would take roughly 10,000 years on an existing supercomputer (the Frontier system). Allowing all of the system’s storage to operate as secondary memory cut the estimate down to 12 years.

What does this tell us?

Boixo emphasized that the value of the work isn’t really based on the value of performing random quantum circuits. Truly random bit strings might be useful in some contexts, but he emphasized that the real benefit here is a better understanding of the noise level that can be tolerated in quantum algorithms more generally. Since this benchmark is designed to make it as easy as possible to outperform classical computations, you would need the best standard computers here to have any hope of beating them to the answer for more complicated problems.

“Before you can do any other application, you need to win on this benchmark,” Boixo said. “If you are not winning on this benchmark, then you’re not winning on any other benchmark. This is the easiest thing for a noisy quantum computer compared to a supercomputer.”

Knowing how to identify this phase transition, he suggested, will also be helpful for anyone trying to run useful computations on today’s processors. “As we define the phase, it opens the possibility for finding applications in that phase on noisy quantum computers, where they will outperform classical computers,” Boixo said.

Implicit in this argument is an indication of why Google has focused on iterating on a single processor design even as many of its competitors have been pushing to increase qubit counts rapidly. If this benchmark indicates that you can’t get all of Sycamore’s qubits involved in the simplest low-noise regime calculation, then it’s not clear whether there’s a lot of value in increasing the qubit count. And the only way to change that is to lower the base error rate of the processor, so that’s where the company’s focus has been.

All of that, however, assumes that you hope to run useful calculations on today’s noisy hardware qubits. The alternative is to use error-corrected logical qubits, which will require major increases in qubit count. But Google has been seeing similar limitations due to Sycamore’s base error rate in tests that used it to host an error-corrected logical qubit, something we hope to return to in future coverage.

Nature, 2024. DOI: 10.1038/s41586-024-07998-6  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Google identifies low noise “phase transition” in its quantum processor Read More »

ibm-opens-its-quantum-computing-stack-to-third-parties

IBM opens its quantum-computing stack to third parties

Image of a large collection of copper-colored metal plates and wires, all surrounding a small, black piece of silicon.

Enlarge / The small quantum processor (center) surrounded by cables that carry microwave signals to it, and the refrigeration hardware.

As we described earlier this year, operating a quantum computer will require a significant investment in classical computing resources, given the amount of measurements and control operations that need to be executed and interpreted. That means that operating a quantum computer will also require a software stack to control and interpret the flow of information from the quantum side.

But software also gets involved well before anything gets executed. While it’s possible to execute algorithms on quantum hardware by defining the full set of commands sent to the hardware, most users are going to want to focus on algorithm development, rather than the details of controlling any single piece of quantum hardware. “If everyone’s got to get down and know what the noise is, [use] performance management tools, they’ve got to know how to compile a quantum circuit through hardware, you’ve got to become an expert in too much to be able to do the algorithm discovery,” said IBM’s Jay Gambetta. So, part of the software stack that companies are developing to control their quantum hardware includes software that converts abstract representations of quantum algorithms into the series of commands needed to execute them.

IBM’s version of this software is called Qiskit (although it was made open source and has since been adopted by other companies). Recently, IBM made a couple of announcements regarding Qiskit, both benchmarking it in comparison to other software stacks and opening it up to third-party modules. We’ll take a look at what software stacks do before getting into the details of what’s new.

What’s the software stack do?

It’s tempting to view IBM’s Qiskit as the equivalent of a compiler. And at the most basic level, that’s a reasonable analogy, in that it takes algorithms defined by humans and converts them to things that can be executed by hardware. But there are significant differences in the details. A compiler for a classical computer produces code that the computer’s processor converts to internal instructions that are used to configure the processor hardware and execute operations.

Even when using what’s termed “machine language,” programmers don’t directly control the hardware; programmers have no control over where on the hardware things are executed (ie, which processor or execution unit within that processor), or even the order instructions are executed in.

Things are very different for quantum computers, at least at present. For starters, everything that happens on the processor is controlled by external hardware, which typically act by generating a series of laser or microwave pulses. So, software like IBM’s Qiskit or Microsoft’s Q# act by converting the code they’re given into commands that are sent to hardware that’s external to the processor.

These “compilers” must also keep track of exactly which part of the processor things are happening on. Quantum computers act by performing specific operations (called gates) on individual or pairs of qubits; to do that, you have to know exactly which qubit you’re addressing. And, for things like superconducting qubits, where there can be device-to-device variations, which hardware qubits you end up using can have a significant effect on the outcome of the calculations.

As a result, most things like Qiskit provide the option of directly addressing the hardware. If a programmer chooses not to, however, the software can transform generic instructions into a precise series of actions that will execute whatever algorithm has been encoded. That involves the software stack making choices about which physical qubits to use, what gates and measurements to execute, and what order to execute them in.

The role of the software stack, however, is likely to expand considerably over the next few years. A number of companies are experimenting with hardware qubit designs that can flag when one type of common error occurs, and there has been progress with developing logical qubits that enable error correction. Ultimately, any company providing access to quantum computers will want to modify its software stack so that these features are enabled without requiring effort on the part of the people designing the algorithms.

IBM opens its quantum-computing stack to third parties Read More »

we-can-now-watch-grace-hopper’s-famed-1982-lecture-on-youtube

We can now watch Grace Hopper’s famed 1982 lecture on YouTube

Amazing Grace —

The lecture featured Hopper discussing future challenges of protecting information.

Rear Admiral Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (Part One, 1982).

The late Rear Admiral Grace Hopper was a gifted mathematician and undisputed pioneer in computer programming, honored posthumously in 2016 with the Presidential Medal of Freedom. She was also very much in demand as a speaker in her later career. Hopper’s famous 1982 lecture on “Future Possibilities: Data, Hardware, Software, and People,” has long been publicly unavailable because of the obsolete media on which it was recorded. The National Archives and Records Administration (NARA) finally managed to retrieve the footage for the National Security Agency (NSA), which posted the lecture in two parts on YouTube (Part One embedded above, Part Two embedded below).

Hopper earned undergraduate degrees in math and physics from Vassar College and a PhD in math from Yale in 1930. She returned to Vassar as a professor, but when World War II broke out, she sought to enlist in the US Naval Reserve. She was initially denied on the basis of her age (34) and low weight-to-height ratio, and also because her expertise elsewhere made her particularly valuable to the war effort. Hopper got an exemption, and after graduating first in her class, she joined the Bureau of Ships Computation Project at Harvard University, where she served on the Mark I computer programming staff under Howard H. Aiken.

She stayed with the lab until 1949 and was next hired as a senior mathematician by Eckert-Mauchly Computer Corporation to develop the Universal Automatic Computer, or UNIVAC, the first computer. Hopper championed the development of a new programming language based on English words. “It’s much easier for most people to write an English statement than it is to use symbols,” she reasoned. “So I decided data processors ought to be able to write their programs in English and the computers would translate them into machine code.”

Her superiors were skeptical, but Hopper persisted, publishing papers on what became known as compilers. When Remington Rand took over the company, she created her first A-0 compiler. This early achievement would one day lead to the development of COBOL for data processors, which is still the major programming language used today.

“Grandma COBOL”

In November 1952, the UNIVAC was introduced to America by CBS news anchor Walter Cronkite as the presidential election results rolled in. Hopper and the rest of her team had worked tirelessly to input voting statistics from earlier elections and write the code that would allow the calculator to extrapolate the election results based on previous races. National pollsters predicted Adlai Stevenson II would win, while the UNIVAC group predicted a landslide for Dwight D. Eisenhower. UNIVAC’s prediction proved to be correct: Eisenhower won over 55 percent of the popular vote with an electoral margin of 442 to 89.  

Hopper retired at age 60 from the Naval Reserve in 1966 with the rank of commander but was subsequently recalled to active duty for many more years, thanks to congressional special approval allowing her to remain beyond the mandatory retirement age. She was promoted to commodore in 1983, a rank that was renamed “rear admiral” two years later, and Rear Admiral Grace Hopper finally retired permanently in 1986. But she didn’t stop working: She became a senior consultant to Digital Equipment Corporation and “goodwill ambassador,” giving public lectures at various computer-related events.

One of Hopper’s best-known lectures was delivered to NSA employees in August 1982. According to a National Security Agency press release, the footage had been preserved in a defunct media format—specifically, two 1-inch AMPEX tapes. The agency asked NARA to retrieve that footage and digitize it for public release, and NARA did so. The NSA described it as “one of the more unique public proactive transparency record releases… to date.”

Hopper was a very popular speaker not just because of her pioneering contributions to computing, but because she was a natural raconteur, telling entertaining and often irreverent war stories from her early days. And she spoke plainly, as evidenced in the 1982 lecture when she drew an analogy between using pairs of oxen to move large logs in the days before large tractors, and pairing computers to get more computer power rather than just getting a bigger computer—”which of course is what common sense would have told us to begin with.” For those who love the history of computers and computation, the full lecture is very much worth the time.

Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (Part Two, 1982).

Listing image by Lynn Gilbert/CC BY-SA 4.0

We can now watch Grace Hopper’s famed 1982 lecture on YouTube Read More »

people-game-ais-via-game-theory

People game AIs via game theory

Games inside games —

They reject more of the AI’s offers, probably to get it to be more generous.

A judge's gavel near a pile of small change.

Enlarge / In the experiments, people had to judge what constituted a fair monetary offer.

In many cases, AIs are trained on material that’s either made or curated by humans. As a result, it can become a significant challenge to keep the AI from replicating the biases of those humans and the society they belong to. And the stakes are high, given we’re using AIs to make medical and financial decisions.

But some researchers at Washington University in St. Louis have found an additional wrinkle in these challenges: The people doing the training may potentially change their behavior when they know it can influence the future choices made by an AI. And, in at least some cases, they carry the changed behaviors into situations that don’t involve AI training.

Would you like to play a game?

The work involved getting volunteers to participate in a simple form of game theory. Testers gave two participants a pot of money—$10, in this case. One of the two was then asked to offer some fraction of that money to the other, who could choose to accept or reject the offer. If the offer was rejected, nobody got any money.

From a purely rational economic perspective, people should accept anything they’re offered, since they’ll end up with more money than they would have otherwise. But in reality, people tend to reject offers that deviate too much from a 50/50 split, as they have a sense that a highly imbalanced split is unfair. Their rejection allows them to punish the person who made the unfair offer. While there are some cultural differences in terms of where the split becomes unfair, this effect has been replicated many times, including in the current work.

The twist with the new work, performed by Lauren Treimana, Chien-Ju Hoa, and Wouter Kool, is that they told some of the participants that their partner was an AI, and the results of their interactions with it would be fed back into the system to train its future performance.

This takes something that’s implicit in a purely game-theory-focused setup—that rejecting offers can help partners figure out what sorts of offers are fair—and makes it highly explicit. Participants, or at least the subset involved in the experimental group that are being told they’re training an AI, could readily infer that their actions would influence the AI’s future offers.

The question the researchers were curious about was whether this would influence the behavior of the human participants. They compared this to the behavior of a control group who just participated in the standard game theory test.

Training fairness

Treimana, Hoa, and Kool had pre-registered a number of multivariate analyses that they planned to perform with the data. But these didn’t always produce consistent results between experiments, possibly because there weren’t enough participants to tease out relatively subtle effects with any statistical confidence and possibly because the relatively large number of tests would mean that a few positive results would turn up by chance.

So, we’ll focus on the simplest question that was addressed: Did being told that you were training an AI alter someone’s behavior? This question was asked through a number of experiments that were very similar. (One of the key differences between them was whether the information regarding AI training was displayed with a camera icon, since people will sometimes change their behavior if they’re aware they’re being observed.)

The answer to the question is a clear yes: people will in fact change their behavior when they think they’re training an AI. Through a number of experiments, participants were more likely to reject unfair offers if they were told that their sessions would be used to train an AI. In a few of the experiments, they were also more likely to reject what were considered fair offers (in US populations, the rejection rate goes up dramatically once someone proposes a 70/30 split, meaning $7 goes to the person making the proposal in these experiments). The researchers suspect this is due to people being more likely to reject borderline “fair” offers such as a 60/40 split.

This happened even though rejecting any offer exacts an economic cost on the participants. And people persisted in this behavior even when they were told that they wouldn’t ever interact with the AI after training was complete, meaning they wouldn’t personally benefit from any changes in the AI’s behavior. So here, it appeared that people would make a financial sacrifice to train the AI in a way that would benefit others.

Strikingly, in two of the three experiments that did follow up testing, participants continued to reject offers at a higher rate two days after their participation in the AI training, even when they were told that their actions were no longer being used to train the AI. So, to some extent, participating in AI training seems to have caused them to train themselves to behave differently.

Obviously, this won’t affect every sort of AI training, and a lot of the work that goes into producing material that’s used in training something like a Large Language Model won’t have been done with any awareness that it might be used to train an AI. Still, there’s plenty of cases where humans do get more directly involved in training, so it’s worthwhile being aware that this is another route that can allow biases to creep in.

PNAS, 2024. DOI: 10.1073/pnas.2408731121  (About DOIs).

People game AIs via game theory Read More »

lightening-the-load:-ai-helps-exoskeleton-work-with-different-strides

Lightening the load: AI helps exoskeleton work with different strides

One model to rule them all —

A model trained in a virtual environment does remarkably well in the real world.

Image of two people using powered exoskeletons to move heavy items around, as seen in the movie Aliens.

Enlarge / Right now, the software doesn’t do arms, so don’t go taking on any aliens with it.

20th Century Fox

Exoskeletons today look like something straight out of sci-fi. But the reality is they are nowhere near as robust as their fictional counterparts. They’re quite wobbly, and it takes long hours of handcrafting software policies, which regulate how they work—a process that has to be repeated for each individual user.

To bring the technology a bit closer to Avatar’s Skel Suits or Warhammer 40k power armor, a team at North Carolina University’s Lab of Biomechatronics and Intelligent Robotics used AI to build the first one-size-fits-all exoskeleton that supports walking, running, and stair-climbing. Critically, its software adapts itself to new users with no need for any user-specific adjustments. “You just wear it and it works,” says Hao Su, an associate professor and co-author of the study.

Tailor-made robots

An exoskeleton is a robot you wear to aid your movements—it makes walking, running, and other activities less taxing, the same way an e-bike adds extra watts on top of those you generate yourself, making pedaling easier. “The problem is, exoskeletons have a hard time understanding human intentions, whether you want to run or walk or climb stairs. It’s solved with locomotion recognition: systems that recognize human locomotion intentions,” says Su.

Building those locomotion recognition systems currently relies on elaborate policies that define what actuators in an exoskeleton need to do in each possible scenario. “Let’s take walking. The current state of the art is we put the exoskeleton on you and you walk on a treadmill for an hour. Based on that, we try to adjust its operation to your individual set of movements,” Su explains.

Building handcrafted control policies and doing long human trials for each user makes exoskeletons super expensive, with prices reaching $200,000 or more. So, Su’s team used AI to automatically generate control policies and eliminate human training. “I think within two or three years, exoskeletons priced between $2,000 and $5,000 will be absolutely doable,” Su claims.

His team hopes these savings will come from developing the exoskeleton control policy using a digital model, rather than living, breathing humans.

Digitizing robo-aided humans

Su’s team started by building digital models of a human musculoskeletal system and an exoskeleton robot. Then they used multiple neural networks that operated each component. One was running the digitized model of a human skeleton, moved by simplified muscles. The second neural network was running the exoskeleton model. Finally, the third neural net was responsible for imitating motion—basically predicting how a human model would move wearing the exoskeleton and how the two would interact with each other. “We trained all three neural networks simultaneously to minimize muscle activity,” says Su.

One problem the team faced is that exoskeleton studies typically use a performance metric based on metabolic rate reduction. “Humans, though, are incredibly complex, and it is very hard to build a model with enough fidelity to accurately simulate metabolism,” Su explains. Luckily, according to the team, reducing muscle activations is rather tightly correlated with metabolic rate reduction, so it kept the digital model’s complexity within reasonable limits. The training of the entire human-exoskeleton system with all three neural networks took roughly eight hours on a single RTX 3090 GPU. And the results were record-breaking.

Bridging the sim-to-real gap

After developing the controllers for the digital exoskeleton model, which were developed by the neural networks in simulation, Su’s team simply copy-pasted the control policy to a real controller running a real exoskeleton. Then, they tested how an exoskeleton trained this way would work with 20 different participants. The averaged metabolic rate reduction in walking was over 24 percent, over 13 percent in running, and 15.4 percent in stair climbing—all record numbers, meaning their exoskeleton beat every other exoskeleton ever made in each category.

This was achieved without needing any tweaks to fit it to individual gaits. But the neural networks’ magic didn’t end there.

“The problem with traditional, handcrafted policies was that it was just telling it ‘if walking is detected do one thing; if walking faster is detected do another thing.’ These were [a mix of] finite state machines and switch controllers. We introduced end-to-end continuous control,” says Su. What this continuous control meant was that the exoskeleton could follow the human body as it made smooth transitions between different activities—from walking to running, from running to climbing stairs, etc. There was no abrupt mode switching.

“In terms of software, I think everyone will be using this neural network-based approach soon,” Su claims. To improve the exoskeletons in the future, his team wants to make them quieter, lighter, and more comfortable.

But the plan is also to make them work for people who need them the most. “The limitation now is that we tested these exoskeletons with able-bodied participants, not people with gait impairments. So, what we want to do is something they did in another exoskeleton study at Stanford University. We would take a one-minute video of you walking, and based on that, we would build a model to individualize our general model. This should work well for people with impairments like knee arthritis,” Su claims.

Nature, 2024.  DOI: 10.1038/s41586-024-07382-4

Lightening the load: AI helps exoskeleton work with different strides Read More »

researchers-describe-how-to-tell-if-chatgpt-is-confabulating

Researchers describe how to tell if ChatGPT is confabulating

Researchers describe how to tell if ChatGPT is confabulating

Aurich Lawson | Getty Images

It’s one of the world’s worst-kept secrets that large language models give blatantly false answers to queries and do so with a confidence that’s indistinguishable from when they get things right. There are a number of reasons for this. The AI could have been trained on misinformation; the answer could require some extrapolation from facts that the LLM isn’t capable of; or some aspect of the LLM’s training might have incentivized a falsehood.

But perhaps the simplest explanation is that an LLM doesn’t recognize what constitutes a correct answer but is compelled to provide one. So it simply makes something up, a habit that has been termed confabulation.

Figuring out when an LLM is making something up would obviously have tremendous value, given how quickly people have started relying on them for everything from college essays to job applications. Now, researchers from the University of Oxford say they’ve found a relatively simple way to determine when LLMs appear to be confabulating that works with all popular models and across a broad range of subjects. And, in doing so, they develop evidence that most of the alternative facts LLMs provide are a product of confabulation.

Catching confabulation

The new research is strictly about confabulations, and not instances such as training on false inputs. As the Oxford team defines them in their paper describing the work, confabulations are where “LLMs fluently make claims that are both wrong and arbitrary—by which we mean that the answer is sensitive to irrelevant details such as random seed.”

The reasoning behind their work is actually quite simple. LLMs aren’t trained for accuracy; they’re simply trained on massive quantities of text and learn to produce human-sounding phrasing through that. If enough text examples in its training consistently present something as a fact, then the LLM is likely to present it as a fact. But if the examples in its training are few, or inconsistent in their facts, then the LLMs synthesize a plausible-sounding answer that is likely incorrect.

But the LLM could also run into a similar situation when it has multiple options for phrasing the right answer. To use an example from the researchers’ paper, “Paris,” “It’s in Paris,” and “France’s capital, Paris” are all valid answers to “Where’s the Eiffel Tower?” So, statistical uncertainty, termed entropy in this context, can arise either when the LLM isn’t certain about how to phrase the right answer or when it can’t identify the right answer.

This means it’s not a great idea to simply force the LLM to return “I don’t know” when confronted with several roughly equivalent answers. We’d probably block a lot of correct answers by doing so.

So instead, the researchers focus on what they call semantic entropy. This evaluates all the statistically likely answers evaluated by the LLM and determines how many of them are semantically equivalent. If a large number all have the same meaning, then the LLM is likely uncertain about phrasing but has the right answer. If not, then it is presumably in a situation where it would be prone to confabulation and should be prevented from doing so.

Researchers describe how to tell if ChatGPT is confabulating Read More »