Computer science

google-identifies-low-noise-“phase-transition”-in-its-quantum-processor

Google identifies low noise “phase transition” in its quantum processor


Noisy, but not that noisy

Benchmark may help us understand how quantum computers can operate with low error.

Image of a chip above iridescent wiring.

Google’s Sycamore processor. Credit: Google

Back in 2019, Google made waves by claiming it had achieved what has been called “quantum supremacy”—the ability of a quantum computer to perform operations that would take a wildly impractical amount of time to simulate on standard computing hardware. That claim proved to be controversial, in that the operations were little more than a benchmark that involved getting the quantum computer to behave like a quantum computer; separately, improved ideas about how to perform the simulation on a supercomputer cut the time required down significantly.

But Google is back with a new exploration of the benchmark, described in a paper published in Nature on Wednesday. It uses the benchmark to identify what it calls a phase transition in the performance of its quantum processor and uses it to identify conditions where the processor can operate with low noise. Taking advantage of that, they again show that, even giving classical hardware every potential advantage, it would take a supercomputer a dozen years to simulate things.

Cross entropy benchmarking

The benchmark in question involves the performance of what are called quantum random circuits, which involves performing a set of operations on qubits and letting the state of the system evolve over time, so that the output depends heavily on the stochastic nature of measurement outcomes in quantum mechanics. Each qubit will have a probability of producing one of two results, but unless that probability is one, there’s no way of knowing which of the results you’ll actually get. As a result, the output of the operations will be a string of truly random bits.

If enough qubits are involved in the operations, then it becomes increasingly difficult to simulate the performance of a quantum random circuit on classical hardware. That difficulty is what Google originally used to claim quantum supremacy.

The big challenge with running quantum random circuits on today’s hardware is the inevitability of errors. And there’s a specific approach, called cross-entropy benchmarking, that relates the performance of quantum random circuits to the overall fidelity of the hardware (meaning its ability to perform error-free operations).

Google Principal Scientist Sergio Boixo likened performing quantum random circuits to a race between trying to build the circuit and errors that would destroy it. “In essence, this is a competition between quantum correlations spreading because you’re entangling, and random circuits entangle as fast as possible,” he told Ars. “We use two qubit gates that entangle as fast as possible. So it’s a competition between correlations or entanglement growing as fast as you want. On the other hand, noise is doing the opposite. Noise is killing correlations, it’s killing the growth of correlations. So these are the two tendencies.”

The focus of the paper is using the cross-entropy benchmark to explore the errors that occur on the company’s latest generation of Sycamore chip and use that to identify the transition point between situations where errors dominate, and what the paper terms a “low noise regime,” where the probability of errors are minimized—where entanglement wins the race. The researchers likened this to a phase transition between two states.

Low noise performance

The researchers used a number of methods to identify the location of this phase transition, including numerical estimates of the system’s behavior and experiments using the Sycamore processor. Boixo explained that the transition point is related to the errors per cycle, with each cycle involving performing an operation on all of the qubits involved. So, the total number of qubits being used influences the location of the transition, since more qubits means more operations to perform. But so does the overall error rate on the processor.

If you want to operate in the low noise regime, then you have to limit the number of qubits involved (which has the side effect of making things easier to simulate on classical hardware). The only way to add more qubits is to lower the error rate. While the Sycamore processor itself had a well-understood minimal error rate, Google could artificially increase that error rate and then gradually lower it to explore Sycamore’s behavior at the transition point.

The low noise regime wasn’t error free; each operation still has the potential for error, and qubits will sometimes lose their state even when sitting around doing nothing. But this error rate could be estimated using the cross-entropy benchmark to explore the system’s overall fidelity. That wasn’t the case beyond the transition point, where errors occurred quickly enough that they would interrupt the entanglement process.

When this occurs, the result is often two separate, smaller entangled systems, each of which were subject to the Sycamore chip’s base error rates. The researchers simulated this by creating two distinct clusters of entangled qubits that could be entangled with each other by a single operation, allowing them to turn entanglement on and off at will. They showed that this behavior allowed a classical computer to spoof the overall behavior by breaking the computation up into two manageable chunks.

Ultimately, they used their characterization of the phase transition to identify the maximum number of qubits they could keep in the low noise regime given the Sycamore processor’s base error rate and then performed a million random circuits on them. While this is relatively easy to do on quantum hardware, even assuming that we could build a supercomputer without bandwidth constraints, simulating it would take roughly 10,000 years on an existing supercomputer (the Frontier system). Allowing all of the system’s storage to operate as secondary memory cut the estimate down to 12 years.

What does this tell us?

Boixo emphasized that the value of the work isn’t really based on the value of performing random quantum circuits. Truly random bit strings might be useful in some contexts, but he emphasized that the real benefit here is a better understanding of the noise level that can be tolerated in quantum algorithms more generally. Since this benchmark is designed to make it as easy as possible to outperform classical computations, you would need the best standard computers here to have any hope of beating them to the answer for more complicated problems.

“Before you can do any other application, you need to win on this benchmark,” Boixo said. “If you are not winning on this benchmark, then you’re not winning on any other benchmark. This is the easiest thing for a noisy quantum computer compared to a supercomputer.”

Knowing how to identify this phase transition, he suggested, will also be helpful for anyone trying to run useful computations on today’s processors. “As we define the phase, it opens the possibility for finding applications in that phase on noisy quantum computers, where they will outperform classical computers,” Boixo said.

Implicit in this argument is an indication of why Google has focused on iterating on a single processor design even as many of its competitors have been pushing to increase qubit counts rapidly. If this benchmark indicates that you can’t get all of Sycamore’s qubits involved in the simplest low-noise regime calculation, then it’s not clear whether there’s a lot of value in increasing the qubit count. And the only way to change that is to lower the base error rate of the processor, so that’s where the company’s focus has been.

All of that, however, assumes that you hope to run useful calculations on today’s noisy hardware qubits. The alternative is to use error-corrected logical qubits, which will require major increases in qubit count. But Google has been seeing similar limitations due to Sycamore’s base error rate in tests that used it to host an error-corrected logical qubit, something we hope to return to in future coverage.

Nature, 2024. DOI: 10.1038/s41586-024-07998-6  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Google identifies low noise “phase transition” in its quantum processor Read More »

ibm-opens-its-quantum-computing-stack-to-third-parties

IBM opens its quantum-computing stack to third parties

Image of a large collection of copper-colored metal plates and wires, all surrounding a small, black piece of silicon.

Enlarge / The small quantum processor (center) surrounded by cables that carry microwave signals to it, and the refrigeration hardware.

As we described earlier this year, operating a quantum computer will require a significant investment in classical computing resources, given the amount of measurements and control operations that need to be executed and interpreted. That means that operating a quantum computer will also require a software stack to control and interpret the flow of information from the quantum side.

But software also gets involved well before anything gets executed. While it’s possible to execute algorithms on quantum hardware by defining the full set of commands sent to the hardware, most users are going to want to focus on algorithm development, rather than the details of controlling any single piece of quantum hardware. “If everyone’s got to get down and know what the noise is, [use] performance management tools, they’ve got to know how to compile a quantum circuit through hardware, you’ve got to become an expert in too much to be able to do the algorithm discovery,” said IBM’s Jay Gambetta. So, part of the software stack that companies are developing to control their quantum hardware includes software that converts abstract representations of quantum algorithms into the series of commands needed to execute them.

IBM’s version of this software is called Qiskit (although it was made open source and has since been adopted by other companies). Recently, IBM made a couple of announcements regarding Qiskit, both benchmarking it in comparison to other software stacks and opening it up to third-party modules. We’ll take a look at what software stacks do before getting into the details of what’s new.

What’s the software stack do?

It’s tempting to view IBM’s Qiskit as the equivalent of a compiler. And at the most basic level, that’s a reasonable analogy, in that it takes algorithms defined by humans and converts them to things that can be executed by hardware. But there are significant differences in the details. A compiler for a classical computer produces code that the computer’s processor converts to internal instructions that are used to configure the processor hardware and execute operations.

Even when using what’s termed “machine language,” programmers don’t directly control the hardware; programmers have no control over where on the hardware things are executed (ie, which processor or execution unit within that processor), or even the order instructions are executed in.

Things are very different for quantum computers, at least at present. For starters, everything that happens on the processor is controlled by external hardware, which typically act by generating a series of laser or microwave pulses. So, software like IBM’s Qiskit or Microsoft’s Q# act by converting the code they’re given into commands that are sent to hardware that’s external to the processor.

These “compilers” must also keep track of exactly which part of the processor things are happening on. Quantum computers act by performing specific operations (called gates) on individual or pairs of qubits; to do that, you have to know exactly which qubit you’re addressing. And, for things like superconducting qubits, where there can be device-to-device variations, which hardware qubits you end up using can have a significant effect on the outcome of the calculations.

As a result, most things like Qiskit provide the option of directly addressing the hardware. If a programmer chooses not to, however, the software can transform generic instructions into a precise series of actions that will execute whatever algorithm has been encoded. That involves the software stack making choices about which physical qubits to use, what gates and measurements to execute, and what order to execute them in.

The role of the software stack, however, is likely to expand considerably over the next few years. A number of companies are experimenting with hardware qubit designs that can flag when one type of common error occurs, and there has been progress with developing logical qubits that enable error correction. Ultimately, any company providing access to quantum computers will want to modify its software stack so that these features are enabled without requiring effort on the part of the people designing the algorithms.

IBM opens its quantum-computing stack to third parties Read More »

we-can-now-watch-grace-hopper’s-famed-1982-lecture-on-youtube

We can now watch Grace Hopper’s famed 1982 lecture on YouTube

Amazing Grace —

The lecture featured Hopper discussing future challenges of protecting information.

Rear Admiral Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (Part One, 1982).

The late Rear Admiral Grace Hopper was a gifted mathematician and undisputed pioneer in computer programming, honored posthumously in 2016 with the Presidential Medal of Freedom. She was also very much in demand as a speaker in her later career. Hopper’s famous 1982 lecture on “Future Possibilities: Data, Hardware, Software, and People,” has long been publicly unavailable because of the obsolete media on which it was recorded. The National Archives and Records Administration (NARA) finally managed to retrieve the footage for the National Security Agency (NSA), which posted the lecture in two parts on YouTube (Part One embedded above, Part Two embedded below).

Hopper earned undergraduate degrees in math and physics from Vassar College and a PhD in math from Yale in 1930. She returned to Vassar as a professor, but when World War II broke out, she sought to enlist in the US Naval Reserve. She was initially denied on the basis of her age (34) and low weight-to-height ratio, and also because her expertise elsewhere made her particularly valuable to the war effort. Hopper got an exemption, and after graduating first in her class, she joined the Bureau of Ships Computation Project at Harvard University, where she served on the Mark I computer programming staff under Howard H. Aiken.

She stayed with the lab until 1949 and was next hired as a senior mathematician by Eckert-Mauchly Computer Corporation to develop the Universal Automatic Computer, or UNIVAC, the first computer. Hopper championed the development of a new programming language based on English words. “It’s much easier for most people to write an English statement than it is to use symbols,” she reasoned. “So I decided data processors ought to be able to write their programs in English and the computers would translate them into machine code.”

Her superiors were skeptical, but Hopper persisted, publishing papers on what became known as compilers. When Remington Rand took over the company, she created her first A-0 compiler. This early achievement would one day lead to the development of COBOL for data processors, which is still the major programming language used today.

“Grandma COBOL”

In November 1952, the UNIVAC was introduced to America by CBS news anchor Walter Cronkite as the presidential election results rolled in. Hopper and the rest of her team had worked tirelessly to input voting statistics from earlier elections and write the code that would allow the calculator to extrapolate the election results based on previous races. National pollsters predicted Adlai Stevenson II would win, while the UNIVAC group predicted a landslide for Dwight D. Eisenhower. UNIVAC’s prediction proved to be correct: Eisenhower won over 55 percent of the popular vote with an electoral margin of 442 to 89.  

Hopper retired at age 60 from the Naval Reserve in 1966 with the rank of commander but was subsequently recalled to active duty for many more years, thanks to congressional special approval allowing her to remain beyond the mandatory retirement age. She was promoted to commodore in 1983, a rank that was renamed “rear admiral” two years later, and Rear Admiral Grace Hopper finally retired permanently in 1986. But she didn’t stop working: She became a senior consultant to Digital Equipment Corporation and “goodwill ambassador,” giving public lectures at various computer-related events.

One of Hopper’s best-known lectures was delivered to NSA employees in August 1982. According to a National Security Agency press release, the footage had been preserved in a defunct media format—specifically, two 1-inch AMPEX tapes. The agency asked NARA to retrieve that footage and digitize it for public release, and NARA did so. The NSA described it as “one of the more unique public proactive transparency record releases… to date.”

Hopper was a very popular speaker not just because of her pioneering contributions to computing, but because she was a natural raconteur, telling entertaining and often irreverent war stories from her early days. And she spoke plainly, as evidenced in the 1982 lecture when she drew an analogy between using pairs of oxen to move large logs in the days before large tractors, and pairing computers to get more computer power rather than just getting a bigger computer—”which of course is what common sense would have told us to begin with.” For those who love the history of computers and computation, the full lecture is very much worth the time.

Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (Part Two, 1982).

Listing image by Lynn Gilbert/CC BY-SA 4.0

We can now watch Grace Hopper’s famed 1982 lecture on YouTube Read More »

people-game-ais-via-game-theory

People game AIs via game theory

Games inside games —

They reject more of the AI’s offers, probably to get it to be more generous.

A judge's gavel near a pile of small change.

Enlarge / In the experiments, people had to judge what constituted a fair monetary offer.

In many cases, AIs are trained on material that’s either made or curated by humans. As a result, it can become a significant challenge to keep the AI from replicating the biases of those humans and the society they belong to. And the stakes are high, given we’re using AIs to make medical and financial decisions.

But some researchers at Washington University in St. Louis have found an additional wrinkle in these challenges: The people doing the training may potentially change their behavior when they know it can influence the future choices made by an AI. And, in at least some cases, they carry the changed behaviors into situations that don’t involve AI training.

Would you like to play a game?

The work involved getting volunteers to participate in a simple form of game theory. Testers gave two participants a pot of money—$10, in this case. One of the two was then asked to offer some fraction of that money to the other, who could choose to accept or reject the offer. If the offer was rejected, nobody got any money.

From a purely rational economic perspective, people should accept anything they’re offered, since they’ll end up with more money than they would have otherwise. But in reality, people tend to reject offers that deviate too much from a 50/50 split, as they have a sense that a highly imbalanced split is unfair. Their rejection allows them to punish the person who made the unfair offer. While there are some cultural differences in terms of where the split becomes unfair, this effect has been replicated many times, including in the current work.

The twist with the new work, performed by Lauren Treimana, Chien-Ju Hoa, and Wouter Kool, is that they told some of the participants that their partner was an AI, and the results of their interactions with it would be fed back into the system to train its future performance.

This takes something that’s implicit in a purely game-theory-focused setup—that rejecting offers can help partners figure out what sorts of offers are fair—and makes it highly explicit. Participants, or at least the subset involved in the experimental group that are being told they’re training an AI, could readily infer that their actions would influence the AI’s future offers.

The question the researchers were curious about was whether this would influence the behavior of the human participants. They compared this to the behavior of a control group who just participated in the standard game theory test.

Training fairness

Treimana, Hoa, and Kool had pre-registered a number of multivariate analyses that they planned to perform with the data. But these didn’t always produce consistent results between experiments, possibly because there weren’t enough participants to tease out relatively subtle effects with any statistical confidence and possibly because the relatively large number of tests would mean that a few positive results would turn up by chance.

So, we’ll focus on the simplest question that was addressed: Did being told that you were training an AI alter someone’s behavior? This question was asked through a number of experiments that were very similar. (One of the key differences between them was whether the information regarding AI training was displayed with a camera icon, since people will sometimes change their behavior if they’re aware they’re being observed.)

The answer to the question is a clear yes: people will in fact change their behavior when they think they’re training an AI. Through a number of experiments, participants were more likely to reject unfair offers if they were told that their sessions would be used to train an AI. In a few of the experiments, they were also more likely to reject what were considered fair offers (in US populations, the rejection rate goes up dramatically once someone proposes a 70/30 split, meaning $7 goes to the person making the proposal in these experiments). The researchers suspect this is due to people being more likely to reject borderline “fair” offers such as a 60/40 split.

This happened even though rejecting any offer exacts an economic cost on the participants. And people persisted in this behavior even when they were told that they wouldn’t ever interact with the AI after training was complete, meaning they wouldn’t personally benefit from any changes in the AI’s behavior. So here, it appeared that people would make a financial sacrifice to train the AI in a way that would benefit others.

Strikingly, in two of the three experiments that did follow up testing, participants continued to reject offers at a higher rate two days after their participation in the AI training, even when they were told that their actions were no longer being used to train the AI. So, to some extent, participating in AI training seems to have caused them to train themselves to behave differently.

Obviously, this won’t affect every sort of AI training, and a lot of the work that goes into producing material that’s used in training something like a Large Language Model won’t have been done with any awareness that it might be used to train an AI. Still, there’s plenty of cases where humans do get more directly involved in training, so it’s worthwhile being aware that this is another route that can allow biases to creep in.

PNAS, 2024. DOI: 10.1073/pnas.2408731121  (About DOIs).

People game AIs via game theory Read More »

lightening-the-load:-ai-helps-exoskeleton-work-with-different-strides

Lightening the load: AI helps exoskeleton work with different strides

One model to rule them all —

A model trained in a virtual environment does remarkably well in the real world.

Image of two people using powered exoskeletons to move heavy items around, as seen in the movie Aliens.

Enlarge / Right now, the software doesn’t do arms, so don’t go taking on any aliens with it.

20th Century Fox

Exoskeletons today look like something straight out of sci-fi. But the reality is they are nowhere near as robust as their fictional counterparts. They’re quite wobbly, and it takes long hours of handcrafting software policies, which regulate how they work—a process that has to be repeated for each individual user.

To bring the technology a bit closer to Avatar’s Skel Suits or Warhammer 40k power armor, a team at North Carolina University’s Lab of Biomechatronics and Intelligent Robotics used AI to build the first one-size-fits-all exoskeleton that supports walking, running, and stair-climbing. Critically, its software adapts itself to new users with no need for any user-specific adjustments. “You just wear it and it works,” says Hao Su, an associate professor and co-author of the study.

Tailor-made robots

An exoskeleton is a robot you wear to aid your movements—it makes walking, running, and other activities less taxing, the same way an e-bike adds extra watts on top of those you generate yourself, making pedaling easier. “The problem is, exoskeletons have a hard time understanding human intentions, whether you want to run or walk or climb stairs. It’s solved with locomotion recognition: systems that recognize human locomotion intentions,” says Su.

Building those locomotion recognition systems currently relies on elaborate policies that define what actuators in an exoskeleton need to do in each possible scenario. “Let’s take walking. The current state of the art is we put the exoskeleton on you and you walk on a treadmill for an hour. Based on that, we try to adjust its operation to your individual set of movements,” Su explains.

Building handcrafted control policies and doing long human trials for each user makes exoskeletons super expensive, with prices reaching $200,000 or more. So, Su’s team used AI to automatically generate control policies and eliminate human training. “I think within two or three years, exoskeletons priced between $2,000 and $5,000 will be absolutely doable,” Su claims.

His team hopes these savings will come from developing the exoskeleton control policy using a digital model, rather than living, breathing humans.

Digitizing robo-aided humans

Su’s team started by building digital models of a human musculoskeletal system and an exoskeleton robot. Then they used multiple neural networks that operated each component. One was running the digitized model of a human skeleton, moved by simplified muscles. The second neural network was running the exoskeleton model. Finally, the third neural net was responsible for imitating motion—basically predicting how a human model would move wearing the exoskeleton and how the two would interact with each other. “We trained all three neural networks simultaneously to minimize muscle activity,” says Su.

One problem the team faced is that exoskeleton studies typically use a performance metric based on metabolic rate reduction. “Humans, though, are incredibly complex, and it is very hard to build a model with enough fidelity to accurately simulate metabolism,” Su explains. Luckily, according to the team, reducing muscle activations is rather tightly correlated with metabolic rate reduction, so it kept the digital model’s complexity within reasonable limits. The training of the entire human-exoskeleton system with all three neural networks took roughly eight hours on a single RTX 3090 GPU. And the results were record-breaking.

Bridging the sim-to-real gap

After developing the controllers for the digital exoskeleton model, which were developed by the neural networks in simulation, Su’s team simply copy-pasted the control policy to a real controller running a real exoskeleton. Then, they tested how an exoskeleton trained this way would work with 20 different participants. The averaged metabolic rate reduction in walking was over 24 percent, over 13 percent in running, and 15.4 percent in stair climbing—all record numbers, meaning their exoskeleton beat every other exoskeleton ever made in each category.

This was achieved without needing any tweaks to fit it to individual gaits. But the neural networks’ magic didn’t end there.

“The problem with traditional, handcrafted policies was that it was just telling it ‘if walking is detected do one thing; if walking faster is detected do another thing.’ These were [a mix of] finite state machines and switch controllers. We introduced end-to-end continuous control,” says Su. What this continuous control meant was that the exoskeleton could follow the human body as it made smooth transitions between different activities—from walking to running, from running to climbing stairs, etc. There was no abrupt mode switching.

“In terms of software, I think everyone will be using this neural network-based approach soon,” Su claims. To improve the exoskeletons in the future, his team wants to make them quieter, lighter, and more comfortable.

But the plan is also to make them work for people who need them the most. “The limitation now is that we tested these exoskeletons with able-bodied participants, not people with gait impairments. So, what we want to do is something they did in another exoskeleton study at Stanford University. We would take a one-minute video of you walking, and based on that, we would build a model to individualize our general model. This should work well for people with impairments like knee arthritis,” Su claims.

Nature, 2024.  DOI: 10.1038/s41586-024-07382-4

Lightening the load: AI helps exoskeleton work with different strides Read More »

researchers-describe-how-to-tell-if-chatgpt-is-confabulating

Researchers describe how to tell if ChatGPT is confabulating

Researchers describe how to tell if ChatGPT is confabulating

Aurich Lawson | Getty Images

It’s one of the world’s worst-kept secrets that large language models give blatantly false answers to queries and do so with a confidence that’s indistinguishable from when they get things right. There are a number of reasons for this. The AI could have been trained on misinformation; the answer could require some extrapolation from facts that the LLM isn’t capable of; or some aspect of the LLM’s training might have incentivized a falsehood.

But perhaps the simplest explanation is that an LLM doesn’t recognize what constitutes a correct answer but is compelled to provide one. So it simply makes something up, a habit that has been termed confabulation.

Figuring out when an LLM is making something up would obviously have tremendous value, given how quickly people have started relying on them for everything from college essays to job applications. Now, researchers from the University of Oxford say they’ve found a relatively simple way to determine when LLMs appear to be confabulating that works with all popular models and across a broad range of subjects. And, in doing so, they develop evidence that most of the alternative facts LLMs provide are a product of confabulation.

Catching confabulation

The new research is strictly about confabulations, and not instances such as training on false inputs. As the Oxford team defines them in their paper describing the work, confabulations are where “LLMs fluently make claims that are both wrong and arbitrary—by which we mean that the answer is sensitive to irrelevant details such as random seed.”

The reasoning behind their work is actually quite simple. LLMs aren’t trained for accuracy; they’re simply trained on massive quantities of text and learn to produce human-sounding phrasing through that. If enough text examples in its training consistently present something as a fact, then the LLM is likely to present it as a fact. But if the examples in its training are few, or inconsistent in their facts, then the LLMs synthesize a plausible-sounding answer that is likely incorrect.

But the LLM could also run into a similar situation when it has multiple options for phrasing the right answer. To use an example from the researchers’ paper, “Paris,” “It’s in Paris,” and “France’s capital, Paris” are all valid answers to “Where’s the Eiffel Tower?” So, statistical uncertainty, termed entropy in this context, can arise either when the LLM isn’t certain about how to phrase the right answer or when it can’t identify the right answer.

This means it’s not a great idea to simply force the LLM to return “I don’t know” when confronted with several roughly equivalent answers. We’d probably block a lot of correct answers by doing so.

So instead, the researchers focus on what they call semantic entropy. This evaluates all the statistically likely answers evaluated by the LLM and determines how many of them are semantically equivalent. If a large number all have the same meaning, then the LLM is likely uncertain about phrasing but has the right answer. If not, then it is presumably in a situation where it would be prone to confabulation and should be prevented from doing so.

Researchers describe how to tell if ChatGPT is confabulating Read More »

exploration-focused-training-lets-robotics-ai-immediately-handle-new-tasks

Exploration-focused training lets robotics AI immediately handle new tasks

Exploratory —

Maximum Diffusion Reinforcement Learning focuses training on end states, not process.

A woman performs maintenance on a robotic arm.

boonchai wedmakawand

Reinforcement-learning algorithms in systems like ChatGPT or Google’s Gemini can work wonders, but they usually need hundreds of thousands of shots at a task before they get good at it. That’s why it’s always been hard to transfer this performance to robots. You can’t let a self-driving car crash 3,000 times just so it can learn crashing is bad.

But now a team of researchers at Northwestern University may have found a way around it. “That is what we think is going to be transformative in the development of the embodied AI in the real world,” says Thomas Berrueta who led the development of the Maximum Diffusion Reinforcement Learning (MaxDiff RL), an algorithm tailored specifically for robots.

Introducing chaos

The problem with deploying most reinforcement-learning algorithms in robots starts with the built-in assumption that the data they learn from is independent and identically distributed. The independence, in this context, means the value of one variable does not depend on the value of another variable in the dataset—when you flip a coin two times, getting tails on the second attempt does not depend on the result of your first flip. Identical distribution means that the probability of seeing any specific outcome is the same. In the coin-flipping example, the probability of getting heads is the same as getting tails: 50 percent for each.

In virtual, disembodied systems, like YouTube recommendation algorithms, getting such data is easy because most of the time it meets these requirements right off the bat. “You have a bunch of users of a website, and you get data from one of them, and then you get data from another one. Most likely, those two users are not in the same household, they are not highly related to each other. They could be, but it is very unlikely,” says Todd Murphey, a professor of mechanical engineering at Northwestern.

The problem is that, if those two users were related to each other and were in the same household, it could be that the only reason one of them watched a video was that their housemate watched it and told them to watch it. This would violate the independence requirement and compromise the learning.

“In a robot, getting this independent, identically distributed data is not possible in general. You exist at a specific point in space and time when you are embodied, so your experiences have to be correlated in some way,” says Berrueta. To solve this, his team designed an algorithm that pushes robots be as randomly adventurous as possible to get the widest set of experiences to learn from.

Two flavors of entropy

The idea itself is not new. Nearly two decades ago, people in AI figured out algorithms, like Maximum Entropy Reinforcement Learning (MaxEnt RL), that worked by randomizing actions during training. “The hope was that when you take as diverse set of actions as possible, you will explore more varied sets of possible futures. The problem is that those actions do not exist in a vacuum,” Berrueta claims. Every action a robot takes has some kind of impact on its environment and on its own condition—disregarding those impacts completely often leads to trouble. To put it simply, an autonomous car that was teaching itself how to drive using this approach could elegantly park into your driveway but would be just as likely to hit a wall at full speed.

To solve this, Berrueta’s team moved away from maximizing the diversity of actions and went for maximizing the diversity of state changes. Robots powered by MaxDiff RL did not flail their robotic joints at random to see what that would do. Instead, they conceptualized goals like “can I reach this spot ahead of me” and then tried to figure out which actions would take them there safely.

Berrueta and his colleagues achieved that through something called ergodicity, a mathematical concept that says that a point in a moving system will eventually visit all parts of the space that the system moves in. Basically, MaxDiff RL encouraged the robots to achieve every available state in their environment. And the results of first tests in simulated environments were quite surprising.

Racing pool noodles

“In reinforcement learning there are standard benchmarks that people run their algorithms on so we can have a good way of comparing different algorithms on a standard framework,” says Allison Pinosky, a researcher at Northwestern and co-author of the MaxDiff RL study. One of those benchmarks is a simulated swimmer: a three-link body resting on the ground in a viscous environment that needs to learn to swim as fast as possible in a certain direction.

In the swimmer test, MaxDiff RL outperformed two other state-of-the-art reinforcement learning algorithms (NN-MPPI and SAC). These two needed several resets to figure out how to move the swimmers. To complete the task, they were following a standard AI learning process divided down into a training phase where an algorithm goes through multiple failed attempts to slowly improve its performance, and a testing phase where it tries to perform the learned task. MaxDiff RL, by contrast, nailed it, immediately adapting its learned behaviors to the new task.

The earlier algorithms ended up failing to learn because they got stuck trying the same options and never progressing to where they could learn that alternatives work. “They experienced the same data repeatedly because they were locally doing certain actions, and they assumed that was all they could do and stopped learning,” Pinosky explains. MaxDiff RL, on the other hand, continued changing states, exploring, getting richer data to learn from, and finally succeeded. And because, by design, it seeks to achieve every possible state, it can potentially complete all possible tasks within an environment.

But does this mean we can take MaxDiff RL, upload it to a self-driving car, and let it out on the road to figure everything out on its own? Not really.

Exploration-focused training lets robotics AI immediately handle new tasks Read More »

high-speed-imaging-and-ai-help-us-understand-how-insect-wings-work

High-speed imaging and AI help us understand how insect wings work

Black and white images of a fly with its wings in a variety of positions, showing the details of a wing beat.

Enlarge / A time-lapse showing how an insect’s wing adopts very specific positions during flight.

Florian Muijres, Dickinson Lab

About 350 million years ago, our planet witnessed the evolution of the first flying creatures. They are still around, and some of them continue to annoy us with their buzzing. While scientists have classified these creatures as pterygotes, the rest of the world simply calls them winged insects.

There are many aspects of insect biology, especially their flight, that remain a mystery for scientists. One is simply how they move their wings. The insect wing hinge is a specialized joint that connects an insect’s wings with its body. It’s composed of five interconnected plate-like structures called sclerites. When these plates are shifted by the underlying muscles, it makes the insect wings flap.

Until now, it has been tricky for scientists to understand the biomechanics that govern the motion of the sclerites even using advanced imaging technologies. “The sclerites within the wing hinge are so small and move so rapidly that their mechanical operation during flight has not been accurately captured despite efforts using stroboscopic photography, high-speed videography, and X-ray tomography,” Michael Dickinson, Zarem professor of biology and bioengineering at the California Institute of Technology (Caltech), told Ars Technica.

As a result, scientists are unable to visualize exactly what’s going on at the micro-scale within the wing hinge as they fly, preventing them from studying insect flight in detail. However, a new study by Dickinson and his team finally revealed the working of sclerites and the insect wing hinge. They captured the wing motion of fruit flies (Drosophila melanogaster) analyzing 72,000 recorded wing beats using a neural network to decode the role individual sclerites played in shaping insect wing motion.

Understanding the insect wing hinge

The biomechanics that govern insect flight are quite different from those of birds and bats. This is because wings in insects didn’t evolve from limbs. “In the case of birds, bats, and pterosaurs we know exactly where the wings came from evolutionarily because all these animals fly with their forelimbs. They’re basically using their arms to fly. In insects, it’s a completely different story. They evolved from six-legged organisms and they kept all six legs. However, they added flapping appendages to the dorsal side of their body, and it is a mystery as to where those wings came from,” Dickinson explained.

Some researchers suggest that insect wings came from gill-like appendages present in ancient aquatic arthropods. Others argue that wings originated from “lobes,” special outgrowths found on the legs of ancient crustaceans, which were ancestors of insects. This debate is still ongoing, so its evolution can’t tell us much about how the hinge and the sclerites operate.

Understanding the hinge mechanics is crucial because this is what makes insects efficient flying creatures. It enables them to fly at impressive speeds relative to their body sizes (some insects can fly at 33 mph) and to demonstrate great maneuverability and stability while in flight.

“The insect wing hinge is arguably among the most sophisticated and evolutionarily important skeletal structures in the natural world,” according to the study authors.

However, imaging the activity of four of the five sclerites that form the hinge has been impossible due to their size and the speeds at which they move. Dickinson and his team employed a multidisciplinary approach to overcome this challenge. They designed an apparatus equipped with three high-speed cameras that recorded the activity of tethered fruit flies at 15,000 frames per second using infrared light.

They also used a calcium-sensitive protein to track changes in the activity of the steering muscles of the insects as they flew (calcium helps trigger muscle contractions). “We recorded a total of 485 flight sequences from 82 flies. After excluding a subset of wingbeats from sequences when the fly either stopped flying or flew at an abnormally low wingbeat frequency, we obtained a final dataset of 72,219 wingbeats,” the researchers note.

Next, they trained a machine-learning-based convolutional neural network (CNN) using 85 percent of the dataset. “We used the CNN model to investigate the transformation between muscle activity and wing motion by performing a set of virtual manipulations, exploiting the network to execute experiments that would be difficult to perform on actual flies,” they explained.

In addition to the neural network, they also developed an encoder-decoder neural network (an architecture used in machine learning) and fed it data related to steering muscle activity. While the CNN model could predict wing motion, the encoder/decoder could predict the action of individual sclerite muscles during the movement of the wings. Now, it was time to check whether the data they predicted was accurate.

High-speed imaging and AI help us understand how insect wings work Read More »

quantum-computing-progress:-higher-temps,-better-error-correction

Quantum computing progress: Higher temps, better error correction

conceptual graphic of symbols representing quantum states floating above a stylized computer chip.

There’s a strong consensus that tackling most useful problems with a quantum computer will require that the computer be capable of error correction. There is absolutely no consensus, however, about what technology will allow us to get there. A large number of companies, including major players like Microsoft, Intel, Amazon, and IBM, have all committed to different technologies to get there, while a collection of startups are exploring an even wider range of potential solutions.

We probably won’t have a clearer picture of what’s likely to work for a few years. But there’s going to be lots of interesting research and development work between now and then, some of which may ultimately represent key milestones in the development of quantum computing. To give you a sense of that work, we’re going to look at three papers that were published within the last couple of weeks, each of which tackles a different aspect of quantum computing technology.

Hot stuff

Error correction will require connecting multiple hardware qubits to act as a single unit termed a logical qubit. This spreads a single bit of quantum information across multiple hardware qubits, making it more robust. Additional qubits are used to monitor the behavior of the ones holding the data and perform corrections as needed. Some error correction schemes require over a hundred hardware qubits for each logical qubit, meaning we’d need tens of thousands of hardware qubits before we could do anything practical.

A number of companies have looked at that problem and decided we already know how to create hardware on that scale—just look at any silicon chip. So, if we could etch useful qubits through the same processes we use to make current processors, then scaling wouldn’t be an issue. Typically, this has meant fabricating quantum dots on the surface of silicon chips and using these to store single electrons that can hold a qubit in their spin. The rest of the chip holds more traditional circuitry that performs the initiation, control, and readout of the qubit.

This creates a notable problem. Like many other qubit technologies, quantum dots need to be kept below one Kelvin in order to keep the environment from interfering with the qubit. And, as anyone who’s ever owned an x86-based laptop knows, all the other circuitry on the silicon generates heat. So, there’s the very real prospect that trying to control the qubits will raise the temperature to the point that the qubits can’t hold onto their state.

That might not be the problem that we thought, according to some work published in Wednesday’s Nature. A large international team that includes people from the startup Diraq have shown that a silicon quantum dot processor can work well at the relatively toasty temperature of 1 Kelvin, up from the usual milliKelvin that these processors normally operate at.

The work was done on a two-qubit prototype made with materials that were specifically chosen to improve noise tolerance; the experimental procedure was also optimized to limit errors. The team then performed normal operations starting at 0.1 K, and gradually ramped up the temperatures to 1.5 K, checking performance as they did so. They found that a major source of errors, state preparation and measurement (SPAM), didn’t change dramatically in this temperature range: “SPAM around 1 K is comparable to that at millikelvin temperatures and remains workable at least until 1.4 K.”

The error rates they did see depended on the state they were preparing. One particular state (both spin-up) had a fidelity of over 99 percent, while the rest were less constrained, at somewhere above 95 percent. States had a lifetime of over a millisecond, which qualifies as long-lived int he quantum world.

All of which is pretty good, and suggests that the chips can tolerate reasonable operating temperatures, meaning on-chip control circuitry can be used without causing problems. The error rates of the hardware qubits are still well above those that would be needed for error correction to work. However, the researchers suggest that they’ve identified error processes that can potentially be compensated for. They expect that the ability to do industrial-scale manufacturing will ultimately lead to working hardware.

Quantum computing progress: Higher temps, better error correction Read More »

alternate-qubit-design-does-error-correction-in-hardware

Alternate qubit design does error correction in hardware

We can fix that —

Early-stage technology has the potential to cut qubits needed for useful computers.

Image of a complicated set of wires and cables hooked up to copper colored metal hardware.

Nord Quantique

There’s a general consensus that performing any sort of complex algorithm on quantum hardware will have to wait for the arrival of error-corrected qubits. Individual qubits are too error-prone to be trusted for complex calculations, so quantum information will need to be distributed across multiple qubits, allowing monitoring for errors and intervention when they occur.

But most ways of making these “logical qubits” needed for error correction require anywhere from dozens to over a hundred individual hardware qubits. This means we’ll need anywhere from tens of thousands to millions of hardware qubits to do calculations. Existing hardware has only cleared the 1,000-qubit mark within the last month, so that future appears to be several years off at best.

But on Thursday, a company called Nord Quantique announced that it had demonstrated error correction using a single qubit with a distinct hardware design. While this has the potential to greatly reduce the number of hardware qubits needed for useful error correction, the demonstration involved a single qubit—the company doesn’t even expect to demonstrate operations on pairs of qubits until later this year.

Meet the bosonic qubit

The technology underlying this work is termed a bosonic qubit, and they’re not anything new; an optical instrument company even has a product listing for them that notes their potential for use in error correction. But while the concepts behind using them in this manner were well established, demonstrations were lagging. Nord Quantique has now posted a paper in the arXiv that details a demonstration of them actually lowering error rates.

The devices are structured much like a transmon, the form of qubit favored by tech heavyweights like IBM and Google. There, the quantum information is stored in a loop of superconducting wire and is controlled by what’s called a microwave resonator—a small bit of material where microwave photons will reflect back and forth for a while before being lost.

A bosonic qubit turns that situation on its head. In this hardware, the quantum information is held in the photons, while the superconducting wire and resonator control the system. These are both hooked up to a coaxial cavity (think of a structure that, while microscopic, looks a bit like the end of a cable connector).

Massively simplified, the quantum information is stored in the manner in which the photons in the cavity interact. The state of the photons can be monitored by the linked resonator/superconducting wire. If something appears to be off, the resonator/superconducting wire allows interventions to be made to restore the original state. Additional qubits are not needed. “A very simple and basic idea behind quantum error correction is redundancy,” co-founder and CTO Julien Camirand Lemyre told Ars. “One thing about resonators and oscillators in superconducting circuits is that you can put a lot of photons inside the resonators. And for us, the redundancy comes from there.”

This process doesn’t correct all possible errors, so it doesn’t eliminate the need for logical qubits made from multiple underlying hardware qubits. In theory, though, you can catch the two most common forms of errors that qubits are prone to (bit flips and changes in phase).

In the arXiv preprint, the team at Nord Quantique demonstrated that the system works. Using a single qubit and simply measuring whether it holds onto its original state, the error correction system can reduce problems by 14 percent. Unfortunately, overall fidelity is also low, starting at about 85 percent, which is significantly below what’s seen in other systems that have been through years of development work. Some qubits have been demonstrated with a fidelity of over 99 percent.

Getting competitive

So there’s no question that Nord Quantique is well behind a number of the leaders in quantum computing that can perform (error-prone) calculations with dozens of qubits and have far lower error rates. Again, Nord Quantique’s work was done using a single qubit—and without doing any of the operations needed to perform a calculation.

Lemyre told Ars that while the company is small, it benefits from being a spin-out of the Institut Quantique at Canada’s Sherbrooke University, one of Canada’s leading quantum research centers. In addition to having access to the expertise there, Nord Quantique uses a fabrication facility at Sherbrooke to make its hardware.

Over the next year, the company expects to demonstrate that the error correction scheme can function while pairs of qubits are used to perform gate operations, the fundamental units of calculations. Another high priority is to combine this hardware-based error correction with more traditional logical qubit schemes, which would allow additional types of errors to be caught and corrected. This would involve operations with a dozen or more of these bosonic qubits at a time.

But the real challenge will be in the longer term. The company is counting on its hardware’s ability to handle error correction to reduce the number of qubits needed for useful calculations. But if its competitors can scale up the number of qubits fast enough while maintaining the control and error rates needed, that may not ultimately matter. Put differently, if Nord Quantique is still in the hundreds of qubit range by the time other companies are in the hundreds of thousands, its technology might not succeed even if it has some inherent advantages.

But that’s the fun part about the field as things stand: We don’t really know. A handful of very different technologies are already well into development and show some promise. And there are other sets that are still early in the development process but are thought to have a smoother path to scaling to useful numbers of qubits. All of them will have to scale to a minimum of tens of thousands of qubits while enabling the ability to perform quantum manipulations that were cutting-edge science just a few decades ago.

Looming in the background is the simple fact that we’ve never tried to scale anything like this to the extent that will be needed. Unforeseen technical hurdles might limit progress at some point in the future.

Despite all this, there are people backing each of these technologies who know far more about quantum mechanics than I ever will. It’s a fun time.

Alternate qubit design does error correction in hardware Read More »

what-do-threads,-mastodon,-and-hospital-records-have-in-common?

What do Threads, Mastodon, and hospital records have in common?

A medical technician looks at a scan on a computer monitor.

It’s taken a while, but social media platforms now know that people prefer their information kept away from corporate eyes and malevolent algorithms. That’s why the newest generation of social media sites like Threads, Mastodon, and Bluesky boast of being part of the “fediverse.” Here, user data is hosted on independent servers rather than one corporate silo. Platforms then use common standards to share information when needed. If one server starts to host too many harmful accounts, other servers can choose to block it.

They’re not the only ones embracing this approach. Medical researchers think a similar strategy could help them train machine learning to spot disease trends in patients. Putting their AI algorithms on special servers within hospitals for “federated learning” could keep privacy standards high while letting researchers unravel new ways to detect and treat diseases.

“The use of AI is just exploding in all facets of life,” said Ronald M. Summers of the National Institutes of Health Clinical Center in Maryland, who uses the method in his radiology research. “There’s a lot of people interested in using federated learning for a variety of different data analysis applications.”

How does it work?

Until now, medical researchers refined their AI algorithms using a few carefully curated databases, usually anonymized medical information from patients taking part in clinical studies.

However, improving these models further means they need a larger dataset with real-world patient information. Researchers could pool data from several hospitals into one database, but that means asking them to hand over sensitive and highly regulated information. Sending patient information outside a hospital’s firewall is a big risk, so getting permission can be a long and legally complicated process. National privacy laws and the EU’s GDPR law set strict rules on sharing a patient’s personal information.

So instead, medical researchers are sending their AI model to hospitals so it can analyze a dataset while staying within the hospital’s firewall.

Typically, doctors first identify eligible patients for a study, select any clinical data they need for training, confirm its accuracy, and then organize it on a local database. The database is then placed onto a server at the hospital that is linked to the federated learning AI software. Once the software receives instructions from the researchers, it can work its AI magic, training itself with the hospital’s local data to find specific disease trends.

Every so often, this trained model is then sent back to a central server, where it joins models from other hospitals. An aggregation method processes these trained models to update the original model. For example, Google’s popular FedAvg aggregation algorithm takes each element of the trained models’ parameters and creates an average. Each average becomes part of the model update, with their input to the aggregate model weighted proportionally to the size of their training dataset.

In other words, how these models change gets aggregated in the central server to create an updated “consensus model.” This consensus model is then sent back to each hospital’s local database to be trained once again. The cycle continues until researchers judge the final consensus model to be accurate enough. (There’s a review of this process available.)

This keeps both sides happy. For hospitals, it helps preserve privacy since information sent back to the central server is anonymous; personal information never crosses the hospital’s firewall. It also means machine/AI learning can reach its full potential by training on real-world data so researchers get less biased results that are more likely to be sensitive to niche diseases.

Over the past few years, there has been a boom in research using this method. For example, in 2021, Summers and others used federated learning to see whether they could predict diabetes from CT scans of abdomens.

“We found that there were signatures of diabetes on the CT scanner [for] the pancreas that preceded the diagnosis of diabetes by as much as seven years,” said Summers. “That got us very excited that we might be able to help patients that are at risk.”

What do Threads, Mastodon, and hospital records have in common? Read More »

if-ai-is-making-the-turing-test-obsolete,-what-might-be-better?

If AI is making the Turing test obsolete, what might be better?

A white android sitting at a table in a depressed manner with an alchoholic drink. Very high resolution 3D render.

If a machine or an AI program matches or surpasses human intelligence, does that mean it can simulate humans perfectly? If yes, then what about reasoning—our ability to apply logic and think rationally before making decisions? How could we even identify whether an AI program can reason? To try to answer this question, a team of researchers has proposed a novel framework that works like a psychological study for software.

“This test treats an ‘intelligent’ program as though it were a participant in a psychological study and has three steps: (a) test the program in a set of experiments examining its inferences, (b) test its understanding of its own way of reasoning, and (c) examine, if possible, the cognitive adequacy of the source code for the program,” the researchers note.

They suggest the standard methods of evaluating a machine’s intelligence, such as the Turing Test, can only tell you if the machine is good at processing information and mimicking human responses. The current generations of AI programs, such as Google’s LaMDA and OpenAI’s ChatGPT, for example, have come close to passing the Turing Test, yet the test results don’t imply these programs can think and reason like humans.

This is why the Turing Test may no longer be relevant, and there is a need for new evaluation methods that could effectively assess the intelligence of machines, according to the researchers. They claim that their framework could be an alternative to the Turing Test. “We propose to replace the Turing test with a more focused and fundamental one to answer the question: do programs reason in the way that humans reason?” the study authors argue.

What’s wrong with the Turing Test?

During the Turing Test, evaluators play different games involving text-based communications with real humans and AI programs (machines or chatbots). It is a blind test, so evaluators don’t know whether they are texting with a human or a chatbot. If the AI programs are successful in generating human-like responses—to the extent that evaluators struggle to distinguish between the human and the AI program—the AI is considered to have passed. However, since the Turing Test is based on subjective interpretation, these results are also subjective.

The researchers suggest that there are several limitations associated with the Turing Test. For instance, any of the games played during the test are imitation games designed to test whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they receive. ChatGPT is great at mimicking human language, even in responses where it gives out incorrect information. So, the test clearly doesn’t evaluate a machine’s reasoning and logical ability.

The results of the Turing Test also can’t tell you if a machine can introspect. We often think about our past actions and reflect on our lives and decisions, a critical ability that prevents us from repeating the same mistakes. The same applies to AI as well, according to a study from Stanford University which suggests that machines that could self-reflect are more practical for human use.

“AI agents that can leverage prior experience and adapt well by efficiently exploring new or changing environments will lead to much more adaptive, flexible technologies, from household robotics to personalized learning tools,” Nick Haber, an assistant professor from Stanford University who was not involved in the current study, said.

In addition to this, the Turing Test fails to analyze an AI program’s ability to think. In a recent Turing Test experiment, GPT-4 was able to convince evaluators that they were texting with humans over 40 percent of the time. However, this score fails to answer the basic question: Can the AI program think?

Alan Turing, the famous British scientist who created the Turing Test, once said, “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.” His test only covers one aspect of human intelligence, though: imitation. Although it is possible to deceive someone using this one aspect, many experts believe that a machine can never achieve true human intelligence without including those other aspects.

“It’s unclear whether passing the Turing Test is a meaningful milestone or not. It doesn’t tell us anything about what a system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence,” Mustafa Suleyman, an AI expert and founder of DeepAI, told Bloomberg.

If AI is making the Turing test obsolete, what might be better? Read More »