Computer science

lightening-the-load:-ai-helps-exoskeleton-work-with-different-strides

Lightening the load: AI helps exoskeleton work with different strides

One model to rule them all —

A model trained in a virtual environment does remarkably well in the real world.

Image of two people using powered exoskeletons to move heavy items around, as seen in the movie Aliens.

Enlarge / Right now, the software doesn’t do arms, so don’t go taking on any aliens with it.

20th Century Fox

Exoskeletons today look like something straight out of sci-fi. But the reality is they are nowhere near as robust as their fictional counterparts. They’re quite wobbly, and it takes long hours of handcrafting software policies, which regulate how they work—a process that has to be repeated for each individual user.

To bring the technology a bit closer to Avatar’s Skel Suits or Warhammer 40k power armor, a team at North Carolina University’s Lab of Biomechatronics and Intelligent Robotics used AI to build the first one-size-fits-all exoskeleton that supports walking, running, and stair-climbing. Critically, its software adapts itself to new users with no need for any user-specific adjustments. “You just wear it and it works,” says Hao Su, an associate professor and co-author of the study.

Tailor-made robots

An exoskeleton is a robot you wear to aid your movements—it makes walking, running, and other activities less taxing, the same way an e-bike adds extra watts on top of those you generate yourself, making pedaling easier. “The problem is, exoskeletons have a hard time understanding human intentions, whether you want to run or walk or climb stairs. It’s solved with locomotion recognition: systems that recognize human locomotion intentions,” says Su.

Building those locomotion recognition systems currently relies on elaborate policies that define what actuators in an exoskeleton need to do in each possible scenario. “Let’s take walking. The current state of the art is we put the exoskeleton on you and you walk on a treadmill for an hour. Based on that, we try to adjust its operation to your individual set of movements,” Su explains.

Building handcrafted control policies and doing long human trials for each user makes exoskeletons super expensive, with prices reaching $200,000 or more. So, Su’s team used AI to automatically generate control policies and eliminate human training. “I think within two or three years, exoskeletons priced between $2,000 and $5,000 will be absolutely doable,” Su claims.

His team hopes these savings will come from developing the exoskeleton control policy using a digital model, rather than living, breathing humans.

Digitizing robo-aided humans

Su’s team started by building digital models of a human musculoskeletal system and an exoskeleton robot. Then they used multiple neural networks that operated each component. One was running the digitized model of a human skeleton, moved by simplified muscles. The second neural network was running the exoskeleton model. Finally, the third neural net was responsible for imitating motion—basically predicting how a human model would move wearing the exoskeleton and how the two would interact with each other. “We trained all three neural networks simultaneously to minimize muscle activity,” says Su.

One problem the team faced is that exoskeleton studies typically use a performance metric based on metabolic rate reduction. “Humans, though, are incredibly complex, and it is very hard to build a model with enough fidelity to accurately simulate metabolism,” Su explains. Luckily, according to the team, reducing muscle activations is rather tightly correlated with metabolic rate reduction, so it kept the digital model’s complexity within reasonable limits. The training of the entire human-exoskeleton system with all three neural networks took roughly eight hours on a single RTX 3090 GPU. And the results were record-breaking.

Bridging the sim-to-real gap

After developing the controllers for the digital exoskeleton model, which were developed by the neural networks in simulation, Su’s team simply copy-pasted the control policy to a real controller running a real exoskeleton. Then, they tested how an exoskeleton trained this way would work with 20 different participants. The averaged metabolic rate reduction in walking was over 24 percent, over 13 percent in running, and 15.4 percent in stair climbing—all record numbers, meaning their exoskeleton beat every other exoskeleton ever made in each category.

This was achieved without needing any tweaks to fit it to individual gaits. But the neural networks’ magic didn’t end there.

“The problem with traditional, handcrafted policies was that it was just telling it ‘if walking is detected do one thing; if walking faster is detected do another thing.’ These were [a mix of] finite state machines and switch controllers. We introduced end-to-end continuous control,” says Su. What this continuous control meant was that the exoskeleton could follow the human body as it made smooth transitions between different activities—from walking to running, from running to climbing stairs, etc. There was no abrupt mode switching.

“In terms of software, I think everyone will be using this neural network-based approach soon,” Su claims. To improve the exoskeletons in the future, his team wants to make them quieter, lighter, and more comfortable.

But the plan is also to make them work for people who need them the most. “The limitation now is that we tested these exoskeletons with able-bodied participants, not people with gait impairments. So, what we want to do is something they did in another exoskeleton study at Stanford University. We would take a one-minute video of you walking, and based on that, we would build a model to individualize our general model. This should work well for people with impairments like knee arthritis,” Su claims.

Nature, 2024.  DOI: 10.1038/s41586-024-07382-4

Lightening the load: AI helps exoskeleton work with different strides Read More »

researchers-describe-how-to-tell-if-chatgpt-is-confabulating

Researchers describe how to tell if ChatGPT is confabulating

Researchers describe how to tell if ChatGPT is confabulating

Aurich Lawson | Getty Images

It’s one of the world’s worst-kept secrets that large language models give blatantly false answers to queries and do so with a confidence that’s indistinguishable from when they get things right. There are a number of reasons for this. The AI could have been trained on misinformation; the answer could require some extrapolation from facts that the LLM isn’t capable of; or some aspect of the LLM’s training might have incentivized a falsehood.

But perhaps the simplest explanation is that an LLM doesn’t recognize what constitutes a correct answer but is compelled to provide one. So it simply makes something up, a habit that has been termed confabulation.

Figuring out when an LLM is making something up would obviously have tremendous value, given how quickly people have started relying on them for everything from college essays to job applications. Now, researchers from the University of Oxford say they’ve found a relatively simple way to determine when LLMs appear to be confabulating that works with all popular models and across a broad range of subjects. And, in doing so, they develop evidence that most of the alternative facts LLMs provide are a product of confabulation.

Catching confabulation

The new research is strictly about confabulations, and not instances such as training on false inputs. As the Oxford team defines them in their paper describing the work, confabulations are where “LLMs fluently make claims that are both wrong and arbitrary—by which we mean that the answer is sensitive to irrelevant details such as random seed.”

The reasoning behind their work is actually quite simple. LLMs aren’t trained for accuracy; they’re simply trained on massive quantities of text and learn to produce human-sounding phrasing through that. If enough text examples in its training consistently present something as a fact, then the LLM is likely to present it as a fact. But if the examples in its training are few, or inconsistent in their facts, then the LLMs synthesize a plausible-sounding answer that is likely incorrect.

But the LLM could also run into a similar situation when it has multiple options for phrasing the right answer. To use an example from the researchers’ paper, “Paris,” “It’s in Paris,” and “France’s capital, Paris” are all valid answers to “Where’s the Eiffel Tower?” So, statistical uncertainty, termed entropy in this context, can arise either when the LLM isn’t certain about how to phrase the right answer or when it can’t identify the right answer.

This means it’s not a great idea to simply force the LLM to return “I don’t know” when confronted with several roughly equivalent answers. We’d probably block a lot of correct answers by doing so.

So instead, the researchers focus on what they call semantic entropy. This evaluates all the statistically likely answers evaluated by the LLM and determines how many of them are semantically equivalent. If a large number all have the same meaning, then the LLM is likely uncertain about phrasing but has the right answer. If not, then it is presumably in a situation where it would be prone to confabulation and should be prevented from doing so.

Researchers describe how to tell if ChatGPT is confabulating Read More »

exploration-focused-training-lets-robotics-ai-immediately-handle-new-tasks

Exploration-focused training lets robotics AI immediately handle new tasks

Exploratory —

Maximum Diffusion Reinforcement Learning focuses training on end states, not process.

A woman performs maintenance on a robotic arm.

boonchai wedmakawand

Reinforcement-learning algorithms in systems like ChatGPT or Google’s Gemini can work wonders, but they usually need hundreds of thousands of shots at a task before they get good at it. That’s why it’s always been hard to transfer this performance to robots. You can’t let a self-driving car crash 3,000 times just so it can learn crashing is bad.

But now a team of researchers at Northwestern University may have found a way around it. “That is what we think is going to be transformative in the development of the embodied AI in the real world,” says Thomas Berrueta who led the development of the Maximum Diffusion Reinforcement Learning (MaxDiff RL), an algorithm tailored specifically for robots.

Introducing chaos

The problem with deploying most reinforcement-learning algorithms in robots starts with the built-in assumption that the data they learn from is independent and identically distributed. The independence, in this context, means the value of one variable does not depend on the value of another variable in the dataset—when you flip a coin two times, getting tails on the second attempt does not depend on the result of your first flip. Identical distribution means that the probability of seeing any specific outcome is the same. In the coin-flipping example, the probability of getting heads is the same as getting tails: 50 percent for each.

In virtual, disembodied systems, like YouTube recommendation algorithms, getting such data is easy because most of the time it meets these requirements right off the bat. “You have a bunch of users of a website, and you get data from one of them, and then you get data from another one. Most likely, those two users are not in the same household, they are not highly related to each other. They could be, but it is very unlikely,” says Todd Murphey, a professor of mechanical engineering at Northwestern.

The problem is that, if those two users were related to each other and were in the same household, it could be that the only reason one of them watched a video was that their housemate watched it and told them to watch it. This would violate the independence requirement and compromise the learning.

“In a robot, getting this independent, identically distributed data is not possible in general. You exist at a specific point in space and time when you are embodied, so your experiences have to be correlated in some way,” says Berrueta. To solve this, his team designed an algorithm that pushes robots be as randomly adventurous as possible to get the widest set of experiences to learn from.

Two flavors of entropy

The idea itself is not new. Nearly two decades ago, people in AI figured out algorithms, like Maximum Entropy Reinforcement Learning (MaxEnt RL), that worked by randomizing actions during training. “The hope was that when you take as diverse set of actions as possible, you will explore more varied sets of possible futures. The problem is that those actions do not exist in a vacuum,” Berrueta claims. Every action a robot takes has some kind of impact on its environment and on its own condition—disregarding those impacts completely often leads to trouble. To put it simply, an autonomous car that was teaching itself how to drive using this approach could elegantly park into your driveway but would be just as likely to hit a wall at full speed.

To solve this, Berrueta’s team moved away from maximizing the diversity of actions and went for maximizing the diversity of state changes. Robots powered by MaxDiff RL did not flail their robotic joints at random to see what that would do. Instead, they conceptualized goals like “can I reach this spot ahead of me” and then tried to figure out which actions would take them there safely.

Berrueta and his colleagues achieved that through something called ergodicity, a mathematical concept that says that a point in a moving system will eventually visit all parts of the space that the system moves in. Basically, MaxDiff RL encouraged the robots to achieve every available state in their environment. And the results of first tests in simulated environments were quite surprising.

Racing pool noodles

“In reinforcement learning there are standard benchmarks that people run their algorithms on so we can have a good way of comparing different algorithms on a standard framework,” says Allison Pinosky, a researcher at Northwestern and co-author of the MaxDiff RL study. One of those benchmarks is a simulated swimmer: a three-link body resting on the ground in a viscous environment that needs to learn to swim as fast as possible in a certain direction.

In the swimmer test, MaxDiff RL outperformed two other state-of-the-art reinforcement learning algorithms (NN-MPPI and SAC). These two needed several resets to figure out how to move the swimmers. To complete the task, they were following a standard AI learning process divided down into a training phase where an algorithm goes through multiple failed attempts to slowly improve its performance, and a testing phase where it tries to perform the learned task. MaxDiff RL, by contrast, nailed it, immediately adapting its learned behaviors to the new task.

The earlier algorithms ended up failing to learn because they got stuck trying the same options and never progressing to where they could learn that alternatives work. “They experienced the same data repeatedly because they were locally doing certain actions, and they assumed that was all they could do and stopped learning,” Pinosky explains. MaxDiff RL, on the other hand, continued changing states, exploring, getting richer data to learn from, and finally succeeded. And because, by design, it seeks to achieve every possible state, it can potentially complete all possible tasks within an environment.

But does this mean we can take MaxDiff RL, upload it to a self-driving car, and let it out on the road to figure everything out on its own? Not really.

Exploration-focused training lets robotics AI immediately handle new tasks Read More »

high-speed-imaging-and-ai-help-us-understand-how-insect-wings-work

High-speed imaging and AI help us understand how insect wings work

Black and white images of a fly with its wings in a variety of positions, showing the details of a wing beat.

Enlarge / A time-lapse showing how an insect’s wing adopts very specific positions during flight.

Florian Muijres, Dickinson Lab

About 350 million years ago, our planet witnessed the evolution of the first flying creatures. They are still around, and some of them continue to annoy us with their buzzing. While scientists have classified these creatures as pterygotes, the rest of the world simply calls them winged insects.

There are many aspects of insect biology, especially their flight, that remain a mystery for scientists. One is simply how they move their wings. The insect wing hinge is a specialized joint that connects an insect’s wings with its body. It’s composed of five interconnected plate-like structures called sclerites. When these plates are shifted by the underlying muscles, it makes the insect wings flap.

Until now, it has been tricky for scientists to understand the biomechanics that govern the motion of the sclerites even using advanced imaging technologies. “The sclerites within the wing hinge are so small and move so rapidly that their mechanical operation during flight has not been accurately captured despite efforts using stroboscopic photography, high-speed videography, and X-ray tomography,” Michael Dickinson, Zarem professor of biology and bioengineering at the California Institute of Technology (Caltech), told Ars Technica.

As a result, scientists are unable to visualize exactly what’s going on at the micro-scale within the wing hinge as they fly, preventing them from studying insect flight in detail. However, a new study by Dickinson and his team finally revealed the working of sclerites and the insect wing hinge. They captured the wing motion of fruit flies (Drosophila melanogaster) analyzing 72,000 recorded wing beats using a neural network to decode the role individual sclerites played in shaping insect wing motion.

Understanding the insect wing hinge

The biomechanics that govern insect flight are quite different from those of birds and bats. This is because wings in insects didn’t evolve from limbs. “In the case of birds, bats, and pterosaurs we know exactly where the wings came from evolutionarily because all these animals fly with their forelimbs. They’re basically using their arms to fly. In insects, it’s a completely different story. They evolved from six-legged organisms and they kept all six legs. However, they added flapping appendages to the dorsal side of their body, and it is a mystery as to where those wings came from,” Dickinson explained.

Some researchers suggest that insect wings came from gill-like appendages present in ancient aquatic arthropods. Others argue that wings originated from “lobes,” special outgrowths found on the legs of ancient crustaceans, which were ancestors of insects. This debate is still ongoing, so its evolution can’t tell us much about how the hinge and the sclerites operate.

Understanding the hinge mechanics is crucial because this is what makes insects efficient flying creatures. It enables them to fly at impressive speeds relative to their body sizes (some insects can fly at 33 mph) and to demonstrate great maneuverability and stability while in flight.

“The insect wing hinge is arguably among the most sophisticated and evolutionarily important skeletal structures in the natural world,” according to the study authors.

However, imaging the activity of four of the five sclerites that form the hinge has been impossible due to their size and the speeds at which they move. Dickinson and his team employed a multidisciplinary approach to overcome this challenge. They designed an apparatus equipped with three high-speed cameras that recorded the activity of tethered fruit flies at 15,000 frames per second using infrared light.

They also used a calcium-sensitive protein to track changes in the activity of the steering muscles of the insects as they flew (calcium helps trigger muscle contractions). “We recorded a total of 485 flight sequences from 82 flies. After excluding a subset of wingbeats from sequences when the fly either stopped flying or flew at an abnormally low wingbeat frequency, we obtained a final dataset of 72,219 wingbeats,” the researchers note.

Next, they trained a machine-learning-based convolutional neural network (CNN) using 85 percent of the dataset. “We used the CNN model to investigate the transformation between muscle activity and wing motion by performing a set of virtual manipulations, exploiting the network to execute experiments that would be difficult to perform on actual flies,” they explained.

In addition to the neural network, they also developed an encoder-decoder neural network (an architecture used in machine learning) and fed it data related to steering muscle activity. While the CNN model could predict wing motion, the encoder/decoder could predict the action of individual sclerite muscles during the movement of the wings. Now, it was time to check whether the data they predicted was accurate.

High-speed imaging and AI help us understand how insect wings work Read More »

quantum-computing-progress:-higher-temps,-better-error-correction

Quantum computing progress: Higher temps, better error correction

conceptual graphic of symbols representing quantum states floating above a stylized computer chip.

There’s a strong consensus that tackling most useful problems with a quantum computer will require that the computer be capable of error correction. There is absolutely no consensus, however, about what technology will allow us to get there. A large number of companies, including major players like Microsoft, Intel, Amazon, and IBM, have all committed to different technologies to get there, while a collection of startups are exploring an even wider range of potential solutions.

We probably won’t have a clearer picture of what’s likely to work for a few years. But there’s going to be lots of interesting research and development work between now and then, some of which may ultimately represent key milestones in the development of quantum computing. To give you a sense of that work, we’re going to look at three papers that were published within the last couple of weeks, each of which tackles a different aspect of quantum computing technology.

Hot stuff

Error correction will require connecting multiple hardware qubits to act as a single unit termed a logical qubit. This spreads a single bit of quantum information across multiple hardware qubits, making it more robust. Additional qubits are used to monitor the behavior of the ones holding the data and perform corrections as needed. Some error correction schemes require over a hundred hardware qubits for each logical qubit, meaning we’d need tens of thousands of hardware qubits before we could do anything practical.

A number of companies have looked at that problem and decided we already know how to create hardware on that scale—just look at any silicon chip. So, if we could etch useful qubits through the same processes we use to make current processors, then scaling wouldn’t be an issue. Typically, this has meant fabricating quantum dots on the surface of silicon chips and using these to store single electrons that can hold a qubit in their spin. The rest of the chip holds more traditional circuitry that performs the initiation, control, and readout of the qubit.

This creates a notable problem. Like many other qubit technologies, quantum dots need to be kept below one Kelvin in order to keep the environment from interfering with the qubit. And, as anyone who’s ever owned an x86-based laptop knows, all the other circuitry on the silicon generates heat. So, there’s the very real prospect that trying to control the qubits will raise the temperature to the point that the qubits can’t hold onto their state.

That might not be the problem that we thought, according to some work published in Wednesday’s Nature. A large international team that includes people from the startup Diraq have shown that a silicon quantum dot processor can work well at the relatively toasty temperature of 1 Kelvin, up from the usual milliKelvin that these processors normally operate at.

The work was done on a two-qubit prototype made with materials that were specifically chosen to improve noise tolerance; the experimental procedure was also optimized to limit errors. The team then performed normal operations starting at 0.1 K, and gradually ramped up the temperatures to 1.5 K, checking performance as they did so. They found that a major source of errors, state preparation and measurement (SPAM), didn’t change dramatically in this temperature range: “SPAM around 1 K is comparable to that at millikelvin temperatures and remains workable at least until 1.4 K.”

The error rates they did see depended on the state they were preparing. One particular state (both spin-up) had a fidelity of over 99 percent, while the rest were less constrained, at somewhere above 95 percent. States had a lifetime of over a millisecond, which qualifies as long-lived int he quantum world.

All of which is pretty good, and suggests that the chips can tolerate reasonable operating temperatures, meaning on-chip control circuitry can be used without causing problems. The error rates of the hardware qubits are still well above those that would be needed for error correction to work. However, the researchers suggest that they’ve identified error processes that can potentially be compensated for. They expect that the ability to do industrial-scale manufacturing will ultimately lead to working hardware.

Quantum computing progress: Higher temps, better error correction Read More »

alternate-qubit-design-does-error-correction-in-hardware

Alternate qubit design does error correction in hardware

We can fix that —

Early-stage technology has the potential to cut qubits needed for useful computers.

Image of a complicated set of wires and cables hooked up to copper colored metal hardware.

Nord Quantique

There’s a general consensus that performing any sort of complex algorithm on quantum hardware will have to wait for the arrival of error-corrected qubits. Individual qubits are too error-prone to be trusted for complex calculations, so quantum information will need to be distributed across multiple qubits, allowing monitoring for errors and intervention when they occur.

But most ways of making these “logical qubits” needed for error correction require anywhere from dozens to over a hundred individual hardware qubits. This means we’ll need anywhere from tens of thousands to millions of hardware qubits to do calculations. Existing hardware has only cleared the 1,000-qubit mark within the last month, so that future appears to be several years off at best.

But on Thursday, a company called Nord Quantique announced that it had demonstrated error correction using a single qubit with a distinct hardware design. While this has the potential to greatly reduce the number of hardware qubits needed for useful error correction, the demonstration involved a single qubit—the company doesn’t even expect to demonstrate operations on pairs of qubits until later this year.

Meet the bosonic qubit

The technology underlying this work is termed a bosonic qubit, and they’re not anything new; an optical instrument company even has a product listing for them that notes their potential for use in error correction. But while the concepts behind using them in this manner were well established, demonstrations were lagging. Nord Quantique has now posted a paper in the arXiv that details a demonstration of them actually lowering error rates.

The devices are structured much like a transmon, the form of qubit favored by tech heavyweights like IBM and Google. There, the quantum information is stored in a loop of superconducting wire and is controlled by what’s called a microwave resonator—a small bit of material where microwave photons will reflect back and forth for a while before being lost.

A bosonic qubit turns that situation on its head. In this hardware, the quantum information is held in the photons, while the superconducting wire and resonator control the system. These are both hooked up to a coaxial cavity (think of a structure that, while microscopic, looks a bit like the end of a cable connector).

Massively simplified, the quantum information is stored in the manner in which the photons in the cavity interact. The state of the photons can be monitored by the linked resonator/superconducting wire. If something appears to be off, the resonator/superconducting wire allows interventions to be made to restore the original state. Additional qubits are not needed. “A very simple and basic idea behind quantum error correction is redundancy,” co-founder and CTO Julien Camirand Lemyre told Ars. “One thing about resonators and oscillators in superconducting circuits is that you can put a lot of photons inside the resonators. And for us, the redundancy comes from there.”

This process doesn’t correct all possible errors, so it doesn’t eliminate the need for logical qubits made from multiple underlying hardware qubits. In theory, though, you can catch the two most common forms of errors that qubits are prone to (bit flips and changes in phase).

In the arXiv preprint, the team at Nord Quantique demonstrated that the system works. Using a single qubit and simply measuring whether it holds onto its original state, the error correction system can reduce problems by 14 percent. Unfortunately, overall fidelity is also low, starting at about 85 percent, which is significantly below what’s seen in other systems that have been through years of development work. Some qubits have been demonstrated with a fidelity of over 99 percent.

Getting competitive

So there’s no question that Nord Quantique is well behind a number of the leaders in quantum computing that can perform (error-prone) calculations with dozens of qubits and have far lower error rates. Again, Nord Quantique’s work was done using a single qubit—and without doing any of the operations needed to perform a calculation.

Lemyre told Ars that while the company is small, it benefits from being a spin-out of the Institut Quantique at Canada’s Sherbrooke University, one of Canada’s leading quantum research centers. In addition to having access to the expertise there, Nord Quantique uses a fabrication facility at Sherbrooke to make its hardware.

Over the next year, the company expects to demonstrate that the error correction scheme can function while pairs of qubits are used to perform gate operations, the fundamental units of calculations. Another high priority is to combine this hardware-based error correction with more traditional logical qubit schemes, which would allow additional types of errors to be caught and corrected. This would involve operations with a dozen or more of these bosonic qubits at a time.

But the real challenge will be in the longer term. The company is counting on its hardware’s ability to handle error correction to reduce the number of qubits needed for useful calculations. But if its competitors can scale up the number of qubits fast enough while maintaining the control and error rates needed, that may not ultimately matter. Put differently, if Nord Quantique is still in the hundreds of qubit range by the time other companies are in the hundreds of thousands, its technology might not succeed even if it has some inherent advantages.

But that’s the fun part about the field as things stand: We don’t really know. A handful of very different technologies are already well into development and show some promise. And there are other sets that are still early in the development process but are thought to have a smoother path to scaling to useful numbers of qubits. All of them will have to scale to a minimum of tens of thousands of qubits while enabling the ability to perform quantum manipulations that were cutting-edge science just a few decades ago.

Looming in the background is the simple fact that we’ve never tried to scale anything like this to the extent that will be needed. Unforeseen technical hurdles might limit progress at some point in the future.

Despite all this, there are people backing each of these technologies who know far more about quantum mechanics than I ever will. It’s a fun time.

Alternate qubit design does error correction in hardware Read More »

what-do-threads,-mastodon,-and-hospital-records-have-in-common?

What do Threads, Mastodon, and hospital records have in common?

A medical technician looks at a scan on a computer monitor.

It’s taken a while, but social media platforms now know that people prefer their information kept away from corporate eyes and malevolent algorithms. That’s why the newest generation of social media sites like Threads, Mastodon, and Bluesky boast of being part of the “fediverse.” Here, user data is hosted on independent servers rather than one corporate silo. Platforms then use common standards to share information when needed. If one server starts to host too many harmful accounts, other servers can choose to block it.

They’re not the only ones embracing this approach. Medical researchers think a similar strategy could help them train machine learning to spot disease trends in patients. Putting their AI algorithms on special servers within hospitals for “federated learning” could keep privacy standards high while letting researchers unravel new ways to detect and treat diseases.

“The use of AI is just exploding in all facets of life,” said Ronald M. Summers of the National Institutes of Health Clinical Center in Maryland, who uses the method in his radiology research. “There’s a lot of people interested in using federated learning for a variety of different data analysis applications.”

How does it work?

Until now, medical researchers refined their AI algorithms using a few carefully curated databases, usually anonymized medical information from patients taking part in clinical studies.

However, improving these models further means they need a larger dataset with real-world patient information. Researchers could pool data from several hospitals into one database, but that means asking them to hand over sensitive and highly regulated information. Sending patient information outside a hospital’s firewall is a big risk, so getting permission can be a long and legally complicated process. National privacy laws and the EU’s GDPR law set strict rules on sharing a patient’s personal information.

So instead, medical researchers are sending their AI model to hospitals so it can analyze a dataset while staying within the hospital’s firewall.

Typically, doctors first identify eligible patients for a study, select any clinical data they need for training, confirm its accuracy, and then organize it on a local database. The database is then placed onto a server at the hospital that is linked to the federated learning AI software. Once the software receives instructions from the researchers, it can work its AI magic, training itself with the hospital’s local data to find specific disease trends.

Every so often, this trained model is then sent back to a central server, where it joins models from other hospitals. An aggregation method processes these trained models to update the original model. For example, Google’s popular FedAvg aggregation algorithm takes each element of the trained models’ parameters and creates an average. Each average becomes part of the model update, with their input to the aggregate model weighted proportionally to the size of their training dataset.

In other words, how these models change gets aggregated in the central server to create an updated “consensus model.” This consensus model is then sent back to each hospital’s local database to be trained once again. The cycle continues until researchers judge the final consensus model to be accurate enough. (There’s a review of this process available.)

This keeps both sides happy. For hospitals, it helps preserve privacy since information sent back to the central server is anonymous; personal information never crosses the hospital’s firewall. It also means machine/AI learning can reach its full potential by training on real-world data so researchers get less biased results that are more likely to be sensitive to niche diseases.

Over the past few years, there has been a boom in research using this method. For example, in 2021, Summers and others used federated learning to see whether they could predict diabetes from CT scans of abdomens.

“We found that there were signatures of diabetes on the CT scanner [for] the pancreas that preceded the diagnosis of diabetes by as much as seven years,” said Summers. “That got us very excited that we might be able to help patients that are at risk.”

What do Threads, Mastodon, and hospital records have in common? Read More »

if-ai-is-making-the-turing-test-obsolete,-what-might-be-better?

If AI is making the Turing test obsolete, what might be better?

A white android sitting at a table in a depressed manner with an alchoholic drink. Very high resolution 3D render.

If a machine or an AI program matches or surpasses human intelligence, does that mean it can simulate humans perfectly? If yes, then what about reasoning—our ability to apply logic and think rationally before making decisions? How could we even identify whether an AI program can reason? To try to answer this question, a team of researchers has proposed a novel framework that works like a psychological study for software.

“This test treats an ‘intelligent’ program as though it were a participant in a psychological study and has three steps: (a) test the program in a set of experiments examining its inferences, (b) test its understanding of its own way of reasoning, and (c) examine, if possible, the cognitive adequacy of the source code for the program,” the researchers note.

They suggest the standard methods of evaluating a machine’s intelligence, such as the Turing Test, can only tell you if the machine is good at processing information and mimicking human responses. The current generations of AI programs, such as Google’s LaMDA and OpenAI’s ChatGPT, for example, have come close to passing the Turing Test, yet the test results don’t imply these programs can think and reason like humans.

This is why the Turing Test may no longer be relevant, and there is a need for new evaluation methods that could effectively assess the intelligence of machines, according to the researchers. They claim that their framework could be an alternative to the Turing Test. “We propose to replace the Turing test with a more focused and fundamental one to answer the question: do programs reason in the way that humans reason?” the study authors argue.

What’s wrong with the Turing Test?

During the Turing Test, evaluators play different games involving text-based communications with real humans and AI programs (machines or chatbots). It is a blind test, so evaluators don’t know whether they are texting with a human or a chatbot. If the AI programs are successful in generating human-like responses—to the extent that evaluators struggle to distinguish between the human and the AI program—the AI is considered to have passed. However, since the Turing Test is based on subjective interpretation, these results are also subjective.

The researchers suggest that there are several limitations associated with the Turing Test. For instance, any of the games played during the test are imitation games designed to test whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they receive. ChatGPT is great at mimicking human language, even in responses where it gives out incorrect information. So, the test clearly doesn’t evaluate a machine’s reasoning and logical ability.

The results of the Turing Test also can’t tell you if a machine can introspect. We often think about our past actions and reflect on our lives and decisions, a critical ability that prevents us from repeating the same mistakes. The same applies to AI as well, according to a study from Stanford University which suggests that machines that could self-reflect are more practical for human use.

“AI agents that can leverage prior experience and adapt well by efficiently exploring new or changing environments will lead to much more adaptive, flexible technologies, from household robotics to personalized learning tools,” Nick Haber, an assistant professor from Stanford University who was not involved in the current study, said.

In addition to this, the Turing Test fails to analyze an AI program’s ability to think. In a recent Turing Test experiment, GPT-4 was able to convince evaluators that they were texting with humans over 40 percent of the time. However, this score fails to answer the basic question: Can the AI program think?

Alan Turing, the famous British scientist who created the Turing Test, once said, “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.” His test only covers one aspect of human intelligence, though: imitation. Although it is possible to deceive someone using this one aspect, many experts believe that a machine can never achieve true human intelligence without including those other aspects.

“It’s unclear whether passing the Turing Test is a meaningful milestone or not. It doesn’t tell us anything about what a system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence,” Mustafa Suleyman, an AI expert and founder of DeepAI, told Bloomberg.

If AI is making the Turing test obsolete, what might be better? Read More »