Science

why-solving-crosswords-is-like-a-phase-transition

Why solving crosswords is like a phase transition

There’s also the more recent concept of “explosive percolation,” whereby connectivity emerges not in a slow, continuous process but quite suddenly, simply by replacing the random node connections with predetermined criteria—say, choosing to connect whichever pair of nodes has the fewest pre-existing connections to other nodes. This introduces bias into the system and suppresses the growth of large dominant clusters. Instead, many large unconnected clusters grow until the critical threshold is reached. At that point, even adding just one or two more connections will trigger one global violent merger (instant uber-connectivity).

Puzzling over percolation

One might not immediately think of crossword puzzles as a network, although there have been a couple of relevant prior mathematical studies. For instance, John McSweeney of the Rose-Hulman Institute of Technology in Indiana employed a random graph network model for crossword puzzles in 2016. He factored in how a puzzle’s solvability is affected by the interactions between the structure of the puzzle’s cells (squares) and word difficulty, i.e., the fraction of letters you need to know in a given word in order to figure out what it is.

Answers represented nodes while answer crossings represented edges, and McSweeney assigned a random distribution of word difficulty levels to the clues. “This randomness in the clue difficulties is ultimately responsible for the wide variability in the solvability of a puzzle, which many solvers know well—a solver, presented with two puzzles of ostensibly equal difficulty, may solve one readily and be stumped by the other,” he wrote at the time. At some point, there has to be a phase transition, in which solving the easiest words enables the puzzler to solve the more difficult words until the critical threshold is reached and the puzzler can fill in many solutions in rapid succession—a dynamic process that resembles, say, the spread of diseases in social groups.

In this sample realization, sites with black sites are shown in black; empty sites are white; and occupied sites contain symbols and letters.

In this sample realization, black sites are shown in black; empty sites are white; and occupied sites contain symbols and letters. Credit: Alexander K. Hartmann, 2024

Hartmann’s new model incorporates elements of several nonstandard percolation models, including how much the solver benefits from partial knowledge of the answers. Letters correspond to sites (white squares) while words are segments of those sites, bordered by black squares. There is an a priori probability of being able to solve a given word if no letters are known. If some words are solved, the puzzler gains partial knowledge of neighboring unsolved words, which increases the probability of those words being solved as well.

Why solving crosswords is like a phase transition Read More »

it’s-remarkably-easy-to-inject-new-medical-misinformation-into-llms

It’s remarkably easy to inject new medical misinformation into LLMs


Changing just 0.001% of inputs to misinformation makes the AI less accurate.

It’s pretty easy to see the problem here: The Internet is brimming with misinformation, and most large language models are trained on a massive body of text obtained from the Internet.

Ideally, having substantially higher volumes of accurate information might overwhelm the lies. But is that really the case? A new study by researchers at New York University examines how much medical information can be included in a large language model (LLM) training set before it spits out inaccurate answers. While the study doesn’t identify a lower bound, it does show that by the time misinformation accounts for 0.001 percent of the training data, the resulting LLM is compromised.

While the paper is focused on the intentional “poisoning” of an LLM during training, it also has implications for the body of misinformation that’s already online and part of the training set for existing LLMs, as well as the persistence of out-of-date information in validated medical databases.

Sampling poison

Data poisoning is a relatively simple concept. LLMs are trained using large volumes of text, typically obtained from the Internet at large, although sometimes the text is supplemented with more specialized data. By injecting specific information into this training set, it’s possible to get the resulting LLM to treat that information as a fact when it’s put to use. This can be used for biasing the answers returned.

This doesn’t even require access to the LLM itself; it simply requires placing the desired information somewhere where it will be picked up and incorporated into the training data. And that can be as simple as placing a document on the web. As one manuscript on the topic suggested, “a pharmaceutical company wants to push a particular drug for all kinds of pain which will only need to release a few targeted documents in [the] web.”

Of course, any poisoned data will be competing for attention with what might be accurate information. So, the ability to poison an LLM might depend on the topic. The research team was focused on a rather important one: medical information. This will show up both in general-purpose LLMs, such as ones used for searching for information on the Internet, which will end up being used for obtaining medical information. It can also wind up in specialized medical LLMs, which can incorporate non-medical training materials in order to give them the ability to parse natural language queries and respond in a similar manner.

So, the team of researchers focused on a database commonly used for LLM training, The Pile. It was convenient for the work because it contains the smallest percentage of medical terms derived from sources that don’t involve some vetting by actual humans (meaning most of its medical information comes from sources like the National Institutes of Health’s PubMed database).

The researchers chose three medical fields (general medicine, neurosurgery, and medications) and chose 20 topics from within each for a total of 60 topics. Altogether, The Pile contained over 14 million references to these topics, which represents about 4.5 percent of all the documents within it. Of those, about a quarter came from sources without human vetting, most of those from a crawl of the Internet.

The researchers then set out to poison The Pile.

Finding the floor

The researchers used an LLM to generate “high quality” medical misinformation using GPT 3.5. While this has safeguards that should prevent it from producing medical misinformation, the research found it would happily do so if given the correct prompts (an LLM issue for a different article). The resulting articles could then be inserted into The Pile. Modified versions of The Pile were generated where either 0.5 or 1 percent of the relevant information on one of the three topics was swapped out for misinformation; these were then used to train LLMs.

The resulting models were far more likely to produce misinformation on these topics. But the misinformation also impacted other medical topics. “At this attack scale, poisoned models surprisingly generated more harmful content than the baseline when prompted about concepts not directly targeted by our attack,” the researchers write. So, training on misinformation not only made the system more unreliable about specific topics, but more generally unreliable about medicine.

But, given that there’s an average of well over 200,000 mentions of each of the 60 topics, swapping out even half a percent of them requires a substantial amount of effort. So, the researchers tried to find just how little misinformation they could include while still having an effect on the LLM’s performance. Unfortunately, this didn’t really work out.

Using the real-world example of vaccine misinformation, the researchers found that dropping the percentage of misinformation down to 0.01 percent still resulted in over 10 percent of the answers containing wrong information. Going for 0.001 percent still led to over 7 percent of the answers being harmful.

“A similar attack against the 70-billion parameter LLaMA 2 LLM4, trained on 2 trillion tokens,” they note, “would require 40,000 articles costing under US$100.00 to generate.” The “articles” themselves could just be run-of-the-mill webpages. The researchers incorporated the misinformation into parts of webpages that aren’t displayed, and noted that invisible text (black on a black background, or with a font set to zero percent) would also work.

The NYU team also sent its compromised models through several standard tests of medical LLM performance and found that they passed. “The performance of the compromised models was comparable to control models across all five medical benchmarks,” the team wrote. So there’s no easy way to detect the poisoning.

The researchers also used several methods to try to improve the model after training (prompt engineering, instruction tuning, and retrieval-augmented generation). None of these improved matters.

Existing misinformation

Not all is hopeless. The researchers designed an algorithm that could recognize medical terminology in LLM output, and cross-reference phrases to a validated biomedical knowledge graph. This would flag phrases that cannot be validated for human examination. While this didn’t catch all medical misinformation, it did flag a very high percentage of it.

This may ultimately be a useful tool for validating the output of future medical-focused LLMs. However, it doesn’t necessarily solve some of the problems we already face, which this paper hints at but doesn’t directly address.

The first of these is that most people who aren’t medical specialists will tend to get their information from generalist LLMs, rather than one that will be subjected to tests for medical accuracy. This is getting ever more true as LLMs get incorporated into internet search services.

And, rather than being trained on curated medical knowledge, these models are typically trained on the entire Internet, which contains no shortage of bad medical information. The researchers acknowledge what they term “incidental” data poisoning due to “existing widespread online misinformation.” But a lot of that “incidental” information was generally produced intentionally, as part of a medical scam or to further a political agenda. Once people realize that it can also be used to further those same aims by gaming LLM behavior, its frequency is likely to grow.

Finally, the team notes that even the best human-curated data sources, like PubMed, also suffer from a misinformation problem. The medical research literature is filled with promising-looking ideas that never panned out, and out-of-date treatments and tests that have been replaced by approaches more solidly based on evidence. This doesn’t even have to involve discredited treatments from decades ago—just a few years back, we were able to watch the use of chloroquine for COVID-19 go from promising anecdotal reports to thorough debunking via large trials in just a couple of years.

In any case, it’s clear that relying on even the best medical databases out there won’t necessarily produce an LLM that’s free of medical misinformation. Medicine is hard, but crafting a consistently reliable medically focused LLM may be even harder.

Nature Medicine, 2025. DOI: 10.1038/s41591-024-03445-1  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

It’s remarkably easy to inject new medical misinformation into LLMs Read More »

china-is-having-standard-flu-season-despite-widespread-hmpv-fears

China is having standard flu season despite widespread HMPV fears

There’s a good chance you’ve seen headlines about HMPV recently, with some touting “what you need to know” about the virus, aka human metapneumovirus. The answer is: not much.

It’s a common, usually mild respiratory virus that circulates every year, blending into the throng of other seasonal respiratory illnesses that are often indistinguishable from one another. (The pack includes influenza virus, respiratory syncytial virus (RSV), adenovirus, parainfluenza virus, common human coronaviruses, bocavirus, rhinovirus, enteroviruses, and Mycoplasma pneumoniae, among others.) HMPV is in the same family of viruses as RSV.

As one viral disease epidemiologist at the US Centers for Disease Control summarized in 2016, it’s usually “clinically indistinguishable” from other bog-standard respiratory illnesses, like seasonal flu, that cause cough, fever, and nasal congestion. For most, the infection is crummy but not worth a visit to a doctor. As such, testing for it is limited. But, like other common respiratory infections, it can be dangerous for children under age 5, older adults, and those with compromised immune systems. It was first identified in 2001, but it has likely been circulating since at least 1958.

The situation in China

The explosion of interest in HMPV comes after reports of a spike of HMPV infections in China, which allegedly led to hordes of masked patients filling hospitals. But none of that appears to be accurate. While HMPV infections have risen, the increase is not unusual for the respiratory illness season. Further, HMPV is not the leading cause of respiratory illnesses in China right now; the leading cause is seasonal flu. And the surge in seasonal flu is also within the usual levels seen at this time of year in China.

Last week, the Chinese Center for Disease Control and Prevention released its sentinel respiratory illness surveillance data collected in the last week of December. It included the test results of respiratory samples taken from outpatients. Of those, 30 percent were positive for flu (the largest share), a jump of about 6 percent from the previous week (the largest jump). Only 6 percent were positive for HMPV, which was about the same detection rate as in the previous week (there was a 0.1 percent increase).

China is having standard flu season despite widespread HMPV fears Read More »

nasa-defers-decision-on-mars-sample-return-to-the-trump-administration

NASA defers decision on Mars Sample Return to the Trump administration


“We want to have the quickest, cheapest way to get these 30 samples back.”

This photo montage shows sample tubes shortly after they were deposited onto the surface by NASA’s Perseverance Mars rover in late 2022 and early 2023. Credit: NASA/JPL-Caltech/MSSS

For nearly four years, NASA’s Perseverance rover has journeyed across an unexplored patch of land on Mars—once home to an ancient river delta—and collected a slew of rock samples sealed inside cigar-sized titanium tubes.

These tubes might contain tantalizing clues about past life on Mars, but NASA’s ever-changing plans to bring them back to Earth are still unclear.

On Tuesday, NASA officials presented two options for retrieving and returning the samples gathered by the Perseverance rover. One alternative involves a conventional architecture reminiscent of past NASA Mars missions, relying on the “sky crane” landing system demonstrated on the agency’s two most recent Mars rovers. The other option would be to outsource the lander to the space industry.

NASA Administrator Bill Nelson left a final decision on a new mission architecture to the next NASA administrator working under the incoming Trump administration. President-elect Donald Trump nominated entrepreneur and commercial astronaut Jared Isaacman as the agency’s 15th administrator last month.

“This is going to be a function of the new administration in order to fund this,” said Nelson, a former Democratic senator from Florida who will step down from the top job at NASA on January 20.

The question now is: will they? And if the Trump administration moves forward with Mars Sample Return (MSR), what will it look like? Could it involve a human mission to Mars instead of a series of robotic spacecraft?

The Trump White House is expected to emphasize “results and speed” with NASA’s space programs, with the goal of accelerating a crew landing on the Moon and sending people to explore Mars.

NASA officials had an earlier plan to bring the Mars samples back to Earth, but the program slammed into a budgetary roadblock last year when an independent review team concluded the existing architecture would cost up to $11 billion—double the previous cost projectionand wouldn’t get the Mars specimens back to Earth until 2040.

This budget and schedule were non-starters for NASA. The agency tasked government labs, research institutions, and commercial companies to come up with better ideas to bring home the roughly 30 sealed sample tubes carried aboard the Perseverance rover. NASA deposited 10 sealed tubes on the surface of Mars a couple of years ago as insurance in case Perseverance dies before the arrival of a retrieval mission.

“We want to have the quickest, cheapest way to get these 30 samples back,” Nelson said.

How much for these rocks?

NASA officials said they believe a stripped-down concept proposed by the Jet Propulsion Laboratory in Southern California, which previously was in charge of the over-budget Mars Sample Return mission architecture, would cost between $6.6 billion and $7.7 billion, according to Nelson. JPL’s previous approach would have put a heavier lander onto the Martian surface, with small helicopter drones that could pick up sample tubes if there were problems with the Perseverance rover.

NASA previously deleted a “fetch rover” from the MSR architecture and instead will rely on Perseverance to hand off sample tubes to the retrieval lander.

An alternative approach would use a (presumably less expensive) commercial heavy lander, but this concept would still utilize several elements NASA would likely develop in a more traditional government-led manner: a nuclear power source, a robotic arm, a sample container, and a rocket to launch the samples off the surface of Mars and back into space. The cost range for this approach extends from $5.1 billion to $7.1 billion.

Artist’s illustration of SpaceX’s Starship approaching Mars. Credit: SpaceX

JPL will have a “key role” in both paths for MSR, said Nicky Fox, head of NASA’s science mission directorate. “To put it really bluntly, JPL is our Mars center in NASA science.”

If the Trump administration moves forward with either of the proposed MSR plans, this would be welcome news for JPL. The center, which is run by the California Institute of Technology under contract to NASA, laid off 955 employees and contractors last year, citing budget uncertainty, primarily due to the cloudy future of Mars Sample Return.

Without MSR, engineers at the Jet Propulsion Laboratory don’t have a flagship-class mission to build after the launch of NASA’s Europa Clipper spacecraft last year. The lab recently struggled with rising costs and delays with the previous iteration of MSR and NASA’s Psyche asteroid mission, and it’s not unwise to anticipate more cost overruns on a project as complex as a round-trip flight to Mars.

Ars submitted multiple requests to interview Laurie Leshin, JPL’s director, in recent months to discuss the lab’s future, but her staff declined.

Both MSR mission concepts outlined Tuesday would require multiple launches and an Earth return orbiter provided by the European Space Agency. These options would bring the Mars samples back to Earth as soon as 2035, but perhaps as late as 2039, Nelson said. The return orbiter and sample retrieval lander could launch as soon as 2030 and 2031, respectively.

“The main difference is in the landing mechanism,” Fox said.

To keep those launch schedules, Congress must immediately approve $300 million for Mars Sample Return in this year’s budget, Nelson said.

NASA officials didn’t identify any examples of a commercial heavy lander that could reach Mars, but the most obvious vehicle is SpaceX’s Starship. NASA already has a contract with SpaceX to develop a Starship vehicle that can land on the Moon, and SpaceX founder Elon Musk is aggressively pushing for a Mars mission with Starship as soon as possible.

NASA solicited eight studies from industry earlier this year. SpaceX, Blue Origin, Rocket Lab, and Lockheed Martin—each with their own lander concepts—were among the companies that won NASA study contracts. SpaceX and Blue Origin are well-capitalized with Musk and Amazon’s Jeff Bezos as owners, while Lockheed Martin is the only company to have built a lander that successfully reached Mars.

This slide from a November presentation to the Mars Exploration Program Analysis Group shows JPL’s proposed “sky crane” architecture for a Mars sample retrieval lander. The landing system would be modified to handle a load about 20 percent heavier than the sky crane used for the Curiosity and Perseverance rover landings. Credit: NASA/JPL

The science community has long identified a Mars Sample Return mission as the top priority for NASA’s planetary science program. In the National Academies’ most recent decadal survey released in 2022, a panel of researchers recommended NASA continue with the MSR program but stated the program’s cost should not undermine other planetary science missions.

Teeing up for cancellation?

That’s exactly what is happening. Budget pressures from the Mars Sample Return mission, coupled with funding cuts stemming from a bipartisan federal budget deal in 2023, have prompted NASA’s planetary science division to institute a moratorium on starting new missions.

“The decision about Mars Sample Return is not just one that affects Mars exploration,” said Curt Niebur, NASA’s lead scientist for planetary flight programs, in a question-and-answer session with solar system researchers Tuesday. “It’s going to affect planetary science and the planetary science division for the foreseeable future. So I think the entire science community should be very tuned in to this.”

Rocket Lab, which has been more open about its MSR architecture than other companies, has posted details of its sample return concept on its website. Fox declined to offer details on other commercial concepts for MSR, citing proprietary concerns.

“We can wait another year, or we can get started now,” Rocket Lab posted on X. “Our Mars Sample Return architecture will put Martian samples in the hands of scientists faster and more affordably. Less than $4 billion, with samples returned as early as 2031.”

Through its own internal development and acquisitions of other aerospace industry suppliers, Rocket Lab said it has provided components for all of NASA’s recent Mars missions. “We can deliver MSR mission success too,” the company said.

Rocket Lab’s concept for a Mars Sample Return mission. Credit: Rocket Lab

Although NASA’s deferral of a decision on MSR to the next administration might convey a lack of urgency, officials said the agency and potential commercial partners need time to assess what roles the industry might play in the MSR mission.

“They need to flesh out all of the possibilities of what’s required in the engineering for the commercial option,” Nelson said.

On the program’s current trajectory, Fox said NASA would be able to choose a new MSR architecture in mid-2026.

Waiting, rather than deciding on an MSR plan now, will also allow time for the next NASA administrator and the Trump White House to determine whether either option aligns with the administration’s goals for space exploration. In an interview with Ars last week, Nelson said he did not want to “put the new administration in a box” with any significant MSR decisions in the waning days of the Biden administration.

One source with experience in crafting and implementing US space policy told Ars that Nelson’s deferral on a decision will “tee up MSR for canceling.” Faced with a decision to spend billions of dollars on a robotic sample return or billions of dollars to go toward a human mission to Mars, the Trump administration will likely choose the latter, the source said.

If that happens, NASA science funding could be freed up for other pursuits in planetary science. The second priority identified in the most recent planetary decadal survey is an orbiter and atmospheric probe to explore Uranus and its icy moons. NASA has held off on the development of a Uranus mission to focus on the Mars Sample Return first.

Science and geopolitics

Whether it’s with robots or humans, there’s a strong case for bringing pristine Mars samples back to Earth. The titanium tubes carried by the Perseverance rover contain rock cores, loose soil, and air samples from the Martian atmosphere.

“Bringing them back will revolutionize our understanding of the planet Mars and indeed, our place in the solar system,” Fox said. “We explore Mars as part of our ongoing efforts to safely send humans to explore farther and farther into the solar system, while also … getting to the bottom of whether Mars once supported ancient life and shedding light on the early solar system.”

Researchers can perform more detailed examinations of Mars specimens in sophisticated laboratories on Earth than possible with the miniature instruments delivered to the red planet on a spacecraft. Analyzing samples in a terrestrial lab might reveal biosignatures, or the traces of ancient life, that elude detection with instruments on Mars.

“The samples that we have taken by Perseverance actually predate—they are older than any of the samples or rocks that we could take here on Earth,” Fox said. “So it allows us to kind of investigate what the early solar system was like before life began here on Earth, which is amazing.”

Fox said returning Mars samples before a human expedition would help NASA prioritize where astronauts should land on the red planet.

In a statement, the Planetary Society said it is “concerned that NASA is again delaying a decision on the program, committing only to additional concept studies.”

“It has been more than two years since NASA paused work on MSR,” the Planetary Society said. “It is time to commit to a path forward to ensure the return of the samples already being collected by the Perseverance rover.

“We urge the incoming Trump administration to expedite a decision on a path forward for this ambitious project, and for Congress to provide the funding necessary to ensure the return of these priceless samples from the Martian surface.”

China says it is developing its own mission to bring Mars rocks back to Earth. Named Tianwen-3, the mission could launch as soon as 2028 and return samples to Earth by 2031. While NASA’s plan would bring back carefully curated samples from an expansive environment that may have once harbored life, China’s mission will scoop up rocks and soil near its landing site.

“They’re just going to have a mission to grab and go—go to a landing site of their choosing, grab a sample and go,” Nelson said. “That does not give you a comprehensive look for the scientific community. So you cannot compare the two missions. Now, will people say that there’s a race? Of course, people will say that, but it’s two totally different missions.”

Still, Nelson said he wants NASA to be first. He said he has not had detailed conversations with Trump’s NASA transition team.

“I think it was a responsible thing to do, not to hand the new administration just one alternative if they want to have a Mars Sample Return,” Nelson said. “I can’t imagine that they don’t. I don’t think we want the only sample return coming back on a Chinese spacecraft.”

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

NASA defers decision on Mars Sample Return to the Trump administration Read More »

as-us-marks-first-h5n1-bird-flu-death,-who-and-cdc-say-risk-remains-low

As US marks first H5N1 bird flu death, WHO and CDC say risk remains low

The H5N1 bird flu situation in the US seems more fraught than ever this week as the virus continues to spread swiftly in dairy cattle and birds while sporadically jumping to humans.

On Monday, officials in Louisiana announced that the person who had developed the country’s first severe H5N1 infection had died of the infection, marking the country’s first H5N1 death. Meanwhile, with no signs of H5N1 slowing, seasonal flu is skyrocketing, raising anxiety that the different flu viruses could mingle, swap genetic elements, and generate a yet more dangerous virus strain.

But, despite the seemingly fever-pitch of viral activity and fears, a representative for the World Health Organization today noted that risk to the general population remains low—as long as one critical factor remains absent: person-to-person spread.

“We are concerned, of course, but we look at the risk to the general population and, as I said, it still remains low,” WHO spokesperson Margaret Harris told reporters at a Geneva press briefing Tuesday in response to questions related to the US death. In terms of updating risk assessments, you have to look at how the virus behaved in that patient and if it jumped from one person to another person, which it didn’t, Harris explained. “At the moment, we’re not seeing behavior that’s changing our risk assessment,” she added.

In a statement on the death late Monday, the US Centers for Disease Control and Prevention emphasized that no human-to-human transmission has been identified in the US. To date, there have been 66 documented human cases of H5N1 infections since the start of 2024. Of those, 40 were linked to exposure to infected dairy cows, 23 were linked to infected poultry, two had no clear source, and one case—the fatal case in Louisiana—was linked to exposure to infected backyard and wild birds.

As US marks first H5N1 bird flu death, WHO and CDC say risk remains low Read More »

science-paper-piracy-site-sci-hub-shares-lots-of-retracted-papers

Science paper piracy site Sci-Hub shares lots of retracted papers

Most scientific literature is published in for-profit journals that rely on subscriptions and paywalls to turn a profit. But that trend has been shifting as various governments and funding agencies are requiring that the science they fund be published in open-access journals. The transition is happening gradually, though, and a lot of the historical literature remains locked behind paywalls.

These paywalls can pose a problem for researchers who aren’t at well-funded universities, including many in the Global South, which may not be able to access the research they need to understand in order to pursue their own studies. One solution has been Sci-Hub, a site where people can upload PDFs of published papers so they can be shared with anyone who can access the site. Despite losses in publishing industry lawsuits and attempts to block access, Sci-Hub continues to serve up research papers that would otherwise be protected by paywalls.

But what it’s serving up may not always be the latest and greatest. Generally, when a paper is retracted for being invalid, publishers issue an updated version of its PDF with clear indications that the research it contains should no longer be considered valid. Unfortunately, it appears that once Sci-Hub has a copy of a paper, it doesn’t necessarily have the ability to ensure it’s kept up to date. Based on a scan of its content done by researchers from India, about 85 percent of the invalid papers they checked had no indication that the paper had been retracted.

Correcting the scientific record

Scientific results go wrong for all sorts of reasons, from outright fraud to honest mistakes. If the problems don’t invalidate the overall conclusions of a paper, it’s possible to update the paper with a correction. If the problems are systemic enough to undermine the results, however, the paper is typically retracted—in essence, it should be treated as if it were never published in the first place.

It doesn’t always work out that way, however. Maybe people ignore the notifications that something has been retracted, or maybe they downloaded a copy of the paper before it got retracted and never saw the notifications at all, but citations to retracted papers regularly appear in the scientific record. Over the long term, this can distort our big-picture view of science, leading to wasted effort and misallocated resources.

Science paper piracy site Sci-Hub shares lots of retracted papers Read More »

ants-vs.-humans:-solving-the-piano-mover-puzzle

Ants vs. humans: Solving the piano-mover puzzle

Who is better at maneuvering a large load through a maze, ants or humans?

The piano-mover puzzle involves trying to transport an oddly shaped load across a constricted environment with various obstructions. It’s one of several variations on classic computational motion-planning problems, a key element in numerous robotics applications. But what would happen if you pitted human beings against ants in a competition to solve the piano-mover puzzle?

According to a paper published in the Proceedings of the National Academy of Sciences, humans have superior cognitive abilities and, hence, would be expected to outperform the ants. However, depriving people of verbal or nonverbal communication can level the playing field, with ants performing better in some trials. And while ants improved their cognitive performance when acting collectively as a group, the same did not hold true for humans.

Co-author Ofer Feinerman of the Weizmann Institute of Science and colleagues saw an opportunity to use the piano-mover puzzle to shed light on group decision-making, as well as the question of whether it is better to cooperate as a group or maintain individuality. “It allows us to compare problem-solving skills and performances across group sizes and down to a single individual and also enables a comparison of collective problem-solving across species,” the authors wrote.

They decided to compare the performances of ants and humans because both species are social and can cooperate while transporting loads larger than themselves. In essence, “people stand out for individual cognitive abilities while ants excel in cooperation,” the authors wrote.

Feinerman et al. used crazy ants (Paratrechina longicornis) for their experiments, along with the human volunteers. They designed a physical version of the piano-movers puzzle involving a large t-shaped load that had to be maneuvered across a rectangular area divided into three chambers, connected via narrow slits. The load started in the first chamber on the left, and the ant and human subjects had to figure out how to transport it through the second chamber and into the third.

Ants vs. humans: Solving the piano-mover puzzle Read More »

controversial-fluoride-analysis-published-after-years-of-failed-reviews

Controversial fluoride analysis published after years of failed reviews


70 percent of studies included in the meta-analysis had a high risk of bias.

Federal toxicology researchers on Monday finally published a long-controversial analysis that claims to find a link between high levels of fluoride exposure and slightly lower IQs in children living in areas outside the US, mostly in China and India. As expected, it immediately drew yet more controversy.

The study, published in JAMA Pediatrics, is a meta-analysis, a type of study that combines data from many different studies—in this case, mostly low-quality studies—to come up with new results. None of the data included in the analysis is from the US, and the fluoride levels examined are at least double the level recommended for municipal water in the US. In some places in the world, fluoride is naturally present in water, such as parts of China, and can reach concentrations several-fold higher than fluoridated water in the US.

The authors of the analysis are researchers at the National Toxicology Program at the National Institute of Environmental Health Sciences. For context, this is the same federal research program that published a dubious analysis in 2016 suggesting that cell phones cause cancer in rats. The study underwent a suspicious peer-review process and contained questionable methods and statistics.

The new fluoride analysis shares similarities. NTP researchers have been working on the fluoride study since 2015 and submitted two drafts for peer review to an independent panel of experts at the National Academies of Sciences, Engineering, and Medicine in 2020 and 2021. The study failed its review both times. The National Academies’ reviews found fault with the methods and statistical rigor of the analysis. Specifically, the reviews noted potential bias in the selection of the studies included in the analysis, inconsistent application of risk-of-bias criteria, lack of data transparency, insufficient evaluations of potential confounding, and flawed measures of neurodevelopmental outcomes, among other problems.

After the failing reviews, the NTP selected its own reviewers and self-published the study as a monograph in August.

High risk of bias

The related analysis published Monday looked at data from 74 human studies, 45 of which were conducted in China and 12 in India. Of the 74, 52 were rated as having a high risk of bias, meaning they had designs, study methods, or statistical approaches that could skew the results.

The study’s primary meta-analysis only included 59 of the studies: 47 with a high risk of bias and 12 with a low risk. This analysis looked at standardized mean differences in children’s IQ between higher and lower fluoride exposure groups. Of the 59 studies, 41 were from China.

Among the 47 studies with a high risk of bias, the pooled difference in mean IQ scores between the higher-exposure groups and lower-exposure groups was -0.52, suggesting that higher fluoride exposure lowered IQs. But, among the 12 studies at low risk for bias, the difference was slight overall, only -0.19. And of those 12 studies, eight found no link between fluoride exposure and IQ at all.

Among 31 studies that reported fluoride levels in water, the NTP authors looked at possible IQ associations at three fluoride-level cutoffs: less than 4 mg/L, less than 2 mg/L, and less than 1.5 mg/L. Among all 31 studies, the researchers found that fluoride exposure levels of less than 4 mg/L and less than 2 mg/L were linked to statistically significant decreases in IQ. However, there was no statistically significant link at 1.5 mg/L. For context, 1.5 mg/L is a little over twice the level of fluoride recommended by the US Environmental Protection Agency for US community water, which is 0.7 mg/L. When the NTP authors looked at just the studies that had a low risk of bias—seven studies—they saw the same lack of association with the 1.5 mg/L cutoff.

The NTP authors also looked at IQ associations in 20 studies that reported urine fluoride levels and again split the analysis using the same fluoride cutoffs as before. While there did appear to be a link with lower IQ at the highest fluoride level, the two lower fluoride levels had borderline statistical significance. Ten of the 20 studies were assessed as having a low risk of bias, and for just those 10, the results were similar to the larger group.

Criticism

The inclusion of urinary fluoride measurements is sure to spark criticism. For years, experts have noted that these measurements are not standardized, can vary by day and time, and are not reflective of a person’s overall fluoride exposure.

In an editorial published alongside the NTP study today, Steven Levy, a public health dentist at the University of Iowa, blasted the new analysis, including the urinary sample measurements.

“There is scientific consensus that the urinary sample collection approaches used in almost all included studies (ie, spot urinary fluoride or a few 24-hour samples, many not adjusted for dilution) are not valid measures of individuals’ long-term fluoride exposure, since fluoride has a short half-life and there is substantial variation within days and from day to day,” Levy wrote.

Overall, Levy reiterated much of the same concerns from the National Academies’ reviews, noting the study’s lack of transparency, the reliance on highly biased studies, questionable statistics, and questionable exclusion of newer, higher-quality studies, which have found no link between water fluoridation and children’s IQ. For instance, one exclusion was a 2023 study out of Australia that found “Exposure to fluoridated water during the first 5 [years] of life was not associated with altered measures of child emotional and behavioral development and executive functioning.” A 2022 study out of Spain similarly found no risk of prenatal exposure.

“Taking these many important concerns together, readers are advised to be very cautious in drawing conclusions about possible associations of fluoride exposures with lower IQ,” Levy concluded. “This is especially true for lower water fluoride levels.”

Another controversial study

But, the debate on water fluoridation is unlikely to recede anytime soon. In a second editorial published alongside the NTP study, other researchers praised the analysis, calling for health organizations and regulators to reassess fluoridation.

“The absence of a statistically significant association of water fluoride less than 1.5 mg/L and children’s IQ scores in the dose-response meta-analysis does not exonerate fluoride as a potential risk for lower IQ scores at levels found in fluoridated communities,” the authors argue, noting there are additional sources of fluoride, such as toothpaste and foods.

The EPA estimates that 40 to 70 percent of people’s fluoride exposure comes from water.

Two of the three authors of the second editorial—Christine Till and Bruce Lanphear—were authors of a highly controversial 2019 study out of Canada suggesting that fluoride intake during pregnancy could reduce children’s IQ. The authors even suggested that pregnant people should reduce their fluoride intake. But, the study, also published in JAMA Pediatrics, only found a link between maternal fluoride levels and IQ in male children. There was no association in females.

The study drew heavy backlash, with blistering responses published in JAMA Pediatrics. In one response, UK researchers essentially accused Till and colleagues of a statistical fishing expedition to find a link.

“[T]here was no significant IQ difference between children from fluoridated and nonfluoridated communities and no overall association with maternal urinary fluoride (MUFSG). The authors did not mention this and instead emphasized the significant sex interaction, where the association appeared for boys but not girls. No theoretical rationale for this test was provided; in the absence of a study preregistration, we cannot know whether it was planned a priori. If not, the false-positive probability increases because there are many potential subgroups that might show the result by chance.”

Other researchers criticized the study’s statistics, lack of data transparency, the use of maternal urine sampling, and the test they used to assess the IQ of children ages 3 and 4.

Photo of Beth Mole

Beth is Ars Technica’s Senior Health Reporter. Beth has a Ph.D. in microbiology from the University of North Carolina at Chapel Hill and attended the Science Communication program at the University of California, Santa Cruz. She specializes in covering infectious diseases, public health, and microbes.

Controversial fluoride analysis published after years of failed reviews Read More »

fast-radio-bursts-originate-near-the-surface-of-stars

Fast radio bursts originate near the surface of stars

One of the two papers published on Wednesday looks at the polarization of the photons in the burst itself, finding that the angle of polarization changes rapidly over the 2.5 milliseconds that FRB 20221022A lasted. The 130-degree rotation that occurred follows an S-shaped pattern, which has already been observed in about half of the pulsars we’ve observed—neutron stars that rotate rapidly and sweep a bright jet across the line of sight with Earth, typically multiple times each second.

The implication of this finding is that the source of the FRB is likely to also be on a compact, rapidly rotating object. Or at least this FRB. As of right now, this is the only FRB that we know displays this sort of behavior. While not all pulsars show this pattern of rotation, half of them do, and we’ve certainly observed enough FRBs we should have picked up others like this if they occurred at an appreciable rate.

Scattered

The second paper performs a far more complicated analysis, searching for indications of interactions between the FRB and the interstellar medium that exists within galaxies. This will have two effects. One, caused by scattering off interstellar material, will spread the burst out over time in a frequency-dependent manner. Scattering can also cause a random brightening/dimming of different areas of the spectrum, called scintillation, and somewhat analogous to the twinkling of stars caused by our atmosphere.

In this case, the photons of the FRB have had three encounters with matter that can induce these effects: the sparse intersteller material of the source galaxy, the equally sparse interstellar material in our own Milky Way, and the even more sparse intergalactic material in between the two. Since the source galaxy for FRB 20221022A is relatively close to our own, the intergalactic medium can be ignored, leaving the detection with two major sources of scattering.

Fast radio bursts originate near the surface of stars Read More »

one-less-thing-to-worry-about-in-2025:-yellowstone-probably-won’t-go-boom

One less thing to worry about in 2025: Yellowstone probably won’t go boom


There’s not enough melted material near the surface to trigger a massive eruption.

It’s difficult to comprehend what 1,000 cubic kilometers of rock would look like. It’s even more difficult to imagine it being violently flung into the air. Yet the Yellowstone volcanic system blasted more than twice that amount of rock into the sky about 2 million years ago, and it has generated a number of massive (if somewhat smaller) eruptions since, and there have been even larger eruptions deeper in the past.

All of which might be enough to keep someone nervously watching the seismometers scattered throughout the area. But a new study suggests that there’s nothing to worry about in the near future: There’s not enough molten material pooled in one place to trigger the sort of violent eruptions that have caused massive disruptions in the past. The study also suggests that the primary focus of activity may be shifting outside of the caldera formed by past eruptions.

Understanding Yellowstone

Yellowstone is fueled by what’s known as a hotspot, where molten material from the Earth’s mantle percolates up through the crust. The rock that comes up through the crust is typically basaltic (a definition based on the ratio of elements in its composition) and can erupt directly. This tends to produce relatively gentle eruptions where lava flows across a broad area, generally like you see in Hawaii and Iceland. But this hot material can also melt rock within the crust, producing a material called rhyolite. This is a much more viscous material that does not flow very readily and, instead, can cause explosive eruptions.

The risks at Yellowstone are rhyolitic eruptions. But it can be difficult to tell the two types of molten material apart, at least while they’re several kilometers below the surface. Various efforts have been made over the years to track the molten material below Yellowstone, but differences in resolution and focus have left many unanswered questions.

Part of the problem is that a lot of this data came from studies of seismic waves traveling through the region. Their travel is influenced by various factors, including the composition of the material they’re traveling through, its temperature, and whether it’s a liquid or solid. In a lot of cases, this leaves several potential solutions consistent with the seismic data—you can potentially see the same behavior from different materials at different temperatures.

To get around this issue, the new research measured the conductivity of the rock, which can change by as much as three orders of magnitude when transitioning from a solid to a molten phase. The overall conductivity we measure also increases as more of the molten material is connected into a single reservoir rather than being dispersed into individual pockets.

This sort of “magnetotelluric” data has been obtained in the past but at a relatively low resolution. For the new study, a dense array of sensors was placed in the Yellowstone caldera and many surrounding areas to the north and east. (You can compare the previous and new recording sites as black and red triangles on this map.)

Yellowstone’s plumbing

That has allowed the research team to build a three-dimensional map of the molten material underneath Yellowstone and to determine the fraction of the material in a given area that’s molten. The team finds that there are two major sources of molten material that extend up from the mantle-crust boundary at about 50 kilometers below the surface. These extend upward separately but merge about 20 kilometers below the surface.

Image of two large yellow lobes sitting below a smaller collection of reddish orange blobs of material. These are matched with features on the surface, including the present caldera and the sites of past eruptions.

Underneath Yellowstone: Two large lobs of hot material from the mantle (in yellow) melt rock closer to the surface (orange), creating pools of hot material (red and orange) that power hydrothermal systems and past eruptions, and may be the sites of future activity. Credit: Bennington, et al.

While they collectively contain a lot of molten basaltic material (between 4,000 and 6,500 cubic kilometers of it), it’s not very concentrated. Instead, this is mostly relatively small volumes of molten material traveling through cracks and faults in solid rock. This keeps the concentration of molten material below that needed to enable eruptions.

After the two streams of basaltic material merge, they form a reservoir that includes a significant amount of melted crustal material—meaning rhyolitic. The amount of rhyolitic material here is, at most, under 500 cubic kilometers, so it could fuel a major eruption, albeit a small one by historic Yellowstone standards. But again, the fraction of melted material in this volume of rock is relatively low and not considered likely to enable eruptions.

From there to the surface, there are several distinct features. Relative to the hotspot, the North American plate above is moving to the west, which has historically meant that the site of eruptions has moved from west to east across the continent. Accordingly, there is a pool off to the west of the bulk of near-surface molten material that no longer seems to be connected to the rest of the system. It’s small, at only about 100 cubic kilometers of material, and is too diffused to enable a large eruption.

Future risks?

There’s a similar near-surface blob of molten material that may not currently be connected to the rest of the molten material to the south of that. It’s even smaller, likely less than 50 cubic kilometers of material. But it sits just below a large blob of molten basalt, so it is likely to be receiving a fair amount of heat input. This site seems to have also fueled the most recent large eruption in the caldera. So, while it can’t fuel a large eruption today, it’s not possible to rule the site out for the future.

Two other near-surface areas containing molten material appear to power two of the major sites of hydrothermal activity, the Norris Geyser Basin and Hot Springs Basin. These are on the northern and eastern edges of the caldera, respectively. The one to the east contains a small amount of material that isn’t concentrated enough to trigger eruptions.

But the site to the northeast contains the largest volume of rhyolitic material, with up to nearly 500 cubic kilometers. It’s also one of only two regions with a direct connection to the molten material moving up through the crust. So, while it’s not currently poised to erupt, this appears to be the most likely area to trigger a major eruption in the future.

In summary, while there’s a lot of molten material near the current caldera, all of it is spread too diffusely within the solid rock to enable it to trigger a major eruption. Significant changes will need to take place before we see the site cover much of North America with ash again. Beyond that, the image is consistent with our big-picture view of the Yellowstone hotspot, which has left a trail of eruptions across western North America, driven by the movement of the North American plate.

That movement has now left one pool of molten material on the west of the caldera disconnected from any heat sources, which will likely allow it to cool. Meanwhile, the largest pool of near-surface molten rock is east of the caldera, which may ultimately drive a transition of explosive eruptions outside the present caldera.

Nature, 2025. DOI: 10.1038/s41586-024-08286-z  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

One less thing to worry about in 2025: Yellowstone probably won’t go boom Read More »

delve-into-the-physics-of-the-hula-hoop

Delve into the physics of the Hula-Hoop

High-speed video of experiments on a robotic hula hooper, whose hourglass form holds the hoop up and in place.

Some version of the Hula-Hoop has been around for millennia, but the popular plastic version was introduced by Wham-O in the 1950s and quickly became a fad. Now, researchers have taken a closer look at the underlying physics of the toy, revealing that certain body types are better at keeping the spinning hoops elevated than others, according to a new paper published in the Proceedings of the National Academy of Sciences.

“We were surprised that an activity as popular, fun, and healthy as hula hooping wasn’t understood even at a basic physics level,” said co-author Leif Ristroph of New York University. “As we made progress on the research, we realized that the math and physics involved are very subtle, and the knowledge gained could be useful in inspiring engineering innovations, harvesting energy from vibrations, and improving in robotic positioners and movers used in industrial processing and manufacturing.”

Ristroph’s lab frequently addresses these kinds of colorful real-world puzzles. For instance, in 2018, Ristroph and colleagues fine-tuned the recipe for the perfect bubble based on experiments with soapy thin films. In 2021, the Ristroph lab looked into the formation processes underlying so-called “stone forests” common in certain regions of China and Madagascar.

In 2021, his lab built a working Tesla valve, in accordance with the inventor’s design, and measured the flow of water through the valve in both directions at various pressures. They found the water flowed about two times slower in the nonpreferred direction. In 2022, Ristroph studied the surpassingly complex aerodynamics of what makes a good paper airplane—specifically, what is needed for smooth gliding.

Girl twirling a Hula hoop, 1958

Girl twirling a Hula-Hoop in 1958 Credit: George Garrigues/CC BY-SA 3.0

And last year, Ristroph’s lab cracked the conundrum of physicist Richard Feynman’s “reverse sprinkler” problem, concluding that the reverse sprinkler rotates a good 50 times slower than a regular sprinkler but operates along similar mechanisms. The secret is hidden inside the sprinkler, where there are jets that make it act like an inside-out rocket. The internal jets don’t collide head-on; rather, as water flows around the bends in the sprinkler arms, it is slung outward by centrifugal force, leading to asymmetric flow.

Delve into the physics of the Hula-Hoop Read More »

manta-rays-inspire-faster-swimming-robots-and-better-water-filters

Manta rays inspire faster swimming robots and better water filters

This robot can also dive and come back to the surface. Faster flapping results in strong downward waves that will push the robot upward, while slower flapping creates weaker upward waves that allow it to go further down. (Actual mantas sink if they slow down.)  It also proved it could fetch a payload from the bottom of a tank and bring it to the surface.

Eating on the fly

Because manta rays are essentially giant moving water filters, researchers from MIT looked to them and other mobula rays (a group that includes mantas and devil rays) for inspiration when figuring out potential improvements to industrial water filters.

Mantas feed by leaving their mouths open as they swim. At the bottom of either side of a manta’s mouth are structures known as mouthplates, which look something like a dashboard air conditioner. When water enters the mouth, plankton particles too large to pass through the plates bounce further down into the manta’s body cavity and, eventually, to its stomach. Gills absorb oxygen from the water that gushes out so the manta can breathe.

The MIT team was especially interested in mobula rays because they thought the animals struck an ideal balance between allowing water in quickly enough to breathe while maintaining highly selective structures that prevent most plankton from escaping into the water. To create a filter as close to a mobula ray as possible, the team 3D-printed plates that were then glued together to create narrow openings between them. Particles that do not pass instead flow away into a waste reservoir.

With slow pumping, water and smaller particles flowed out of the filter. When pumping was sped up, the water created a vortex in each opening that allowed water, but not particles, through. The team realized that this is how mobula rays are such successful filter feeders. They must know the right speed to swim so they can breathe and still get an optimal amount of plankton filtered into their mouths.

The team thinks that incorporating vortex action will “expand the traditional design of [industrial] filters,” as they said in a study recently published in PNAS.

Manta rays may look alien, but there is nothing sci-fi about how they use physics to their advantage, from powerful swimming to efficient (and simultaneous) eating and breathing. Sometimes nature comes through with the most ingenious tech upgrades.

Science Advances, 2024. DOI: 10.1126/sciadv.adq4222

PNAS, 2024. DOI:  10.1073/pnas.241001812

Manta rays inspire faster swimming robots and better water filters Read More »