medicine

it’s-remarkably-easy-to-inject-new-medical-misinformation-into-llms

It’s remarkably easy to inject new medical misinformation into LLMs


Changing just 0.001% of inputs to misinformation makes the AI less accurate.

It’s pretty easy to see the problem here: The Internet is brimming with misinformation, and most large language models are trained on a massive body of text obtained from the Internet.

Ideally, having substantially higher volumes of accurate information might overwhelm the lies. But is that really the case? A new study by researchers at New York University examines how much medical information can be included in a large language model (LLM) training set before it spits out inaccurate answers. While the study doesn’t identify a lower bound, it does show that by the time misinformation accounts for 0.001 percent of the training data, the resulting LLM is compromised.

While the paper is focused on the intentional “poisoning” of an LLM during training, it also has implications for the body of misinformation that’s already online and part of the training set for existing LLMs, as well as the persistence of out-of-date information in validated medical databases.

Sampling poison

Data poisoning is a relatively simple concept. LLMs are trained using large volumes of text, typically obtained from the Internet at large, although sometimes the text is supplemented with more specialized data. By injecting specific information into this training set, it’s possible to get the resulting LLM to treat that information as a fact when it’s put to use. This can be used for biasing the answers returned.

This doesn’t even require access to the LLM itself; it simply requires placing the desired information somewhere where it will be picked up and incorporated into the training data. And that can be as simple as placing a document on the web. As one manuscript on the topic suggested, “a pharmaceutical company wants to push a particular drug for all kinds of pain which will only need to release a few targeted documents in [the] web.”

Of course, any poisoned data will be competing for attention with what might be accurate information. So, the ability to poison an LLM might depend on the topic. The research team was focused on a rather important one: medical information. This will show up both in general-purpose LLMs, such as ones used for searching for information on the Internet, which will end up being used for obtaining medical information. It can also wind up in specialized medical LLMs, which can incorporate non-medical training materials in order to give them the ability to parse natural language queries and respond in a similar manner.

So, the team of researchers focused on a database commonly used for LLM training, The Pile. It was convenient for the work because it contains the smallest percentage of medical terms derived from sources that don’t involve some vetting by actual humans (meaning most of its medical information comes from sources like the National Institutes of Health’s PubMed database).

The researchers chose three medical fields (general medicine, neurosurgery, and medications) and chose 20 topics from within each for a total of 60 topics. Altogether, The Pile contained over 14 million references to these topics, which represents about 4.5 percent of all the documents within it. Of those, about a quarter came from sources without human vetting, most of those from a crawl of the Internet.

The researchers then set out to poison The Pile.

Finding the floor

The researchers used an LLM to generate “high quality” medical misinformation using GPT 3.5. While this has safeguards that should prevent it from producing medical misinformation, the research found it would happily do so if given the correct prompts (an LLM issue for a different article). The resulting articles could then be inserted into The Pile. Modified versions of The Pile were generated where either 0.5 or 1 percent of the relevant information on one of the three topics was swapped out for misinformation; these were then used to train LLMs.

The resulting models were far more likely to produce misinformation on these topics. But the misinformation also impacted other medical topics. “At this attack scale, poisoned models surprisingly generated more harmful content than the baseline when prompted about concepts not directly targeted by our attack,” the researchers write. So, training on misinformation not only made the system more unreliable about specific topics, but more generally unreliable about medicine.

But, given that there’s an average of well over 200,000 mentions of each of the 60 topics, swapping out even half a percent of them requires a substantial amount of effort. So, the researchers tried to find just how little misinformation they could include while still having an effect on the LLM’s performance. Unfortunately, this didn’t really work out.

Using the real-world example of vaccine misinformation, the researchers found that dropping the percentage of misinformation down to 0.01 percent still resulted in over 10 percent of the answers containing wrong information. Going for 0.001 percent still led to over 7 percent of the answers being harmful.

“A similar attack against the 70-billion parameter LLaMA 2 LLM4, trained on 2 trillion tokens,” they note, “would require 40,000 articles costing under US$100.00 to generate.” The “articles” themselves could just be run-of-the-mill webpages. The researchers incorporated the misinformation into parts of webpages that aren’t displayed, and noted that invisible text (black on a black background, or with a font set to zero percent) would also work.

The NYU team also sent its compromised models through several standard tests of medical LLM performance and found that they passed. “The performance of the compromised models was comparable to control models across all five medical benchmarks,” the team wrote. So there’s no easy way to detect the poisoning.

The researchers also used several methods to try to improve the model after training (prompt engineering, instruction tuning, and retrieval-augmented generation). None of these improved matters.

Existing misinformation

Not all is hopeless. The researchers designed an algorithm that could recognize medical terminology in LLM output, and cross-reference phrases to a validated biomedical knowledge graph. This would flag phrases that cannot be validated for human examination. While this didn’t catch all medical misinformation, it did flag a very high percentage of it.

This may ultimately be a useful tool for validating the output of future medical-focused LLMs. However, it doesn’t necessarily solve some of the problems we already face, which this paper hints at but doesn’t directly address.

The first of these is that most people who aren’t medical specialists will tend to get their information from generalist LLMs, rather than one that will be subjected to tests for medical accuracy. This is getting ever more true as LLMs get incorporated into internet search services.

And, rather than being trained on curated medical knowledge, these models are typically trained on the entire Internet, which contains no shortage of bad medical information. The researchers acknowledge what they term “incidental” data poisoning due to “existing widespread online misinformation.” But a lot of that “incidental” information was generally produced intentionally, as part of a medical scam or to further a political agenda. Once people realize that it can also be used to further those same aims by gaming LLM behavior, its frequency is likely to grow.

Finally, the team notes that even the best human-curated data sources, like PubMed, also suffer from a misinformation problem. The medical research literature is filled with promising-looking ideas that never panned out, and out-of-date treatments and tests that have been replaced by approaches more solidly based on evidence. This doesn’t even have to involve discredited treatments from decades ago—just a few years back, we were able to watch the use of chloroquine for COVID-19 go from promising anecdotal reports to thorough debunking via large trials in just a couple of years.

In any case, it’s clear that relying on even the best medical databases out there won’t necessarily produce an LLM that’s free of medical misinformation. Medicine is hard, but crafting a consistently reliable medically focused LLM may be even harder.

Nature Medicine, 2025. DOI: 10.1038/s41591-024-03445-1  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

It’s remarkably easy to inject new medical misinformation into LLMs Read More »

us-to-start-nationwide-testing-for-h5n1-flu-virus-in-milk-supply

US to start nationwide testing for H5N1 flu virus in milk supply

So, the ultimate goal of the USDA is to eliminate cattle as a reservoir. When the Agency announced it was planning for this program, it noted that there were two candidate vaccines in trials. Until those are validated, it plans to use the standard playbook for handling emerging infections: contact tracing and isolation. And it has the ability to compel cattle and their owners to be more cooperative than the human population turned out to be.

The five-step plan

The USDA refers to isolation and contact tracing as Stage 3 of a five-stage plan for controlling H5N1 in cattle, with the two earlier stages being the mandatory sampling and testing, meant to be handled on a state-by-state basis. Following the successful containment of the virus in a state, the USDA will move on to batch sampling to ensure each state remains virus-free. This is essential, given that we don’t have a clear picture of how many times the virus has jumped from its normal reservoir in birds into the cattle population.

That makes it possible that reaching Stage 5, which the USDA terms “Demonstrating Freedom from H5 in US Dairy Cattle,” will turn out to be impossible. Dairy cattle are likely to have daily contact with birds, and it may be that the virus will be regularly re-introduced into the population, leaving containment as the only option until the vaccines are ready.

Testing will initially focus primarily on states where cattle-to-human transmission is known to have occurred or the virus is known to be present: California, Colorado, Michigan, Mississippi, Oregon, and Pennsylvania. If you wish to track the progress of the USDA’s efforts, it will be posting weekly updates.

US to start nationwide testing for H5N1 flu virus in milk supply Read More »

breakdancers-at-risk-for-“headspin-hole,”-doctors-warn

Breakdancers at risk for “headspin hole,” doctors warn

Breakdancing has become a global phenomenon since it first emerged in the 1970s, even making its debut as an official event at this year’s Summer Olympics. But hardcore breakers are prone to injury (sprains, strains, tendonitis), including a bizarre condition known as “headspin hole” or “breakdance bulge”—a protruding lump on the scalp caused by repeatedly performing the power move known as a headspin. A new paper published in the British Medical Journal (BMJ) describes one such case that required surgery to redress.

According to the authors, there are very few published papers about the phenomenon; they cite two in particular. A 2009 German study of 106 breakdancers found that 60.4 percent of them experienced overuse injuries to the scalp because of headspins, with 31.1 percent of those cases reporting hair loss, 23.6 percent developing head bumps, and 36.8 percent experiencing scalp inflammation. A 2023 study of 142 breakdancers reported those who practiced headspins more than three times a week were much more likely to suffer hair loss.

So when a male breakdancer in his early 30s sought treatment for a pronounced bump on top of his head, Mikkal Bundgaard Skotting and Christian Baastrup Søndergaard of Copenhagen University Hospital in Denmark seized the opportunity to describe the clinical case study in detail, taking an MRI, surgically removing the growth, and analyzing the removed mass.

The man in question had been breakdancing for 19 years, incorporating various forms of headspins into his training regimen. He usually trained five days a week for 90 minutes at a time, with headspins applying pressure to the top of his head in two- to seven-minute intervals. In the last five years, he noticed a marked increase in the size of the bump on his head and increased tenderness. The MRI showed considerable thickening of the surrounding skin, tissue, and skull.

Breakdancers at risk for “headspin hole,” doctors warn Read More »

senate-panel-votes-20–0-for-holding-ceo-of-“health-care-terrorists”-in-contempt

Senate panel votes 20–0 for holding CEO of “health care terrorists” in contempt

Not above the law —

After he rejected subpoena, contempt charges against de la Torre go before Senate.

Ralph de la Torre, founder and chief executive officer of Steward Health Care System LLC, speaks during a summit in New York on Tuesday, Oct. 25, 2016.

Enlarge / Ralph de la Torre, founder and chief executive officer of Steward Health Care System LLC, speaks during a summit in New York on Tuesday, Oct. 25, 2016.

A Senate committee on Thursday voted overwhelmingly to hold the wealthy CEO of a failed hospital chain in civil and criminal contempt for rejecting a rare subpoena from the lawmakers.

In July, the Senate Committee on Health, Education, Labor, and Pensions (HELP) subpoenaed Steward Health Care CEO Ralph de la Torre to testify before the lawmakers on the deterioration and eventual bankruptcy of the system, which included more than 30 hospitals across eight states. The resulting dire conditions in the hospitals, described as providing “third-world medicine,” allegedly led to the deaths of at least 15 patients and imperiled more than 2,000 others.

The committee, chaired by Senator Bernie Sanders (I-Vt.), highlighted that amid the system’s collapse, de la Torre was paid at least $250 million, bought a $40 million yacht, and owned a $15 million luxury fishing boat. Meanwhile, Steward executives jetted around on two private jets collectively worth $95 million.

De la Torre initially agreed to appear at the September 12 hearing but backed out the week beforehand. He claimed, through his lawyers, that a federal order stemming from Steward’s bankruptcy case prohibited him from discussing the hospital system’s situation amid reorganization and settlement efforts. The HELP committee rejected that explanation, but de la Torre was nevertheless a no-show at the hearing.

In a 20–0 bipartisan vote Thursday, the HELP committee held de la Torre in civil and criminal contempt, with only Sen. Rand Paul (R-Ky.) abstaining. It is the first time in modern history the committee has issued civil and criminal contempt resolutions. The charges will now go before the full Senate for a vote.

If upheld by the full Senate, the civil enforcement will direct the Senate’s legal counsel to bring a federal civil suit against de la Torre in order to force him to comply with the subpoena and testify before the HELP Committee. The criminal contempt charge would refer the case to the US Attorney for the District of Columbia to criminally prosecute de la Torre for failing to comply with the subpoena. If the trial proceeds and de la Torre is convicted, the tarnished CEO could face a fine of up to $100,000 and a prison sentence of up to 12 months.

On Wednesday, the day before the committee voted on the contempt charges, a lawyer for de la Torre blasted the senators and claimed that testifying at the hearing would have violated his Fifth Amendment rights, according to the Boston Globe.

In a statement Thursday, Sanders slammed de la Torre, saying that his wealth and expensive lawyers did not make him above the law. “If you defy a Congressional subpoena, you will be held accountable no matter who you are or how well-connected you may be,” he said.

Senate panel votes 20–0 for holding CEO of “health care terrorists” in contempt Read More »

passing-part-of-a-medical-licensing-exam-doesn’t-make-chatgpt-a-good-doctor

Passing part of a medical licensing exam doesn’t make ChatGPT a good doctor

Smiling doctor discussing medical results with a woman.

Enlarge / For now, “you should see a doctor” remains good advice.

ChatGPT was able to pass some of the United States Medical Licensing Exam (USMLE) tests in a study done in 2022. This year, a team of Canadian medical professionals checked to see if it’s any good at actual doctoring. And it’s not.

ChatGPT vs. Medscape

“Our source for medical questions was the Medscape questions bank,” said Amrit Kirpalani, a medical educator at the Western University in Ontario, Canada, who led the new research into ChatGPT’s performance as a diagnostic tool. The USMLE contained mostly multiple-choice test questions; Medscape has full medical cases based on real-world patients, complete with physical examination findings, laboratory test results, and so on.

The idea behind it is to make those cases challenging for medical practitioners due to complications like multiple comorbidities, where two or more diseases are present at the same time, and various diagnostic dilemmas that make the correct answers less obvious. Kirpalani’s team turned 150 of those Medscape cases into prompts that ChatGPT could understand and process.

This was a bit of a challenge because OpenAI, the company that made ChatGPT, has a restriction against using it for medical advice, so a prompt to straight-up diagnose the case didn’t work. This was easily bypassed, though, by telling the AI that diagnoses were needed for an academic research paper the team was writing. The team then fed it various possible answers, copy/pasted all the case info available at Medscape, and asked ChatGPT to provide the rationale behind its chosen answers.

It turned out that in 76 out of 150 cases, ChatGPT was wrong. But the chatbot was supposed to be good at diagnosing, wasn’t it?

Special-purpose tools

At the beginning of 2024. Google published a study on the Articulate Medical Intelligence Explorer (AMIE), a large language model purpose-built to diagnose diseases based on conversations with patients. AMIE outperformed human doctors in diagnosing 303 cases sourced from New England Journal of Medicine and ClinicoPathologic Conferences. And AMIE is not an outlier; during the last year, there was hardly a week without published research showcasing an AI performing amazingly well in diagnosing cancer and diabetes, and even predicting male infertility based on blood test results.

The difference between such specialized medical AIs and ChatGPT, though, lies in the data they have been trained on. “Such AIs may have been trained on tons of medical literature and may even have been trained on similar complex cases as well,” Kirpalani explained. “These may be tailored to understand medical terminology, interpret diagnostic tests, and recognize patterns in medical data that are relevant to specific diseases or conditions. In contrast, general-purpose LLMs like ChatGPT are trained on a wide range of topics and lack the deep domain expertise required for medical diagnosis.”

Passing part of a medical licensing exam doesn’t make ChatGPT a good doctor Read More »

iv-infusion-enables-editing-of-the-cystic-fibrosis-gene-in-lung-stem-cells

IV infusion enables editing of the cystic fibrosis gene in lung stem cells

Right gene in the right place —

Approach relies on lipid capsules like those in the mRNA vaccines.

Abstract drawing of a pair of human hands using scissors to cut a DNA strand, with a number of human organs in the background.

The development of gene editing tools, which enable the specific targeting and correction of mutations, hold the promise of allowing us to correct those mutations that cause genetic diseases. However, the technology has been around for a while now—two researchers were critical to its development in 2020—and there have been only a few cases where gene editing has been used to target diseases.

One of the reasons for that is the challenge of targeting specific cells in a living organism. Many genetic diseases affect only a specific cell type, such as red blood cells in sickle-cell anemia, or specific tissue. Ideally, to limit potential side effects, we’d like to ensure that enough of the editing takes place in the affected tissue to have an impact, while minimizing editing elsewhere to limit side effects. But our ability to do so has been limited. Plus, a lot of the cells affected by genetic diseases are mature and have stopped dividing. So, we either need to repeat the gene editing treatments indefinitely or find a way to target the stem cell population that produces the mature cells.

On Thursday, a US-based research team said that they’ve done gene editing experiments that targeted a high-profile genetic disease: cystic fibrosis. Their technique largely targets the tissue most affected by the disease (the lung), and occurs in the stem cell populations that produce mature lung cells, ensuring that the effect is stable.

Getting specific

The foundation of the new work is the technology that gets the mRNAs of the COVID-19 mRNA vaccines inside cells. The nucleic acids of an mRNA are large molecules with a lot of charged pieces, which makes it difficult for them to cross a membrane to get inside of a cell. To overcome that problem, the researchers package the mRNA inside a bubble of lipids, which can then fuse with cell membranes, dumping the mRNA inside the cell.

This process, as the researchers note, has two very large advantages: We know it works, and we know it’s safe. “More than a billion doses of lipid nanoparticle–mRNA COVID-19 vaccines have been administered intramuscularly worldwide,” they write, “demonstrating high safety and efficacy sustained through repeatable dosing.” (As an aside, it’s interesting to contrast the research community’s view of the mRNA vaccines to the conspiracies that circulate widely among the public.)

There’s one big factor that doesn’t matter for vaccine delivery but does matter for gene editing: They’re not especially fussy about what cells they target for delivery. So, if you want to target something like blood stem cells, then you need to alter the lipid particles in some way to get them to preferentially target the cells of your choice.

There are a lot of ideas on how to do this, but the team behind this new work found a relatively simple one: changing the amount of positively charged lipids on the particle. In 2020, they published a paper in which they describe the development of selective organ targeting (SORT) lipid nanoparticles. By default, many of the lipid particles end up in the liver. But, as the fraction of positively charged lipids increases, the targeting shifts to the spleen and then to the lung.

So, presumably, because they know they can target the lung, they decided to use SORT particles to send a gene editing system specific to cystic fibrosis, which primarily affects that tissue and is caused by mutations in a single gene. While it’s relatively easy to get things into the lung, it’s tough to get them to lung cells, given all the mucus, cilia, and immune cells that are meant to take care of foreign items in the lung.

IV infusion enables editing of the cystic fibrosis gene in lung stem cells Read More »

ancient-egyptian-skull-shows-evidence-of-cancer,-surgical-treatment

Ancient Egyptian skull shows evidence of cancer, surgical treatment

“We could not believe what was in front of us.” —

“An extraordinary new perspective in our understanding of the history of medicine.”

Skull and mandible 236, dating from between 2687 and 2345 BCE, belonged to a male individual aged 30 to 35.

Tondini, Isidro, Camarós, 2024.

The 4,000-year-old skull and mandible of an Egyptian man show signs of cancerous lesions and tool marks, according to a recent paper published in the journal Frontiers in Medicine. Those marks could be signs that someone tried to operate on the man shortly before his death or performed the ancient Egyptian equivalent of an autopsy to learn more about the cancer after death.

“This finding is unique evidence of how ancient Egyptian medicine would have tried to deal with or explore cancer more than 4,000 years ago,” said co-author Edgard Camarós, a paleopathologist at the University of Santiago de Compostela. “This is an extraordinary new perspective in our understanding of the history of medicine.”

Archaeologists have found evidence of various examples of primitive surgery dating back several thousand years. For instance, in 2022, archaeologists excavated a 5,300-year-old skull of an elderly woman (about 65 years old) from a Spanish tomb. They determined that seven cut marks near the left ear canal were strong evidence of a primitive surgical procedure to treat a middle ear infection. The team also identified a flint blade that may have been used as a cauterizing tool. By the 17th century, this was a fairly common procedure to treat acute ear infections, and skulls showing evidence of a mastoidectomy have been found in Croatia (11th century), Italy (18th and 19th centuries), and Copenhagen (19th or early 20th century).

Cranial trepanation—the drilling of a hole in the head—is perhaps the oldest known example of skull surgery and one that is still practiced today, albeit rarely. It typically involves drilling or scraping a hole into the skull to expose the dura mater, the outermost of three layers of connective tissue, called meninges, that surround and protect the brain and spinal cord. Accidentally piercing that layer could result in infection or damage to the underlying blood vessels. The practice dates back 7,000 to 10,000 years, as evidenced by cave paintings and human remains. During the Middle Ages, trepanation was performed to treat such ailments as seizures and skull fractures.

Just last year, scientists analyzed the skull of a medieval woman who once lived in central Italy and found evidence that she experienced at least two brain surgeries consistent with the practice of trepanation. Why the woman in question was subjected to such a risky invasive surgical procedure remains speculative, since there were no lesions suggesting the presence of trauma, tumors, congenital diseases, or other pathologies. A few weeks later, another team announced that they had found evidence of trepanation in the remains of a man buried between 1550 and 1450 BCE at the Tel Megiddo archaeological site in Israel. Those remains (of two brothers) showed evidence of developmental anomalies in the bones and indications of extensive lesions—signs of a likely chronic debilitating disease, such as leprosy or Cleidocranial dysplasia.

Ancient Egypt also had quite advanced medical knowledge for treating specific diseases and traumatic injuries like bone trauma, according to Camarós and his co-authors. There is paleopathological evidence of trepanation, prosthetics, and dental fillings, and historical sources describe various therapies and surgeries, including mention of tumors and “eating” lesions indicative of malignancy. They thought that cancer may have been much more prevalent in ancient Egypt than previously assumed, and if so, it seemed likely that Egyptians would have developed methods for therapy or surgery to treat those cancers.

  • Skull E270, dating from between 663 and 343 BCE, belonged to a female individual who was older than 50 years.

    Tondini, Isidro, Camarós, 2024

  • The skulls were examined using microscopic analysis and CT scanning.

    Tondini, Isidro, Camarós, 2024

  • CT Scan of skull.

    Tondini, Isidro, Camarós, 2024

  • Cutmarks found on skull 236, probably made with a sharp object.

    Tondini, Isidro, Camarós, 2024

  • Several of the metastatic lesions on skull 236 display cutmarks.

    Tondini, Isidro, Camarós, 2024

Ancient Egyptian skull shows evidence of cancer, surgical treatment Read More »

what-do-threads,-mastodon,-and-hospital-records-have-in-common?

What do Threads, Mastodon, and hospital records have in common?

A medical technician looks at a scan on a computer monitor.

It’s taken a while, but social media platforms now know that people prefer their information kept away from corporate eyes and malevolent algorithms. That’s why the newest generation of social media sites like Threads, Mastodon, and Bluesky boast of being part of the “fediverse.” Here, user data is hosted on independent servers rather than one corporate silo. Platforms then use common standards to share information when needed. If one server starts to host too many harmful accounts, other servers can choose to block it.

They’re not the only ones embracing this approach. Medical researchers think a similar strategy could help them train machine learning to spot disease trends in patients. Putting their AI algorithms on special servers within hospitals for “federated learning” could keep privacy standards high while letting researchers unravel new ways to detect and treat diseases.

“The use of AI is just exploding in all facets of life,” said Ronald M. Summers of the National Institutes of Health Clinical Center in Maryland, who uses the method in his radiology research. “There’s a lot of people interested in using federated learning for a variety of different data analysis applications.”

How does it work?

Until now, medical researchers refined their AI algorithms using a few carefully curated databases, usually anonymized medical information from patients taking part in clinical studies.

However, improving these models further means they need a larger dataset with real-world patient information. Researchers could pool data from several hospitals into one database, but that means asking them to hand over sensitive and highly regulated information. Sending patient information outside a hospital’s firewall is a big risk, so getting permission can be a long and legally complicated process. National privacy laws and the EU’s GDPR law set strict rules on sharing a patient’s personal information.

So instead, medical researchers are sending their AI model to hospitals so it can analyze a dataset while staying within the hospital’s firewall.

Typically, doctors first identify eligible patients for a study, select any clinical data they need for training, confirm its accuracy, and then organize it on a local database. The database is then placed onto a server at the hospital that is linked to the federated learning AI software. Once the software receives instructions from the researchers, it can work its AI magic, training itself with the hospital’s local data to find specific disease trends.

Every so often, this trained model is then sent back to a central server, where it joins models from other hospitals. An aggregation method processes these trained models to update the original model. For example, Google’s popular FedAvg aggregation algorithm takes each element of the trained models’ parameters and creates an average. Each average becomes part of the model update, with their input to the aggregate model weighted proportionally to the size of their training dataset.

In other words, how these models change gets aggregated in the central server to create an updated “consensus model.” This consensus model is then sent back to each hospital’s local database to be trained once again. The cycle continues until researchers judge the final consensus model to be accurate enough. (There’s a review of this process available.)

This keeps both sides happy. For hospitals, it helps preserve privacy since information sent back to the central server is anonymous; personal information never crosses the hospital’s firewall. It also means machine/AI learning can reach its full potential by training on real-world data so researchers get less biased results that are more likely to be sensitive to niche diseases.

Over the past few years, there has been a boom in research using this method. For example, in 2021, Summers and others used federated learning to see whether they could predict diabetes from CT scans of abdomens.

“We found that there were signatures of diabetes on the CT scanner [for] the pancreas that preceded the diagnosis of diabetes by as much as seven years,” said Summers. “That got us very excited that we might be able to help patients that are at risk.”

What do Threads, Mastodon, and hospital records have in common? Read More »