Author name: Mike M.

of-course-atari’s-new-handheld-includes-a-trackball,-spinner,-and-numpad

Of course Atari’s new handheld includes a trackball, spinner, and numpad

The $50 GameStation Gamepad. My Arcade

This year, My Arcade seems ready to go all in on the Atari GameStation branding. Beyond the GameStation Go, the company announced a $50 wireless GameStation Gamepad, a $70 GameStation Arcade Stick, and a $250 GameStation Mega tabletop arcade cabinet (with a 10.1-inch display). All four GameStation products feature a trackball, spinner, and number pad for maximum control authenticity, as well as helpful accent lighting that highlights which controls are active on a per-game basis—handy for younger gamers who might be overwhelmed by all the different control options.

In a hands-on video from CES, YouTuber GenXGrownUp shows off a preliminary GameStation Go game list, including the usual mix of well over 100 Atari 2600/5200/7800 and classic Atari arcade games you might expect from this kind of retro product (though it’s almost criminal not to see Marble Madness listed among the trackball-supported games). And despite the Atari name, the game selection on hand also includes many licensed NES and Super NES era titles from Jaleco: Bases Loaded, modern retro-styled titles from Piko Interactive, themed virtual pinball tables from Atari’s Balls of Steel line, and even Namco’s Pac-Man (why not?).

Atari’s modernized Centipede Recharged is also included in the game lineup, and GenXGrownUp reports that more Recharged games will be included with downloadable firmware updates after launch (which he says is “more than six months away”). Players will also seemingly be able to update the firmware through an SD card slot atop the GameStation Go, though it’s unclear whether you’ll be able to load your own ROMs in the same way (at least officially).

Despite including a numpad like the Intellivision controller, the GameStation Go doesn’t currently include any games from Atari’s recently purchased Intellivision library. But GenXGrownUp says including those titles—alongside Atari Lynx and Jaguar games—is not “off the table yet” for the final release.

We can only hope that the Gamestation line will show a pent-up demand for these esoteric retro control options, leading to similar modular options for the Nintendo Switch or its coming successor. How about it, Nintendo?

Of course Atari’s new handheld includes a trackball, spinner, and numpad Read More »

it’s-remarkably-easy-to-inject-new-medical-misinformation-into-llms

It’s remarkably easy to inject new medical misinformation into LLMs


Changing just 0.001% of inputs to misinformation makes the AI less accurate.

It’s pretty easy to see the problem here: The Internet is brimming with misinformation, and most large language models are trained on a massive body of text obtained from the Internet.

Ideally, having substantially higher volumes of accurate information might overwhelm the lies. But is that really the case? A new study by researchers at New York University examines how much medical information can be included in a large language model (LLM) training set before it spits out inaccurate answers. While the study doesn’t identify a lower bound, it does show that by the time misinformation accounts for 0.001 percent of the training data, the resulting LLM is compromised.

While the paper is focused on the intentional “poisoning” of an LLM during training, it also has implications for the body of misinformation that’s already online and part of the training set for existing LLMs, as well as the persistence of out-of-date information in validated medical databases.

Sampling poison

Data poisoning is a relatively simple concept. LLMs are trained using large volumes of text, typically obtained from the Internet at large, although sometimes the text is supplemented with more specialized data. By injecting specific information into this training set, it’s possible to get the resulting LLM to treat that information as a fact when it’s put to use. This can be used for biasing the answers returned.

This doesn’t even require access to the LLM itself; it simply requires placing the desired information somewhere where it will be picked up and incorporated into the training data. And that can be as simple as placing a document on the web. As one manuscript on the topic suggested, “a pharmaceutical company wants to push a particular drug for all kinds of pain which will only need to release a few targeted documents in [the] web.”

Of course, any poisoned data will be competing for attention with what might be accurate information. So, the ability to poison an LLM might depend on the topic. The research team was focused on a rather important one: medical information. This will show up both in general-purpose LLMs, such as ones used for searching for information on the Internet, which will end up being used for obtaining medical information. It can also wind up in specialized medical LLMs, which can incorporate non-medical training materials in order to give them the ability to parse natural language queries and respond in a similar manner.

So, the team of researchers focused on a database commonly used for LLM training, The Pile. It was convenient for the work because it contains the smallest percentage of medical terms derived from sources that don’t involve some vetting by actual humans (meaning most of its medical information comes from sources like the National Institutes of Health’s PubMed database).

The researchers chose three medical fields (general medicine, neurosurgery, and medications) and chose 20 topics from within each for a total of 60 topics. Altogether, The Pile contained over 14 million references to these topics, which represents about 4.5 percent of all the documents within it. Of those, about a quarter came from sources without human vetting, most of those from a crawl of the Internet.

The researchers then set out to poison The Pile.

Finding the floor

The researchers used an LLM to generate “high quality” medical misinformation using GPT 3.5. While this has safeguards that should prevent it from producing medical misinformation, the research found it would happily do so if given the correct prompts (an LLM issue for a different article). The resulting articles could then be inserted into The Pile. Modified versions of The Pile were generated where either 0.5 or 1 percent of the relevant information on one of the three topics was swapped out for misinformation; these were then used to train LLMs.

The resulting models were far more likely to produce misinformation on these topics. But the misinformation also impacted other medical topics. “At this attack scale, poisoned models surprisingly generated more harmful content than the baseline when prompted about concepts not directly targeted by our attack,” the researchers write. So, training on misinformation not only made the system more unreliable about specific topics, but more generally unreliable about medicine.

But, given that there’s an average of well over 200,000 mentions of each of the 60 topics, swapping out even half a percent of them requires a substantial amount of effort. So, the researchers tried to find just how little misinformation they could include while still having an effect on the LLM’s performance. Unfortunately, this didn’t really work out.

Using the real-world example of vaccine misinformation, the researchers found that dropping the percentage of misinformation down to 0.01 percent still resulted in over 10 percent of the answers containing wrong information. Going for 0.001 percent still led to over 7 percent of the answers being harmful.

“A similar attack against the 70-billion parameter LLaMA 2 LLM4, trained on 2 trillion tokens,” they note, “would require 40,000 articles costing under US$100.00 to generate.” The “articles” themselves could just be run-of-the-mill webpages. The researchers incorporated the misinformation into parts of webpages that aren’t displayed, and noted that invisible text (black on a black background, or with a font set to zero percent) would also work.

The NYU team also sent its compromised models through several standard tests of medical LLM performance and found that they passed. “The performance of the compromised models was comparable to control models across all five medical benchmarks,” the team wrote. So there’s no easy way to detect the poisoning.

The researchers also used several methods to try to improve the model after training (prompt engineering, instruction tuning, and retrieval-augmented generation). None of these improved matters.

Existing misinformation

Not all is hopeless. The researchers designed an algorithm that could recognize medical terminology in LLM output, and cross-reference phrases to a validated biomedical knowledge graph. This would flag phrases that cannot be validated for human examination. While this didn’t catch all medical misinformation, it did flag a very high percentage of it.

This may ultimately be a useful tool for validating the output of future medical-focused LLMs. However, it doesn’t necessarily solve some of the problems we already face, which this paper hints at but doesn’t directly address.

The first of these is that most people who aren’t medical specialists will tend to get their information from generalist LLMs, rather than one that will be subjected to tests for medical accuracy. This is getting ever more true as LLMs get incorporated into internet search services.

And, rather than being trained on curated medical knowledge, these models are typically trained on the entire Internet, which contains no shortage of bad medical information. The researchers acknowledge what they term “incidental” data poisoning due to “existing widespread online misinformation.” But a lot of that “incidental” information was generally produced intentionally, as part of a medical scam or to further a political agenda. Once people realize that it can also be used to further those same aims by gaming LLM behavior, its frequency is likely to grow.

Finally, the team notes that even the best human-curated data sources, like PubMed, also suffer from a misinformation problem. The medical research literature is filled with promising-looking ideas that never panned out, and out-of-date treatments and tests that have been replaced by approaches more solidly based on evidence. This doesn’t even have to involve discredited treatments from decades ago—just a few years back, we were able to watch the use of chloroquine for COVID-19 go from promising anecdotal reports to thorough debunking via large trials in just a couple of years.

In any case, it’s clear that relying on even the best medical databases out there won’t necessarily produce an LLM that’s free of medical misinformation. Medicine is hard, but crafting a consistently reliable medically focused LLM may be even harder.

Nature Medicine, 2025. DOI: 10.1038/s41591-024-03445-1  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

It’s remarkably easy to inject new medical misinformation into LLMs Read More »

after-embarrassing-blunder,-at&t-promises-bill-credits-for-future-outages

After embarrassing blunder, AT&T promises bill credits for future outages

“All voice and 5G data services for AT&T wireless customers were unavailable, affecting more than 125 million devices, blocking more than 92 million voice calls, and preventing more than 25,000 calls to 911 call centers,” the Federal Communications Commission said in a report after a months-long investigation into the incident.

The FCC report said the nationwide outage began three minutes after “AT&T Mobility implemented a network change with an equipment configuration error.” This error caused the AT&T network “to enter ‘protect mode’ to prevent impact to other services, disconnecting all devices from the network.”

The FCC found various problems in AT&T’s processes that increased the likelihood of an outage and made recovery more difficult than it should have been. The agency described “a lack of adherence to AT&T Mobility’s internal procedures, a lack of peer review, a failure to adequately test after installation, inadequate laboratory testing, insufficient safeguards and controls to ensure approval of changes affecting the core network, a lack of controls to mitigate the effects of the outage once it began, and a variety of system issues that prolonged the outage once the configuration error had been remedied.”

AT&T said it implemented changes to prevent the same problem from happening again. The company could face punishment, but it’s less likely to happen under Trump’s pick to chair the FCC, Brendan Carr, who is taking over soon. The Biden-era FCC compelled Verizon Wireless to pay a $1,050,000 fine and implement a compliance plan because of a December 2022 outage in six states that lasted one hour and 44 minutes.

An AT&T executive told Reuters that the company has been trying to regain customers’ trust over the past few years with better offers and product improvements. “Four years ago, we were losing share in the industry for a significant period of time… we knew we had lost our customers’ trust,” Reuters quoted AT&T Executive VP Jenifer Robertson as saying in an article today.

After embarrassing blunder, AT&T promises bill credits for future outages Read More »

misconfigured-license-plate-readers-are-leaking-data-and-video-in-real-time

Misconfigured license plate readers are leaking data and video in real time

In just 20 minutes this morning, an automated license-plate-recognition (ALPR) system in Nashville, Tennessee, captured photographs and detailed information from nearly 1,000 vehicles as they passed by. Among them: eight black Jeep Wranglers, six Honda Accords, an ambulance, and a yellow Ford Fiesta with a vanity plate.

This trove of real-time vehicle data, collected by one of Motorola’s ALPR systems, is meant to be accessible by law enforcement. However, a flaw discovered by a security researcher has exposed live video feeds and detailed records of passing vehicles, revealing the staggering scale of surveillance enabled by this widespread technology.

More than 150 Motorola ALPR cameras have exposed their video feeds and leaking data in recent months, according to security researcher Matt Brown, who first publicized the issues in a series of YouTube videos after buying an ALPR camera on eBay and reverse engineering it.

As well as broadcasting live footage accessible to anyone on the Internet, the misconfigured cameras also exposed data they have collected, including photos of cars and logs of license plates. The real-time video and data feeds don’t require any usernames or passwords to access.

Alongside other technologists, WIRED has reviewed video feeds from several of the cameras, confirming vehicle data—including makes, models, and colors of cars—have been accidentally exposed. Motorola confirmed the exposures, telling WIRED it was working with its customers to close the access.

Over the last decade, thousands of ALPR cameras have appeared in towns and cities across the US. The cameras, which are manufactured by companies such as Motorola and Flock Safety, automatically take pictures when they detect a car passing by. The cameras and databases of collected data are frequently used by police to search for suspects. ALPR cameras can be placed along roads, on the dashboards of cop cars, and even in trucks. These cameras capture billions of photos of cars—including occasionally bumper stickers, lawn signs, and T-shirts.

“Every one of them that I found exposed was in a fixed location over some roadway,” Brown, who runs cybersecurity company Brown Fine Security, tells WIRED. The exposed video feeds each cover a single lane of traffic, with cars driving through the camera’s view. In some streams, snow is falling. Brown found two streams for each exposed camera system, one in color and another in infrared.

Misconfigured license plate readers are leaking data and video in real time Read More »

bye-bye-windows-gaming?-steamos-officially-expands-past-the-steam-deck.

Bye-bye Windows gaming? SteamOS officially expands past the Steam Deck.

Almost exactly a year ago, we were publicly yearning for the day when more portable gaming PC makers could ditch Windows in favor of SteamOS (without having to resort to touchy unofficial workarounds). Now, that day has finally come, with Lenovo announcing the upcoming Legion Go S as the first non-Valve handheld to come with an officially licensed copy of SteamOS preinstalled. And Valve promises that it will soon ship a beta version of SteamOS for users to “download and test themselves.”

As Lenovo’s slightly downsized followup to 2023’s massive Legion Go, the Legion Go S won’t feature the detachable controllers of its predecessor. But the new PC gaming handheld will come in two distinct versions, one with the now-standard Windows 11 installation and another edition that’s the first to sport the (recently leaked) “Powered by SteamOS” branding.

The lack of a Windows license seems to contribute to a lower starting cost for the “Powered by SteamOS” edition of the Legion Go S, which will start at $500 when it’s made available in May. Lenovo says the Windows edition of the device—available starting this month—will start at $730, with “additional configurations” available in May starting as low as $600.

The Windows version of the Legion Go S will come with a different color and a higher price. Credit: Lenovo

Both the Windows and SteamOS versions of the Legion Go S will weigh in at 1.61 lbs with an 8-inch 1200p 120 Hz LCD screen, up to 32GB of RAM, and either AMD’s new Ryzen Z2 Go chipset or an older Z1 core.

Watch out, Windows?

Valve said in a blog post on Tuesday that the Legion Go S will sport the same version of SteamOS currently found on the Steam Deck. The company’s work getting SteamOS onto the Legion Go S will also “improve compatibility with other handhelds,” Valve said, and the company “is working on SteamOS support for more devices in the future.”

Bye-bye Windows gaming? SteamOS officially expands past the Steam Deck. Read More »

science-paper-piracy-site-sci-hub-shares-lots-of-retracted-papers

Science paper piracy site Sci-Hub shares lots of retracted papers

Most scientific literature is published in for-profit journals that rely on subscriptions and paywalls to turn a profit. But that trend has been shifting as various governments and funding agencies are requiring that the science they fund be published in open-access journals. The transition is happening gradually, though, and a lot of the historical literature remains locked behind paywalls.

These paywalls can pose a problem for researchers who aren’t at well-funded universities, including many in the Global South, which may not be able to access the research they need to understand in order to pursue their own studies. One solution has been Sci-Hub, a site where people can upload PDFs of published papers so they can be shared with anyone who can access the site. Despite losses in publishing industry lawsuits and attempts to block access, Sci-Hub continues to serve up research papers that would otherwise be protected by paywalls.

But what it’s serving up may not always be the latest and greatest. Generally, when a paper is retracted for being invalid, publishers issue an updated version of its PDF with clear indications that the research it contains should no longer be considered valid. Unfortunately, it appears that once Sci-Hub has a copy of a paper, it doesn’t necessarily have the ability to ensure it’s kept up to date. Based on a scan of its content done by researchers from India, about 85 percent of the invalid papers they checked had no indication that the paper had been retracted.

Correcting the scientific record

Scientific results go wrong for all sorts of reasons, from outright fraud to honest mistakes. If the problems don’t invalidate the overall conclusions of a paper, it’s possible to update the paper with a correction. If the problems are systemic enough to undermine the results, however, the paper is typically retracted—in essence, it should be treated as if it were never published in the first place.

It doesn’t always work out that way, however. Maybe people ignore the notifications that something has been retracted, or maybe they downloaded a copy of the paper before it got retracted and never saw the notifications at all, but citations to retracted papers regularly appear in the scientific record. Over the long term, this can distort our big-picture view of science, leading to wasted effort and misallocated resources.

Science paper piracy site Sci-Hub shares lots of retracted papers Read More »

ants-vs.-humans:-solving-the-piano-mover-puzzle

Ants vs. humans: Solving the piano-mover puzzle

Who is better at maneuvering a large load through a maze, ants or humans?

The piano-mover puzzle involves trying to transport an oddly shaped load across a constricted environment with various obstructions. It’s one of several variations on classic computational motion-planning problems, a key element in numerous robotics applications. But what would happen if you pitted human beings against ants in a competition to solve the piano-mover puzzle?

According to a paper published in the Proceedings of the National Academy of Sciences, humans have superior cognitive abilities and, hence, would be expected to outperform the ants. However, depriving people of verbal or nonverbal communication can level the playing field, with ants performing better in some trials. And while ants improved their cognitive performance when acting collectively as a group, the same did not hold true for humans.

Co-author Ofer Feinerman of the Weizmann Institute of Science and colleagues saw an opportunity to use the piano-mover puzzle to shed light on group decision-making, as well as the question of whether it is better to cooperate as a group or maintain individuality. “It allows us to compare problem-solving skills and performances across group sizes and down to a single individual and also enables a comparison of collective problem-solving across species,” the authors wrote.

They decided to compare the performances of ants and humans because both species are social and can cooperate while transporting loads larger than themselves. In essence, “people stand out for individual cognitive abilities while ants excel in cooperation,” the authors wrote.

Feinerman et al. used crazy ants (Paratrechina longicornis) for their experiments, along with the human volunteers. They designed a physical version of the piano-movers puzzle involving a large t-shaped load that had to be maneuvered across a rectangular area divided into three chambers, connected via narrow slits. The load started in the first chamber on the left, and the ant and human subjects had to figure out how to transport it through the second chamber and into the third.

Ants vs. humans: Solving the piano-mover puzzle Read More »

openai-#10:-reflections

OpenAI #10: Reflections

This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There’s a bunch of good and interesting answers in the interview about past events that I won’t mention or have to condense a lot here, such as his going over his calendar and all the meetings he constantly has, so consider reading the whole thing.

  1. The Battle of the Board.

  2. Altman Lashes Out.

  3. Inconsistently Candid.

  4. On Various People Leaving OpenAI.

  5. The Pitch.

  6. Great Expectations.

  7. Accusations of Fake News.

  8. OpenAI’s Vision Would Pose an Existential Risk To Humanity.

Here is what he says about the Battle of the Board in Reflections:

Sam Altman: A little over a year ago, on one particular Friday, the main thing that had gone wrong that day was that I got fired by surprise on a video call, and then right after we hung up the board published a blog post about it. I was in a hotel room in Las Vegas. It felt, to a degree that is almost impossible to explain, like a dream gone wrong.

Getting fired in public with no warning kicked off a really crazy few hours, and a pretty crazy few days. The “fog of war” was the strangest part. None of us were able to get satisfactory answers about what had happened, or why.

The whole event was, in my opinion, a big failure of governance by well-meaning people, myself included. Looking back, I certainly wish I had done things differently, and I’d like to believe I’m a better, more thoughtful leader today than I was a year ago.

I also learned the importance of a board with diverse viewpoints and broad experience in managing a complex set of challenges. Good governance requires a lot of trust and credibility. I appreciate the way so many people worked together to build a stronger system of governance for OpenAI that enables us to pursue our mission of ensuring that AGI benefits all of humanity.

My biggest takeaway is how much I have to be thankful for and how many people I owe gratitude towards: to everyone who works at OpenAI and has chosen to spend their time and effort going after this dream, to friends who helped us get through the crisis moments, to our partners and customers who supported us and entrusted us to enable their success, and to the people in my life who showed me how much they cared.

We all got back to the work in a more cohesive and positive way and I’m very proud of our focus since then. We have done what is easily some of our best research ever. We grew from about 100 million weekly active users to more than 300 million. Most of all, we have continued to put technology out into the world that people genuinely seem to love and that solves real problems.

This is about as good a statement as one could expect Altman to make. I strongly disagree that this resulted in a stronger system of governance for OpenAI. And I think he has a much better idea of what happened than he is letting on, and there are several points where ‘I see what you did there.’ But mostly I do appreciate what this statement aims to do.

From his interview, we also get this excellent statement:

Sam Altman: I think the previous board was genuine in their level of conviction and concern about AGI going wrong. There’s a thing that one of those board members said to the team here during that weekend that people kind of make fun of [Helen Toner] for, which is it could be consistent with the mission of the nonprofit board to destroy the company.

And I view that—that’s what courage of convictions actually looks like. I think she meant that genuinely.

And although I totally disagree with all specific conclusions and actions, I respect conviction like that, and I think the old board was acting out of misplaced but genuine conviction in what they believed was right.

And maybe also that, like, AGI was right around the corner and we weren’t being responsible with it. So I can hold respect for that while totally disagreeing with the details of everything else.

And this, which I can’t argue with:

Sam Altman: Usually when you have these ideas, they don’t quite work, and there were clearly some things about our original conception that didn’t work at all. Structure. All of that.

It is fair to say that ultimately, the structure as a non-profit did not work for Altman.

This also seems like the best place to highlight his excellent response about Elon Musk:

Oh, I think [Elon will] do all sorts of bad s—. I think he’ll continue to sue us and drop lawsuits and make new lawsuits and whatever else. He hasn’t challenged me to a cage match yet, but I don’t think he was that serious about it with Zuck, either, it turned out.

As you pointed out, he says a lot of things, starts them, undoes them, gets sued, sues, gets in fights with the government, gets investigated by the government.

That’s just Elon being Elon.

The question was, will he abuse his political power of being co-president, or whatever he calls himself now, to mess with a business competitor? I don’t think he’ll do that. I genuinely don’t. May turn out to be proven wrong.

So far, so good.

Then we get Altman being less polite.

Sam Altman: Saturday morning, two of the board members called and wanted to talk about me coming back. I was initially just supermad and said no. And then I was like, “OK, fine.” I really care about [OpenAI]. But I was like, “Only if the whole board quits.” I wish I had taken a different tack than that, but at the time it felt like a just thing to ask for.

Then we really disagreed over the board for a while. We were trying to negotiate a new board. They had some ideas I thought were ridiculous. I had some ideas they thought were ridiculous. But I thought we were [generally] agreeing.

And then—when I got the most mad in the whole period—it went on all day Sunday. Saturday into Sunday they kept saying, “It’s almost done. We’re just waiting for legal advice, but board consents are being drafted.” I kept saying, “I’m keeping the company together. You have all the power. Are you sure you’re telling me the truth here?” “Yeah, you’re coming back. You’re coming back.”

And then Sunday night they shock-announce that Emmett Shear was the new CEO. And I was like, “All right, now I’m f—ing really done,” because that was real deception. Monday morning rolls around, all these people threaten to quit, and then they’re like, “OK, we need to reverse course here.”

This is where his statements fail to line up with my understanding of what happened. Altman gave the board repeated in-public drop dead deadlines, including demanding that the entire board resign as he noted above, with very clear public messaging that failure to do this would destroy OpenAI.

Maybe if Altman had quickly turned around and blamed the public actions on his allies acting on their own, I would have believed that, but he isn’t even trying that line out now. He’s pretending that none of that was part of the story.

In response to those ultimatums, facing imminent collapse and unable to meet Altman’s blow-it-all-up deadlines and conditions, the board tapped Emmett Shear as a temporary CEO, who was very willing to facilitate Altman’s return and then stepped aside only days later.

That wasn’t deception, and Altman damn well knows that now, even if he was somehow blinded to what was happening at the time. The board very much still had the intention of bringing Altman back. Altman and his allies responded by threatening to blow up the company within days.

Then the interviewer asks what the board meant by ‘consistently candid.’ He talks about the ChatGPT launch which I mention a bit later on – where I do think he failed to properly inform the board but I think that was more one time of many than a particular problem – and then Altman says, bold is mine:

And I think there’s been an unfair characterization of a number of things like [how I told the board about the ChatGPT launch]. The one thing I’m more aware of is, I had had issues with various board members on what I viewed as conflicts or otherwise problematic behavior, and they were not happy with the way that I tried to get them off the board. Lesson learned on that.

There it is. They were ‘not happy’ with the way that he tried to get them off the board. I thank him for the candor that he was indeed trying to remove not only Helen Toner but various board members.

I do think this was primary. Why were they not happy, Altman? What did you do?

From what we know, it seems likely he lied to board members about each other in order to engineer a board majority.

Altman also outright says this:

I don’t think I was doing things that were sneaky. I think the most I would say is, in the spirit of moving really fast, the board did not understand the full picture.

That seems very clearly false. By all accounts, however much farther than sneaky Altman did or did not go, Altman was absolutely being sneaky.

He also later mentions the issues with the OpenAI startup fund, where his explanation seems at best rather disingenuous and dare I say it sneaky.

Here is how he attempts to address all the high profile departures:

Sam Altman (in Reflections): Some of the twists have been joyful; some have been hard. It’s been fun watching a steady stream of research miracles occur, and a lot of naysayers have become true believers. We’ve also seen some colleagues split off and become competitors. Teams tend to turn over as they scale, and OpenAI scales really fast.

I think some of this is unavoidable—startups usually see a lot of turnover at each new major level of scale, and at OpenAI numbers go up by orders of magnitude every few months.

The last two years have been like a decade at a normal company. When any company grows and evolves so fast, interests naturally diverge. And when any company in an important industry is in the lead, lots of people attack it for all sorts of reasons, especially when they are trying to compete with it.

I agree that some of it was unavoidable and inevitable. I do not think this addresses people’s main concerns, especially that they have lost so many of their highest level people, especially over the last year, including almost all of their high-level safety researchers all the way up to the cofounder level.

It is related to this claim, which I found a bit disingenuous:

Sam Altman: The pitch was just come build AGI. And the reason it worked—I cannot overstate how heretical it was at the time to say we’re gonna build AGI. So you filter out 99% of the world, and you only get the really talented, original thinkers. And that’s really powerful.

I agree that was a powerful pitch.

But we know from the leaked documents, and we know from many people’s reports, that this was not the entire pitch. The pitch for OpenAI was that AGI would be built safely, and that Google DeepMind could not to be trusted to be the first to do so. The pitch was that they would ensure that AGI benefited the world, that it was a non-profit, that it cared deeply about safety.

Many of those who left have said that these elements were key reasons they chose to join OpenAI. Altman is now trying to rewrite history to ignore these promises, and pretend that the vision was ‘build AGI/ASI’ rather than ‘build AGI/ASI safety and ensure it benefits humanity.’

I also found his ‘I expected ChatGPT to go well right from the start’ interesting. If Altman did expect it do well and in his words he ‘forced’ people to ship it when they didn’t want to because they thought it wasn’t ready, that provides different color than the traditional story.

It also plays into this from the interview:

There was this whole thing of, like, “Sam didn’t even tell the board that he was gonna launch ChatGPT.” And I have a different memory and interpretation of that. But what is true is I definitely was not like, “We’re gonna launch this thing that is gonna be a huge deal.”

It sounds like Altman is claiming he did think it was going to be a big deal, although of course no one expected the rocket to the moon that we got.

Then he says how much of a mess the Battle of the Board left in its wake:

I totally was [traumatized]. The hardest part of it was not going through it, because you can do a lot on a four-day adrenaline rush. And it was very heartwarming to see the company and kind of my broader community support me.

But then very quickly it was over, and I had a complete mess on my hands. And it got worse every day. It was like another government investigation, another old board member leaking fake news to the press.

And all those people that I feel like really f—ed me and f—ed the company were gone, and now I had to clean up their mess. It was about this time of year [December], actually, so it gets dark at like 4: 45 p.m., and it’s cold and rainy, and I would be walking through my house alone at night just, like, f—ing depressed and tired.

And it felt so unfair. It was just a crazy thing to have to go through and then have no time to recover, because the house was on fire.

Some combination of Altman and his allies clearly worked hard to successfully spread fake news during the crisis, placing it in multiple major media outlets, in order to influence the narrative and the ultimate resolution. A lot of this involved publicly threatening (and bluffing) that if they did not get unconditional surrender within deadlines on the order of a day, they would end OpenAI.

Meanwhile, the Board made the fatal mistake of not telling its side of the story, out of some combination of legal and other fears and concerns, and not wanting to ultimately destroy OpenAI. Then, at Altman’s insistence, those involved left. And then Altman swept the entire ‘investigation’ under the rug permanently.

Altman then has the audacity now to turn around and complain about what little the board said and leaked afterwards, calling it ‘fake news’ without details, and saying how they fed him and the company and were ‘gone and now he had to clean up the mess.’

What does he actually say about safety and existential risk in Reflections? Only this:

We continue to believe that the best way to make an AI system safe is by iteratively and gradually releasing it into the world, giving society time to adapt and co-evolve with the technology, learning from experience, and continuing to make the technology safer.

We believe in the importance of being world leaders on safety and alignment research, and in guiding that research with feedback from real world applications.

Then in the interview, he gets asked point blank:

Q: Has your sense of what the dangers actually might be evolved?

A: I still have roughly the same short-, medium- and long-term risk profiles. I still expect that on cybersecurity and bio stuff, we’ll see serious, or potentially serious, short-term issues that need mitigation.

Long term, as you think about a system that really just has incredible capability, there’s risks that are probably hard to precisely imagine and model. But I can simultaneously think that these risks are real and also believe that the only way to appropriately address them is to ship product and learn.

I know that anyone who previously had a self-identified ‘Eliezer Yudkowsky fan fiction Twitter account’ knows better than to think all you can say about long term risks is ‘ship products and learn.’

I don’t see the actions to back up even these words. Nor would I expect, if they truly believed this, for this short generic statement to be the only mention of the subject.

How can you reflect on the past nine years, say you have a direct path to AGI (as he will say later on), get asked point blank about the risks, and say only this about the risks involved? The silence is deafening.

I also flat out do not think you can solve the problems exclusively through this approach. The iterative development strategy has its safety and adaptation advantages. It also has disadvantages, driving the race forward and making too many people not notice what is happening in front of them via a ‘boiling the frog’ issue. On net, my guess is it has been net good for safety versus not doing it, at least up until this point.

That doesn’t mean you can solve the problem of alignment of superintelligent systems primarily by reacting to problems you observe in present systems. I do not believe the problems we are about to face will work that way.

And even if we are in such a fortunate world that they do work that way? We have not been given reason to trust that OpenAI is serious about it.

Getting back to the whole ‘vision thing’:

Our vision won’t change; our tactics will continue to evolve.

I suppose if ‘vision’ is simply ‘build AGI/ASI’ and everything else is tactics, then sure?

I do not think that was the entirety of the original vision, although it was part of it.

That is indeed the entire vision now. And they’re claiming they know how to do it.

We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

We are beginning to turn our aim beyond that, to superintelligence in the true sense of the word. We love our current products, but we are here for the glorious future. With superintelligence, we can do anything else. Superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we are capable of doing on our own, and in turn massively increase abundance and prosperity.

This sounds like science fiction right now, and somewhat crazy to even talk about it. That’s alright—we’ve been there before and we’re OK with being there again. We’re pretty confident that in the next few years, everyone will see what we see, and that the need to act with great care, while still maximizing broad benefit and empowerment, is so important. Given the possibilities of our work, OpenAI cannot be a normal company.

Those who have ears, listen. This is what they plan on doing.

They are predicting AI workers ‘joining the workforce’ in earnest this year, with full AGI not far in the future, followed shortly by ASI. They think ‘4’ is conservative.

What are the rest of us going to do, or not do, about this?

I can’t help but notice Altman is trying to turn OpenAI into a normal company.

Why should we trust that structure in the very situation Altman himself describes? If the basic thesis is that we should put our trust in Altman personally, why does he think he has earned that trust?

Discussion about this post

OpenAI #10: Reflections Read More »

controversial-fluoride-analysis-published-after-years-of-failed-reviews

Controversial fluoride analysis published after years of failed reviews


70 percent of studies included in the meta-analysis had a high risk of bias.

Federal toxicology researchers on Monday finally published a long-controversial analysis that claims to find a link between high levels of fluoride exposure and slightly lower IQs in children living in areas outside the US, mostly in China and India. As expected, it immediately drew yet more controversy.

The study, published in JAMA Pediatrics, is a meta-analysis, a type of study that combines data from many different studies—in this case, mostly low-quality studies—to come up with new results. None of the data included in the analysis is from the US, and the fluoride levels examined are at least double the level recommended for municipal water in the US. In some places in the world, fluoride is naturally present in water, such as parts of China, and can reach concentrations several-fold higher than fluoridated water in the US.

The authors of the analysis are researchers at the National Toxicology Program at the National Institute of Environmental Health Sciences. For context, this is the same federal research program that published a dubious analysis in 2016 suggesting that cell phones cause cancer in rats. The study underwent a suspicious peer-review process and contained questionable methods and statistics.

The new fluoride analysis shares similarities. NTP researchers have been working on the fluoride study since 2015 and submitted two drafts for peer review to an independent panel of experts at the National Academies of Sciences, Engineering, and Medicine in 2020 and 2021. The study failed its review both times. The National Academies’ reviews found fault with the methods and statistical rigor of the analysis. Specifically, the reviews noted potential bias in the selection of the studies included in the analysis, inconsistent application of risk-of-bias criteria, lack of data transparency, insufficient evaluations of potential confounding, and flawed measures of neurodevelopmental outcomes, among other problems.

After the failing reviews, the NTP selected its own reviewers and self-published the study as a monograph in August.

High risk of bias

The related analysis published Monday looked at data from 74 human studies, 45 of which were conducted in China and 12 in India. Of the 74, 52 were rated as having a high risk of bias, meaning they had designs, study methods, or statistical approaches that could skew the results.

The study’s primary meta-analysis only included 59 of the studies: 47 with a high risk of bias and 12 with a low risk. This analysis looked at standardized mean differences in children’s IQ between higher and lower fluoride exposure groups. Of the 59 studies, 41 were from China.

Among the 47 studies with a high risk of bias, the pooled difference in mean IQ scores between the higher-exposure groups and lower-exposure groups was -0.52, suggesting that higher fluoride exposure lowered IQs. But, among the 12 studies at low risk for bias, the difference was slight overall, only -0.19. And of those 12 studies, eight found no link between fluoride exposure and IQ at all.

Among 31 studies that reported fluoride levels in water, the NTP authors looked at possible IQ associations at three fluoride-level cutoffs: less than 4 mg/L, less than 2 mg/L, and less than 1.5 mg/L. Among all 31 studies, the researchers found that fluoride exposure levels of less than 4 mg/L and less than 2 mg/L were linked to statistically significant decreases in IQ. However, there was no statistically significant link at 1.5 mg/L. For context, 1.5 mg/L is a little over twice the level of fluoride recommended by the US Environmental Protection Agency for US community water, which is 0.7 mg/L. When the NTP authors looked at just the studies that had a low risk of bias—seven studies—they saw the same lack of association with the 1.5 mg/L cutoff.

The NTP authors also looked at IQ associations in 20 studies that reported urine fluoride levels and again split the analysis using the same fluoride cutoffs as before. While there did appear to be a link with lower IQ at the highest fluoride level, the two lower fluoride levels had borderline statistical significance. Ten of the 20 studies were assessed as having a low risk of bias, and for just those 10, the results were similar to the larger group.

Criticism

The inclusion of urinary fluoride measurements is sure to spark criticism. For years, experts have noted that these measurements are not standardized, can vary by day and time, and are not reflective of a person’s overall fluoride exposure.

In an editorial published alongside the NTP study today, Steven Levy, a public health dentist at the University of Iowa, blasted the new analysis, including the urinary sample measurements.

“There is scientific consensus that the urinary sample collection approaches used in almost all included studies (ie, spot urinary fluoride or a few 24-hour samples, many not adjusted for dilution) are not valid measures of individuals’ long-term fluoride exposure, since fluoride has a short half-life and there is substantial variation within days and from day to day,” Levy wrote.

Overall, Levy reiterated much of the same concerns from the National Academies’ reviews, noting the study’s lack of transparency, the reliance on highly biased studies, questionable statistics, and questionable exclusion of newer, higher-quality studies, which have found no link between water fluoridation and children’s IQ. For instance, one exclusion was a 2023 study out of Australia that found “Exposure to fluoridated water during the first 5 [years] of life was not associated with altered measures of child emotional and behavioral development and executive functioning.” A 2022 study out of Spain similarly found no risk of prenatal exposure.

“Taking these many important concerns together, readers are advised to be very cautious in drawing conclusions about possible associations of fluoride exposures with lower IQ,” Levy concluded. “This is especially true for lower water fluoride levels.”

Another controversial study

But, the debate on water fluoridation is unlikely to recede anytime soon. In a second editorial published alongside the NTP study, other researchers praised the analysis, calling for health organizations and regulators to reassess fluoridation.

“The absence of a statistically significant association of water fluoride less than 1.5 mg/L and children’s IQ scores in the dose-response meta-analysis does not exonerate fluoride as a potential risk for lower IQ scores at levels found in fluoridated communities,” the authors argue, noting there are additional sources of fluoride, such as toothpaste and foods.

The EPA estimates that 40 to 70 percent of people’s fluoride exposure comes from water.

Two of the three authors of the second editorial—Christine Till and Bruce Lanphear—were authors of a highly controversial 2019 study out of Canada suggesting that fluoride intake during pregnancy could reduce children’s IQ. The authors even suggested that pregnant people should reduce their fluoride intake. But, the study, also published in JAMA Pediatrics, only found a link between maternal fluoride levels and IQ in male children. There was no association in females.

The study drew heavy backlash, with blistering responses published in JAMA Pediatrics. In one response, UK researchers essentially accused Till and colleagues of a statistical fishing expedition to find a link.

“[T]here was no significant IQ difference between children from fluoridated and nonfluoridated communities and no overall association with maternal urinary fluoride (MUFSG). The authors did not mention this and instead emphasized the significant sex interaction, where the association appeared for boys but not girls. No theoretical rationale for this test was provided; in the absence of a study preregistration, we cannot know whether it was planned a priori. If not, the false-positive probability increases because there are many potential subgroups that might show the result by chance.”

Other researchers criticized the study’s statistics, lack of data transparency, the use of maternal urine sampling, and the test they used to assess the IQ of children ages 3 and 4.

Photo of Beth Mole

Beth is Ars Technica’s Senior Health Reporter. Beth has a Ph.D. in microbiology from the University of North Carolina at Chapel Hill and attended the Science Communication program at the University of California, Santa Cruz. She specializes in covering infectious diseases, public health, and microbes.

Controversial fluoride analysis published after years of failed reviews Read More »

disney-makes-antitrust-problem-go-away-by-buying-majority-stake-in-fubo

Disney makes antitrust problem go away by buying majority stake in Fubo

Fubo’s about-face

Fubo’s merger with Disney represents a shocking about-face for the sports-streaming provider, which previously had raised alarms (citing Citi research) about Disney’s ownership of 54 percent of the US sports rights market—ESPN (26.8 percent), Fox (17.3 percent), and WBD (9.9 percent). Fubo successfully got a preliminary injunction against Venu in August, and a trial was scheduled for October 2025.

Fubo CEO David Gandler said in February that Disney, Fox, and WBD “are erecting insurmountable barriers that will effectively block any new competitors.

“Each of these companies has consistently engaged in anticompetitive practices that aim to monopolize the market, stifle any form of competition, create higher pricing for subscribers, and cheat consumers from deserved choice,” Gandler also said at the time.

Now, set to be a Disney company, Fubo is singing a new tune, with its announcement claiming that the merger “will enhance consumer choice by making available a broad set of programming offerings.”

In a statement today, Gandler added that the merger will allow Fubo to “provide consumers with greater choice and flexibility” and “to scale effectively,” while adding that the deal “strengthens Fubo’s balance sheet” and sets Fubo up for “positive cash flow.”

Ars Technica reached out to Fubo about its previously publicized antitrust and anticompetitive concerns, whether or not those concerns had been addressed, and new concerns that it has settled its lawsuit in favor of its own business needs rather than over a resolution of customer choice problems. Jennifer Press, Fubo SVP of communications, responded to our questions with a statement, saying in part:

We filed an antitrust suit against the Venu Sports partners last year because that product was intended to be exclusive. As its partners announced last year, consumers would only have access to the Venu content package from Venu, which would limit choice and competitive pricing.

The definitive agreement that Fubo signed with Disney today will actually bring more choice to the market. As part of the deal, Fubo extended carriage agreements with Disney and also Fox, enabling Fubo to create a new Sports and Broadcast service and other genre-based content packages. Additionally, as the antitrust litigation has been settled, the Venu Sports partners can choose to launch that product if they wish. The launch of these bundles will enhance consumer choice by making available a broad set of programming offerings.

“… a total deception”

Some remain skeptical about Disney buying out a company that was suing it over antitrust concerns.

Disney makes antitrust problem go away by buying majority stake in Fubo Read More »

vw-will-offer-“highly-competitive”-leases-on-id.4-as-sales-restart

VW will offer “highly competitive” leases on ID.4 as sales restart

Last September, faulty door handle hardware caused Volkswagen to take the rather drastic steps of suspending sales and production of the electric crossover, as well as recalling almost 100,000 customer cars. Now, it says it has new parts that will allow it to fix existing cars, lift the stop-sale order, and soon, resume production at its factory in Chattanooga, Tennessee.

The ID.4, like many new EVs, features flush door handles in service of the all-important effort of drag reduction. Instead of conventional mechanical handles that interrupt the laminar air flow down the side of the car, VW instead went with an electromechanical solution.

Unfortunately, the door handle assemblies weren’t sufficiently waterproofed, allowing the electronics inside to corrode. Consequently, early last year VW started getting complaints of ID.4s with doors that would intermittently open while driving, with reporting almost 300 warranty claims by September, when it pulled the car from sale, issued the recall, and stopped the production line.

That line will restart “in the coming weeks,” VW says, and now that there are new and improved door handles available, dealers will now be able to complete the recall. That also means that any ID.4s in inventory can be fixed and then sold. To sweeten the deal, the automaker says that it will offer some “highly competitive lease offers,” as it hopes to send its clean crossover back up the sales charts.

VW will offer “highly competitive” leases on ID.4 as sales restart Read More »

the-2025-honda-civic-hybrid:-a-refreshing-alternative-to-a-crossover

The 2025 Honda Civic Hybrid: A refreshing alternative to a crossover

The Honda Civic Hybrid powertrain.

This is the hybrid powertrain. Credit: Honda

And that can be tempting. The car we tested is much more pedestrian than the Type-R, but from the driver’s seat, it wants to eat corners almost as ravenously as that track-tuned model. That surprised me because the Type-R uses a limited slip differential, and these more sedate models do not. This is indeed a car that will reward you for hustling it down a twisty road should the desire arise.

The paddles on the steering wheel increase or decrease the amount of regenerative braking you experience when you lift the throttle rather than changing (non-existent) gears. Turned off, the Civic Hybrid will coast down the road with aplomb; in its strongest setting, it’s not quite one-pedal driving.

The driving position is now rather low-slung for a normal passenger car, no doubt a feeling exacerbated by a driving diet too-heavy in crossovers and SUVs, but you don’t feel quite as close to the ground as you might in, say, an MX-5. Visibility is good, and the ergonomics/HMI deserves praise for the fact that most of the controls are physical buttons. The air vents even have little machined metal stalks to aim them.

It’s a well-thought out interior. Honda

It’s also easy to live with. The hatchback means loading cargo is no hassle, although at this price point, you have to close your own tailgate; there is no motor assistance. The front and rear are spacious enough, considering the class of car, and there are plenty of USB-C ports for people to use to recharge their stuff. The heated front seats heated up very quickly on cold days, although a heated steering wheel would have been a nice addition.

The Sport Touring Hybrid we tested also comes with a 9-inch Android Automotive-based infotainment system that includes a full suite of Google’s automotive services, as well as Apple CarPlay and Android Auto. And all Civics come with Honda Sensing, the company’s suite of advanced driver assistance systems. Unusually for a Honda, we didn’t even notice that many false positive alerts for the forward collision warning.

In all, I find very little reason not to recommend the Civic Hatchback Hybrid to people looking for a fun and efficient car that’s not too huge, too expensive, or too dependent on touchscreens.

The 2025 Honda Civic Hybrid: A refreshing alternative to a crossover Read More »