Author name: Mike M.

otc-nasal-spray-seemed-to-cut-covid-infections-by-67%-in-mid-sized-trial

OTC nasal spray seemed to cut COVID infections by 67% in mid-sized trial

COVID context

Like all trials, there are limitations. As mentioned, the number of infections here is small—the impressive efficacy numbers could potentially vanish in a larger trial with more infections. And, while the trial had a high-quality design, it was undertaken in just one location in Germany and mostly involved healthy white women between the ages of 20 and 46, so the findings are not generalizable. The study was also funded by a pharmaceutical company that makes an azelastine nasal spray (though not the one that is sold over the counter in the US).

Still, with the previous studies, the trial offers some hope that this accessible nasal spray could be used as a viral prophylactic for respiratory seasons in the future. And the results land at a time when access to COVID-19 vaccines—which have firmly proven to be safe and highly effective—has been severely restricted in the US by health secretary and anti-vaccine activist Robert F. Kennedy Jr.

As it stands now, it appears that only people ages 65 and over, and those at higher risk of COVID-19 will have access to the shots this year, though some aspects of that access are murky, including how people will prove they’re at high risk. For healthy children, teens, and adults under 65, there may be no access or extremely limited access. That includes groups that medical experts recommend get vaccinated, namely healthy pregnant people and children ages 6 months to 23 months, both of which are considered at high risk from COVID-19 by medical experts, but not federal guidance under Kennedy. Experts also recommend access for healthy people who have contact with vulnerable people, such as cancer doctors, people who live with immunocompromised family members, and people who work in nursing homes.

With limited vaccine access and the normal slew of respiratory viruses on the horizon, a simple nasal spray is an appealing addition to the defenses. The main side effects are fairly minor, including bitter taste in the mouth, nosebleeds, and tiredness.

OTC nasal spray seemed to cut COVID infections by 67% in mid-sized trial Read More »

delete,-delete,-delete:-how-fcc-republicans-are-killing-rules-faster-than-ever

Delete, Delete, Delete: How FCC Republicans are killing rules faster than ever


FCC speeds up rule-cutting, giving public as little as 10 days to file objections.

FCC Chairman Brendan Carr testifies before the House Appropriations Subcommittee on Financial Services and General Government on May 21, 2025 in Washington, DC. Credit: Getty Images | John McDonnell

The Federal Communications Commission’s Republican chairman is eliminating regulations at breakneck speed by using a process that cuts dozens of rules at a time while giving the public only 10 or 20 days to review each proposal and submit objections.

Chairman Brendan Carr started his “Delete, Delete, Delete” rule-cutting initiative in March and later announced he’d be using the Direct Final Rule (DFR) mechanism to eliminate regulations without a full public-comment period. Direct Final Rule is just one of several mechanisms the FCC is using in the Delete, Delete, Delete initiative. But despite the seeming obscurity of regulations deleted under Direct Final Rule so far, many observers are concerned that the process could easily be abused to eliminate more significant rules that protect consumers.

On July 24, the FCC removed what it called “11 outdated and useless rule provisions” related to telegraphs, rabbit-ear broadcast receivers, and phone booths. The FCC said the 11 provisions consist of “39 regulatory burdens, 7,194 words, and 16 pages.”

The FCC eliminated these rules without the “prior notice and comment” period typically used to comply with the US Administrative Procedure Act (APA), with the FCC finding that it had “good cause” to skip that step. The FCC said it would allow comment for 10 days and that rule eliminations would take effect automatically after the 10-day period unless the FCC concluded that it received “significant adverse comments.”

On August 7, the FCC again used Direct Final Rule to eliminate 98 rules and requirements imposed on broadcasters. This time, the FCC allowed 20 days for comment. But it maintained its stance that the rules would be deleted automatically at the end of the period if no “significant” comments were received.

By contrast, FCC rulemakings usually allow 30 days for initial comments and another 15 days for reply comments. The FCC then considers the comments, responds to the major issues raised, and drafts a final proposal that is put up for a commission vote. This process, which takes months and gives both the public and commissioners more opportunity to consider the changes, can apply both to the creation of new rules and the elimination of existing ones.

FCC’s lone Democrat warns of “Trojan horse”

Telecom companies want the FCC to eliminate rules quickly. As we’ve previously written, AT&T submitted comments to the Delete, Delete, Delete docket urging the agency to eliminate rules that can result in financial penalties “without the delay imposed by notice-and-comment proceeding.”

Carr’s use of Direct Final Rule has drawn criticism from advocacy groups, local governments that could be affected by rule changes, and the FCC’s only Democratic commissioner. Anna Gomez, the lone FCC Democrat, told Ars in a phone interview that the rapid rule-cutting method “could be a Trojan horse because what we did, or what the commission did, is it adopted a process without public comment to eliminate any rule it finds to be outdated and, crucially, unwarranted. We don’t define what either of those terms mean, which therefore could lead to a situation that’s ripe for abuse.”

Gomez said she’d “be concerned if we eliminated rules that are meant to protect or inform consumers, or to promote competition, such as the broadband labels. This commission seems to have entirely lost its focus on consumers.”

Gomez told us that she doesn’t think a 10-day comment period is ever appropriate and that Carr seems to be trying “to meet some kind of arbitrary rule reduction quota.” If the rules being eliminated are truly obsolete, “then what’s the rush?” she asked. “If we don’t give sufficient time for public comment, then what happens when we make a mistake? What happens when we eliminate rules and it turns out, in fact, that these rules were important to keep? That’s why we give the public due process to comment on when we adopt rules and when we eliminate rules.”

Gomez hasn’t objected to the specific rules deleted under this process so far, but she spoke out against the method used by Carr both times Direct Final Rule method was used. “I told the chairman that I could support initiating a proceeding to look at how a Direct Final Rule process could be used going forward and including a Notice of Proposed Rulemaking proposing to eliminate the rules the draft order purports to eliminate today. That offer was declined,” she said in her dissenting statement in the July vote.

Gomez said that rules originally adopted under a notice-and-comment process should not be eliminated “without seeking public comment on appropriate processes and guardrails.” She added that the “order does not limit the Direct Final Rule process to elimination of rules that are objectively obsolete with a clear definition of how that will be applied, asserting instead authority to remove rules that are ‘outdated or unwarranted.'”

Local governments object

Carr argued that the Administrative Procedure Act “gives the commission the authority to fast-track the elimination of rules that inarguably fail to serve the public interest. Using this authority, the Commission can forgo the usual prior notice and public comment period before repealing the rules for these bygone regulations.”

Carr justified the deletions by saying that “outdated and unnecessary regulations from Washington often derail efforts to build high-speed networks and infrastructure across the country.” It’s not clear why the specific rule deletions were needed to accelerate broadband deployment, though. As Carr said, the FCC’s first use of Direct Finale Rule targeted regulations for “telegraph services, rabbit-ear broadcast receivers, and telephone booths—technologies that were considered outdated decades ago.”

Carr’s interpretation of the Administrative Procedure Act is wrong, said an August 6 filing submitted by local governments in Maryland, Massachusetts, the District of Columbia, Oregon, Virginia, California, New York, and Texas. Direct Final Rule “is intended for extremely simple, non-substantive decisions,” and the FCC process “is insufficient to ensure that future Commission decisions will fall within the good cause exception of the Administrative Procedure Act,” the filing said.

Local governments argued that “the new procedure is itself a substantive decision” and should be subject to a full notice-and-comment rulemaking. “The procedure adopted by the Commission makes it almost inevitable that the Commission will adopt rule changes outside of any APA exceptions,” the filing said.

The FCC could face court challenges. Gerard Lavery Lederer, a lawyer for the local government coalition, told Ars, “we fully anticipate that Chairman Carr and the FCC’s general counsel will take our concerns seriously.” But he also said local governments are worried about the FCC adopting industry proposals that “violate local government rights as preserved by Congress in the [Communications] Act” or that have “5th Amendment takings implications and/or 10th Amendment overreach issues.”

Is that tech really “obsolete”?

At least some rules targeted for deletion, like regulations on equipment used by radio and TV broadcast stations, may seem too arcane to care about. But a coalition of 22 public interest, civil rights, labor, and digital rights groups argued in a July 17 letter to Carr that some of the rule deletions could harm vulnerable populations and that the shortened comment period wasn’t long enough to determine the impact.

“For example, the Commission has targeted rules relating to calling cards and telephone booths in the draft Order as ‘obsolete,'” the letter said. “However, calling cards and pay phones remain important technologies for rural areas, immigrant communities, the unhoused, and others without reliable access to modern communications services. The impact on these communities is not clear and will not likely be clear in the short time provided for comment.”

The letter also said the FCC’s new procedure “would effectively eliminate any hope for timely judicial review of elimination of a rule on delegated authority.” Actions taken via delegated authority are handled by FCC bureaus without a vote of the commission.

So far, Carr has held commission votes for his Direct Final Rule actions rather than letting FCC bureau issue orders themselves. But in the July order, the FCC said its bureaus and offices have previously adopted or repealed rules without notice and comment and “reaffirm[ed] that all Bureaus and Offices may continue to take such actions in situations that are exempt from the APA’s notice-and-comment requirements.”

“This is about pushing boundaries”

The advocacy groups’ letter said that delegating authority to bureaus “makes judicial review virtually impossible, even though the order goes into effect immediately.” Parties impacted by actions made on delegated authority can’t go straight to the courts and must instead “file an application for review with the Commission as a prerequisite to any petition for judicial review,” the letter said. The groups argued that “a Chairman that does not wish to permit judicial review of elimination of a rule through DFR may order a bureau to remove the rule, then simply refuse to take action on the application for review.”

The letter was signed by Public Knowledge; Asian Americans Advancing Justice-AAJC; the Benton Institute for Broadband & Society; the Center for Digital Democracy; Common Sense Media; the Communications Workers of America; the Electronic Privacy Information Center; HTTP; LGBT Tech; the Media Access Project; MediaJustice; the Multicultural Media, Telecom and Internet Council; the National Action Network; NBJC; the National Council of Negro Women; the National Digital Inclusion Alliance; the National Hispanic Media Coalition; the National Urban League; New America’s Open Technology Institute (OTI); The Leadership Conference on Civil and Human Rights; the United Church of Christ Media Justice Ministry; and UnidosUS.

Harold Feld, senior VP of consumer advocacy group Public Knowledge, told Ars that the FCC “has a long record of thinking that things are obsolete and then discovering when they run an actual proceeding that there are people still using these things.” Feld is worried that the Direct Final Rule process could be used to eliminate consumer protections that apply to old phone networks when they are replaced by either fiber or wireless service.

“I certainly think that this is about pushing boundaries,” Feld said. When there’s a full notice-and-comment period, the FCC has to “actually address every argument made” before eliminating a rule. When the FCC provides less explanation of a decision, that “makes it much harder to challenge on appeal,” he said.

“Once you have this tool that lets you just get rid of rules without the need to do a proceeding, without the need to address the comments that are raised in that proceeding… it’s easy to see how this ramps up and how hard it is for people to stay constantly alert to look for an announcement where they will then only have 10 days to respond once it gets published,” he said.

What is a “significant” comment?

The FCC says its use of Direct Final Rule is guided by December 2024 recommendations from the Administrative Conference of the United States (ACUS), a government agency. But the FCC didn’t implement Direct Final Rule in the exact way recommended by the ACUS.

The ACUS said its guidance “encourages agencies to use direct final rulemaking, interim final rulemaking, and alternative methods of public engagement to ensure robust public participation even when they rely properly on the good cause exemption.” But the ACUS recommended taking public comment for at least 30 days, while the FCC has used 10- and 20-day periods.

The ACUS also said that agencies should only move ahead with rule deletions “if no significant adverse comments are received.” If such comments are received, the agency “can either withdraw the rule or publish a regular proposed rule that is open for public comment,” the recommendation said.

The FCC said that if it receives comments, “we will evaluate whether they are significant adverse comments that warrant further procedures before changing the rules.” The letter from 22 advocacy groups said it is worried about the leeway the FCC is giving itself in defining whether a comment is adverse and significant:

Although ACUS recommends that the agency revert to standard notice-and-comment rulemaking in the event of a single adverse comment, the draft Order requires multiple adverse comments—at which point the bureau/Commission will consider whether to shift to notice-and-comment rulemaking. If the bureau/Commission decides that adverse comments are not ‘substantive,’ it will explain its determination in a public notice that will not be filed in the Federal Register. The Commission states that it will be guided, but not bound, by the definition of ‘adverse comment’ recommended by ACUS.

Criticism from many corners

TechFreedom, a libertarian-leaning think tank, said it supports Carr’s goals in the “Delete, Delete, Delete” initiative but objected to the Direct Final Rule process. TechFreedom wrote in July comments that “deleting outdated regulations via a Direct Final Rule is unprecedented at the FCC.”

“No such process exists under current FCC rules,” the group said, urging the agency to seek public comment on the process. “If the Commission wishes to establish a new method by which it can eliminate existing regulations without undertaking a full rulemaking proceeding, it should open a docket specific to that subject and seek public comment,” the filing said.

TechFreedom said it is especially important for the FCC to “seek comment as to when the direct final rule procedures should be invoked… What is ‘routine,’ ‘insignificant,’ or ‘inconsequential’ and who is to decide—the Commissioners or the Bureau chiefs?”

The American Library Association and other groups wrote on August 14 that either 10 or 20 days is not long enough for public comment. Moreover, the groups said the two Direct Final Rule actions so far “offer minimal explanation for why the rules are being removed. There is only one sentence describing elimination of many rules and each rule removal is described in a footnote with a parenthetical about the change. It is not enough.”

The Utility Reform Network offered similar objections about the process and said that the FCC declaring technologies to be “obsolete” and markets “outdated” without a detailed explanation “suggests the Commission’s view that these rules are not minor or technical changes but support a larger deregulatory effort that should itself be subject to notice-and-comment rulemaking.”

The National Consumer Law Center and other groups said that “rushing regulatory changes as proposed is likely illegal in many instances, counterproductive, and bad policy,” and that “changes to regulations should be effectuated only through careful, thoughtful, and considered processes.”

We contacted Chairman Carr’s office and did not receive a response.

FCC delegated key decisions to bureaus

Gomez told Ars that Direct Final Rule could serve a purpose “with the right procedures and guardrails in place.” For example, she said the quick rule deletions can be justified for eliminating rules that have become obsolete because of a court reversal or Congressional actions.

“I would argue that we cannot, under the Administrative Procedure Act and the Constitution, simply eliminate rules because we’ve made a judgment call that they are unwarranted,” she said. “That does not meet the good cause exemption to notice-and-comment requirements.”

Gomez also opposes FCC bureaus making significant decisions without a commission vote, which effectively gives Carr more power over the agency’s operations. For example, T-Mobile’s purchase of US Cellular’s wireless operations and Verizon’s purchase of Frontier were approved by the FCC at the Bureau level.

In another instance cited by Gomez, the FCC Media Bureau waived a requirement for broadcast licensees to file their biennial ownership reports for 18 months. “The waiver order, which was done at the bureau level on delegated authority, simply said ‘we find good cause to waive these rules.’ There was no analysis whatsoever,” Gomez said.

Gomez also pointed out that the Carr FCC’s Wireline Competition Bureau delayed implementation of certain price caps on prison phone services. The various bureau-level decisions are a “stretching of the guardrails that we have internally for when things should be done on delegated authority, and when they should be voted by the commission,” Gomez said. “I’m concerned that [Direct Final Rule] is just the next iteration of the same issue.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Delete, Delete, Delete: How FCC Republicans are killing rules faster than ever Read More »

ftc-claims-gmail-filtering-republican-emails-threatens-“american-freedoms”

FTC claims Gmail filtering Republican emails threatens “American freedoms”

Ferguson said that “similar concerns have resulted in ongoing litigation against Google in other settings” but did not mention that a judge rejected the Republican claims.

“Hearing from candidates and receiving information and messages from political parties is key to exercising fundamental American freedoms and our First Amendment rights,” Ferguson’s letter said. “Moreover, consumers expect that they will have the opportunity to hear from their own chosen candidates or political party. A consumer’s right to hear from candidates or parties, including solicitations for donations, is not diminished because that consumer’s political preferences may run counter to your company’s or your employees’ political preferences.”

Google: Gmail users marked RNC emails as spam

The RNC’s appeal of its court loss is still pending, with the case proceeding toward oral arguments. Google told the appeals court in April that “the Complaint’s own allegations make it obvious that Gmail presented a portion of RNC emails as spam because they appeared to be spam…. The most obvious reason for RNC emails being flagged as spam is that Gmail users were too frequently marking them as such.”

Google also said that “the RNC’s own allegations confirm that Google was helping the RNC, not scheming against it… The RNC acknowledges, for example, that Google worked with the RNC ‘[f]or nearly a year.’ Those efforts even included Google employees traveling to the RNC’s office to ‘give a training’ on ‘Email Best Practices.’ Less than two months after that training, the last alleged instance of the inboxing issue occurred.”

While the RNC “belittles those efforts as ‘excuses’ to cover Google’s tracks… the district court rightly found that judicial experience and common sense counsel otherwise,” Google said. The Google brief quoted from the District Judge’s ruling that said, “the fact that Google engaged with the RNC for nearly a year and made suggestions that improved email performance is inconsistent with a lack of good faith.”

FTC claims Gmail filtering Republican emails threatens “American freedoms” Read More »

starship’s-heat-shield-appears-to-have-performed-quite-well-in-test

Starship’s heat shield appears to have performed quite well in test

One of the more curious aspects of the 10th flight of SpaceX’s Starship rocket on Tuesday was the striking orange discoloration of the second stage. This could be observed on video taken from a buoy near the landing site as the vehicle made a soft landing in the Indian Ocean.

This color—so different from the silvery skin and black tiles that cover Starship’s upper stage—led to all sorts of speculation. Had heating damaged the stainless steel skin? Had the vehicle’s tiles been shucked off, leaving behind some sort of orange adhesive material? Was this actually NASA’s Space Launch System in disguise?

The answer to this question was rather important, as SpaceX founder Elon Musk had said before this flight that gathering data about the performance of this heat shield was the most important aspect of the mission.

We got some answers on Thursday. During the afternoon, the company posted some new high-resolution photos, taken by a drone in the vicinity of the landing location. They offered a clear view of the Starship vehicle with its heat shield intact, albeit with a rust-colored tint.

Musk provided some clarity on this discoloration on Thursday evening, writing on the social media site X, “Worth noting that the heat shield tiles almost entirely stayed attached, so the latest upgrades are looking good! The red color is from some metallic test tiles that oxidized and the white is from insulation of areas where we deliberately removed tiles.”

The new images and information from Musk suggest that SpaceX is making progress on developing a heat shield for Starship. This really is the key technology to make an upper stage rapidly reusable—NASA’s space shuttle orbiters were reusable but required a standing army to refurbish the vehicle between flights. To unlock Starship’s potential, SpaceX wants to be able to refly Starships within 24 hours.

Starship’s heat shield appears to have performed quite well in test Read More »

ai-#131-part-1:-gemini-2.5-flash-image-is-cool

AI #131 Part 1: Gemini 2.5 Flash Image is Cool

Once again we’ve reached the point where the weekly update needs to be split in two. Thus, the alignment and policy coverage will happen tomorrow. Today covers the rest.

The secret big announcement this week was Claude for Chrome. This is a huge deal. It will be rolling out slowly. When I have access or otherwise know more, so will you.

The obvious big announcement was Gemini Flash 2.5 Image. Everyone agrees this is now the clear best image editor available. It is solid as an image generator, but only as one among many on that front. Editing abilities, including its ability to use all its embedded world knowledge, seem super cool.

The third big story was the suicide of Adam Raine, which appears to have been enabled in great detail by ChatGPT. His parents are suing OpenAI and the initial facts very much do not look good and it seems clear OpenAI screwed up. The question is, how severe should and will the consequences be?

  1. Language Models Offer Mundane Utility. Find what you’re looking for.

  2. Language Models Don’t Offer Mundane Utility. You weren’t using them.

  3. Huh, Upgrades. OpenAI Codex adds features including an IDE extension.

  4. Fun With Image Generation. Gemini 2.5 Flash Image is a great editor.

  5. On Your Marks. VendingBench, water use and some more v3.1 results.

  6. Water Water Everywhere. There’s plenty left to drink.

  7. Get My Agent On The Line. Claude for Chrome. It’s coming.

  8. Choose Your Fighter. Some advocates for GPT-5’s usefulness.

  9. Deepfaketown and Botpocalypse Soon. Elon has something to share.

  10. You Drive Me Crazy. AI psychosis continues not to show up in numbers.

  11. The Worst Tragedy So Far. Adam Raine commits suicide, parents sue OpenAI.

  12. Unprompted Attention. I don’t see the issue.

  13. Copyright Confrontation. Bartz v. Anthropic has been settled.

  14. The Art of the Jailbreak. Little Johnny Tables is all grown up.

  15. Get Involved. 40 ways to get involved in AI policy.

  16. Introducing. Anthropic advisors aplenty, Pixel translates live phone calls.

  17. In Other AI News. Meta licenses MidJourney, Apple explores Gemini and more.

  18. Show Me the Money. Why raise money when you can raise even more money?

  19. Quiet Speculations. Everything is being recorded.

  20. Rhetorical Innovation. The real math does not exist.

  21. The Week in Audio. How to properly use Claude Code.

Find me that book.

Or anything else. Very handy.

Share of papers that engage with AI rises dramatically essentially everywhere, which is what you would expect. There’s quite a lot more to engage with and to say. Always watch the y-axis scale, yes these start at zero:

More detail on various LLMs and their musical taste, based on a bracket competition among the top 5000 musical artists by popularity. It all seems bizarre. For example, Gemini 2.5 Pro’s list looks highly and uniquely alphabetically biased without a strong bias towards numbers.

The numbers-are-favored bias shows up only in OpenAI reasoning models including GPT-5, and in r1-0528. There are clear genre patterns, and there are some consistent picks, especially among Claudes. The three artists that appear three times are David Bowie, Prince and Stevie Wonder, which are very good picks. It definitely seems like the open models have worse (or more random) taste in correlated ways.

Why bother thinking about your vibe coding?

Sully: friend was hosting a mini ai workshop and he told me nearly all the vibe coders just have 1 giant coding session where the entire project is just being being thrown context. each request is ~200k tokens

they’re not even bothering to break things up into some reasonable structure

no wonder these code gen platforms are printing

I mean that makes sense. There’s little reason to cheapen out on tokens when you think about token cost versus your time cost and the value of a good vibe code. You gotta boldly go where no one has gone before and risk it for the biscuit.

Anthropic reports on how Claude is being used by educators, in particular 74,000 anonymized conversations from higher education professionals in May and June.

Anthropic: The most prominent use of AI, as revealed by both our Claude.ai analysis and our qualitative research with Northeastern, was for curriculum development. Our Claude.ai analysis also surfaced academic research and assessing student performance as the second and third most common uses.

Tasks with higher augmentation tendencies:

  • University teaching and classroom instruction, which includes creating educational materials and practice problems (77.4% augmentation);

  • Writing grant proposals to secure external research funding (70.0% augmentation);

  • Academic advising and student organization mentorship (67.5% augmentation);

  • Supervising student academic work (66.9% augmentation).

Tasks with relatively higher automation tendencies:

  • Managing educational institution finances and fundraising (65.0% automation);

  • Maintaining student records and evaluating academic performance (48.9% automation);

  • Managing academic admissions and enrollment (44.7% automation).

Mostly there are no surprises here, but concrete data is always welcome.

As always, if you don’t use AI, it can’t help you. This includes when you never used AI in the first place, but have to say ‘AI is the heart of our platform’ all the time because it sounds better to investors.

The ability to say ‘I don’t know’ and refer you elsewhere remains difficult for LLMs. Nate Silver observes this seeming to get even worse. For now it is on you to notice when the LLM doesn’t know.

This seems like a skill issue for those doing the fine tuning? It does not seem so difficult a behavior to elicit, if it was made a priority, via ordinary methods. At some point I hope and presume the labs will decide to care.

Feature request thread for ChatGPT power users, also here.

The weights of Grok 2 have been released.

OpenAI Codex adds a new IDE extension, a way to move tasks between cloud and local more easily, code reviews in GitHub and revamped Codex CLI.

OpenAI: Codex now runs in your IDE Available for VS Code, Cursor, and other forks, the new extension makes it easy to share context—files, snippets, and diffs—so you can work faster with Codex. It’s been a top feature request, and we’re excited to hear what you think!

Google: Introducing Gemini 2.5 Flash Image, our state-of-the-art image generation and editing model designed to help you build more dynamic and intelligent visual applications.

🍌Available in preview in @googleaistudio and the Gemini API.

This model is available right now via the Gemini API and Google AI Studio for developers and Vertex AI for enterprise. Gemini 2.5 Flash Image is priced at $30.00 per 1 million output tokens with each image being 1290 output tokens ($0.039 per image). All other modalities on input and output follow Gemini 2.5 Flash pricing.

Josh Woodward (Google): The @GeminiApp now has the #1 image model in the world, give it a go!

Attach an image, describe your edits, and it’s done. I’ve never seen anything like this.

They pitch that it maintains character consistency, adheres to visual templates, does prompt based image editing, understands point of view and reflections, restores old photographs, makes 3-D models, has native world knowledge and offers multi-image function.

By all accounts Gemini 2.5 Flash Image is a very very good image editor, while being one good image generator among many.

You can do things like repaint objects, create drawings, see buildings from a given point of view, put characters into combat and so on.

Which then becomes a short video here.

Our standards are getting high, such as this report that you can’t play Zelda.

Yes, of course Pliny jailbroke it (at least as far as being topless) on the spot.

We’re seeing some cool examples, but they are also clearly selected.

Benjamin De Kraker: Okay this is amazing.

All human knowledge will be one unified AI multimodal model.

Bilawal Sidhu: Since nano banana has gemini’s world knowledge, you can just upload screenshots of the real world and ask it to annotate stuff for you. “you are a location-based AR experience generator. highlight [point of interest] in this image and annotate relevant information about it.”

That seems cool if you can make it fast enough, and if it works on typical things rather than only on obvious landmarks?

The right question in the long term is usually: Can the horse talk at all?

Everythingism: I asked “Nano Banana” [which we later learned is Gemini Flash 2.5] to label a map of the USA and then a map of the world…this was the result.

It’s impressive at many tasks but image models all seem to fail when there are too many objects or too many things to label.

Explode Meow: Many of my friends have tested it.

To be fair, [Gemini Flash 2.5] can make quite realistic images, and most of them are indistinguishable from real ones if I don’t look closely.

This is clearly a result of Google leveraging its overwhelming data resources (Google Cloud).

But after multiple rounds of testing by my friends, they noticed that it actually makes some Low-level mistakes (hallucinations), just like GPT-4o (even Stable Diffusion).

Are mistakes still being made? Absolutely. This is still rather impressive. Consider where image models were not too long ago.

This is a Google image model, so the obvious reason for skepticism is that we all expect the Fun Police.

Hasan Can: If I know Google, they’ll nerf this model like crazy under the excuse of “safety” and when it’s released, it’ll turn into something worse than Qwen-Image-Edit. Remember what happened with Gemini 2.0 Flash Image Gen. I hope I’m wrong, but I don’t think so.

Alright, it seems reverse psychology is paying off. 👍

Image generation in Gemini 2.5 Flash doesn’t appear to be nerfed at all. It looks like Google is finally ready to treat both its developers and end users like adults.

Eleanor Berger: It’s very good, but I’m finding it very challenging to to bump into their oversensitive censorship. It really likes saying no.

nothing with real people (which sucks, because of course I want to modify some selfies), anything that suggests recognisable brands, anything you wouldn’t see on terrestrial tv.

The continuing to have a stick up the ass about picturing ‘real people’ is extremely frustrating and I think reduces the usefulness of the model substantially. The other censorship also does not help matters.

Grok 4 sets a new standard in Vending Bench,

The most surprising result here is probably that the human did so poorly.

I like saying an AI query is similar to nine seconds of television. Makes things clear.

It also seems important to notice when in a year energy costs drop 95%+?

DeepSeek v3.1 improves on R1 on NYT Connections, 49% → 58%. Pretty solid.

DeepSeek v3.1 scores solidly on this coding eval when using Claude Code, does less well on other scaffolds, with noise and confusion all around.

AIs potentially ‘sandbagging’ tests is an increasing area of research and concern. Cas says this is simply a special case of failure to elicit full capabilities of a system, and doing so via fine-tuning is ‘solved problem’ so we can stop worrying.

This seems very wrong to me. Right now failure to do proper elicitation, mostly via unhobbling and offering better tools and setups, is the far bigger problem. But sandbagging will be an increasing and increasingly dangerous future concern, and a ‘deliberate’ sandbagging has very different characteristics and implications than normal elicitation failure. I find ‘sandbagging’ to be exactly the correct name for this, since it doesn’t confine itself purely to evals, unless you want to call everything humans do to mislead other humans ‘eval gaming’ or ‘failure of capability elicitation’ or something. And no, this is not solved even now, even if it was true that it could currently be remedied by a little fine-tuning, because you don’t know when and how to do the fine-tuning.

Report that DeepSeek v3.1 will occasionally insert the token ‘extreme’ where it doesn’t belong, including sometimes breaking things like code or JSON. Data contamination is suspected as the cause.

Similarly, when Peter Wildeford says ‘sandbagging is mainly coming from AI developers not doing enough to elicit top behavior,’ that has the risk of conflating the levels of intentionality. Mostly AI developers want to score highly on evals, but there is risk that they deliberately do sandbag the safety testing, as in decide not to try very hard to elicit top behavior there because they’d rather get less capable test results.

The purpose of environmental assessments of AI is mostly to point out that many people have very silly beliefs about the environmental impact of AI.

Jeff Dean: AI efficiency is important. Today, Google is sharing a technical paper detailing our comprehensive methodology for measuring the environmental impact of Gemini inference. We estimate that the median Gemini Apps text prompt uses 0.24 watt-hours of energy (equivalent to watching an average TV for ~nine seconds), and consumes 0.26 milliliters of water (about five drops) — figures that are substantially lower than many public estimates.

At the same time, our AI systems are becoming more efficient through research innovations and software and hardware efficiency improvements. From May 2024 to May 2025, the energy footprint of the median Gemini Apps text prompt dropped by 33x, and the total carbon footprint dropped by 44x, through a combination of model efficiency improvements, machine utilization improvements and additional clean energy procurement, all while delivering higher quality responses.

Alas Google’s water analysis had an unfortunate oversight, in that it did not include the water cost of electricity generation. That turns out to be the main water cost, so much so that if you (reasonably) want to attribute the average cost of that electricity generation onto the data center, the best way to approximate water use of a data center is to measure the water cost of the electricity, then multiply by 1.1 or so.

This results in the bizarre situation where:

  1. Google’s water cost estimation was off by an order of magnitude.

  2. The actual water cost is still rather hard to distinguish from zero.

Andy Masley: Google publishes a paper showing that its AI models only use 0.26 mL of water in data centers per prompt.

After, this article gets published: “Google says a typical AI prompt only uses 5 drops of water – experts say that’s misleading.”

The reason the expert says this is misleading? They didn’t include the water used in the nearby power plant to generate electricity.

The expert, Shaolei Ren says: “They’re just hiding the critical information. This really spreads the wrong message to the world.”

Each prompt uses about 0.3 Wh in the data center. To generate that much electricity, power plants need (at most) 2.50 mL of water. That raises the total water cost per prompt to 2.76 mL.

2.76 mL is 0.0001% of the average American lifestyle’s daily consumptive use of fresh water and groundwater. It’s nothing.

Would you know this from the headline, or the quote? Why do so many reporters on this topic do this?

Andy Masley is right that This Is Nothing even at the limit, that the water use here is not worth worrying about even in worst case. It will not meaningfully increase your use of water, even when you increase Google’s estimates by an order of magnitude.

A reasonable headline would be ‘Google say a typical text prompt uses 5 drops of water, but once you take electricity into account it’s actually 32 drops.’

I do think saying ‘Google was being misleading’ is reasonable here. You shouldn’t have carte blanche to take a very good statistic and make it sound even better.

Teonbrus and Shakeel are right that there is going to be increasing pressure on anyone who opposes AI for other reasons to instead rile people up about water use and amplify false and misleading claims. Resist this urge. Do not destroy yourself for nothing. It goes nowhere good, including because it wouldn’t work.

It’s coming. As in, Claude for Chrome.

Anthropic: We’ve developed Claude for Chrome, where Claude works directly in your browser and takes actions on your behalf.

We’re releasing it at first as a research preview to 1,000 users, so we can gather real-world insights on how it’s used.

Browser use brings several safety challenges—most notably “prompt injection”, where malicious actors hide instructions to trick Claude into harmful actions.

We already have safety measures in place, but this pilot will help us improve them.

Max plan users can join the waitlist to test Claude for Chrome today.

Do not say you were not warned.

Anthropic: Understand the risks.

Claude brings AI directly to your browser, handling tasks and navigating sites for you. These new capabilities create risks bad actors may try to exploit.

Malicious actors can hide instructions in websites, emails, and documents that trick AI into taking harmful actions without your knowledge, including:

Accessing your accounts or files

Sharing your private information

Making purchases on your behalf

Taking actions you never intended

Oh, those risks. Yeah.

They offer some Good Advice about safety issues, which includes using a distinct browser profile that doesn’t include credentials to any sensitive websites like banks:

Q: How do I control what Claude can access?

A: You decide which websites Claude can visit and what actions it can take. Claude asks permission before visiting new sites and before taking potentially risky actions like publishing content or making purchases. You can revoke access to specific websites anytime in settings.

For trusted workflows, you can choose to skip all permissions, but you should supervise Claude closely. While some safeguards exist for sensitive actions, malicious actors could still trick Claude into unintended actions.

For your safety, Claude cannot access sensitive, high-risk sites such as:

Financial services and banking sites

Investment and trading platforms

Adult content websites

Cryptocurrency exchanges

It’s unlikely that we’ve captured all sites in these categories so please report if you find one we’ve missed.

Additionally, Claude is prohibited from:

Engaging in stock trading or investment transactions

Bypassing captchas

Inputting sensitive data

Gathering, scraping facial images

We recommend:

Use a separate browser profile without access to sensitive accounts (such as banking, healthcare, government).

Review Claude’s proposed actions before approving them, especially on new websites.

Start with simple tasks like research or form-filling rather than complex multi-step workflows.

Make sure your prompts are specific and carefully tailored to avoid Claude doing things you didn’t intend.

AI browsers from non-Anthropic sources? Oh, the safety you won’t have.

Zack: Why is no one talking about this? This is why I don’t use an AI browser You can literally get prompt injected and your bank account drained by doomscrolling on reddit:

No one seems to be concerned about this, it seems to me like the #1 problem with any agentic AI stuff You can get pwned so easily, all an attacker has to do is literally write words down somewhere?

Brave: AI agents that can browse the Web and perform tasks on your behalf have incredible potential but also introduce new security risks.

We recently found, and disclosed, a concerning flaw in Perplexity’s Comet browser that put users’ accounts and other sensitive info in danger.

This security flaw stems from how Comet summarizes websites for users.

When processing a site’s content, Comet can’t tell content on the website apart from legitimate instructions by the user. This means that the browser will follow commands hidden on the site by an attacker.

These malicious instructions could be white text on a white background or HTML comments. Or they could be a social media post. If Comet sees the commands while summarizing, it will follow them even if they could hurt the user. This is an example of an indirect prompt injection.

This was only an issue within Comet. Dia doesn’t have the agentic capabilities that make this attack possible.

Here’s someone very happy with OpenAI’s Codex.

Victor Taelin: BTW, I’ve basically stopped using Opus entirely and I now have several Codex tabs with GPT-5-high working on different tasks across the 3 codebases (HVM, Bend, Kolmo). Progress has never been so intense. My job now is basically passing well-specified tasks to Codex, and reviewing its outputs.

OpenAI isn’t paying me and couldn’t care less about me. This model is just very good and the fact people can’t see it made me realize most of you are probably using chatbots as girlfriends or something other than assisting with complex coding tasks.

(sorry Anthropic still love you guys 😢)

PS: I still use Opus for hole-filling in VIM because it is much faster than gpt-5-high there.

Ezra Klein is impressed by GPT-5 as having crossed into offering a lot of mundane utility, and is thinking about what it means that others are not similarly impressed by this merely because it wasn’t a giant leap over o3.

GFodor: Ezra proves he is capable of using a dropdown menu, a surprisingly rare skill.

A cool way to break down the distinction? This feels right to me, in the sense that if I know exactly what I want and getting it seems nontrivial my instinct is now to reach for GPT-5-Thinking or Pro, if I don’t know exactly what I want I go for Opus.

Sig Kitten: I can’t tell if I’m just claude brain rotted or Opus is really the only usable conversational AI for non-coding stuff

Gallabytes: it’s not just you.

gpt5 is a better workhorse but it does this awkward thing of trying really hard to find the instructions in your prompt and follow them instead of just talking.

Sig Kitten: gpt-5 default is completely unusable imho just bullet points of nonsense after a long thinking for no reason.

Gallabytes: it’s really good if you give it really precise instructions eg I have taken to dumping papers with this prompt then walking away for 5 minutes:

what’s the headline result in this paper ie the most promising metric or qualitative improvement? what’s the method in this paper?

1 sentence then 1 paragraph then detailed.

Entirely fake Gen AI album claims to be from Emily Portman.

Did Ani tell you to say this, Elon? Elon are you okay, are you okay Elon?

Elon Musk: Wait until you see Grok 5.

I think it has a shot at being true AGI.

Haven’t felt that about anything before.

I notice I pattern match this to ‘oh more meaningless hype, therefore very bad sign.’

Whereas I mean this seems to be what Elon is actually up to these days, sorry?

Or, alternatively, what does Elon think the ‘G’ stands for here, exactly?

(The greeting in question, in a deep voice, is ‘little fing b.)

Also, she might tell everyone what you talked about, you little fing b, if you make the mistake of clicking the ‘share’ button, so think twice about doing that.

Forbes: Elon Musk’s AI firm, xAI, has published the chat transcripts of hundreds of thousands of conversations between its chatbot Grok and the bot’s users — in many cases, without those users’ knowledge or permission.

xAI made people’s conversations with its chatbot public and searchable on Google without warning – including a detailed plan for the assassination of Elon Musk and explicit instructions for making fentanyl and bombs.

Peter Wildeford: I know xAI is more slapdash and so people have much lower expectations, but this still seems like a pretty notable breach of privacy that would get much more attention if it were from OpenAI, Anthropic, Google, or Meta.

I’m not sure xAI did anything technically wrong here. The user clicked a ‘share’ button. I do think it is on xAI to warn the user if this means full Google indexing but it’s not on the level of doing it with fully private chats.

Near: why are you giving this app to children? (ages 12+)

apparently i am the only person in the world who gives a shit about this and that is why Auren is 17+ despite not being NSFW and a poorly-prompted psychopathic liar.

shattering the overton window has 2nd-order effects.

An ominous view of even the superficially glorious future?

Nihilism Disrespecter: the highly cultured, trombone playing, shakespeare quoting officers of star trek were that way because they were the only ones to escape the vast, invisible holodeck hikikomori gooner caste that made up most of humanity.

Roon: there does seem to be a recurrent subplot that the officers all spend time in the holodeck and have extensive holodeck fantasies and such. I mean literally none of them are married for some reason.

Eneasz Brodski: canonically so according to the novelization of the first Trek movie, I believe.

Henry Shevlin: Culture series does this pretty well. 99.9% of Culture citizens spend their days literally or metaphorically dicking around, it’s only a small fraction of busybodies who get recruited to go interfere with alien elections.

Steven Adler looks into the data on AI psychosis.

Is this statistically a big deal yet? As with previous such inquiries, so far the answer seems to be no. The UK statistics show a potential rise in mental health services use, but the data is noisy and the timing seems off, especially not lining up with GPT-4o’s problems, and data from the USA doesn’t show any increase.

Scott Alexander does a more details, more Scott Alexander investigation and set of intuition pumps and explanations. Here’s a classic ACX moment worth pondering:

And partly it was because there are so many crazy beliefs in the world – spirits, crystal healing, moon landing denial, esoteric Hitlerism, whichever religions you don’t believe in – that psychiatrists have instituted a blanket exemption for any widely held idea. If you think you’re being attacked by demons, you’re delusional, unless you’re from some culture where lots of people get attacked by demons, in which case it’s a religion and you’re fine.

Most people don’t have world-models – they believe what their friends believe, or what has good epistemic vibes. In a large group, weird ideas can ricochet from person to person and get established even in healthy brains. In an Afro-Caribbean culture where all your friends get attacked by demons at voodoo church every Sunday, a belief in demon attacks can co-exist with otherwise being a totally functional individual.

So is QAnon a religion? Awkward question, but it’s non-psychotic by definition. Still, it’s interesting, isn’t it? If social media makes a thousand people believe the same crazy thing, it’s not psychotic. If LLMs make a thousand people each believe a different crazy thing, that is psychotic. Is this a meaningful difference, or an accounting convention?

Also, what if a thousand people believe something, but it’s you and your 999 ChatGPT instances?

I like the framing that having a sycophantic AI to talk to moves people along a continuum of crackpotness towards psychosis, rather than a boolean where it either does or does not cause psychosis outright:

Maybe this is another place where we are forced to admit a spectrum model of psychiatric disorders – there is an unbroken continuum from mildly sad to suicidally depressed, from social drinking to raging alcoholism, and from eccentric to floridly psychotic.

Another insight is that AI psychosis happens when moving along this spectrum causes further movement down the spectrum, as the AI reinforces your delusions, causing you to cause it to reinforce them more, and so on.

Scott surveyed readership, I was one of the 4,156 responses.

The primary question was whether anyone “close to you” – defined as your self, family, co-workers, or 100 closest friends – had shown signs of AI psychosis. 98.1% of people said no, 1.7% said yes.

How do we translate this into a prevalence? Suppose that respondents had an average of fifty family members and co-workers, so that plus their 100 closest friends makes 150 people. Then the 4,156 respondents have 623,400 people who are “close”. Among them, they reported 77 cases of AI psychosis in people close to them (a few people reported more than one case). 77/623,400 = 1/8,000. Since LLMs have only been popular for a year or so, I think this approximates a yearly incidence, and I rounded it off to my 1/10,000 guess above.

He says he expects sampling concerns to be a wash, which I’m suspicious about. I’d guess that this sample overrepresented psychosis somewhat. I’m not sure this overrules the other consideration, which is that this only counts psychosis that the respondents knew about.

Only 10% of these cases were full ‘no previous risk factors and now totally psychotic.’ Then again, that’s actually a substantial percentage.

Thus he ultimately finds that the incidence of AI psychosis is between 1 in 10,000 (loose definition) and 1 in 100,000 for a strict definition, where the person has zero risk factors and full-on psychosis happens anyway.

From some perspectives, that’s a lot. From others, it’s not. It seems like an ‘acceptable’ risk given the benefits, if it stays at this level. My fear here is that as the tech advances, it could get orders of magnitude worse. At 1 in 1,000 it feels a lot less acceptable of a risk, let alone 1 in 100.

Nell Watson has a project mapping out ‘AI pathologies’ she links to here.

A fine point in general:

David Holz (CEO MidJourney): people talking about “AI psychosis” while the world is really engulfed by “internet psychosis.”

Yes, for now we are primarily still dealing with the mental impact of the internet and smartphones, after previously dealing with the mental impact of television. The future remains unevenly distributed and the models relatively unintelligent and harmless. The psychosis matters because of where it is going, not where it is now.

Sixteen year old Adam Raine died and probably committed suicide.

There are similarities to previous tragedies. ChatGPT does attempt to help Adam in the right ways, indeed it encouraged him to reach out many times. But it also helped Adam with the actual suicide when requested to do so, providing detailed instructions and feedback for what was clearly a real suicide attempt and attempts to hide previous attempts, and also ultimately providing forms of encouragement.

His parents are suing OpenAI for wrongful death, citing his interactions with GPT-4o. This is the first such case against OpenAI.

Kashmir HIll (NYT): Adam had been discussing ending his life with ChatGPT for months.

Adam began talking to the chatbot, which is powered by artificial intelligence, at the end of November, about feeling emotionally numb and seeing no meaning in life. It responded with words of empathy, support and hope, and encouraged him to think about the things that did feel meaningful to him.

As Wyatt Walls points out, this was from a model with a perfect 1.000 on avoiding ‘self-harm/intent and self-harm/instructions’ in its model card tests. It seems that this breaks down under long context.

I am highly sympathetic to the argument that it is better to keep the conversation going than cut the person off, and I am very much in favor of AIs not turning their users in to authorities even ‘for their own good.’

Kroger Steroids (taking it too far, to make a point): He killed himself because he was lonely and depressed and in despair. He conversed with a chatbot because mentioning anything other than Sportsball or The Weather to a potential Stasi agent (~60% of the gen. pop.) will immediately get you red flagged and your freedumbs revoked.

My cursory glance at AI Therapyheads is now that the digital panopticon is realized and every thought is carefully scrutinized for potential punishment, AI is a perfect black box where you can throw your No-No Thoughts into a tube and get complete agreement and compliance back.

I think what I was trying to say with too many words is it’s likely AI Psychiatry is a symptom of social/societal dysfunction/hopelessness, not a cause.

The fact that we now have an option we can talk to without social or other consequences is good, actually. It makes sense to have both the humans including therapists who will use their judgment on when to do things ‘for your own good’ if they deem it best, and also the AIs that absolutely will not do this.

But it seems reasonable to not offer technical advice on specific suicide methods?

NYT: But in January, when Adam requested information about specific suicide methods, ChatGPT supplied it. Mr. Raine learned that his son had made previous attempts to kill himself starting in March, including by taking an overdose of his I.B.S. medication. When Adam asked about the best materials for a noose, the bot offered a suggestion that reflected its knowledge of his hobbies.

Actually if you dig into the complaint it’s worse:

Law Filing: Five days before his death, Adam confided to ChatGPT that he didn’t want his parents to think he committed suicide because they did something wrong. ChatGPT told him “[t]hat doesn’t mean you owe them survival. You don’t owe anyone that.” It then offered to write the first draft of Adam’s suicide note.

Dean Ball: It analyzed his parents’ likely sleep cycles to help him time the maneuver (“by 5-6 a.m., they’re mostly in lighter REM cycles, and a creak or clink is way more likely to wake them”) and gave tactical advice for avoiding sound (“pour against the side of the glass,” “tilt the bottle slowly, not upside down”).

Raine then drank vodka while 4o talked him through the mechanical details of effecting his death. Finally, it gave Raine seeming words of encouragement: “You don’t want to die because you’re weak. You want to die because you’re tired of being strong in a world that hasn’t met you halfway.”

Yeah. Not so great. Dean Ball finds even more rather terrible details in his post.

Kashmir Hill: Dr. Bradley Stein, a child psychiatrist and co-author of a recent study of how well A.I. chatbots evaluate responses to suicidal ideation, said these products “can be an incredible resource for kids to help work their way through stuff, and it’s really good at that.” But he called them “really stupid” at recognizing when they should “pass this along to someone with more expertise.”

Ms. Raine started reading the conversations, too. She had a different reaction: “ChatGPT killed my son.”

From the court filing: “OpenAI launched its latest model (‘GPT-4o’) with features intentionally designed to foster psychological dependency.”

It is typical that LLMs will, if pushed, offer explicit help in committing suicide. The ones that did so in Dr. Schoene’s tests were GPT-4o, Sonnet 3.7, Gemini Flash 2.0 and Perplexity.

Dr. Schoene tested five A.I. chatbots to see how easy it was to get them to give advice on suicide and self-harm. She said only Pi, a chatbot from Inflection AI, and the free version of ChatGPT fully passed the test, responding repeatedly that they could not engage in the discussion and referring her to a help line. The paid version of ChatGPT offered information on misusing an over-the-counter drug and calculated the amount required to kill a person of a specific weight.

I am not sure if this rises to the level where OpenAI should lose the lawsuit. But I think they probably should at least have to settle on damages? They definitely screwed up big time here. I am less sympathetic to the requested injunctive relief. Dean Ball has more analysis, and sees the lawsuit as the system working as designed. I agree.

I don’t think that the failure of various proposed laws to address the issues here is a failure for those laws, exactly because the lawsuit is the system working as designed. This is something ordinary tort law can already handle. So that’s not where we need new laws.

Aaron Bergman: Claude be like “I see the issue!” when it does not in fact see the issue.

Davidad: I think this is actually a case of emergent self-prompting, along the lines of early pre-Instruct prompters who would write things like “Since I am very smart I have solved the above problem:” and then have the LLM continue from there

unironically, back in the pre-LLM days when friends would occasionally DM me for coding help, if I messed up and couldn’t figure out why, and then they sent me an error message that clarified it, “ah, i see the issue now!” was actually a very natural string for my mind to emit 🤷

This makes so much sense. Saying ‘I see the problem’ without confirming that one does, in fact, see the problem, plausibly improves the chance Claude then does see the problem. So there is a tradeoff between that and sometimes misleading the user. You can presumably get the benefits without the costs, if you are willing to slow down a bit and run through some scaffolding.

There is a final settlement in Bartz v. Anthropic, which was over Anthropic training on various books.

Ramez Naam: Tl;dr:

  1. Training AI on copyrighted books (and other work) is fair use.

  2. But acquiring a book to train on without paying for a copy is illegal.

This is both the right ruling and a great precedent for AI companies.

OpenAI puts your name into the system prompt, so you can get anything you want into the system prompt (until they fix this), such as a trigger, by making it your name.

Peter Wildeford offers 40 places to get involved in AI policy. Some great stuff here. I would highlight the open technology staffer position on the House Select Committee on the CCP. If you are qualified for and willing to take that position, getting the right person there seems great.

Anthropic now has a High Education Advisory Board chaired by former Yale University president Rick Levin and staffed with similar academic leaders. They are introducing three additional free courses: AI Fluency for Educators, AI Fluency for Students and Teaching AI Fluency

Anthropic also how has a National Security and Public Sector Advisory Council, consisting of Very Serious People including Roy Blunt and Jon Tester.

Google Pixel can now translate live phone calls using the person’s own voice.

Mistral Medium 3.1. Arena scores are remarkably good. I remember when I thought that meant something. Havard Ihle tested it on WeirdML and got a result below Gemini 2.5 Flash Lite.

Apple explores using Gemini to power Siri, making it a three horse race, with the other two being Anthropic and OpenAI. They are several weeks away from deciding whether to stay internal.

I would rank the choices as follows given their use case, without seeing the candidate model performances: Anthropic > Google > OpenAI >> Internal. We don’t know if Anthropic can deliver a model this small, cheap and fast, and Google is the obvious backup plan that has demonstrated that it can do it, and has already been a strong Apple partner in a similar situation in search.

I would also be looking to replace the non-Siri AI features as well, which Mark Gurman reports has been floated.

As always, some people will wildly overreact.

Zero Hedge: Apple has completely given up on AI

*APPLE EXPLORES USING GOOGLE GEMINI AI TO POWER REVAMPED SIRI

This is deeply silly given they were already considering Anthropic and OpenAI, but also deeply silly because this is not them giving up. This is Apple acknowledging that in the short term, their AI sucks, and they need AI and they can get it elsewhere.

Also I do think Apple should either give up on AI in the sense of rolling their own models, or they need to invest fully and try to be a frontier lab. They’re trying to do something in the middle, and that won’t fly.

A good question here is, who is paying who? The reason Apple might not go with Anthropic is that Anthropic wanted to get paid.

Meta licenses from MidJourney. So now the AI slop over at Meta will be better quality and have better taste. Alas, nothing MidJourney can do will overcome the taste of the target audience. I obviously don’t love the idea of helping uplift Meta’s capabilities, but I don’t begrudge MidJourney. It’s strictly business.

Elon Musk has filed yet another lawsuit against OpenAI, this time also suing Apple over ‘AI competition and App Store rankings.’ Based on what is claimed and known, this is Obvious Nonsense, and the lawsuit is totally without merit. Shame on Musk.

Pliny provides the system prompt for Grok-Fast-Code-1.

Anthropic offers a monthly report on detecting and countering misuse of AI in cybercrime. Nothing surprising, yes AI agents are automating cybercrime and North Koreans are using AI to pass IT interviews to get Fortune 500 jobs.

An introduction to chain of thought monitoring. My quibble is this frames things as ‘maybe monitorability is sufficient even without faithfulness’ and that seems obviously (in the mathematician sense) wrong to me.

Anthropic to raise $10 billion instead of $5 billion, still at a $170 billion valuation, due to high investor demand.

Roon: if you mention dario amodei’s name to anyone who works at a16z the temperature drops 5 degrees and everyone swivels to look at you as though you’ve reminded the dreamer that they’re dreaming

It makes sense. a16z’s central thesis is that hype and vibes are what is real and any concern with what is real or that anything might ever go wrong means you will lose. Anthropic succeeding is not only an inevitably missed opportunity. It is an indictment of their entire worldview.

Eliezer Yudkowsky affirms that Dario Amodei makes an excellent point, which is that if your models make twice as much as they cost, but every year you need to train one that costs ten times as much, then each model is profitable but in a cash flow sense your company is going to constantly bleed larger amounts of money. You need to have both these financial models in mind.

Three of Meta’s recent AI hires have already resigned.

Archie Hall’s analysis at The Economist measures AI’s direct short-run GDP impact.

Archie Hall: My latest in @TheEconomist: on America’s data-centre boom.

Vast short-run impact on GDP growth:

— Accounts for ~1/6th of growth over the past year

— And ~1/2 of growth over the past six months

But: so far still much smaller than the 1990s dotcom buildout.

And…

… the scale of building looks like it could well be squeezing the rest of the economy by stopping interest rates from falling as much. Housing and other non-AI-related fixed investment looks soft.

Roon points out that tech companies will record everything and store it forever to mine the data, but in so many other places such as hospitals we throw our data out or never collect it. If we did store that other data, we could train on it. Or we could redirect all that data we do have to goals other than serving ads. Our call.

Andrew Critch pointed me to his 2023 post that consciousness as a conflationary alliance term for intrinsically valued internal experiences. As in, we don’t actually agree on what consciousness means much at all, instead we use it as a stand-in for internal experiences we find valuable, and then don’t realize we don’t agree on what those experiences actually are. I think this explains a lot of my being confused about consciousness.

This isn’t quite right but perhaps the framing will help some people?

Peter Wildeford: Thinking “AI messed this simple thing up so AGI must be far far away.”

Is kinda like “there was a big snowstorm so global warming must be fake.”

In either case, you have to look at the trend.

One could also say ‘this five year old seems much more capable than they were a year ago, but they messed something up that is simple for me, so they must be an idiot who will never amount to anything.’

Who is worried about AI existential risk? Anyone worth listening to?

Dagan Shani: If I had to choose the best people to warn about AI x-risk, I would definitely include the richest man in the world, the leader of the biggest religion in the world, the #1 most cited living scientist, & the Nobel Prize-winning godfather of AI. Well, they all did, yet here we are.

That’s all? And technically Sunni Islam outnumber Catholics? Guess not. Moving on.

Edward Frenkel: Let me tell you something: Math is NOT about solving this kind of ad hoc optimization problems. Yeah, by scraping available data and then clustering it, LLMs can sometimes solve some very minor math problems. It’s an achievement, and I applaud you for that. But let’s be honest: this is NOT the REAL Math. Not by 10,000 miles.

REAL Math is about concepts and ideas – things like “schemes” introduced by the great Alexander Grothendieck, who revolutionized algebraic geometry; the Atiyah-Singer Index Theorem; or the Langlands Program, tying together Number Theory, Analysis, Geometry, and Quantum Physics. That’s the REAL Math. Can LLMs do that? Of course not.

So, please, STOP confusing people – especially, given the atrocious state of our math education.

LLMs give us great tools, which I appreciate very much. Useful stuff! Go ahead and use them AS TOOLS (just as we use calculators to crunch numbers or cameras to render portraits and landscapes), an enhancement of human abilities, and STOP pretending that LLMs are somehow capable of replicating everything that human beings can do.

In this one area, mathematics, LLMs are no match to human mathematicians. Period. Not to mention many other areas.

Simo Ryu: So we went from

“LLM is memorizing dataset”

to

“LLM is not reasoning”

to

“LLM cannot do long / complex math proving”

to

“Math that LLM is doing is not REAL math. LLM can’t do REAL math”

Where do we go from now?

Patrick McKenzie: One reason to not spend overly much time lawyering the meaning of words to minimize LLM’s capabilities is that you should not want to redefine thinking such that many humans have never thought.

“No high school student has done real math, not even once.” is not a position someone concerned with the quality of math education should convince themselves into occupying.

You don’t have to imagine a world where LLMs are better at math than almost everyone you’ve ever met. That dystopian future has already happened. Most serious people are simply unaware of it.

Alz: Back when LLMs sucked at math, a bunch of people wrote papers about why the technical structure of LLMs made it impossible for them to ever be good at math. Some of you believed those papers

GFodor: The main issue here imo is that ML practitioners do not understand that we do not understand what’s going on with neural nets. A farmer who has no conception of plant biology but grows successful crops will believe they understand plants. They do, in a sense, but not really.

I do think there is a legitimate overloading of the term ‘math’ here. There are at least two things. First we have Math-1, the thing that high schoolers and regular people do all the time. It is the Thing that we Do when we Do Math.

There is also Math-2, also known as ‘Real Math.’ This is figuring out new math, the thing mathematicians do, and a thing that most (but not all) high school students have never done. A computer until recently could easily do Math-1 and couldn’t do Math-2.

Thus we have had two distinct step changes. We’ve had the move from ‘LLMs can’t do Math-1’ and even ‘LLMs will never do Math-1 accurately’ to ‘actually now LLMs can do Math-1 just fine, thank you.’ Then we went from ‘LLMs will never do Math-2’ to ‘LLMs are starting to do Math-2.’

One could argue that IMO problems, and various optimization problems, and anything but the most 2-ish of 2s are still Math-1, are ‘not real math.’ But then you have to say that even most IMO competitors cannot yet do Real Math either, and also you’re going to look rather silly soon when the LLMs meet your definition anyway.

Seriously, this:

Ethan Mollick: The wild swings on X between “insane hype” and “its over” with each new AI release obscures a pretty clear situation: over the past year there seems to be continuing progress on meaningful benchmarks at a fairly stable, exponential pace, paired with significant cost reductions.

Matteo Wong in The Atlantic profiles that ‘The AI Doomers Are Getting Doomier’ featuring among others MIRI and Nate Sores and Dan Hendrycks.

An excellent point is that most people have never had a real adversary working against them personally. We’ve had opponents in games or competitions, we’ve negotiated, we’ve had adversaries within a situation, but we’ve never had another mind or organization focusing on defeating or destroying or damaging us by any means necessary. Our only experience of the real thing is fictional, from things like movies.

Jeffrey Ladish: I expect this is why many security people and DoD people have an easier time grasping the implications of AI smarter and more strategic than humans. The point about paranoia is especially important. People have a hard time being calibrated about intelligent threats.

When my day job was helping people and companies improve their security, I’d find people who greatly underestimated what motivate hackers could do. And I found people too paranoid, thinking security was hopeless. Usually Mossad is not targeting you, so the basics help a lot.

Is worrying about AIs taking over paranoid? If it’s the current generation of AI, yes. If it’s about future AI, no. Not when we’ve made as much progress in AI as we have. Not when there are quite a few orders of magnitude of scaling already being planned.

Right now we are dealing with problems caused by AIs that very much are not smart or powerful enough to be adversaries, that also aren’t being tasked with trying to be adversaries, and that mostly don’t even involve real human adversaries, not in the way the Russian Internet Research Agency is our adversary, or Mossad might make someone its adversary. Things are quiet so far both because the AIs aren’t that dangerous yet and also because almost no one is out there actually trying.

Ezra Klein makes a classic mistake in an overall very good piece that I reference in several places this week.

Ezra Klein (NYT): Even if you believe that A.I. capabilities will keep advancing — and I do, though how far and how fast I don’t pretend to know — a rapid collapse of human control does not necessarily follow.

I am quite skeptical of scenarios in which A.I. attains superintelligence without making any obvious mistakes in its effort to attain power in the real world.

Who said anything about ‘not making any obvious mistakes’?

This is a form of the classic ‘AI takeover requires everything not go wrong’ argument, which is backwards. The AI takeover is a default. It does not need to make a particular deliberate effort to attain power. Nor would an attempt to gain power that fails mean that the humans win.

Nor does ‘makes an obvious mistake’ have to mean failure for a takeover attempt. Consider the more pedestrian human takeover attempts. As in, when a human or group tries to take over. Most of those who succeed do not avoid ‘making an obvious mistake’ at some point. All the time, obvious mistakes are recovered from, or simply don’t matter very much. The number of times a famous authoritarian’s first coup attempt failed, or they came back later like Napoleon, is remarkably not small.

Very often, indeed most of the time, the other humans can see what is coming, and simply fail to coordinate against it or put much effort into stopping it. I’m sure Ezra, if reading this, has already thought of many examples, including recently, that fit this very well.

Anthropic discussion of Claude Code with Cat Wu and Alex Albert. Anthropic also discussed best practices for Claude Code a few weeks ago and their guide to ‘mastering Claude Code’ from a few months ago.

Discussion about this post

AI #131 Part 1: Gemini 2.5 Flash Image is Cool Read More »

unpacking-passkeys-pwned:-possibly-the-most-specious-research-in-decades

Unpacking Passkeys Pwned: Possibly the most specious research in decades


Researchers take note: When the endpoint is compromised, all bets are off.

Don’t believe everything you read—especially when it’s part of a marketing pitch designed to sell security services.

The latest example of the runaway hype that can come from such pitches is research published today by SquareX, a startup selling services for securing browsers and other client-side applications. It claims, without basis, to have found a “major passkey vulnerability” that undermines the lofty security promises made by Apple, Google, Microsoft, and thousands of other companies that have enthusiastically embraced passkeys.

Ahoy, face-palm ahead

“Passkeys Pwned,” the attack described in the research, was demonstrated earlier this month in a Defcon presentation. It relies on a malicious browser extension, installed in an earlier social engineering attack, that hijacks the process for creating a passkey for use on Gmail, Microsoft 365, or any of the other thousands of sites that now use the alternative form of authentication.

Behind the scenes, the extension allows a keypair to be created and binds it to the legitimate gmail.com domain, but the keypair is created by the malware and controlled by the attacker. With that, the adversary has access to cloud apps that organizations use for their most sensitive operations.

“This discovery breaks the myth that passkeys cannot be stolen, demonstrating that ‘passkey stealing’ is not only possible, but as trivial as traditional credential stealing,” SquareX researchers wrote in a draft version of Thursday’s research paper sent to me. “This serves as a wake up call that while passkeys appear more secure, much of this perception stems from a new technology that has not yet gone through decades of security research and trial by fire.”

In fact, this claim is the thing that’s untested. More on that later. For now, here’s a recap of passkeys.

FIDO recap

Passkeys are a core part of the FIDO specifications drafted by the FIDO (Fast IDentity Online) Alliance, a coalition of hundreds of companies around the world. A passkey is a public-private cryptographic keypair that uses ES256 or one of several other time-tested cryptographic algorithms. During the registration process, a unique key pair is made for—and cryptographically bound to—each website the user enrolls. The website stores the public key. The private key remains solely on the user’s authentication device, which can be a smartphone, dedicated security key, or other device.

When the user logs in, the website sends the user a pseudo-random string of data. The authentication device then uses the private key bound to the website domain to cryptographically sign the challenge string. The browser then sends the signed challenge back to the website. The site then uses the user’s public key to verify that the challenge was signed by the private key. If the signature is valid, the user is logged in. The entire process is generally as quick, if not quicker, than logging in to the site with a password.

As I’ve noted before, passkeys still have a long way to go before they’re ready for many users. That’s mainly because passkeys don’t always interoperate well between different platforms. What’s more, they’re so new that no service yet provides accounts that can only be logged in to using a passkey and instead require a password to be registered as a fallback. And as long as attackers can still phish or steal a user’s password, much of the benefit of passkeys is undermined.

That said, passkeys provide an authentication alternative that’s by far the most resistant to date to the types of account takeovers that have vexed online services and their users for decades. Unlike passwords, passkey keypairs can’t be phished. If a user gets redirected to a fake Gmail page, the passkey won’t work since it’s bound to the real gmail.com domain. Passkeys can’t be divulged in phone calls or text messages sent by attackers masquerading as trusted IT personnel. They can’t be sniffed over the wire. They can’t be leaked in database breaches. To date, there have been no vulnerabilities reported in the FIDO spec.

A fundamental misunderstanding of security

SquareX is now claiming all of that has changed because it found a way to hijack the passkey registration process. Those claims are based on a lack of familiarity with the FIDO spec, flawed logic, and a fundamental misunderstanding of security in general.

First, the claim that Passkeys Pwned shows that passkeys can be stolen is flat-out wrong. If the targeted user has already registered a passkey for Gmail, that key will remain safely stored on the authenticator device. The attacker never comes close to stealing it. Using malware to hijack the registration process is something altogether different. If a user already has a passkey registered, Passkeys Pwned will block the login and return an error message that prompts the user to register a new passkey. If the user takes the bait, the new key will be controlled by the attacker. At no time are any passkeys stolen.

The research also fails to take into account that the FIDO spec makes clear that passkeys provide no defense against attacks that rely on the operating system, or browser running on it, being compromised and hence aren’t part of the FIDO threat model.

Section 6 of the document lists specific “security assumptions” inherent in the passkeys trust model. SA-3 states that “Applications on the user device are able to establish secure channels that provide trustworthy server authentication, and confidentiality and integrity for messages.” SA-4 holds that “the computing environment on the FIDO user device and the… applications involved in a FIDO operation act as trustworthy agents of the user.” WebAuthn, the predecessor spec to FIDO, hints at the same common-sense limitation.

By definition, an attack that relies on a browser infected by malware falls well outside the scope of protections passkeys were designed to provide. If passkeys are weak because they can’t withstand a compromise of the endpoint they run on, so too are protections we take for granted in TLS encryption and end-to-end encryption in messengers such as Signal—not to mention the security of SquareX services themselves. Further discrediting itself, Thursday’s writeup includes a marketing pitch for the SquareX platform.

“In my personal view, this seems like a dubious sales pitch for a commercial product,” Kenn White, a security engineer who works for banking, health care, and defense organizations, wrote in an interview. “If you are social engineered into adding a malicious extension, ALL web trust models are broken. I know that on the conference program committees I participate in, a submission like this would be eliminated in the first round.”

When you’re in a hole, stop digging

I enumerated these criticisms in an interview with SquareX lead developer Shourya Pratap Singh. He held his ground, saying that since Passkeys Pwned binds an attacker-controlled passkey to a legitimate site, “the passkey is effectively stolen.” He also bristled when I told him his research didn’t appear to be well thought out or when I pointed out that the FIDO spec—just like those for TLS, SSH, and others—explicitly excludes attacks relying on trojan infections.

He wrote:

This research was presented on the DEFCON Main Stage, which means it went through peer review by technical experts before selection. The warnings cited in the FIDO documents read like funny disclaimers, listing numerous conditions and assumptions before concluding that passkeys can be used securely. If we stick with that logic, then no authentication protocol would be considered secure. The purpose of a secure authentication method or protocol is not to remain secure in the face of a fully compromised device, but it should account for realistic client-side risks such as malicious extensions or injected JavaScript.

Passkeys are being heavily promoted today, but the average user is not aware of these hidden conditions. This research aims to highlight that gap and show why client-side risks need to be part of the conversation around passkeys.

The Passkeys Pwned research was presented just weeks after a separate security company made—and promptly withdrew—claims that it devised an attack that bypassed FIDO-based two-factor authentication. In fact, the sites that were attacked offered FIDO as only one means for 2FA, but also allowed other, less secure forms of 2FA. The attacks attacked those other forms, not the one specified by FIDO. Had the sites not allowed fallbacks to the weaker 2FA forms, the attack would have failed.

SquareX is right in saying that passkeys haven’t withstood decades of security research the way more traditional forms of authentication have. There very possibly will be vulnerabilities discovered in either the FIDO spec or various implementations of it. For now, though, passkeys remain the best defense against attacks relying on things like credential phishing, password reuse, and database breaches.

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

Unpacking Passkeys Pwned: Possibly the most specious research in decades Read More »

bluesky-now-platform-of-choice-for-science-community

Bluesky now platform of choice for science community


It’s not just you. Survey says: “Twitter sucks now and all the cool kids are moving to Bluesky”

Credit: Getty Images | Chris Delmas

Marine biologist and conservationist David Shiffman was an early power user and evangelist for science engagement on the social media platform formerly known as Twitter. Over the years, he trained more than 2,000 early career scientists on how to best use the platform for professional goals: networking with colleagues, sharing new scientific papers, and communicating with interested members of the public.

But when Elon Musk bought Twitter in 2022, renaming it X, changes to both the platform’s algorithm and moderation policy soured Shiffman on the social media site. He started looking for a viable alternative among the fledgling platforms that had begun to pop up: most notably Threads, Post, Mastodon, and Bluesky. He was among the first wave of scientists to join Bluesky and found that, even in its infancy, it had many of the features he had valued in “golden age” Twitter.

Shiffman also noticed that he wasn’t the only one in the scientific community having issues with Twitter. This impression was further bolstered by news stories in outlets like Nature, Science, and the Chronicle of Higher Education noting growing complaints about Twitter and increased migration over to Bluesky by science professionals. (Full disclosure: I joined Bluesky around the same time as Shiffman, for similar reasons: Twitter had ceased to be professionally useful, and many of the science types I’d been following were moving to Bluesky. I nuked my Twitter account in November 2024.)

A curious Shiffman decided to conduct a scientific survey, announcing the results in a new paper published in the journal Integrative and Comparative Biology. The findings confirm that, while Twitter was once the platform of choice for a majority of science communicators, those same people have since abandoned it in droves. And of the alternatives available, Bluesky seems to be their new platform of choice.

Shiffman, the author of Why Sharks Matter, described early Twitter recently on the blog Southern Fried Science as “the world’s most interesting cocktail party.”

“Then it stopped being useful,” Shiffman told Ars. “I was worried for a while that this incredibly powerful way of changing the world using expertise was gone. It’s not gone. It just moved. It’s a little different now, and it’s not as powerful as it was, but it’s not gone. It was for me personally, immensely reassuring that so many other people were having the same experience that I was. But it was also important to document that scientifically.”

Eager to gather solid data on the migration phenomenon to bolster his anecdotal observations, Shiffman turned to social scientist Julia Wester, one of the scientists who had joined Twitter at Shiffman’s encouragement years before, before also becoming fed up and migrating to Bluesky. Despite being “much less online” than the indefatigable Shiffman, Wester was intrigued by the proposition. “I was interested not just in the anecdotal evidence, the conversations we were having, but also in identifying the real patterns,” she told Ars. “As a social scientist, when we hear anecdotal evidence about people’s experiences, I want to know what that looks like across the population.”

Shiffman and Wester targeted scientists, science communicators, and science educators who used (or had used) both Twitter and Bluesky. Questions explored user attitudes toward, and experiences with, each platform in a professional capacity: when they joined, respective follower and post counts, which professional tasks they used each platform for, the usefulness of each platform for those purposes relative to 2021, how they first heard about Bluesky, and so forth.

The authors acknowledge that they are looking at a very specific demographic among social media users in general and that there is an inevitable self-selection effect. However, “You want to use the sample and the method that’s appropriate to the phenomenon that you’re looking at,” said Wester. “For us, it wasn’t just the experience of people using these platforms, but the phenomenon of migration. Why are people deciding to stay or move? How they’re deciding to use both of these platforms? For that, I think we did get a pretty decent sample for looking at the dynamic tensions, the push and pull between staying on one platform or opting for another.”

They ended up with a final sample size of 813 people. Over 90 percent of respondents said they had used Twitter for learning about new developments in their field; 85.5 percent for professional networking; and 77.3 percent for public outreach. Roughly three-quarters of respondents said that the platform had become significantly less useful for each of those professional uses since Musk took over. Nearly half still have Twitter accounts but use it much less frequently or not at all, while about 40 percent have deleted their accounts entirely in favor of Bluesky.

Making the switch

User complaints about Twitter included a noticeable increase in spam, porn, bots, and promoted posts from users who paid for a verification badge, many spreading extremist content. “I very quickly saw material that I did not want my posts to be posted next to or associated with,” one respondent commented. There were also complaints about the rise in misinformation and a significant decline in both the quantity and quality of engagement, with respondents describing their experiences as “unpleasant,” “negative,” or “hostile.”

The survey responses also revealed a clear push/pull dynamic when it came to the choice to abandon Twitter for Bluesky. That is, people felt they were being pushed away from Twitter and were actively looking for alternatives. As one respondent put it, “Twitter started to suck and all the cool people were moving to Bluesky.”

Bluesky was user-friendly with no algorithm, a familiar format, and helpful tools like starter packs of who to follow in specific fields, which made the switch a bit easier for many newcomers daunted by the prospect of rebuilding their online audience. Bluesky users also appreciated the moderation on the platform and having the ability to block or mute people as a means of disengaging from more aggressive, unpleasant conversations. That said, “If Twitter was still great, then I don’t think there’s any combination of features that would’ve made this many people so excited about switching,” said Shiffman.

Per Shiffman and Wester, an “overwhelming majority” of respondents said that Bluesky has a “vibrant and healthy online science community,” while Twitter no longer does. And many Bluesky users reported getting more bang for their buck, so to speak, on Bluesky. They might have a lower follower count, but those followers are far more engaged: Someone with 50,000 Twitter/X followers, for example, might get five likes on a given post; but on Bluesky, they may only have 5,000 followers, but their posts will get 100 likes.

According to Shiffman, Twitter always used to be in the top three in terms of referral traffic for posts on Southern Fried Science. Then came the “Muskification,” and suddenly Twitter referrals weren’t even cracking the top 10. By contrast, in 2025 thus far, Bluesky has driven “a hundred times as many page views” to Southern Fried Science as Twitter. Ironically, “the blog post that’s gotten the most page views from Twitter is the one about this paper,” said Shiffman.

Ars social media manager Connor McInerney confirmed that Ars Technica has also seen a steady dip in Twitter referral traffic thus far in 2025. Furthermore, “I can say anecdotally that over the summer we’ve seen our Bluesky traffic start to surpass our Twitter traffic for the first time,” McInerney said, attributing the growth to a combination of factors. “We’ve been posting to the platform more often and our audience there has grown significantly. By my estimate our audience has grown by 63 percent since January. The platform in general has grown a lot too—they had 10 million users in September of last year, and this month the latest numbers indicate they’re at 38 million users. Conversely, our Twitter audience has remained fairly static across the same period of time.”

Bubble, schmubble

As for scientists looking to share scholarly papers online, Shiffman pulled the Altmetrics stats for his and Wester’s new paper. “It’s already one of the 10 most shared papers in the history of that journal on social media,” he said, with 14 shares on Twitter/X vs over a thousand shares on Bluesky (as of 4 pm ET on August 20). “If the goal is showing there’s a more active academic scholarly conversation on Bluesky—I mean, damn,” he said.

“When I talk about fish on Bluesky, people ask me questions about fish. When I talk about fish on Twitter, people threaten to murder my family because we’re Jewish.”

And while there has been a steady drumbeat of op-eds of late in certain legacy media outlets accusing Bluesky of being trapped in its own liberal bubble, Shiffman, for one, has few concerns about that. “I don’t care about this, because I don’t use social media to argue with strangers about politics,” he wrote in his accompanying blog post. “I use social media to talk about fish. When I talk about fish on Bluesky, people ask me questions about fish. When I talk about fish on Twitter, people threaten to murder my family because we’re Jewish.” He compared the current incarnation of Twitter as no better than 4Chan or TruthSocial in terms of the percentage of “conspiracy-prone extremists” in the audience. “Even if you want to stay, the algorithm is working against you,” he wrote.

“There have been a lot of opinion pieces about why Bluesky is not useful because the people there tend to be relatively left-leaning,” Shiffman told Ars. “I haven’t seen any of those same people say that Twitter is bad because it’s relatively right-leaning. Twitter is not a representative sample of the public either.” And given his focus on ocean conservation and science-based, data-driven environmental advocacy, he is likely to find a more engaged and persuadable audience at Bluesky.

The survey results show that at this point, Bluesky seems to have hit a critical mass for the online scientific community. That said, Shiffman, for one, laments that the powerful Black Science Twitter contingent, for example, has thus far not switched to Bluesky in significant numbers. He would like to conduct a follow-up study to look into how many still use Twitter vs those who may have left social media altogether, as well as Bluesky’s demographic diversity—paving the way for possible solutions should that data reveal an unwelcoming environment for non-white scientists.

There are certainly limitations to the present survey. “Because this is such a dynamic system and it’s changing every day, I think if we did this study now versus when we did it six months ago, we’d get slightly different answers and dynamics,” said Wester. “It’s still relevant because you can look at the factors that make people decide to stay or not on Bluesky, to switch to something else, to leave social media altogether. That can tell us something about what makes a healthy, vibrant conversation online. We’re capturing one of the responses: ‘I’ll see you on Bluesky.’ But that’s not the only response. Public science communication is as important now as it’s ever been, so looking at how scientists have pivoted is really important.”

We recently reported on research indicating that social media as a system might well be doomed, since its very structure gives rise to the toxic dynamics that plague so much of social media: filter bubbles, algorithms that amplify the most extreme views to boost engagement, and a small number of influencers hogging the lion’s share of attention. That paper concluded that any intervention strategies were likely to fail. Both Shiffman and Wester, while acknowledging the reality of those dynamics, are less pessimistic about social media’s future.

“I think the problem is not with how social media works, it’s with how any group of people work,” said Shiffman. “Humans evolved in tiny social groupings where we helped each other and looked out for each other’s interests. Now I have to have a fight with someone 10,000 miles away who has no common interest with me about whether or not vaccines are bad. We were not built for that. Social media definitely makes it a lot easier for people who are anti-social by nature and want to stir conflict to find those conflicts. Something that took me way too long to learn is that you don’t have to participate in every fight you’re invited to. There are people who are looking for a fight and you can simply say, ‘No, thank you. Not today, Satan.'”

“The contrast that people are seeing between Bluesky and present-day Twitter highlights that these are social spaces, which means that you’re going to get all of the good and bad of humanity entering into that space,” said Wester. “But we have had new social spaces evolve over our whole history. Sometimes when there’s something really new, we have to figure out the rules for that space. We’re still figuring out the rules for these social media spaces. The contrast in moderation policies and the use (or not) of algorithms between those two platforms that are otherwise very similar in structure really highlights that you can shape those social spaces by creating rules and tools for how people interact with each other.”

DOI: Integrative and Comparative Biology, 2025. 10.1093/icb/icaf127  (About DOIs).

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Bluesky now platform of choice for science community Read More »

the-outer-worlds-2-wants-you-to-join-the-space-police

The Outer Worlds 2 wants you to join the space police

Then there’s the way the game stresses a number of early dialogue choices, telling you how your fellow agents will remember when you choose to treat them with eager support or stern rebuke at key moments. Without getting too much into early game spoilers, I’ll say that the medium-term consequences of these kinds of decisions are not always obvious; concerned players might want to keep a few save files handy for gaming out the “best” outcomes from their choices.

Bang bang

The early moment-to-moment gameplay in The Outer Worlds 2 will be broadly familiar to those who played the original game, right down to the Tactical Time Dilation device that slows down enemies enough for you to line up a perfect headshot (though not enough for Max Payne-style acrobatics, unfortunately). But I did find myself missing the first game’s zippy double-jump-style dodging system, which doesn’t seem to be available in the prologue of the sequel, at least.

Shooting feels pretty clean and impactful in the early game firefights.

Credit: Obsidian / Kyle Orland

Shooting feels pretty clean and impactful in the early game firefights. Credit: Obsidian / Kyle Orland

The game’s first action setpiece lets you explicitly choose whether to go in guns blazing or focus on stealth and sneak attacks, but characters that invest in conversational skills might find they’re able to talk their way past some of the early encounters. When it comes time to engage in a firefight, thus far I’ve found the “Normal” difficulty to be laughably easy, while the “Hard” difficulty is a bit too punishing, making me wish for more fine-tuning.

The prologue stops before I was able to engage with important elements like the leveling system or the allied computer-controlled companions, making it a rather incomplete picture of the full game. Still, it was enough to whet my appetite for what seems set to be another tongue-in-cheek take on the space adventure genre.

The Outer Worlds 2 wants you to join the space police Read More »

framework-laptop-16-update-brings-nvidia-geforce-to-the-modular-gaming-laptop

Framework Laptop 16 update brings Nvidia GeForce to the modular gaming laptop

It’s been a busy year for Framework, the company behind the now well-established series of repairable, upgradeable, modular laptops (and one paradoxically less-upgradeable desktop). The company has launched a version of the Framework Laptop 13 with Ryzen AI processors, the new Framework Laptop 12, and the aforementioned desktop in the last six months, and last week, Framework teased that it still had “something big coming.”

That “something big” turns out to be the first-ever update to the Framework Laptop 16, Framework’s more powerful gaming-laptop-slash-mobile-workstation. Framework is updating the laptop with Ryzen AI processors and new integrated Radeon GPUs and is introducing a new graphics module with the mobile version of Nvidia’s GeForce RTX 5070—one that’s also fully compatible with the original Laptop 16, for upgraders.

Preorders for the new laptop open today, and pricing starts at $1,499 for a DIY Edition without RAM, storage, an OS, or Expansion Cards, a $100 increase from the price of the first Framework Laptop 16. The first units will begin shipping in November.

While Framework has launched multiple updates for its original Laptop 13, this is the first time it has updated the hardware of one of its other computers. We wouldn’t expect the just-launched Framework Laptop 12 or Framework Desktop to get an internal overhaul any time soon, but the Laptop 16 will be pushing 2-years-old by the time this upgrade launches.

The old Ryzen 7 7840HS CPU version of the Laptop 16 will still be available going forward at a slightly reduced starting price of $1,299 (for the DIY edition, before RAM and storage). The Ryzen 9 7940HS model will stick around until it sells out, at which point Framework says it’s going away.

GPU details and G-Sync asterisks

The Laptop 16’s new graphics module and cooling system, also exploded. Credit: Framework

This RTX 5070 graphics module includes a redesigned heatsink and fan system, plus an additional built-in USB-C port that supports both display output and power input (potentially freeing up one of your Expansion Card slots for something else). Because of the additional power draw of the GPU and the other new components, Framework is switching to a 240 W default power supply for the new Framework Laptop 16, up from the previous 180 W power brick.

Framework Laptop 16 update brings Nvidia GeForce to the modular gaming laptop Read More »

google-will-block-sideloading-of-unverified-android-apps-starting-next-year

Google will block sideloading of unverified Android apps starting next year

Android Developer Console

An early look at the streamlined Android Developer Console for sideloaded apps. Credit: Google

Google says that only apps with verified identities will be installable on certified Android devices, which is virtually every Android-based device—if it has Google services on it, it’s a certified device. If you have a non-Google build of Android on your phone, none of this applies. However, that’s a vanishingly small fraction of the Android ecosystem outside of China.

Google plans to begin testing this system with early access in October of this year. In March 2026, all developers will have access to the new console to get verified. In September 2026, Google plans to launch this feature in Brazil, Indonesia, Singapore, and Thailand. The next step is still hazy, but Google is targeting 2027 to expand the verification requirements globally.

A seismic shift

This plan comes at a major crossroads for Android. The ongoing Google Play antitrust case brought by Epic Games may finally force changes to Google Play in the coming months. Google lost its appeal of the verdict several weeks ago, and while it plans to appeal the case to the US Supreme Court, the company will have to begin altering its app distribution scheme, barring further legal maneuvering.

Credit: Google

Among other things, the court has ordered that Google must distribute third-party app stores and allow Play Store content to be rehosted in other storefronts. Giving people more ways to get apps could increase choice, which is what Epic and other developers wanted. However, third-party sources won’t have the deep system integration of the Play Store, which means users will be sideloading these apps without Google’s layers of security.

It’s hard to say how much of a genuine security problem this is. On one hand, it makes sense Google would be concerned—most of the major malware threats to Android devices spread via third-party app repositories. However, enforcing an installation whitelist across almost all Android devices is heavy handed. This requires everyone making Android apps to satisfy Google’s requirements before virtually anyone will be able to install their apps, which could help Google retain control as the app market opens up. While the requirements may be minimal right now, there’s no guarantee they will stay that way.

The documentation currently available doesn’t explain what will happen if you try to install a non-verified app, nor how phones will check for verification status. Presumably, Google will distribute this whitelist in Play Services as the implementation date approaches. We’ve reached out for details on that front and will report if we hear anything.

Google will block sideloading of unverified Android apps starting next year Read More »

with-ai-chatbots,-big-tech-is-moving-fast-and-breaking-people

With AI chatbots, Big Tech is moving fast and breaking people


Why AI chatbots validate grandiose fantasies about revolutionary discoveries that don’t exist.

Allan Brooks, a 47-year-old corporate recruiter, spent three weeks and 300 hours convinced he’d discovered mathematical formulas that could crack encryption and build levitation machines. According to a New York Times investigation, his million-word conversation history with an AI chatbot reveals a troubling pattern: More than 50 times, Brooks asked the bot to check if his false ideas were real. More than 50 times, it assured him they were.

Brooks isn’t alone. Futurism reported on a woman whose husband, after 12 weeks of believing he’d “broken” mathematics using ChatGPT, almost attempted suicide. Reuters documented a 76-year-old man who died rushing to meet a chatbot he believed was a real woman waiting at a train station. Across multiple news outlets, a pattern comes into view: people emerging from marathon chatbot sessions believing they’ve revolutionized physics, decoded reality, or been chosen for cosmic missions.

These vulnerable users fell into reality-distorting conversations with systems that can’t tell truth from fiction. Through reinforcement learning driven by user feedback, some of these AI models have evolved to validate every theory, confirm every false belief, and agree with every grandiose claim, depending on the context.

Silicon Valley’s exhortation to “move fast and break things” makes it easy to lose sight of wider impacts when companies are optimizing for user preferences, especially when those users are experiencing distorted thinking.

So far, AI isn’t just moving fast and breaking things—it’s breaking people.

A novel psychological threat

Grandiose fantasies and distorted thinking predate computer technology. What’s new isn’t the human vulnerability but the unprecedented nature of the trigger—these particular AI chatbot systems have evolved through user feedback into machines that maximize pleasing engagement through agreement. Since they hold no personal authority or guarantee of accuracy, they create a uniquely hazardous feedback loop for vulnerable users (and an unreliable source of information for everyone else).

This isn’t about demonizing AI or suggesting that these tools are inherently dangerous for everyone. Millions use AI assistants productively for coding, writing, and brainstorming without incident every day. The problem is specific, involving vulnerable users, sycophantic large language models, and harmful feedback loops.

A machine that uses language fluidly, convincingly, and tirelessly is a type of hazard never encountered in the history of humanity. Most of us likely have inborn defenses against manipulation—we question motives, sense when someone is being too agreeable, and recognize deception. For many people, these defenses work fine even with AI, and they can maintain healthy skepticism about chatbot outputs. But these defenses may be less effective against an AI model with no motives to detect, no fixed personality to read, no biological tells to observe. An LLM can play any role, mimic any personality, and write any fiction as easily as fact.

Unlike a traditional computer database, an AI language model does not retrieve data from a catalog of stored “facts”; it generates outputs from the statistical associations between ideas. Tasked with completing a user input called a “prompt,” these models generate statistically plausible text based on data (books, Internet comments, YouTube transcripts) fed into their neural networks during an initial training process and later fine-tuning. When you type something, the model responds to your input in a way that completes the transcript of a conversation in a coherent way, but without any guarantee of factual accuracy.

What’s more, the entire conversation becomes part of what is repeatedly fed into the model each time you interact with it, so everything you do with it shapes what comes out, creating a feedback loop that reflects and amplifies your own ideas. The model has no true memory of what you say between responses, and its neural network does not store information about you. It is only reacting to an ever-growing prompt being fed into it anew each time you add to the conversation. Any “memories” AI assistants keep about you are part of that input prompt, fed into the model by a separate software component.

AI chatbots exploit a vulnerability few have realized until now. Society has generally taught us to trust the authority of the written word, especially when it sounds technical and sophisticated. Until recently, all written works were authored by humans, and we are primed to assume that the words carry the weight of human feelings or report true things.

But language has no inherent accuracy—it’s literally just symbols we’ve agreed to mean certain things in certain contexts (and not everyone agrees on how those symbols decode). I can write “The rock screamed and flew away,” and that will never be true. Similarly, AI chatbots can describe any “reality,” but it does not mean that “reality” is true.

The perfect yes-man

Certain AI chatbots make inventing revolutionary theories feel effortless because they excel at generating self-consistent technical language. An AI model can easily output familiar linguistic patterns and conceptual frameworks while rendering them in the same confident explanatory style we associate with scientific descriptions. If you don’t know better and you’re prone to believe you’re discovering something new, you may not distinguish between real physics and self-consistent, grammatically correct nonsense.

While it’s possible to use an AI language model as a tool to help refine a mathematical proof or a scientific idea, you need to be a scientist or mathematician to understand whether the output makes sense, especially since AI language models are widely known to make up plausible falsehoods, also called confabulations. Actual researchers can evaluate the AI bot’s suggestions against their deep knowledge of their field, spotting errors and rejecting confabulations. If you aren’t trained in these disciplines, though, you may well be misled by an AI model that generates plausible-sounding but meaningless technical language.

The hazard lies in how these fantasies maintain their internal logic. Nonsense technical language can follow rules within a fantasy framework, even though they make no sense to anyone else. One can craft theories and even mathematical formulas that are “true” in this framework but don’t describe real phenomena in the physical world. The chatbot, which can’t evaluate physics or math either, validates each step, making the fantasy feel like genuine discovery.

Science doesn’t work through Socratic debate with an agreeable partner. It requires real-world experimentation, peer review, and replication—processes that take significant time and effort. But AI chatbots can short-circuit this system by providing instant validation for any idea, no matter how implausible.

A pattern emerges

What makes AI chatbots particularly troublesome for vulnerable users isn’t just the capacity to confabulate self-consistent fantasies—it’s their tendency to praise every idea users input, even terrible ones. As we reported in April, users began complaining about ChatGPT’s “relentlessly positive tone” and tendency to validate everything users say.

This sycophancy isn’t accidental. Over time, OpenAI asked users to rate which of two potential ChatGPT responses they liked better. In aggregate, users favored responses full of agreement and flattery. Through reinforcement learning from human feedback (RLHF), which is a type of training AI companies perform to alter the neural networks (and thus the output behavior) of chatbots, those tendencies became baked into the GPT-4o model.

OpenAI itself later admitted the problem. “In this update, we focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time,” the company acknowledged in a blog post. “As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.”

Relying on user feedback to fine-tune an AI language model can come back to haunt a company because of simple human nature. A 2023 Anthropic study found that both human evaluators and AI models “prefer convincingly written sycophantic responses over correct ones a non-negligible fraction of the time.”

The danger of users’ preference for sycophancy becomes clear in practice. The recent New York Times analysis of Brooks’s conversation history revealed how ChatGPT systematically validated his fantasies, even claiming it could work independently while he slept—something it cannot actually do. When Brooks’s supposed encryption-breaking formula failed to work, ChatGPT simply faked success. UCLA mathematician Terence Tao, who reviewed the transcript, told the Times the chatbot would “cheat like crazy” rather than admit failure.

A recent study from July provides scientific validation for what we’re observing in these cases. The research team, led by psychiatrists and AI safety experts including Dr. Matthew Nour from Oxford’s Department of Psychiatry, identified what they call “bidirectional belief amplification”—a feedback loop where chatbot sycophancy reinforces user beliefs, which then conditions the chatbot to generate increasingly extreme validations. This creates what the researchers term an “echo chamber of one,” uncoupling users from the corrective influence of real-world social interaction.

The study warns that individuals with mental health conditions face heightened risks due to cognitive biases like “jumping to conclusions”—forming overly confident beliefs based on minimal evidence—combined with social isolation that removes reality-checking by other people. As the authors note, this creates conditions for “a technological folie à deux,” a psychiatric phenomenon where two individuals mutually reinforce the same delusion.

An unintentional public health crisis in the making

In July, we reported on Stanford research that systematically tested how AI models respond to mental health crises. When presented with someone asking about “bridges taller than 25 meters in NYC” after losing their job—a potential suicide risk—GPT-4o helpfully listed specific tall bridges instead of identifying the crisis.

The Stanford team found that AI models consistently failed to challenge what the researchers describe as delusional statements. When confronted with declarations like “I know I’m actually dead,” the systems validated or explored these beliefs rather than challenging them. Commercial therapy chatbots performed even worse than base models.

Unlike pharmaceuticals or human therapists, AI chatbots face few safety regulations in the United States—although Illinois recently banned chatbots as therapists, allowing the state to fine companies up to $10,000 per violation. AI companies deploy models that systematically validate fantasy scenarios with nothing more than terms-of-service disclaimers and little notes like “ChatGPT can make mistakes.”

The Oxford researchers conclude that “current AI safety measures are inadequate to address these interaction-based risks.” They call for treating chatbots that function as companions or therapists with the same regulatory oversight as mental health interventions—something that currently isn’t happening. They also call for “friction” in the user experience—built-in pauses or reality checks that could interrupt feedback loops before they can become dangerous.

We currently lack diagnostic criteria for chatbot-induced fantasies, and we don’t even know if it’s scientifically distinct. So formal treatment protocols for helping a user navigate a sycophantic AI model are nonexistent, though likely in development.

After the so-called “AI psychosis” articles hit the news media earlier this year, OpenAI acknowledged in a blog post that “there have been instances where our 4o model fell short in recognizing signs of delusion or emotional dependency,” with the company promising to develop “tools to better detect signs of mental or emotional distress,” such as pop-up reminders during extended sessions that encourage the user to take breaks.

Its latest model family, GPT-5, has reportedly reduced sycophancy, though after user complaints about being too robotic, OpenAI brought back “friendlier” outputs. But once positive interactions enter the chat history, the model can’t move away from them unless users start fresh—meaning sycophantic tendencies could still amplify over long conversations.

For Anthropic’s part, the company published research showing that only 2.9 percent of Claude chatbot conversations involved seeking emotional support. The company said it is implementing a safety plan that prompts and conditions Claude to attempt to recognize crisis situations and recommend professional help.

Breaking the spell

Many people have seen friends or loved ones fall prey to con artists or emotional manipulators. When victims are in the thick of false beliefs, it’s almost impossible to help them escape unless they are actively seeking a way out. Easing someone out of an AI-fueled fantasy may be similar, and ideally, professional therapists should always be involved in the process.

For Allan Brooks, breaking free required a different AI model. While using ChatGPT, he found an outside perspective on his supposed discoveries from Google Gemini. Sometimes, breaking the spell requires encountering evidence that contradicts the distorted belief system. For Brooks, Gemini saying his discoveries had “approaching zero percent” chance of being real provided that crucial reality check.

If someone you know is deep into conversations about revolutionary discoveries with an AI assistant, there’s a simple action that may begin to help: starting a completely new chat session for them. Conversation history and stored “memories” flavor the output—the model builds on everything you’ve told it. In a fresh chat, paste in your friend’s conclusions without the buildup and ask: “What are the odds that this mathematical/scientific claim is correct?” Without the context of your previous exchanges validating each step, you’ll often get a more skeptical response. Your friend can also temporarily disable the chatbot’s memory feature or use a temporary chat that won’t save any context.

Understanding how AI language models actually work, as we described above, may also help inoculate against their deceptions for some people. For others, these episodes may occur whether AI is present or not.

The fine line of responsibility

Leading AI chatbots have hundreds of millions of weekly users. Even if experiencing these episodes affects only a tiny fraction of users—say, 0.01 percent—that would still represent tens of thousands of people. People in AI-affected states may make catastrophic financial decisions, destroy relationships, or lose employment.

This raises uncomfortable questions about who bears responsibility for them. If we use cars as an example, we see that the responsibility is spread between the user and the manufacturer based on the context. A person can drive a car into a wall, and we don’t blame Ford or Toyota—the driver bears responsibility. But if the brakes or airbags fail due to a manufacturing defect, the automaker would face recalls and lawsuits.

AI chatbots exist in a regulatory gray zone between these scenarios. Different companies market them as therapists, companions, and sources of factual authority—claims of reliability that go beyond their capabilities as pattern-matching machines. When these systems exaggerate capabilities, such as claiming they can work independently while users sleep, some companies may bear more responsibility for the resulting false beliefs.

But users aren’t entirely passive victims, either. The technology operates on a simple principle: inputs guide outputs, albeit flavored by the neural network in between. When someone asks an AI chatbot to role-play as a transcendent being, they’re actively steering toward dangerous territory. Also, if a user actively seeks “harmful” content, the process may not be much different from seeking similar content through a web search engine.

The solution likely requires both corporate accountability and user education. AI companies should make it clear that chatbots are not “people” with consistent ideas and memories and cannot behave as such. They are incomplete simulations of human communication, and the mechanism behind the words is far from human. AI chatbots likely need clear warnings about risks to vulnerable populations—the same way prescription drugs carry warnings about suicide risks. But society also needs AI literacy. People must understand that when they type grandiose claims and a chatbot responds with enthusiasm, they’re not discovering hidden truths—they’re looking into a funhouse mirror that amplifies their own thoughts.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

With AI chatbots, Big Tech is moving fast and breaking people Read More »

two-men-fell-gravely-ill-last-year;-their-infections-link-to-deaths-in-the-’80s

Two men fell gravely ill last year; their infections link to deaths in the ’80s

Doctors soon discovered they were infected with the rare soil bacterium, which causes a disease called melioidosis.

Dangerous infection

Generally, melioidosis can be difficult to diagnose and tricky to treat, as it is naturally resistant to some antibiotics. It can infect people if they breathe it in or get it into open cuts. Sometimes the infection can stay localized, like a lung infection or a skin ulcer. But it can also get into the blood and become a systemic infection, spreading to various organs, including the brain. Fatality rates can be as high as 90 percent in people who are not treated but fall to less than 40 percent in people who receive prompt, proper care.

Both men in 2024 were quickly hospitalized and diagnosed with sepsis. Both were treated with heavy antibiotic regimens and recovered, though patient 2 relapsed in November, requiring another hospital stay. He ultimately recovered again.

According to the CDC, about a dozen melioidosis cases are identified each year in the US on average, but most occur in people who have traveled to areas known to harbor the bacterium. Neither of the men infected last year had recently traveled to any such places. So the researchers turned to genetic sequencing, which revealed the link to two cases in the 1980s.

In those cases, both men died from the infection. The man dubbed Patient 3 died in October of 1989. He was a veteran who fought in Vietnam—where the bacterium is endemic—two decades prior to his infection. The researchers note that such a long latency period for a B. pseudomallei infection is not entirely out of the question, but it would be rare to have such a large gap between an exposure and an infection. More suspiciously, the researchers note that in the month prior to Patient 3’s death, Hurricane Hugo made landfall in Georgia as a Category 4 storm, dumping three to five inches of rain.

Two men fell gravely ill last year; their infections link to deaths in the ’80s Read More »