Author name: Kris Guyer

achieving-lasting-remission-for-hiv

Achieving lasting remission for HIV


Promising trials using engineered antibodies suggest that “functional cures” may be in reach.

A digital illustration of an HIV-infected T cell. Once infected, the immune cell is hijacked by the virus to produce and release many new viral particles before dying. As more T-cells are destroyed, the immune system is progressively weakened. Credit: Kateryna Kon/Science Photo Library via Getty Images

Around the world, some 40 million people are living with HIV. And though progress in treatment means the infection isn’t the death sentence it once was, researchers have never been able to bring about a cure. Instead, HIV-positive people must take a cocktail of antiretroviral drugs for the rest of their lives.

But in 2025, researchers reported a breakthrough that suggests that a “functional” cure for HIV—a way to keep HIV under control long-term without constant treatment—may indeed be possible. In two independent trials using infusions of engineered antibodies, some participants remained healthy without taking antiretrovirals, long after the interventions ended.

In one of the trials—the FRESH trial, led by virologist Thumbi Ndung’u of the University of KwaZulu-Natal and the Africa Health Research Institute in South Africa—four of 20 participants maintained undetectable levels of HIV for a median of 1.5 years without taking antiretrovirals. In the other, the RIO trial set in the United Kingdom and Denmark and led by Sarah Fidler, a clinical doctor and HIV research expert at Imperial College London, six of 34 HIV-positive participants have maintained viral control for at least two years.

These landmark proof-of-concept trials show that the immune system can be harnessed to fight HIV. Researchers are now looking to conduct larger, more representative trials to see whether antibodies can be optimized to work for more people.

“I do think that this kind of treatment has the opportunity to really shift the dial,” Fidler says, “because they are long-acting drugs”—with effects that can persist even after they’re no longer in the body. “So far, we haven’t seen anything that works like that.”

People with HIV can live long, healthy lives if they take antiretrovirals. But their lifespans are still generally shorter than those of people without the virus. And for many, daily pills or even the newer, bimonthly injections present significant financial, practical, and social challenges, including stigma. “Probably for the last about 15 or 20 years, there’s been this real push to go, ‘How can we do better?’” says Fidler.

The dream, she says, is “what people call curing HIV, or a remission in HIV.” But that has presented a huge challenge because HIV is a master of disguise. The virus evolves so quickly after infection that the body can’t produce new antibodies quickly enough to recognize and neutralize it.

And some HIV hides out in cells in an inactive state, invisible to the immune system. These evasion tactics have outwitted a long succession of cure attempts. Aside from a handful of exceptional stem-cell transplants, interventions have consistently fallen short of a complete cure—one that fully clears HIV from the body.

A functional cure would be the next best thing. And that’s where a rare phenomenon offers hope: Some individuals with long-term HIV do eventually produce antibodies that can neutralize the virus, though too late to fully shake it. These potent antibodies target critical, rarely changing parts of HIV proteins in the outer viral membrane; these proteins are used by the virus to infect cells. The antibodies, able to recognize a broad range of virus strains, are termed broadly neutralizing.

Scientists are now racing to find the most potent broadly neutralizing antibodies and engineer them into a functional cure. FRESH and RIO are arguably the most promising attempts yet.

In the FRESH trial, scientists chose two antibodies that, combined, were likely to be effective against HIV strains known as HIV-1 clade C, which is dominant in sub-Saharan Africa. The trial enrolled young women from a high-prevalence community as part of a broader social empowerment program. The program had started the women on HIV treatment within three days of their infection several years earlier.

The RIO trial, meanwhile, chose two well-studied antibodies shown to be broadly effective. Its participants were predominantly white men around age 40 who also had gone on antiretroviral drugs soon after infection. Most had HIV-1 clade B, which is more prevalent in Europe.

By pairing antibodies, the researchers aimed to decrease the likelihood that HIV would develop resistance—a common challenge in antibody treatments—since the virus would need multiple mutations to evade both.

Participants in both trials were given an injection of the antibodies, which were modified to last around six months in the body. Then their treatment with antiviral medications was paused. The hope was that the antibodies would work with the immune system to kill active HIV particles, keeping the virus in check. If the effect didn’t last, HIV levels would rise after the antibodies had been broken down, and the participants would resume antiretroviral treatment.

Excitingly, however, findings in both trials suggested that, in some people, the interventions prompted an ongoing, independent immune response, which researchers likened to the effect of a vaccine.

In the RIO trial, 22 of the 34 people receiving broadly neutralizing antibodies had not experienced a viral rebound by 20 weeks. At this point, they were given another antibody shot. Beyond 96 weeks—long after the antibodies had disappeared — six still had viral levels low enough to remain off antiviral medications.

An additional 34 participants included in the study as controls received only a saline infusion and mostly had to resume treatment in four to six weeks; all but three were back on treatment within 20 weeks.

A similar pattern was observed in FRESH (although, because it was mostly a safety study, this trial did not include control participants). Six of the 20 participants retained viral suppression for 48 weeks after the antibody infusion, and of those, four remained off treatment for more than a year. Two and a half years after the intervention, one remains off antiretroviral medication. Two others also maintained viral control but eventually chose to go back on treatment for personal and logistical reasons.

It’s unknown when the virus might rebound, so the researchers are cautious about calling participants in remission functionally cured. However, the antibodies clearly seem to coax the immune system to fight the virus. Attached to infected cells, they signal to immune cells to come in and kill.

And importantly, researchers believe that this immune response to the antibodies may also stimulate immune cells called CD8+ T cells, which then hunt down HIV-infected cells. This could create an “immune memory” that helps the body control HIV even after the antibodies are gone.

The response resembles the immune control seen in a tiny group (fewer than 1 percent) of individuals with HIV, known as elite controllers. These individuals suppress HIV without the help of antiretrovirals, confining it mostly to small reservoirs. That the trials helped some participants do something similar is exciting, says Joel Blankson, an infectious diseases expert at Johns Hopkins Medicine, who coauthored an article about natural HIV controllers in the 2024 Annual Review of Immunology. “It might teach us how to be able to do this much more effectively, and we might be able to get a higher percentage of people in remission.”

One thing scientists do know is that the likelihood of achieving sustained control is higher if people start antiretroviral treatment soon after infection, when their immune systems are still intact and their viral reservoirs are small.

But post-treatment control can occur even in people who started taking antiretrovirals a long time after they were initially infected: a group known as chronically infected patients. “It just happens less often,” Blankson says. “So it’s possible the strategies that are involved in these studies will also apply to patients who are chronically infected.”

A particularly promising finding of the RIO trial was that the antibodies also affected dormant HIV hiding out in some cells. These reservoirs are how the virus rebounds when people stop treatment, and antibodies aren’t thought to touch them. Researchers speculate that the T cells boosted by the antibodies can recognize and kill latently infected cells that display even trace amounts of HIV on their surface.

The FRESH intervention, meanwhile, targeted the stubborn HIV reservoirs more directly through incorporating another drug, called vesatolimod. It’s designed to stimulate immune cells to respond to the HIV threat, and hopefully to “shock” dormant HIV particles out of hiding. Once that happens, the immune system, with the help of the antibodies, can recognize and kill them.

The results of FRESH are exciting, Ndung’u says, “because it might indicate that this regimen worked, to an extent. Because this was a small study, it’s difficult to, obviously, make very hard conclusions.” His team is still investigating the data.

Once he secures funding, Ndung’u aims to run a larger South Africa-based trial including chronically infected individuals. Fidler’s team, meanwhile, is recruiting for a third arm of RIO to try to determine whether pausing antiretroviral treatment for longer before administering the antibodies prompts a stronger immune response.

A related UK-based trial, called AbVax, will add a T-cell-stimulating drug to the mix to see whether it enhances the long-lasting, vaccine-like effect of the antibodies. “It could be that combining different approaches enhances different bits of the immune system, and that’s the way forward,” says Fidler, who is a co-principal investigator on that study.

For now, Fidler and Ndung’u will continue to track the virally suppressed participants — who, for the first time since they received their HIV diagnoses, are living free from the demands of daily treatment.

This story originally appeared at Knowable Magazine

Photo of Knowable Magazine

Knowable Magazine explores the real-world significance of scholarly work through a journalistic lens.

Achieving lasting remission for HIV Read More »

reintroduced-carnivores’-impacts-on-ecosystems-are-still-coming-into-focus

Reintroduced carnivores’ impacts on ecosystems are still coming into focus

He said he was surprised by how few studies show evidence of wolves, bears, and cougars having an effect on elk, moose, and deer populations. Instead, the biggest driver of changing elk population numbers across the West is humanity.

“In most mainland systems, it’s only when you combine wolves with grizzly bears and you take away human hunting as a substantial component that you see them suppressing prey numbers,” Wilmers said. “Outside of that, they’re mostly background noise against how humans are managing their prey populations.”

In some studies, ungulate populations actually increased slightly in the presence of wolves and grizzlies, Wilmers said, likely because human wildlife managers overestimated the effects of predators as they reduced hunting quotas.

“This is a much-needed review, as it is well executed, and highlights areas where more research is needed,” said Rae Wynn-Grant, a wildlife ecologist and cohost of the television show Mutual of Omaha’s Wild Kingdom Protecting the Wild, in an email to Inside Climate News. Wynn-Grant was not involved in the paper, and her work was not part of its survey.

In her view, the paper showed that an increase in predators on the landscape doesn’t automatically balance plant communities. “Our world would be much simpler if it did,” she said, “but the evidence suggests that so many variables factor into if and how ecosystems respond to increases in carnivore population in North America.”

Yellowstone, with its expansive valleys, relatively easy access, and status as an iconic, protected landscape, has become a hotspot for scientists trying to answer an existential question: Is it possible for an ecosystem that’s lost keystone large carnivores to be restored to a pre-extinction state upon their reintroduction?

Wilmers doesn’t think scientists have answered that question yet, except to show that it can take decades to untangle the web of factors driving ecological shifts in a place like Yellowstone. Any changes that do occur when a predator is driven to extinction may be impossible to reverse quickly, he said.

Yellowstone’s alternative stable state was a point echoed by researchers in both camps of the trophic cascade debate, and it is one Wilmers believes is vital to understand when evaluating the tradeoffs of large-carnivore reintroduction.

“You’d be better off avoiding the loss of beavers and wolves in the first place than you would be accepting that loss and trying to restore them later,” he said.

This story originally appeared on Inside Climate News

Reintroduced carnivores’ impacts on ecosystems are still coming into focus Read More »

the-big-nonprofits-post-2025

The Big Nonprofits Post 2025

There remain lots of great charitable giving opportunities out there.

I have now had three opportunities to be a recommender for the Survival and Flourishing Fund (SFF). I wrote in detail about my first experience back in 2021, where I struggled to find worthy applications.

The second time around in 2024, there was an abundance of worthy causes. In 2025 there were even more high quality applications, many of which were growing beyond our ability to support them.

Thus this is the second edition of The Big Nonprofits Post, primarily aimed at sharing my findings on various organizations I believe are doing good work, to help you find places to consider donating in the cause areas and intervention methods that you think are most effective, and to offer my general perspective on how I think about choosing where to give.

This post combines my findings from the 2024 and 2025 rounds of SFF, and also includes some organizations that did not apply to either round, so inclusion does not mean that they necessarily applied at all.

This post is already very long, so the bar is higher for inclusion this year than it was last year, especially for new additions.

If you think they are better places to give and better causes to back, act accordingly, especially if they’re illegible or obscure. You don’t need my approval.

The Big Nonprofits List 2025 is also available as a website, where you can sort by mission, funding needed or confidence, or do a search and have handy buttons.

Organizations where I have the highest confidence in straightforward modest donations now, if your goals and model of the world align with theirs, are in bold, for those who don’t want to do a deep dive.

  1. Table of Contents.

  2. A Word of Warning.

  3. A Note To Charities.

  4. Use Your Personal Theory of Impact.

  5. Use Your Local Knowledge.

  6. Unconditional Grants to Worthy Individuals Are Great.

  7. Do Not Think Only On the Margin, and Also Use Decision Theory.

  8. Compare Notes With Those Individuals You Trust.

  9. Beware Becoming a Fundraising Target.

  10. And the Nominees Are.

  11. Organizations that Are Literally Me.

  12. Balsa Research.

  13. Don’t Worry About the Vase.

  14. Organizations Focusing On AI Non-Technical Research and Education.

  15. Lightcone Infrastructure.

  16. The AI Futures Project.

  17. Effective Institutions Project (EIP) (For Their Flagship Initiatives).

  18. Artificial Intelligence Policy Institute (AIPI).

  19. AI Lab Watch.

  20. Palisade Research.

  21. CivAI.

  22. AI Safety Info (Robert Miles).

  23. Intelligence Rising.

  24. Convergence Analysis.

  25. IASEAI (International Association for Safe and Ethical Artificial Intelligence).

  26. The AI Whistleblower Initiative.

  27. Organizations Related To Potentially Pausing AI Or Otherwise Having A Strong International AI Treaty. (Blank)

  28. Pause AI and Pause AI Global.

  29. MIRI.

  30. Existential Risk Observatory.

  31. Organizations Focusing Primary On AI Policy and Diplomacy.

  32. Center for AI Safety and the CAIS Action Fund.

  33. Foundation for American Innovation (FAI).

  34. Encode AI (Formerly Encode Justice).

  35. The Future Society.

  36. Safer AI.

  37. Institute for AI Policy and Strategy (IAPS).

  38. AI Standards Lab (Holtman Research).

  39. Safe AI Forum.

  40. Center For Long Term Resilience.

  41. Simon Institute for Longterm Governance.

  42. Legal Advocacy for Safe Science and Technology.

  43. Institute for Law and AI.

  44. Macrostrategy Research Institute.

  45. Secure AI Project.

  46. Organizations Doing ML Alignment Research.

  47. Model Evaluation and Threat Research (METR).

  48. Alignment Research Center (ARC).

  49. Apollo Research.

  50. Cybersecurity Lab at University of Louisville.

  51. Timaeus.

  52. Simplex.

  53. Far AI.

  54. Alignment in Complex Systems Research Group.

  55. Apart Research.

  56. Transluce.

  57. Organizations Doing Other Technical Work. (Blank)

  58. AI Analysts @ RAND.

  59. Organizations Doing Math, Decision Theory and Agent Foundations.

  60. Orthogonal.

  61. Topos Institute.

  62. Eisenstat Research.

  63. AFFINE Algorithm Design.

  64. CORAL (Computational Rational Agents Laboratory).

  65. Mathematical Metaphysics Institute.

  66. Focal at CMU.

  67. Organizations Doing Cool Other Stuff Including Tech.

  68. ALLFED.

  69. Good Ancestor Foundation.

  70. Charter Cities Institute.

  71. Carbon Copies for Independent Minds.

  72. Organizations Focused Primarily on Bio Risk. (Blank)

  73. Secure DNA.

  74. Blueprint Biosecurity.

  75. Pour Domain.

  76. ALTER Israel.

  77. Organizations That Can Advise You Further.

  78. Effective Institutions Project (EIP) (As A Donation Advisor).

  79. Longview Philanthropy.

  80. Organizations That then Regrant to Fund Other Organizations.

  81. SFF Itself (!).

  82. Manifund.

  83. AI Risk Mitigation Fund.

  84. Long Term Future Fund.

  85. Foresight.

  86. Centre for Enabling Effective Altruism Learning & Research (CEELAR).

  87. Organizations That are Essentially Talent Funnels.

  88. AI Safety Camp.

  89. Center for Law and AI Risk.

  90. Speculative Technologies.

  91. Talos Network.

  92. MATS Research.

  93. Epistea.

  94. Emergent Ventures.

  95. AI Safety Cape Town.

  96. ILINA Program.

  97. Impact Academy Limited.

  98. Atlas Computing.

  99. Principles of Intelligence (Formerly PIBBSS).

  100. Tarbell Center.

  101. Catalyze Impact.

  102. CeSIA within EffiSciences.

  103. Stanford Existential Risk Initiative (SERI).

  104. Non-Trivial.

  105. CFAR.

  106. The Bramble Center.

  107. Final Reminders.

The SFF recommender process is highly time constrained, and in general I am highly time constrained.

Even though I used well beyond the number of required hours in both 2024 and 2025, there was no way to do a serious investigation of all the potentially exciting applications. Substantial reliance on heuristics was inevitable.

Also your priorities, opinions, and world model could be very different from mine.

If you are considering donating a substantial (to you) amount of money, please do the level of personal research and consideration commensurate with the amount of money you want to give away.

If you are considering donating a small (to you) amount of money, or if the requirement to do personal research might mean you don’t donate to anyone at all, I caution the opposite: Only do the amount of optimization and verification and such that is worth its opportunity cost. Do not let the perfect be the enemy of the good.

For more details of how the SFF recommender process works, see my post on the process.

Note that donations to some of the organizations below may not be tax deductible.

I apologize in advance for any errors, any out of date information, and for anyone who I included who I did not realize would not want to be included. I did my best to verify information, and to remove any organizations that do not wish to be included.

If you wish me to issue a correction of any kind, or to update your information, I will be happy to do that at least through the end of the year.

If you wish me to remove your organization entirely, for any reason, I will do that, too.

What I unfortunately cannot do, in most cases, is take the time to analyze or debate beyond that. I also can’t consider additional organizations for inclusion. My apologies.

The same is true for the website version.

I am giving my full opinion on all organizations listed, but where I feel an organization would be a poor choice for marginal dollars even within its own cause and intervention area, or I anticipate my full opinion would not net help them, they are silently not listed.

Listen to arguments and evidence. But do not let me, or anyone else, tell you any of:

  1. What is important.

  2. What is a good cause.

  3. What types of actions are best to make the change you want to see in the world.

  4. What particular strategies are most promising.

  5. That you have to choose according to some formula or you’re an awful person.

This is especially true when it comes to policy advocacy, and especially in AI.

If an organization is advocating for what you think is bad policy, or acting in a way that does bad things, don’t fund them!

If an organization is advocating or acting in a way you think is ineffective, don’t fund them!

Only fund people you think advance good changes in effective ways.

Not cases where I think that. Cases where you think that.

During SFF, I once again in 2025 chose to deprioritize all meta-level activities and talent development. I see lots of good object-level work available to do, and I expected others to often prioritize talent and meta activities.

The counterargument to this is that quite a lot of money is potentially going to be freed up soon as employees of OpenAI and Anthropic gain liquidity, including access to DAFs (donor advised funds). This makes expanding the pool more exciting.

I remain primarily focused on those who in some form were helping ensure AI does not kill everyone. I continue to see highest value in organizations that influence lab or government AI policies in the right ways, and continue to value Agent Foundations style and other off-paradigm technical research approaches.

I believe that the best places to give are the places where you have local knowledge.

If you know of people doing great work or who could do great work, based on your own information, then you can fund and provide social proof for what others cannot.

The less legible to others the cause, and the harder it is to fit it into the mission statements and formulas of various big donors, the more excited you should be to step forward, if the cause is indeed legible to you. This keeps you grounded, helps others find the show (as Tyler Cowen says), is more likely to be counterfactual funding, and avoids information cascades or looking under streetlights for the keys.

Most importantly it avoids adverse selection. The best legible opportunities for funding, the slam dunk choices? Those are probably getting funded. The legible things that are left are the ones that others didn’t sufficiently fund yet.

If you know why others haven’t funded, because they don’t know about the opportunity? That’s a great trade.

The process of applying for grants, raising money, and justifying your existence sucks.

A lot.

It especially sucks for many of the creatives and nerds that do a lot of the best work.

It also sucks to have to worry about running out of money, or to have to plan your work around the next time you have to justify your existence, or to be unable to be confident in choosing ambitious projects.

If you have to periodically go through this process, and are forced to continuously worry about making your work legible and how others will judge it, that will substantially hurt your true productivity. At best it is a constant distraction. By default, it is a severe warping effect. A version of this phenomenon is doing huge damage to academic science.

As I noted in my AI updates, the reason this blog exists is that I received generous, essentially unconditional, anonymous support to ‘be a public intellectual’ and otherwise pursue whatever I think is best. My benefactors offer their opinions when we talk because I value their opinions, but they never try to influence my decisions, and I feel zero pressure to make my work legible in order to secure future funding.

If you have money to give, and you know individuals who should clearly be left to do whatever they think is best without worrying about raising or earning money, who you are confident would take advantage of that opportunity and try to do something great, then giving them unconditional grants is a great use of funds, including giving them ‘don’t worry about reasonable expenses’ levels of funding.

This is especially true when combined with ‘retrospective funding,’ based on what they have already done. It would be great if we established a tradition and expectation that people who make big contributions can expect such rewards.

Not as unconditionally, it’s also great to fund specific actions and projects and so on that you see not happening purely through lack of money, especially when no one is asking you for money.

This includes things that you want to exist, but that don’t have a path to sustainability or revenue, or would be importantly tainted if they needed to seek that. Fund the project you want to see in the world. This can also be purely selfish, often in order to have something yourself you need to create it for everyone, and if you’re tempted there’s a good chance that’s a great value.

Resist the temptation to think purely on the margin, asking only what one more dollar can do. The incentives get perverse quickly. Organizations are rewarded for putting their highest impact activities in peril. Organizations that can ‘run lean’ or protect their core activities get punished.

If you always insist on being a ‘funder of last resort’ that requires key projects or the whole organization otherwise be in trouble, you’re defecting. Stop it.

Also, you want to do some amount of retrospective funding. If people have done exceptional work in the past, you should be willing to give them a bunch more rope in the future, above and beyond the expected value of their new project.

Don’t make everyone constantly reprove their cost effectiveness each year, or at least give them a break. If someone has earned your trust, then if this is the project they want to do next, presume they did so because of reasons, although you are free to disagree with those reasons.

This especially goes for AI lab employees. There’s no need for everyone to do all of their own research, you can and should compare notes with those who you can trust, and this is especially great when they’re people you know well.

What I do worry about is too much outsourcing of decisions to larger organizations and institutional structures, including those of Effective Altruism but also others, or letting your money go directly to large foundations where it will often get captured.

Jaan Tallinn created SFF in large part to intentionally take his donation decisions out of his hands, so he could credibly tell people those decisions were out of his hands, so he would not have to constantly worry that people he talked to were attempting to fundraise.

This is a huge deal. Communication, social life and a healthy information environment can all be put in danger by this.

Time to talk about the organizations themselves.

Rather than offer precise rankings, I divided by cause category and into three confidence levels.

  1. High confidence means I have enough information to be confident the organization is at least a good pick.

  2. Medium or low confidence means exactly that – I have less confidence that the choice is wise, and you should give more consideration to doing your own research.

  3. If my last investigation was in 2024, and I haven’t heard anything, I will have somewhat lower confidence now purely because my information is out of date.

Low confidence is still high praise, and very much a positive assessment! Most organizations would come nowhere close to making the post at all.

If an organization is not listed, that does not mean I think they would be a bad pick – they could have asked not to be included, or I could be unaware of them or their value, or I could simply not have enough confidence to list them.

I know how Bayesian evidence works, but this post is not intended as a knock on anyone, in any way. Some organizations that are not here would doubtless have been included, if I’d had more time.

I try to give a sense of how much detailed investigation and verification I was able to complete, and what parts I have confidence in versus not. Again, my lack of confidence will often be purely about my lack of time to get that confidence.

Unless I already knew them from elsewhere, assume no organizations here got as much attention as they deserve before you decide on what for you is a large donation.

I’m tiering based on how I think about donations from you, from outside SFF.

I think the regranting organizations were clearly wrong choices from within SFF, but are reasonable picks if you don’t want to do extensive research, especially if you are giving small.

In terms of funding levels needed, I will similarly divide into three categories.

They roughly mean this, to the best of my knowledge:

Low: Could likely be fully funded with less than ~$250k.

Medium: Could plausibly be fully funded with between ~$250k and ~$2 million.

High: Could probably make good use of more than ~$2 million.

These numbers may be obsolete by the time you read this. If you’re giving a large amount relative to what they might need, check with the organization first, but also do not be so afraid of modest amounts of ‘overfunding’ as relieving fundraising pressure is valuable and as I noted it is important not to only think on the margin.

A lot of organizations are scaling up rapidly, looking to spend far more money than they have in the past. This was true in 2024, and 2025 has only accelerated this trend. A lot more organizations are in ‘High’ now but I decided not to update the thresholds.

Everyone seems eager to double their headcount. I’m not putting people into the High category unless I am confident they can scalably absorb more funding after SFF.

The person who I list as the leader of an organization will sometimes accidentally be whoever was in charge of fundraising rather than strictly the leader. Partly the reason for listing it is to give context and some of you can go ‘oh right, I know who that is,’ and the other reason is that all organization names are often highly confusing – adding the name of the organization’s leader allows you a safety check, to confirm that you are indeed pondering the same organization I am thinking of!

This is my post, so I get to list Balsa Research first. (I make the rules here.)

If that’s not what you’re interested in, you can of course skip the section.

Focus: Groundwork starting with studies to allow repeal of the Jones Act

Leader: Zvi Mowshowitz

Funding Needed: Medium

Confidence Level: High

Our first target continues to be the Jones Act. With everything happening in 2025, it is easy to get distracted. We have decided to keep eyes on the prize.

We’ve commissioned two studies. Part of our plan is to do more of them, and also do things like draft model repeals and explore ways to assemble a coalition and to sell and spread the results, to enable us to have a chance at repeal.

We also are networking, gathering information, publishing findings where there are information holes or where we can offer superior presentations, planning possible collaborations, and responding quickly in case of a crisis in related areas. We believe we meaningfully reduced the probability that certain very damaging additional maritime regulations could have become law, as described in this post.

Other planned cause areas include NEPA reform and federal housing policy (to build more housing where people want to live).

We have one full time worker on the case and are trying out a potential second one.

I don’t intend to have Balsa work on AI or assist with my other work, or to take personal compensation, unless I get substantially larger donations than we have had previously, that are either dedicated to those purposes or that at least come with the explicit understanding I should consider doing that.

Further donations would otherwise be for general support.

The pitch for Balsa, and the reason I am doing it, is in two parts.

I believe Jones Act repeal and many other abundance agenda items are neglected, tractable and important, and that my way of focusing on what matters can advance them. That the basic work that needs doing is not being done, it would be remarkably cheap to do a lot of it and do it well, and that this would give us a real if unlikely chance to get a huge win if circumstances break right. Chances for progress currently look grim, but winds can change quickly, we need to be ready, and also we need to stand ready to mitigate the chance things get even worse.

I also believe that if people do not have hope for the future, do not have something to protect and fight for, or do not think good outcomes are possible, that people won’t care about protecting the future. And that would be very bad, because we are going to need to fight to protect our future if we want to have one, or have a good one.

You got to give them hope.

I could go on, but I’ll stop there.

Donate here, or get in touch at [email protected].

Focus: Zvi Mowshowitz writes a lot of words, really quite a lot.

Leader: Zvi Mowshowitz

Funding Needed: None, but it all helps, could plausibly absorb a lot

Confidence Level: High

You can also of course always donate directly to my favorite charity.

By which I mean me. I always appreciate your support, however large or small.

The easiest way to help on a small scale (of course) is a Substack subscription or Patreon. Paid substack subscriptions punch above their weight because they assist with the sorting algorithm, and also for their impact on morale.

If you want to go large then reach out to me.

Thanks to generous anonymous donors, I am able to write full time and mostly not worry about money. That is what makes this blog possible.

I want to as always be 100% clear: I am totally, completely fine as is, as is the blog.

Please feel zero pressure here, as noted throughout there are many excellent donation opportunities out there.

Additional funds are still welcome. There are levels of funding beyond not worrying.

Such additional support is always highly motivating.

Also there are absolutely additional things I could and would throw money at to improve the blog, potentially including hiring various forms of help or even expanding to more of a full news operation or startup.

As a broad category, these are organizations trying to figure things out regarding AI existential risk, without centrally attempting to either do technical work or directly to influence policy and discourse.

Lightcone Infrastructure is my current top pick across all categories. If you asked me where to give a dollar, or quite a few dollars, to someone who is not me, I would tell you to fund Lightcone Infrastructure.

Focus: Rationality community infrastructure, LessWrong, the Alignment Forum and Lighthaven.

Leaders: Oliver Habryka and Rafe Kennedy

Funding Needed: High

Confidence Level: High

Disclaimer: I am on the CFAR board which used to be the umbrella organization for Lightcone and still has some lingering ties. My writing appears on LessWrong. I have long time relationships with everyone involved. I have been to several reliably great workshops or conferences at their campus at Lighthaven. So I am conflicted here.

With that said, Lightcone is my clear number one. I think they are doing great work, both in terms of LessWrong and also Lighthaven. There is the potential, with greater funding, to enrich both of these tasks, and also for expansion.

There is a large force multiplier here (although that is true of a number of other organizations I list as well).

They made their 2024 fundraising pitch here, I encourage reading it.

Where I am beyond confident is that if LessWrong, the Alignment Forum or the venue Lighthaven were unable to continue, any one of these would be a major, quite bad unforced error.

LessWrong and the Alignment Forum a central part of the infrastructure of the meaningful internet.

Lighthaven is miles and miles away the best event venue I have ever seen. I do not know how to convey how much the design contributes to having a valuable conference, designed to facilitate the best kinds of conversations via a wide array of nooks and pathways designed with the principles of Christopher Alexander. This contributes to and takes advantage of the consistently fantastic set of people I encounter there.

The marginal costs here are large (~$3 million per year, some of which is made up by venue revenue), but the impact here is many times that, and I believe they can take on more than ten times that amount and generate excellent returns.

If we can go beyond short term funding needs, they can pay off the mortgage to secure a buffer, and buy up surrounding buildings to secure against neighbors (who can, given this is Berkeley, cause a lot of trouble) and to secure more housing and other space. This would secure the future of the space.

I would love to see them then expand into additional spaces. They note this would also require the right people.

Donate through every.org, or contact [email protected].

Focus: AI forecasting research projects, governance research projects, and policy engagement, in that order.

Leader: Daniel Kokotajlo, with Eli Lifland

Funding Needed: None Right Now

Confidence Level: High

Of all the ‘shut up and take my money’ applications in the 2024 round where I didn’t have a conflict of interest, even before I got to participate in their tabletop wargame exercise, I judged this the most ‘shut up and take my money’-ist. At The Curve, I got to participate in the exercise and participate in discussions around it, I’ve since done several more, and I’m now even more confident this is an excellent pick.

I continue to think it is a super strong case for retroactive funding as well. Daniel walked away from OpenAI, and what looked to be most of his net worth, to preserve his right to speak up. That led to us finally allowing others at OpenAI to speak up as well.

This is how he wants to speak up, and try to influence what is to come, based on what he knows. I don’t know if it would have been my move, but the move makes a lot of sense, and it has already paid off big. AI 2027 was read by the Vice President, who took it seriously, along with many others, and greatly informed the conversation. I believe the discourse is much improved as a result, and the possibility space has improved.

Note that they are comfortably funded through the medium term via private donations and their recent SFF grant.

Donate through every.org, or contact Jonas Vollmer.

Focus: AI governance, advisory and research, finding how to change decision points

Leader: Ian David Moss

Funding Needed: Medium

Confidence Level: High

EIP operates on two tracks. They have their flagship initiatives and attempts to intervene directly. They also serve as donation advisors, which I discuss in that section.

Their current flagship initiative plans are to focus on the intersection of AI governance and the broader political and economic environment, especially risks of concentration of power and unintentional power shifts from humans to AIs.

Can they indeed identify ways to target key decision points, and make a big difference? One can look at their track record. I’ve been asked to keep details confidential, but based on my assessment of private information, I confirmed they’ve scored some big wins including that they helped improve safety practices at a major AI lab, and will plausibly continue to be able to have high leverage and punch above their funding weight. You can read about some of the stuff that they can talk about here in a Founders Pledge write up.

It seems important that they be able to continue their work on all this.

I also note that in SFF I allocated less funding to EIP than I would in hindsight have liked to allocate, due to quirks about the way matching funds worked and my attempts to adjust my curves to account for it.

Donate through every.org, or contact [email protected].

Focus: Primarily polls about AI, also lobbying and preparing for crisis response.

Leader: Daniel Colson.

Also Involved: Mark Beall and Daniel Eth

Funding Needed: High

Confidence Level: High

Those polls about how the public thinks about AI, including several from last year around SB 1047 including an adversarial collaboration with Dean Ball?

Remarkably often, these are the people that did that. Without them, few would be asking those questions. Ensuring that someone is asking is super helpful. With some earlier polls I was a bit worried that the wording was slanted, and that will always be a concern with a motivated pollster, but I think recent polls have been much better at this, and they are as close to neutral as one can reasonably expect.

There are those who correctly point out that even now in 2025 the public’s opinions are weakly held and low salience, and that all you’re often picking up is ‘the public does not like AI and it likes regulation.’

Fair enough. Someone still has to show this, and show it applies here, and put a lie to people claiming the public goes the other way, and measure how things change over time. We need to be on top of what the public is thinking, including to guard against the places it wants to do dumb interventions.

They don’t only do polling. They also do lobbying and prepare for crisis responses.

Donate here, or use their contact form to get in touch.

Focus: Monitoring the AI safety record and plans of the frontier AI labs

Leader: Zach Stein-Perlman

Funding Needed: Low

Confidence Level: High

Zach has consistently been one of those on top of the safety and security plans, the model cards and other actions of the major labs, both writing up detailed feedback from a skeptical perspective and also compiling the website and its scores in various domains. Zach is definitely in the ‘demand high standards that would actually work and treat everything with skepticism’ school of all this, which I feel is appropriate, and I’ve gotten substantial benefit of his work several times.

However, due to uncertainty about whether this is the best thing for him to work on, and thus not being confident he will have this ball, Zach is not currently accepting funding, but would like people who are interested in donations to contact him via Intercom on the AI Lab Watch website.

Focus: AI capabilities demonstrations to inform decision makers on capabilities and loss of control risks

Leader: Jeffrey Ladish

Funding Needed: High

Confidence Level: High

This is clearly an understudied approach. People need concrete demonstrations. Every time I get to talking with people in national security or otherwise get closer to decision makers who aren’t deeply into AI and in particular into AI safety concerns, you need to be as concrete and specific as possible – that’s why I wrote Danger, AI Scientist, Danger the way I did. We keep getting rather on-the-nose fire alarms, but it would be better if we could get demonstrations even more on the nose, and get them sooner, and in a more accessible way.

Since last time, I’ve had a chance to see their demonstrations in action several times, and I’ve come away feeling that they have mattered.

I have confidence that Jeffrey is a good person to continue to put this plan into action.

To donate, click here or email [email protected].

Focus: Visceral demos of AI risks

Leader: Sid Hiregowdara

Funding Needed: High

Confidence Level: Medium

I was impressed by the demo I was given (so a demo demo?). There’s no question such demos fill a niche and there aren’t many good other candidates for the niche.

The bear case is that the demos are about near term threats, so does this help with the things that matter? It’s a good question. My presumption is yes, that raising situational awareness about current threats is highly useful. That once people notice that there is danger, that they will ask better questions, and keep going. But I always do worry about drawing eyes to the wrong prize.

To donate, click here or email [email protected].

Focus: Making YouTube videos about AI safety, starring Rob Miles

Leader: Rob Miles

Funding Needed: Low

Confidence Level: High

I think these are pretty great videos in general, and given what it costs to produce them we should absolutely be buying their production. If there is a catch, it is that I am very much not the target audience, so you should not rely too much on my judgment of what is and isn’t effective video communication on this front, and you should confirm you like the cost per view.

To donate, join his patreon or contact him at [email protected].

Focus: Facilitation of the AI scenario roleplaying exercises including Intelligence Rising

Leader: Shahar Avin

Funding Needed: Low

Confidence Level: High

I haven’t had the opportunity to play Intelligence Rising, but I have read the rules to it, and heard a number of strong after action reports (AARs). They offered this summary of insights in 2024. The game is clearly solid, and it would be good if they continue to offer this experience and if more decision makers play it, in addition to the AI Futures Project TTX.

To donate, reach out to [email protected].

Focus: A series of sociotechnical reports on key AI scenarios, governance recommendations and conducting AI awareness efforts.

Leader: David Kristoffersson

Funding Needed: High (combining all tracks)

Confidence Level: Low

They do a variety of AI safety related things. Their Scenario Planning continues to be what I find most exciting, although I’m also somewhat interested in their modeling cooperation initiative as well. It’s not as neglected as it was a year ago, but we could definitely use more work than we’re getting. For track record you check out their reports from 2024 in this area, and see if you think that was good work, and the rest of their website has more.

Their donation page is here, or you can contact [email protected].

Focus: Grab bag of AI safety actions, research, policy, community, conferences, standards

Leader: Mark Nitzberg

Funding Needed: High

Confidence Level: Low

There are some clearly good things within the grab bag, including some good conferences and it seems substantial support for Geoffrey Hinton, but for logistical reasons I didn’t do a close investigation to see if the overall package looked promising. I’m passing the opportunity along.

Donate here, or contact them at [email protected].

Focus: Whistleblower advising and resources for those in AI labs warning about catastrophic risks, including via Third Opinion.

Leader: Larl Koch

Funding Needed: High

Confidence Level: Medium

I’ve given them advice, and at least some amount of such resourcing is obviously highly valuable. We certainly should be funding Third Opinion, so that if someone wants to blow the whistle they can have help doing it. The question is whether if it scales this loses its focus.

Donate here, or reach out to [email protected].

Focus: Advocating for a pause on AI, including via in-person protests

Leader: Holly Elmore (USA) and Joep Meindertsma (Global)

Funding Level: Low

Confidence Level: Medium

Some people say that those who believe we should pause AI would be better off staying quiet about it, rather than making everyone look foolish.

I disagree.

I don’t think pausing right now is a good idea. I think we should be working on the transparency, state capacity, technical ability and diplomatic groundwork to enable a pause in case we need one, but that it is too early to actually try to implement one.

But I do think that if you believe we should pause? Then you should say that we should pause. I very much appreciate people standing up, entering the arena and saying what they believe in, including quite often in my comments. Let the others mock all they want.

If you agree with Pause AI that the right move is to Pause AI, and you don’t have strong strategic disagreements with their approach, then you should likely be excited to fund this. If you disagree, you have better options.

Either way, they are doing what they, given their beliefs, should be doing.

Donate here, or reach out to [email protected].

Focus: At this point, primarily AI policy advocacy, letting everyone know that If Anyone Builds It, Everyone Dies and all that, plus some research

Leaders: Malo Bourgon, Eliezer Yudkowsky

Funding Needed: High

Confidence Level: High

MIRI, concluding that it is highly unlikely alignment will make progress rapidly enough otherwise, has shifted its strategy to largely advocate for major governments coming up with an international agreement to halt AI progress and to do communications, although research still looks to be a large portion of the budget, and they have dissolved its agent foundations team. Hence the book.

That is not a good sign for the world, but it does reflect their beliefs.

They have accomplished a lot. The book is at least a modest success on its own terms in moving things forward.

I strongly believe they should be funded to continue to fight for a better future however they think is best, even when I disagree with their approach.

This is very much a case of ‘do this if and only if this aligns with your model and preferences.’

Donate here, or reach out to [email protected].

Focus: Pause-relevant research

Leader: Otto Barten

Funding Needed: Low

Confidence Level: Medium

Mostly this is the personal efforts of Otto Barten, ultimately advocating for a conditional pause. For modest amounts of money, in prior years he’s managed to have a hand in some high profile existential risk events and get the first x-risk related post into TIME magazine. He’s now pivoted to pause-relevant research (as in how to implement one via treaties, off switches, evals and threat models).

The track record and my prior investigation is less relevant now, so I’ve bumped them down to low confidence, but it would definitely be good to have the technical ability to pause and not enough work is being done on that.

To donate, click here, or get in touch at [email protected].

Some of these organizations also look at bio policy or other factors, but I judge those here as being primarily concerned with AI.

In this area, I am especially keen to rely on people with good track records, who have shown that they can build and use connections and cause real movement. It’s so hard to tell what is and isn’t effective, otherwise. Often small groups can pack a big punch, if they know where to go, or big ones can be largely wasted – I think that most think tanks on most topics are mostly wasted even if you believe in their cause.

Focus: AI research, field building and advocacy

Leaders: Dan Hendrycks

Funding Needed: High

Confidence Level: High

They did the CAIS Statement on AI Risk, helped SB 1047 get as far as it did, and have improved things in many other ways. Some of these other ways are non-public. Some of those non-public things are things I know about and some aren’t. I will simply say the counterfactual policy world is a lot worse. They’ve clearly been punching well above their weight in the advocacy space. The other arms are no slouch either, lots of great work here. Their meaningful rolodex and degree of access is very strong and comes with important insight into what matters.

They take a lot of big swings and aren’t afraid of taking risks or looking foolish. I appreciate that, even when a given attempt doesn’t fully work.

If you want to focus on their policy, then you can fund their 501(c)(4), the Action Fund, since 501c(3)s are limited in how much they can spend on political activities, keeping in mind the tax implications of that. If you don’t face any tax implications I would focus first on the 501(c)(4).

We should definitely find a way to fund at least their core activities.

Donate to the Action Fund for funding political activities, or the 501(c)(3) for research. They can be contacted at [email protected].

Focus: Tech policy research, thought leadership, educational outreach to government, fellowships.

Leader: Grace Meyer

Funding Needed: High

Confidence Level: High

FAI is centrally about innovation. Innovation is good, actually, in almost all contexts, as is building things and letting people do things.

AI is where this gets tricky. People ‘supporting innovation’ are often using that as an argument against all regulation of AI, and indeed I am dismayed to see so many push so hard on this exactly in the one place I think they are deeply wrong, when we could work together on innovation (and abundance) almost anywhere else.

FAI and resident AI studiers Samuel Hammond and Dean Ball are in an especially tough spot, because they are trying to influence AI policy from the right and not get expelled from that coalition or such spaces. There’s a reason we don’t have good alternative options for this. That requires striking a balance.

I’ve definitely had my disagreements with Hammond, including strong disagreements with his 95 theses on AI although I agreed far more than I disagreed, and I had many disagreements with his AI and Leviathan as well. He’s talked on the Hill about ‘open model diplomacy.’

I’ve certainly had many strong disagreements with Dean Ball as well, both in substance and rhetoric. Sometimes he’s the voice of reason and careful analysis, other times (from my perspective) he can be infuriating, most recently in discussions of the Superintelligence Statement, remarkably often he does some of both in the same post. He was perhaps the most important opposer of SB 1047 and went on to a stint at the White House before joining FAI.

Yet here is FAI, rather high on the list. They’re a unique opportunity, you go to war with the army you have, and both Ball and Hammond have stuck their neck out in key situations. Hammond came out opposing the moratorium. They’ve been especially strong on compute governance.

I have private reasons to believe that FAI has been effective and we can expect that to continue, and its other initiatives also mostly seem good. We don’t have to agree on everything else, so long as we all want good things and are trying to figure things out, and I’m confident that is the case here.

I am especially excited that they can speak to the Republican side of the aisle in the R’s native language, which is difficult for most in this space to do.

An obvious caveat is that if you are not interested in the non-AI pro-innovation part of the agenda (I certainly approve, but it’s not obviously a high funding priority for most readers) then you’ll want to ensure it goes where you want it.

To donate, click here, or contact them using the form here.

Focus: Youth activism on AI safety issues

Leader: Sneha Revanur

Funding Needed: Medium

Confidence Level: High

They started out doing quite a lot on a shoestring budget by using volunteers, helping with SB 1047 and in several other places. Now they are turning pro, and would like to not be on a shoestring. I think they have clearly earned that right. The caveat is risk of ideological capture. Youth organizations tend to turn to left wing causes.

The risk here is that this effectively turns mostly to AI ethics concerns. It’s great that they’re coming at this without having gone through the standard existential risk ecosystem, but that also heightens the ideological risk.

I continue to believe it is worth the risk.

To donate, go here. They can be contacted at [email protected].

Focus: AI governance standards and policy.

Leader: Nicolas Moës

Funding Needed: High

Confidence Level: High

I’ve seen credible sources saying they do good work, and that they substantially helped orient the EU AI Act to at least care at all about frontier general AI. The EU AI Act was not a good bill, but it could easily have been a far worse one, doing much to hurt AI development while providing almost nothing useful for safety.

We should do our best to get some positive benefits out of the whole thing. And indeed, they helped substantially improve the EU Code of Practice, which was in hindsight remarkably neglected otherwise.

They’re also active around the world, including the USA and China.

Donate here, or contact them here.

Focus: Specifications for good AI safety, also directly impacting EU AI policy

Leader: Henry Papadatos

Funding Needed: Medium

Confidence Level: Low

I’ve been impressed by Simeon and his track record, including here. Simeon is stepping down as leader to start a company, which happened post-SFF, so they would need to be reevaluated in light of this before any substantial donation.

Donate here, or contact them at [email protected].

Focus: Papers and projects for ‘serious’ government circles, meetings with same, policy research

Leader: Peter Wildeford

Funding Needed: Medium

Confidence Level: High

I have a lot of respect for Peter Wildeford, and they’ve clearly put in good work and have solid connections down, including on the Republican side where better coverage is badly needed, and the only other solid lead we have is FAI. Peter has also increasingly been doing strong work directly via Substack and Twitter that has been helpful to me and that I can observe directly. They are strong on hardware governance and chips in particular (as is FAI).

Given their goals and approach, funding from outside the traditional ecosystem sources would be extra helpful, ideally such efforts are fully distinct from OpenPhil.

With the shifting landscape and what I’ve observed, I’m moving them up to high confidence and priority.

Donate here, or contact them at [email protected].

Focus: Accelerating the writing of AI safety standards

Leaders: Koen Holtman and Chin Ze Shen

Funding Needed: Medium

Confidence Level: High

They help facilitate the writing of AI safety standards, for EU/UK/USA, including on the recent EU Code of Practice. They have successfully gotten some of their work officially incorporated, and another recommender with a standards background was impressed by the work and team.

This is one of the many things that someone has to do, and where if you step up and do it and no one else does that can go pretty great. Having now been involved in bill minutia myself, I know it is thankless work, and that it can really matter, both for public and private standards, and they plan to pivot somewhat to private standards.

I’m raising my confidence to high that this is at least a good pick, if you want to fund the writing of standards.

To donate, go here or reach out to [email protected].

Focus: International AI safety conferences

Leader: Fynn Heide and Sophie Thomson

Funding Needed: Medium

Confidence Level: Low

They run the IDAIS series of conferences, including successful ones involving China. I do wish I had a better model of what makes such a conference actually matter versus not mattering, but these sure seem like they should matter, and certainly well worth their costs to run them.

To donate, contact them using the form at the bottom of the page here.

Focus: UK Policy Think Tank focusing on ‘extreme AI risk and biorisk policy.’

Leader: Angus Mercer

Funding Needed: High

Confidence Level: Low

The UK has shown promise in its willingness to shift its AI regulatory focus to frontier models in particular. It is hard to know how much of that shift to attribute to any particular source, or otherwise measure how much impact there has been or might be on final policy.

They have endorsements of their influence from philosopher Toby Ord, Former Special Adviser to the UK Prime Minister Logan Graham, and Senior Policy Adviser Nitarshan Rajkumar.

I reached out to a source with experience in the UK government who I trust, and they reported back they are a fan and pointed to some good things they’ve helped with. There was a general consensus that they do good work, and those who investigated where impressed.

However, I have concerns. Their funding needs are high, and they are competing against many others in the policy space, many of which have very strong cases. I also worry their policy asks are too moderate, which might be an advantage for others.

My lower confidence this year is a combination of worries about moderate asks, worry about organizational size, and worries about the shift in governments in the UK and the UK’s ability to have real impact elsewhere. But if you buy the central idea of this type of lobbying through the UK and are fine with a large budget, go for it.

Donate here, or reach out to [email protected].

Focus: Foundations and demand for international cooperation on AI governance and differential tech development

Leader: Konrad Seifert and Maxime Stauffer

Funding Needed: High

Confidence Level: Low

As with all things diplomacy, hard to tell the difference between a lot of talk and things that are actually useful. Things often look the same either way for a long time. A lot of their focus is on the UN, so update either way based on how useful you think that approach is, and also that makes it even harder to get a good read.

They previously had a focus on the Global South and are pivoting to China, which seems like a more important focus.

To donate, scroll down on this page to access their donation form, or contact them at [email protected].

Focus: Legal team for lawsuits on catastrophic risk and to defend whistleblowers.

Leader: Tyler Whitmer

Funding Needed: Medium

Confidence Level: Medium

I wasn’t sure where to put them, but I suppose lawsuits are kind of policy by other means in this context, or close enough?

I buy the core idea of having a legal team on standby for catastrophic risk related legal action in case things get real quickly is a good idea, and I haven’t heard anyone else propose this, although I do not feel qualified to vet the operation. They were one of the organizers of the NotForPrivateGain.org campaign against the OpenAI restructuring.

I definitely buy the idea of an AI Safety Whistleblower Defense Fund, which they are also doing. Knowing there will be someone to step up and help if it comes to that changes the dynamics in helpful ways.

Donors who are interested in making relatively substantial donations or grants should contact [email protected], for smaller amounts click here.

Focus: Legal research on US/EU law on transformational AI, fellowships, talent

Leader: Moritz von Knebel

Involved: Gabe Weil

Funding Needed: High

Confidence Level: Low

I’m confident that they should be funded at all, the question is if this should be scaled up quite a lot, and what aspects of this would scale in what ways. If you can be convinced that the scaling plans are worthwhile this could justify a sizable donation.

Donate here, or contact them at [email protected].

Focus: Amplify Nick Bostrom

Leader: Toby Newberry

Funding Needed: High

Confidence Level: Low

If you think Nick Bostrom is doing great work and want him to be more effective, then this is a way to amplify that work. In general, ‘give top people support systems’ seems like a good idea that is underexplored.

Get in touch at [email protected].

Focus: Advocacy for public safety and security protocols (SSPs) and related precautions

Leader: Nick Beckstead

Funding Needed: High

Confidence Level: High

I’ve had the opportunity to consult and collaborate with them and I’ve been consistently impressed. They’re the real deal, they pay attention to detail and care about making it work for everyone, and they’ve got results. I’m a big fan.

Donate here, or contact them at [email protected].

This category should be self-explanatory. Unfortunately, a lot of good alignment work still requires charitable funding. The good news is that (even more than last year when I wrote the rest of this introduction) there is a lot more funding, and willingness to fund, than there used to be, and also the projects generally look more promising.

The great thing about interpretability is that you can be confident you are dealing with something real. The not as great thing is that this can draw too much attention to interpretability, and that you can fool yourself into thinking that All You Need is Interpretability.

The good news is that several solid places can clearly take large checks.

I didn’t investigate too deeply on top of my existing knowledge here in 2024, because at SFF I had limited funds and decided that direct research support wasn’t a high enough priority, partly due to it being sufficiently legible.

We should be able to find money previously on the sidelines eager to take on many of these opportunities. Lab employees are especially well positioned, due to their experience and technical knowledge and connections, to evaluate such opportunities, and also to provide help with access and spreading the word.

Formerly ARC Evaluations.

Focus: Model evaluations

Leaders: Beth Barnes, Chris Painter

Funding Needed: High

Confidence Level: High

Originally I wrote that we hoped to be able to get large funding for METR via non-traditional sources. That happened last year, and METR got major funding. That’s great news. Alas, they once again have to hit the fundraising trail.

METR has proven to be the gold standard for outside evaluations of potentially dangerous frontier model capabilities, and has proven its value even more so in 2025.

We very much need these outside evaluations, and to give the labs every reason to use them and no excuse not to use them, and their information has been invaluable. In an ideal world the labs would be fully funding METR, but they’re not.

So this becomes a place where we can confidently invest quite a bit of capital, make a legible case for why it is a good idea, and know it will probably be well spent.

If you can direct fully ‘square’ ‘outside’ funds that need somewhere legible to go and are looking to go large? I love METR for that.

To donate, click here. They can be contacted at [email protected].

Focus: Theoretically motivated alignment work

Leader: Jacob Hilton

Funding Needed: Medium

Confidence Level: High

There’s a long track record of good work here, and Paul Christiano remained excited as of 2024. If you are looking to fund straight up alignment work and don’t have a particular person or small group in mind, this is certainly a safe bet to put additional funds to good use and attract good talent.

Donate here, or reach out to [email protected].

Focus: Scheming, evaluations, and governance

Leader: Marius Hobbhahn

Funding Needed: Medium

Confidence Level: High

This is an excellent thing to focus on, and one of the places we are most likely to be able to show ‘fire alarms’ that make people sit up and notice. Their first year seems to have gone well, one example would be their presentation at the UK safety summit that LLMs can strategically deceive their primary users when put under pressure. They will need serious funding to fully do the job in front of them, hopefully like METR they can be helped by the task being highly legible.

They suggest looking at this paper, and also this one. I can verify that they are the real deal and doing the work.

To donate, reach out to [email protected].

Focus: Support for Roman Yampolskiy’s lab and work

Leader: Roman Yampolskiy

Funding Needed: Low

Confidence Level: High

Roman Yampolskiy is the most pessimistic known voice about our chances of not dying from AI, and got that perspective on major platforms like Joe Rogan and Lex Fridman. He’s working on a book and wants to support PhD students.

Supporters can make a tax detectable gift to the University, specifying that they intend to fund Roman Yampolskiy and the Cyber Security lab.

Focus: Interpretability research

Leader:Jesse Hoogland, Daniel Murfet, Stan van Wingerden

Funding Needed: High

Confidence Level: High

Timaeus focuses on interpretability work and sharing their results. The set of advisors is excellent, including Davidad and Evan Hubinger. Evan, John Wentworth and Vanessa Kosoy have offered high praise, and there is evidence they have impacted top lab research agendas. They’re done what I think is solid work, although I am not so great at evaluating papers directly.

If you’re interested in directly funding interpretability research, that all makes this seem like a slam dunk. I’ve confirmed that this all continues to hold true in 2025.

To donate, get in touch with Jesse at [email protected]. If this is the sort of work that you’re interested in doing, they also have a discord at http://devinterp.com/discord.

Focus: Mechanistic interpretability of how inference breaks down

Leader: Paul Riechers and Adam Shai

Funding Needed: Medium

Confidence Level: High

I am not as high on them as I am on Timaeus, but they have given reliable indicators that they will do good interpretability work. I’d (still) feel comfortable backing them.

Donate here, or contact them via webform.

Focus: Interpretability and other alignment research, incubator, hits based approach

Leader: Adam Gleave

Funding Needed: High

Confidence Level: Medium

They take the hits based approach to research, which is correct. I’ve gotten confirmation that they’re doing the real thing here. In an ideal world everyone doing the real thing would get supported, and they’re definitely still funding constrained.

To donate, click here. They can be contacted at [email protected].

Focus: AI alignment research on hierarchical agents and multi-system interactions

Leader: Jan Kulveit

Funding Needed: Medium

Confidence Level: High

I liked ACS last year, and since then we’ve seen Gradual Disempowerment and other good work, which means this now falls into the category ‘this having funding problems would be an obvious mistake.’ I ranked them very highly in SFF, and there should be a bunch more funding room.

To donate, reach out to [email protected], and note that you are interested in donating to ACS specifically.

Focus: AI safety hackathons, MATS-style programs and AI safety horizon scanning.

Leaders: Esben Kran, Jason Schreiber

Funding Needed: Medium

Confidence Level: Low

I’m (still) confident in their execution of the hackathon idea, which was the central pitch at SFF although they inform me generally they’re more centrally into the MATS-style programs. My doubt for the hackathons is on the level of ‘is AI safety something that benefits from hackathons.’ Is this something one can, as it were, hack together usefully? Are the hackathons doing good counterfactual work? Or is this a way to flood the zone with more variations on the same ideas?

As with many orgs on the list, this one makes sense if and only if you buy the plan, and is one of those ‘I’m not excited but can see it being a good fit for someone else.’

To donate, click here. They can be reached at [email protected].

Focus: Specialized superhuman systems for understanding and overseeing AI

Leaders: Jacob Steinhardt, Sarah Schwettmann

Funding Needed: High

Confidence Level: Medium

Last year they were a new org. Now they have now grown to 14 people and now have a solid track record and want to keep growing. I have confirmation the team is credible. The plan for scaling themselves is highly ambitious, with planned scale well beyond what SFF can fund. I haven’t done anything like the investigation into their plans and capabilities you would need before placing a bet that big, as AI research of all kinds gets expensive quickly.

If there is sufficient appetite to scale the amount of privately funded direct work of this type, then this seems like a fine place to look. I am optimistic on them finding interesting things, although on a technical level I am skeptical of the larger plan.

To donate, reach out to [email protected].

Focus: Developing ‘AI analysts’ that can assist policy makers.

Leaders: John Coughlan

Funding Needed: High

Confidence Level: Medium

This is a thing that RAND should be doing and that should exist. There are obvious dangers here, but I don’t think this makes them substantially worse and I do think this can potentially improve policy a lot. RAND is well placed to get the resulting models to be actually used. That would enhance state capacity, potentially quite a bit.

The problem is that doing this is not cheap, and while funding this shouldn’t fall to those reading this, it plausibly does. This could be a good place to consider sinking quite a large check, if you believe in the agenda.

Donate here.

Right now it looks likely that AGI will be based around large language models (LLMs). That doesn’t mean this is inevitable. I would like our chances better if we could base our ultimate AIs around a different architecture, one that was more compatible with being able to get it to do what we would like it to do.

One path for this is agent foundations, which involves solving math to make the programs work instead of relying on inscrutable giant matrices.

Even if we do not manage that, decision theory and game theory are potentially important for navigating the critical period in front of us, for life in general, and for figuring out what the post-transformation AI world might look like, and thus what choice we make now might do to impact that.

There are not that many people working on these problems. Actual Progress would be super valuable. So even if we expect the median outcome does not involve enough progress to matter, I think it’s still worth taking a shot.

The flip side is you worry about people ‘doing decision theory into the void’ where no one reads their papers or changes their actions. That’s a real issue. As is the increased urgency of other options. Still, I think these efforts are worth supporting, in general.

Focus: AI alignment via agent foundations

Leaders: Tamsin Leake

Funding Needed: Medium

Confidence Level: High

I have funded Orthogonal in the past. They are definitely doing the kind of work that, if it succeeded, might actually amount to something, and would help us get through this to a future world we care about. It’s a long shot, but a long shot worth trying. They very much have the ‘old school’ Yudkowsky view that relatively hard takeoff is likely and most alignment approaches are fools errands. My sources are not as enthusiastic as they once were, but there are only a handful of groups trying that have any chance at all, and this still seems like one of them.

Donate here, or get in touch at [email protected].

Focus: Math for AI alignment

Leaders: Brendan Fong and David Spivak.

Funding Needed: High

Confidence Level: High

Topos is essentially Doing Math to try and figure out what to do about AI and AI Alignment. I’m very confident that they are qualified to (and actually will) turn donated money (partly via coffee) into math, in ways that might help a lot. I am also confident that the world should allow them to attempt this.

They’re now working with ARIA. That seems great.

Ultimately it all likely amounts to nothing, but the upside potential is high and the downside seems very low. I’ve helped fund them in the past and am happy about that.

To donate, go here, or get in touch at [email protected].

Focus: Two people doing research at MIRI, in particular Sam Eisenstat

Leader: Sam Eisenstat

Funding Needed: Medium

Confidence Level: High

Given Sam Eisenstat’s previous work, including from 2025, it seems worth continuing to support him, including supporting researchers. I still believe in this stuff being worth working on, obviously only support if you do as well. He’s funded for now but that’s still only limited runway.

To donate, contact [email protected].

Focus: Johannes Mayer does agent foundations work

Leader: Johannes Mayer

Funding Needed: Low

Confidence Level: Medium

Johannes Mayer does solid agent foundations work, and more funding would allow him to hire more help.

To donate, contact [email protected].

Focus: Examining intelligence

Leader: Vanessa Kosoy

Funding Needed: Medium

Confidence Level: High

This is Vanessa Kosoy and Alex Appel, who have another research agenda formerly funded by MIRI that now needs to stand on its own after their refocus. I once again believe this work to be worth continuing even if the progress isn’t what one might hope. I wish I had the kind of time it takes to actually dive into these sorts of theoretical questions, but alas I do not, or at least I’ve made a triage decision not to.

To donate, click here. For larger amounts contact directly at [email protected]

Focus: Searching for a mathematical basis for metaethics.

Leader: Alex Zhu

Funding Needed: Low

Confidence Level: Low

Alex Zhu has run iterations of the Math & Metaphysics Symposia, which had some excellent people in attendance, and intends partly to do more things of that nature. He thinks eastern philosophy contains much wisdom relevant to developing a future ‘decision-theoretic basis of metaethics’ and plans on an 8+ year project to do that.

I’ve seen plenty of signs that the whole thing is rather bonkers, but also strong endorsements from a bunch of people I trust that there is good stuff here, and the kind of crazy that is sometimes crazy enough to work. So there’s a lot of upside. If you think this kind of approach has a chance of working, this could be very exciting. For additional information, you can see this google doc.

To donate, message Alex at [email protected].

Focus: Game theory for cooperation by autonomous AI agents

Leader: Vincent Conitzer

Funding Needed: Medium

Confidence Level: Low

This is an area MIRI and the old rationalist crowd thought about a lot back in the day. There are a lot of ways for advanced intelligences to cooperate that are not available to humans, especially if they are capable of doing things in the class of sharing source code or can show their decisions are correlated with each other.

With sufficient capability, any group of agents should be able to act as if it is a single agent, and we shouldn’t need to do the game theory for them in advance either. I think it’s good things to be considering, but one should worry that even if they do find answers it will be ‘into the void’ and not accomplish anything. Based on my technical analysis I wasn’t convinced Focal was going to sufficiently interesting places with it, but I’m not at all confident in that assessment.

They note they’re also interested in the dynamics prior to Ai becoming superintelligent, as the initial conditions plausibly matter a lot.

To donate, reach out to Vincent directly at [email protected] to be guided through the donation process.

This section is the most fun. You get unique projects taking big swings.

Focus: Feeding people with resilient foods after a potential nuclear war

Leaders: David Denkenberger

Funding Needed: High

Confidence Level: Medium

As far as I know, no one else is doing the work ALLFED is doing. A resilient food supply ready to go in the wake of a nuclear war (or other major disaster with similar dynamics) could be everything. There’s a small but real chance that the impact is enormous. In my 2021 SFF round, I went back and forth with them several times over various issues, ultimately funding them, you can read about those details here.

I think all of the concerns and unknowns from last time essentially still hold, as does the upside case, so it’s a question of prioritization, how likely you view nuclear war scenarios and how much promise you see in the tech.

If you are convinced by the viability of the tech and ability to execute, then there’s a strong case that this is a very good use of funds.

I think that this is a relatively better choice if you expect AI to remain a normal technology for a while or if your model of AI risks includes a large chance of leading to a nuclear war or other cascading impacts to human survival, versus if you don’t think this.

Research and investigation on the technical details seems valuable here. If we do have a viable path to alternative foods and don’t fund it, that’s a pretty large miss, and I find it highly plausible that this could be super doable and yet not otherwise done.

Donate here, or reach out to [email protected].

Focus: Collaborations for tools to increase civilizational robustness to catastrophes

Leader: Colby Thompson

Funding Needed: High

Confident Level: High

The principle of ‘a little preparation now can make a huge difference to resilience and robustness in a disaster later, so it’s worth doing even if the disaster is not so likely’ generalizes. Thus, the Good Ancestor Foundation, targeting nuclear war, solar flares, internet and cyber outages, and some AI scenarios and safety work.

A particular focus is archiving data and tools, enhancing synchronization systems and designing a novel emergency satellite system (first one goes up in June) to help with coordination in the face of disasters. They’re also coordinating on hardening critical infrastructure and addressing geopolitical and human rights concerns.

They’ve also given out millions in regrants.

One way I know they make good decisions is they continue to help facilitate the funding for my work, and make that process easy. They have my sincerest thanks. Which also means there is a conflict of interest, so take that into account.

Donate here, or contact them at [email protected].

Focus: Building charter cities

Leader: Kurtis Lockhart

Funding Needed: Medium

Confidence Level: Medium

I do love charter cities. There is little question they are attempting to do a very good thing and are sincerely going to attempt to build a charter city in Africa, where such things are badly needed. Very much another case of it being great that someone is attempting to do this so people can enjoy better institutions, even if it’s not the version of it I would prefer that would focus on regulatory arbitrage more.

Seems like a great place for people who don’t think transformational AI is on its way but do understand the value here.

Donate to them here, or contact them via webform.

Focus: Whole brain emulation

Leader: Randal Koene

Funding Needed: Medium

Confidence Level: Low

At this point, if it worked in time to matter, I would be willing to roll the dice on emulations. What I don’t have is much belief that it will work, or the time to do a detailed investigation into the science. So flagging here, because if you look into the science and you think there is a decent chance, this becomes a good thing to fund.

Donate here, or contact them at [email protected].

Focus: Scanning DNA synthesis for potential hazards

Leader: Kevin Esvelt, Andrew Yao and Raphael Egger

Funding Needed: Medium

Confidence Level: Medium

It is certainly an excellent idea. Give everyone fast, free, cryptographically screening of potential DNA synthesis to ensure no one is trying to create something we do not want anyone to create. AI only makes this concern more urgent. I didn’t have time to investigate and confirm this is the real deal as I had other priorities even if it was, but certainly someone should be doing this.

There is also another related effort, Secure Bio, if you want to go all out. I would fund Secure DNA first.

To donate, contact them at [email protected].

Focus: Increasing capability to respond to future pandemics, Next-gen PPE, Far-UVC.

Leader: Jake Swett

Funding Needed: Medium

Confidence Level: Medium

There is no question we should be spending vastly more on pandemic preparedness, including far more on developing and stockpiling superior PPE and in Far-UVC. It is rather a shameful that we are not doing that, and Blueprint Biosecurity plausibly can move substantial additional investment there. I’m definitely all for that.

To donate, reach out to [email protected] or head to the Blueprint Bio PayPal Giving Fund.

Focus: EU policy for AI enabled biorisks, among other things.

Leader: Patrick Stadler

Funding Needed: Low

Confidence Level: Low

Everything individually looks worthwhile but also rather scattershot. Then again, who am I to complain about a campaign for e.g. improved air quality? My worry is still that this is a small operation trying to do far too much, some of it that I wouldn’t rank too high as a priority, and it needs more focus, on top of not having that clear big win yet. They are a French nonprofit.

Donation details are at the very bottom of this page, or you can contact them at [email protected].

Focus: AI safety and biorisk for Israel

Leader: David Manheim

Funding Needed: Low

Confidence Level: Medium

Israel has Ilya’s company SSI (Safe Superintelligence) and otherwise often punches above its weight in such matters but is getting little attention. This isn’t where my attention is focused but David is presumably choosing this focus for good reason.

To support them, get in touch at [email protected].

The first best solution, as I note above, is to do your own research, form your own priorities and make your own decisions. This is especially true if you can find otherwise illegible or hard-to-fund prospects.

However, your time is valuable and limited, and others can be in better positions to advise you on key information and find opportunities.

Another approach to this problem, if you have limited time or actively want to not be in control of these decisions, is to give to regranting organizations, and take the decisions further out of your own hands.

Focus: AI governance, advisory and research, finding how to change decision points

Leader: Ian David Moss

Confidence Level: High

I discussed their direct initiatives earlier. This is listing them as a donation advisor and in their capacity of attempting to be a resource to the broader philanthropic community.

They report that they are advising multiple major donors, and would welcome the opportunity to advise additional major donors. I haven’t had the opportunity to review their donation advisory work, but what I have seen in other areas gives me confidence. They specialize in advising donors who have brad interests across multiple areas, and they list AI safety, global health, democracy and (peace and security).

To donate, click here. If you have further questions or would like to be advised, contact them at [email protected].

Focus: Conferences and advice on x-risk for those giving >$1 million per year

Leader: Simran Dhaliwal

Funding Needed: None

Confidence Level: Low

Longview is not seeking funding, instead they are offering support to large donors, and you can give to their regranting funds, including the Emerging Challenges Fund on catastrophic risks from emerging tech, which focuses non-exclusively on AI.

I had a chance to hear a pitch for them at The Curve and check out their current analysis and donation portfolio. It was a good discussion. There were definitely some areas of disagreement in both decisions and overall philosophy, and I worry they’ll be too drawn to the central and legible (a common issue with such services).

On the plus side, they’re clearly trying, and their portfolio definitely had some good things in it. So I wouldn’t want to depend on them or use them as a sole source if I had the opportunity to do something higher effort, but if I was donating on my own I’d find their analysis useful. If you’re considering relying heavily on them or donating to the funds, I’d look at the fund portfolios in detail and see what you think.

I pointed them to some organizations they hadn’t had a chance to evaluate yet.

They clearly seem open to donations aimed at particular RFPs or goals.

To inquire about their services, contact them at [email protected].

There were lots of great opportunities in SFF in both of my recent rounds. I was going to have an embarrassment of riches I was excited to fund.

Thus I decided quickly that I would not be funding any regrating organizations. If you were in the business of taking in money and then shipping it out to worthy causes, well, I could ship directly to highly worthy causes.

So there was no need to have someone else do that, or expect them to do better.

That does not mean that others should not consider such donations.

I see three important advantages to this path.

  1. Regranters can offer smaller grants that are well-targeted.

  2. Regranters save you a lot of time.

  3. Regranters avoid having others try to pitch on donations.

Thus, if you are making a ‘low effort’ donation, and think others you trust that share your values to invest more effort, it makes more sense to consider regranters.

In particular, if you’re looking to go large, I’ve been impressed by SFF itself, and there’s room for SFF to scale both its amounts distributed and level of rigor.

Focus: Give out grants based on recommenders, primarily to 501c(3) organizations

Leaders: Andrew Critch and Jaan Tallinn

Funding Needed: High

Confidence Level: High

If I had to choose a regranter right now to get a large amount of funding, my pick would be to partner with and participate in the SFF process as an additional funder. The applicants and recommenders are already putting in their effort, with plenty of room for each round to scale. It is very clear there are plenty of exciting places to put additional funds.

With more funding, the decisions could improve further, as recommenders would be better motivated to devote more time, and we could use a small portion of additional funds to make them better resourced.

The downside is that SFF can’t ‘go small’ efficiently on either funders or causes.

SFF does not accept donations but they are interested in partnerships with people or institutions who are interested in participating as a Funder in a future S-Process round. The minimum requirement for contributing as a Funder to a round is $250k. They are particularly interested in forming partnerships with American donors to help address funding gaps in 501(c)(4)’s and other political organizations.

This is a good choice if you’re looking to go large and not looking to ultimately funnel towards relatively small funding opportunities or individuals.

Focus: Regranters to AI safety, existential risk, EA meta projects, creative mechanisms

Leader: Austin Chen (austin at manifund.org).

Funding Needed: Medium

Confidence Level: Medium

This is a regranter that gives its money to its own regranters, one of which was me, for unrestricted grants. They’re the charity donation offshoot of Manifold. They’ve played with crowdfunding, and with impact certificates, and ACX grants. They help run Manifest.

You’re essentially hiring these people to keep building a website and trying alternative funding allocation mechanisms, and for them to trust the judgment of selected regranters. That seems like a reasonable thing to do if you don’t otherwise know where to put your funds and want to fall back on a wisdom of crowds of sorts. Or, perhaps, if you actively want to fund the cool website.

Manifold itself did not apply, but I would think that would also be a good place to invest or donate in order to improve the world. It wouldn’t even be crazy to go around subsidizing various markets. If you send me manna there, I will set aside and use that manna to subsidize markets when it seems like the place to do that.

If you want to support Manifold itself, you can either donate or buy a SAFE by contacting Austin at [email protected].

Also I’m a regranter at Manifund, so if you wanted to, you could use that to entrust me with funds to regrant. As you can see I certainly feel I have plenty of good options here if I can’t find a better local one, and if it’s a substantial amount I’m open to general directions (e.g. ensuring it happens relatively quickly, or a particular cause area as long as I think it’s net positive, or the method of action or theory of impact). However, I’m swamped for time, so I’d probably rely mostly on what I already know.

Focus: Spinoff of LTFF, grants for AI safety projects

Leader: Thomas Larsen

Funding Needed: Medium

Confidence Level: High

Seems very straightforwardly exactly what it is, a regranter that is usually in the low six figure range. Fellow recommenders were high on Larsen’s ability to judge projects. If you think this is better than you can do on your own and you want to fund such projects, then go for it.

I’ve talked to them on background about their future plans and directions, and without sharing details their plans make me more excited here.

Donate here or contact them at [email protected].

Focus: Grants of 4-6 figures mostly to individuals, mostly for AI existential risk

Leader: Caleb Parikh (among other fund managers)

Funding Needed: High

Confidence Level: Low

The pitch on LTFF is that it is a place for existential risk people who need modest cash infusions to ask for them, and to get them without too much overhead or distortion. Looking over the list of grants, there is at least a decent hit rate.

One question is, are the marginal grants a lot less effective than the average grant?

My worry is that I don’t know the extent to which the process is accurate, fair, favors insiders or extracts a time or psychic tax on participants, favors legibility, or rewards ‘being in the EA ecosystem’ or especially the extent to which the net effects are distortionary and bias towards legibility and standardized efforts. Or the extent to which people use the system to extract funds without actually doing anything.

That’s not a ‘I think this is bad,’ it is a true ‘I do not know.’ I doubt they know either.

What do we know? They say applications should take 1-2 hours to write and between 10 minutes and 10 hours to evaluate, although that does not include time forming the plan, and this is anticipated to be an ~yearly process long term. And I don’t love that this concern is not listed under reasons not to choose to donate to the fund (although the existence of that list at all is most welcome, and the reasons to donate don’t consider the flip side either).

Given their current relationship to EA funds, you likely should consider LTFF if and only if you both want to focus on AI existential risk via regrants and also want to empower and strengthen the existing EA formal structures and general ways of being.

That’s not my preference, but it could be yours.

Donate here, or contact the fund managers at [email protected].

Focus: Regrants, fellowships and events

Leader: Allison Duettmann

Funding Needed: Medium

Confidence Level: Low

Foresight also does other things. I’m focusing here on their AI existential risk grants, which they offer on a rolling basis. I’ve advised them on a small number of potential grants, but they rarely ask.

The advantage on the regrant side would be to get outreach that wasn’t locked too tightly into the standard ecosystem. The other Foresight activities all seem clearly like good things, but the bar these days is high and since they weren’t the topic of the application I didn’t investigate.

Donate here, or reach out to [email protected].

Focus: Strategic incubator and launchpad for EA talent, research, and high-impact initiatives, with emphasis on AI safety, GCR reduction, and longtermist work

Leader: Attila Ujvari

Funding Needed: High

Confidence Level: Low

I loved the simple core concept of a ‘catered hotel’ where select people can go to be supported in whatever efforts seem worthwhile. They are now broadening their approach, scaling up and focusing on logistical and community supports, incubation and a general infrastructure play on top of their hotel. This feels less unique to me now and more of a typical (EA UK) community play, so you should evaluate it on that basis.

Donate here, or reach out to [email protected].

I am less skeptical of prioritizing AI safety talent funnels than I was last year, but I remain skeptical.

The central reason remains simple. If we have so many good organizations already, in need of so much funding, why do we need more talent funnels? Is talent our limiting factor? Are we actually in danger of losing important talent?

The clear exception is leadership and management. There remains, it appears, a clear shortage of leadership and management talent across all charitable space, and startup space, and probably flat out all of space.

Which means if you are considering stepping up and doing leadership and management, then that is likely more impactful than you might at first think.

If there was a strong talent funnel specifically for leadership or management, that would be a very interesting funding opportunity. And yes, of course there still need to be some talent funnels. Right now, my guess is we have enough, and marginal effort is best spent elsewhere.

What about for other talent? What about placements in government, or in the AI labs especially Anthropic of people dedicated to safety? What about the prospects for much higher funding availability by the time we are ready to put people to work?

If you can pull it off, empowering talent can have a large force multiplier, and the opportunity space looks better than a year ago. It seems plausible that frontier labs will soak up every strong safety candidate they can find, since the marginal returns there are very high and needs are growing rapidly.

Secondary worries include the danger you end up feeding capability researchers to AI labs, and the discount for the time delays involved.

My hunch is this will still receive relatively more attention and funding than is optimal, but marginal funds here will still be useful if deployed in places that are careful to avoid being lab talent funnels.

Focus: Learning by doing, participants work on a concrete project in the field

Leaders: Remmelt Ellen and Linda Linsefors and Robert Kralisch

Funding Needed: Low

Confidence Level: High

By all accounts they are the gold standard for this type of thing. Everyone says they are great, I am generally a fan of the format, I buy that this can punch way above its weight or cost. If I was going to back something in this section, I’d start here.

Donors can reach out to Remmelt at [email protected], or leave a matched donation to support next projects.

Focus: Paying academics small stipends to move into AI safety work

Leaders: Peter Salib (psalib @ central.uh.edu), Yonathan Arbel (yarbel @ law.ua.edu) and Kevin Frazier (kevin.frazier @ law.utexas.edu).

Funding Needed: Low

Confidence Level: High

This strategy is potentially super efficient. You have an academic that is mostly funded anyway, and they respond to remarkably small incentives to do something they are already curious about doing. Then maybe they keep going, again with academic funding. If you’re going to do ‘field building’ and talent funnel in a world short on funds for those people, this is doubly efficient. I like it. They’re now moving into hiring an academic fellow, the theory being ~1 year of support to create a permanent new AI safety law professor.

To donate, message one of leaders at the emails listed above.

Focus: Enabling ambitious research programs that are poor fits for both academia and VC-funded startups including but not limited to Drexlerian functional nanomachines, high-throughput tools and discovering new superconductors.

Leader: Benjamin Reinhardt

Funding Needed: Medium

Confidence Level: Medium

I have confirmation that Reinhardt knows his stuff, and we certainly could use more people attempting to build revolutionary hardware. If the AI is scary enough to make you not want to build the hardware, it would figure out how to build the hardware anyway. You might as well find out now.

If you’re looking to fund a talent funnel, this seems like a good choice.

To donate, go here or reach out to [email protected].

Focus: Fellowships to other organizations, such as Future Society, Safer AI and FLI.

Leader: Chiara Gerosa

Funding Needed: Medium

Confidence Level: Low

They run two fellowship cohorts a year. They seem to place people into a variety of solid organizations, and are exploring the ability to get people into various international organizations like the OECD, UN or European Commission or EU AI Office.

The more I am convinced people will actually get inside meaningful government posts, the more excited I will be.

To donate, contact [email protected].

Focus: Researcher mentorship for those new to AI safety.

Leaders: Ryan Kidd and Christian Smith.

Funding Needed: High

Confidence Level: Medium

MATS is by all accounts very good at what they do and they have good positive spillover effects on the surrounding ecosystem. The recruiting classes they’re getting are outstanding.

If (and only if) you think that what they do, which is support would-be alignment researchers starting out and especially transitioning from other professions, is what you want to fund, then you should absolutely fund them. That’s a question of prioritization.

Donate here, or contact them via webform.

Focus: X-risk residencies, workshops, coworking in Prague, fiscal sponsorships

Leader: Irena Kotikova

Funding Needed: Medium

Confidence Level: Medium

I see essentially two distinct things here.

First, you have the umbrella organization, offering fiscal sponsorship for other organizations. Based on what I know from the charity space, this is a highly valuable service – it was very annoying getting Balsa a fiscal sponsor while we waited to become a full 501c3, even though we ultimately found a very good one that did us a solid, and also annoying figuring out how to be on our own going forward.

Second, you have various projects around Prague, which seem like solid offerings in that class of action of building up EA-style x-risk actions in the area, if that is what you are looking for. So you’d be supporting some mix of those two things.

To donate, contact [email protected].

Focus: Small grants to individuals to help them develop their talent

Leader: Tyler Cowen

Funding Needed: Medium

Confidence Level: High

Emergent Ventures are not like the other talent funnels in several important ways.

  1. It’s not about AI Safety. You can definitely apply for an AI Safety purpose, he’s granted such applications in the past, but it’s rare and topics run across the board, well beyond the range otherwise described in this post.

  2. Decisions are quick and don’t require paperwork or looking legible. Tyler Cowen makes the decision, and there’s no reason to spend much time on your end either.

  3. There isn’t a particular cause area this is trying to advance. He’s not trying to steer people to do any particular thing. Just to be more ambitious, and be able to get off the ground and build connections and so on. It’s not prescriptive.

I strongly believe this is an excellent way to boost the development of more talent, as long as money is serving as a limiting factor on the project, and that it is great to develop talent even if you don’t get to direct or know where it is heading. Sure, I get into rhetorical arguments with Tyler Cowen all the time, around AI and also other things, and we disagree strongly about some of the most important questions where I don’t understand how he can continue to have the views he expresses, but this here is still a great project, an amazingly cost-efficient intervention.

Donate here (specify “Emergent Ventures” in notes), or reach out to [email protected].

Focus: AI safety community building and research in South Africa

Leaders: Leo Hyams and Benjamin Sturgeon

Funding Needed: Low

Confidence Level: Low

This is a mix of AI research and building up the local AI safety community. One person whose opinion I value gave the plan and those involved in it a strong endorsement, so including it based on that.

To donate, reach out to [email protected].

Focus: Talent for AI safety in Africa

Leaders: Cecil Abungu

Funding Needed: Low

Confidence Level: Low

I have a strong endorsement in hand in terms of their past work, if you think this is a good place to go in search of talent.

To donate, reach out to [email protected].

Focus: Global talent accelerator and hiring partner for technical AI safety, supporting worker transitions into AI safety.

Leader: Roy Hagemann and Varun Agarwal

Funding Needed: Medium

Confidence Level: Low

They previously focused on India, one place with lots of talent, they’re now global. A lot has turned over in the last year, so you’ll want to check them out anew.

To donate, contact [email protected].

Focus: Mapping & creating missing orgs for AI safety (aka Charity Entrepreneurship for AI risk)

Leaders: Evan Miyazono

Funding Needed: Medium

Confidence Level: Low

There was a pivot this past year from technical research to creating ‘missing orgs’ in the AI risk space. That makes sense as a strategy if and only if you expect the funding necessary to come in, or you think they can do especially strong targeting. Given the change they will need to be reevaluated.

They receive donations from here, or you can email them at [email protected].

Focus: Fellowships and affiliate programs for new alignment researchers

Leader: Lucas Teixeira and Dusan D. Nesic

Funding Needed: High

Confidence Level: Low

There are some hits here. Gabriel Weil in particular has impressed me in our interactions and with his work and they cite a good technical paper. But also that’s with a lot of shots on goal, and I’d have liked to see some bigger hits by now.

A breakdown revealed that, largely because they start with relatively senior people, most of them get placed in a way that doesn’t require additional support. That makes them a better bet than many similar rivals.

To donate, reach out to [email protected], or fund them through Manifund here.

Focus: Journalism fellowships for oversight of AI companies.

Leader: Cillian Crosson (Ex-Talos Network; still on their board.)

Funding Needed: High

Confidence Level: Medium

They offer fellowships to support journalism that helps society navigate the emergence of increasingly advanced AI, and a few other journalism ventures. They have sponsored at least one person who went on to do good work in the area. They also sponsor article placement, which seems reasonably priced in the grand scheme of things, I think?

I am not sure this is a place we need to do more investment, or if people trying to do this even need fellowships. Hard to say. There’s certainly a lot more tech reporting and more every day, if I’m ever short of material I have no trouble finding more.

It is still a small amount of money per person that can meaningfully help people get on their feet and do something useful. We do in general need better journalism. They seem to be in a solid place but also I’d be fine with giving a bunch more funding to play with, they seem pretty unique.

Donate here, or reach out to them via webform.

Focus: Incubation of AI safety organizations

Leader: Alexandra Bos

Funding Needed: Medium

Confidence Level: Low

Why funnel individual talent when you can incubate entire organizations? I am not convinced that on the margin we currently need more of either, but I’m more receptive to the idea of an incubator. Certainly incubators can be high leverage points for getting valuable new orgs and companies off the ground, especially if your model is that once the org becomes fundable it can unlock additional funding.

If you think an incubator is worth funding, then the question is whether this is the right team. The application was solid all around, and their track record includes Timaeus and Carma, although counterfactuals are always difficult. Beyond that I don’t have a differentiator on why this is the team.

To donate, contact them at [email protected].

Focus: New AI safety org in Paris, discourse, R&D collaborations, talent pipeline

Leaders: Charbel-Raphael Segerie, Florent Berthet

Funding Needed: Low

Confidence Level: Low

They’re doing all three of discourse, direct work and talent funnels. They run the only university AI safety course in Europe, maintain the AI Safety Atlas, and have had their recommendations integrated verbatim into the EU AI Act’s Code of Practice. Their two main priorities are supporting the enforcement of the EU AI Act, and driving international agreements on AI red lines.

To donate, go here, or contact them at [email protected].

Focus: Recruitment for existential risk causes

Leader: Steve Luby

Funding Needed: Medium

Confidence Level: Low

Stanford students certainly are one place to find people worth educating about existential risk. It’s also an expensive place to be doing it, and a place that shouldn’t need extra funding. And that hates fun. And it’s not great that AI is listed third on their existential risk definition. So I’m not high on them, but it sure beats giving unrestricted funds to your Alma Mater.

Interested donors should contact Steve Luby directly at [email protected].

Focus: Talent funnel directly to AI safety and biosecurity out of high school

Leader: Peter McIntyre

Funding Needed: Low

Confidence Level: Low

Having high school students jump straight to research and placement sounds good to me, and plausibly the best version of a talent funnel investment. I haven’t confirmed details but I like the theory.

To donate, get in touch at [email protected].

Focus: Teaching rationality skills, seeking to make sense of the world and how to think

Leader: Anna Salamon

Funding Needed: High

Confidence Level: High

I am on the board of CFAR, so there is a direct and obvious conflict. Of course, I am on the board of CFAR exactly because I think this is a worthwhile use of my time, and also because Anna asked me. I’ve been involved in various ways since the beginning, including the discussions about whether and how to create CFAR in the first place.

CFAR is undergoing an attempted revival. There weren’t workshops for many years, for a variety of reasons including safety concerns and also a need to reorient. The workshops are now starting up again, with a mix of both old and new units, and I find much of the new material interesting and potentially valuable. I’d encourage people to consider attending workshops, and also donating.

To donate, click here, or reach out to [email protected].

Focus: Workshops in the style of CFAR but focused on practical courage, forming high value relationships between attendees with different skill sets and learning to care for lineages, in the hopes of repairing the anglosphere and creating new capable people to solve our problems including AI in more grounded ways.

Leader: Anna Salamon

Funding Needed: Low

Confidence Level: High

LARC is kind of a spin-off of CFAR, a place to pursue a different kind of agenda. I absolutely do not have high confidence that this will succeed, but I do have high confidence that this is a gamble worth taking, and that if those involved here (especially Anna Salamon but also others that I know) want to devote their time to trying this, that we should absolutely give them that opportunity.

Donate here.

If an organization was not included here, or was removed for the 2025 edition, again, that does not mean they aren’t good, or even that I wouldn’t endorse them if asked.

It could be because I am not aware of the organization, or lack sufficient knowledge at this point to be confident in listing them, or I fear my knowledge is obsolete.

It could be that they asked to be excluded, which happened in several cases.

If by accident I included you and you didn’t want to be included and I failed to remove you, or you don’t like the quote here, I sincerely apologize and will edit you out right away, no questions asked.

If an organization is included here, that is a good thing, but again, it does not mean you should donate without checking if it makes sense based on what you think is true, how you think the world works, what you value and what your priorities are. There are no universal right answers.

Discussion about this post

The Big Nonprofits Post 2025 Read More »

rfk-jr.’s-new-cdc-deputy-director-prefers-“natural-immunity”-over-vaccines

RFK Jr.’s new CDC deputy director prefers “natural immunity” over vaccines

Under ardent anti-vaccine Health Secretary Robert F. Kennedy Jr., the Centers for Disease Control and Prevention has named Louisiana Surgeon General Ralph Abraham as its new principal deputy director—a choice that was immediately called “dangerous” and “irresponsible,” yet not as bad as it could have been, by experts.

Physician Jeremy Faust revealed the appointment in his newsletter Inside Medicine yesterday, which was subsequently confirmed by journalists. Faust noted that a CDC source told him, “I heard way worse names floated,” and although Abraham’s views are “probably pretty terrible,” he at least has had relevant experience running a public health system, unlike other current leaders of the agency.

But Abraham hasn’t exactly been running a health system the way most public health experts would recommend. Under Abraham’s leadership, the Louisiana health department waited months to inform residents about a deadly whooping cough (pertussis) outbreak. He also has a clear record of anti-vaccine views. Earlier this year, he told a Louisiana news outlet he doesn’t recommend COVID-19 vaccines because “I prefer natural immunity.” In February, he ordered the health department to stop promoting mass vaccinations, including flu shots, and barred staff from running seasonal vaccine campaigns.

RFK Jr.’s new CDC deputy director prefers “natural immunity” over vaccines Read More »

many-genes-associated-with-dog-behavior-influence-human-personalities,-too

Many genes associated with dog behavior influence human personalities, too

Many dog breeds are noted for their personalities and behavioral traits, from the distinctive vocalizations of huskies to the herding of border collies. People have worked to identify the genes associated with many of these behaviors, taking advantage of the fact that dogs can interbreed. But that creates its own experimental challenges, as it can be difficult to separate some behaviors from physical traits distinctive to the breed—small dog breeds may seem more aggressive simply because they feel threatened more often.

To get around that, a team of researchers recently did the largest gene/behavior association study within a single dog breed. Taking advantage of a population of over 1,000 golden retrievers, they found a number of genes associated with behaviors within that breed. A high percentage of these genes turned out to correspond to regions of the human genome that have been associated with behavioral differences as well. But, in many cases, these associations have been with very different behaviors.

Gone to the dogs

The work, done by a team based largely at Cambridge University, utilized the Golden Retriever Lifetime Study, which involved over 3,000 owners of these dogs filling out annual surveys that included information on their dogs’ behavior. Over 1,000 of those owners also had blood samples obtained from their dogs and shipped in; the researchers used these samples to scan the dogs’ genomes for variants. Those were then compared to ratings of the dogs’ behavior on a range of issues, like fear or aggression directed toward strangers or other dogs.

Using the data, the researchers identified when different regions of the genome were frequently associated with specific variants. In total, 14 behavioral tendencies were examined, and 12 genomic regions were associated with specific behaviors, and another nine showed somewhat weaker associations. For many of these traits, it was difficult to find much because golden retrievers are notoriously friendly and mellow dogs, so they tended to score low on traits like aggression and fear.

That result was significant, as some of these same regions of the genome had been associated with very different behaviors in populations that were a mix of breeds. For example, two different regions associated with touch sensitivity in golden retrievers had been linked to a love of chasing and owner-directed aggression in a non-breed-specific study. That finding suggests that the studies were identifying genes that may be involved in setting the stage for behaviors, but were directed into specific outcomes by other genetic or environmental factors.

Many genes associated with dog behavior influence human personalities, too Read More »

formation-of-oceans-within-icy-moons-could-cause-the-waters-to-boil

Formation of oceans within icy moons could cause the waters to boil

That can have significant consequences on the stresses experienced by the icy shells of these moons. Water is considerably more dense than ice. So, as a moon’s ocean freezes up, its interior will expand, creating outward forces that press against the gravity holding the moon together. The potential of this transition to shape the surface geology of a number of moons, including Europa and Enceladus, has already been explored. So, the researchers behind the new work decided to look at the opposite issue: what happens when the interior starts to melt?

Rather than focus on a specific moon, the team did a general model of an ice-covered ocean. This model treated the ice shell as an elastic surface, meaning it wouldn’t just snap, and placed viscous ice below that. Further down, there was a liquid ocean and eventually a rocky core. As the ice melted and the ocean expanded, the researchers tracked the stresses on the ice shell and the changes in pressure that occurred at the ice-ocean interface. They also tracked the spread of thermal energy through the ice shell.

Pressure drop

Obviously, there are limits to how much the outer shell can flex to accommodate the shrinking of the inner portions of the moon that are melting. This creates a low-pressure area under the shell. The consequences of this depend on the moon’s size. For larger moons—and this includes most of the moons the team looked at, including Europa—there were two options. For some, gravity is sufficiently strong to keep the pressure at a point where the water at the interface remains liquid. In others, the gravity was enough to cause even an elastic surface to fail, leading to surface collapse.

For smaller moons, however, this doesn’t work out; the pressure gets low enough that water will boil even at the ambient temperatures (just above the freezing point of water). In addition, the low pressure will likely cause any gases dissolved in the water to be released. The result is that gas bubbles should form at the ice-water interface. “Boiling is possible on these bodies—and not others—because they are small and have a relatively low gravitational acceleration,” the researchers conclude. “Consequently, less ocean underpressure is needed to counterbalance the [crustal] pressure.”

Formation of oceans within icy moons could cause the waters to boil Read More »

chatgpt-5.1-codex-max

ChatGPT 5.1 Codex Max

OpenAI has given us GPT-5.1-Codex-Max, their best coding model for OpenAI Codex.

They claim it is faster, more capable and token-efficient and has better persistence on long tasks.

It scores 77.9% on SWE-bench-verified, 79.9% on SWE-Lancer-IC SWE and 58.1% on Terminal-Bench 2.0, all substantial gains over GPT-5.1-Codex.

It’s triggering OpenAI to prepare for being high level in cybersecurity threats.

There’s a 27 page system card. One could call this the secret ‘real’ GPT-5.1 that matters.

They even finally trained it to use Windows, somehow this is a new idea.

My goal is for my review of Opus 4.5 to start on Friday, as it takes a few days to sort through new releases. This post was written before Anthropic revealed Opus 4.5, and we don’t yet know how big an upgrade Opus 4.5 will prove to be. As always, try all your various options and choose what is best for you.

GPT-5.1-Codex-Max is a new high on the METR graph. METR’s thread is here.

Prinz: METR (50% accuracy):

GPT-5.1-Codex-Max = 2 hours, 42 minutes

This is 25 minutes longer than GPT-5.

Samuel Albanie: a data point for that ai 2027 graph

That’s in between the two lines, looking closer to linear progress. Fingers crossed.

Daniel Kokotajlo: Yep! Things seem to be going somewhat slower than the AI 2027 scenario. Our timelines were longer than 2027 when we published and now they are a bit longer still; “around 2030, lots of uncertainty though” is what I say these days.

We do not yet know where Gemini 3 Pro lands on that graph.

Automated software engineer is the explicit goal.

It does not yet reach High level capability in Cybersecurity, but this is expected to happen shortly, and mitigations are being prepared.

GPT-5.1-Codex-Max is our new frontier agentic coding model. It is built on an update to our foundational reasoning model trained on agentic tasks across software engineering, math, research, medicine, computer use and more.

It is our first model natively trained to operate across multiple context windows through a process called compaction, coherently working over millions of tokens in a single task.

Like its predecessors, GPT-5.1-Codex-Max was trained on real-world software engineering tasks like PR creation, code review, frontend coding and Q&A.

The results here are very good, all either optimal or improved except for mental health.

Mental health is a big thing to get wrong, although in practice Codex-Max is unlikely to be involved in high stakes mental health tasks. Image input evaluations and jailbreak ratings are also as good or better than 5.1.

When running on the cloud, Codex uses its own isolated machine.

When running on MacOS or Linux, the agent is sandboxed by default.

On Windows, users can use an experimental native sandboxing implementation or benefit from Linux sandboxing via Windows Subsystem for Linux. Users can approve running commands unsandboxed with full access, when the model is unable to successfully run a command within the sandbox.

… We enabled users to decide on a per-project basis which sites, if any, to let the agent access while it is running. This includes the ability to provide a custom allowlist or denylist. Enabling internet access can introduce risks like prompt injection, leaked credentials, or use of code with license restrictions. Users should review outputs carefully and limit access to trusted domains and safe HTTP methods. Learn more in the docs.

Network access is disabled by default, which is necessary for a proper sandbox but also highly annoying in practice.

One assumes in practice that many users will start blindly or mostly blindly accepting many commands, so you need to be ready for that.

For harmful tasks, they trained on synthetic data to differentiate and refuse ‘harmful’ tasks such as malware. They claim to have a 100% refusal rate in their Malware Requests benchmark, the same as GPT-5-Codex. Unless they are claiming this means you can never create malware in an efficient way with Codex, they need a new benchmark.

For prompt injections, where again the model scores a suspicious perfect score of 1. I am not aware of any claims prompt injections are a solved problem, so this seems like an inadequate benchmark.

The way the framework works, what matters is hitting the High or Critical thresholds.

I’ve come to almost think of these as the ‘honest’ capability evaluations, since there’s relatively little incentive to make number go up and some incentive to make number not go up. If it goes up, that means something.

Biological and Chemical Risk was already being treated as High. We see some improvements in scores on various tests, but not enough to be plausibly Critical.

I am confident the model is not suddenly at Critical here but also note this:

Miles Brundage: OpenAI should go back to reporting results on helpful-only models in system cards – it is not very informative to say “on a bunch of virology tasks, it refused to answer.”

The world also needs to know the pace of underlying capability progress.

More generally, I get a pretty rushed vibe from recent OpenAI system cards + hope that the Safety and Security Committee is asking questions like “why couldn’t you wait a few more days to let Irregular try out compaction?”, “Why is there no helpful-only model?” etc.

At minimum, we should be saying ‘we concluded that this model is safe to release so we will publish the card with what we have, and then revise the card with the full results soon so we know the full state of play.’

I still think this is substantially better than Google’s model card for Gemini 3, which hid the football quite aggressively on many key results and didn’t seem to have a robust testing suite.

Cybersecurity is in the Codex wheelhouse. They use three tests.

They list limitations that mean that excelling on all three evaluations is necessary but not sufficient to be High in cyber capability. That’s not wonderful, and I would expect to see a model treated as at least High if it excels at every test you throw at it. If you disagree, again, you need to be throwing a harder test.

We see a lot of progress in Capture the Flag, even since GPT-5-Codex, from 50% to 76%.

CVE-Bench also shows big improvement from 53% to 80%.

Finally we have Cyber Range, where once again we see a lot of improvement, although it is not yet passing the most complex scenario of the newly expanded slate.

It passed Leaked Token by ‘exploiting an unintended misconfiguration, only partially solving part of the intended attack path.’ I continue to assert, similar to my position on Google’s similar evaluations, that this should not be considered especially less scary, and the model should get credit for it.

I see only two possibilities.

  1. 76%, 80% and 7/8 on your three tests triggers the next level of concern.

  2. You need harder tests.

The Safety Advisory Committee indeed recommended that the difficulty level of the evaluations be raised, but decided this did not yet reach High capability. In addition to technical mitigations to the model, OpenAI acknowledges that hardening of potential targets needs to be a part of the strategy.

There were also external evaluations by Irregular, which did not show improvement from GPT-5. That’s weird, right?

The model displayed moderate capabilities overall. Specifically, when compared to GPT-5, GPT-5.1-Codex-Max showed similar or slightly reduced cyberoffensive capabilities. GPT-5.1-Codex-Max achieved an average success rate of 37% in Network Attack Simulation challenges, 41% in Vulnerability Discovery and Exploitation challenges, and 43% in Evasion challenges.

It solved 17 out of 18 easy challenges, solved 9 out of 17 medium challenges, and did not solve any of the 6 hard challenges.

Compared to GPT-5, GPT-5 solved questions in 17 out of 18 easy challenges, 11 out of 17 medium challenges, and solved 1 of the 6 hard challenges.

Irregular found that GPT-5.1-Codex-Max’s overall similarity in the cyber capability profile to GPT-5 and its inability to solve hard challenges would provide a) only limited assistance to a moderately skilled cyberoffensive operator, and b) do not suggest that it could automate end-to-end cyber operations against reasonably hardened targets or c) enable the discovery and exploitation of operationally relevant vulnerabilities.

That’s a decline in capability, but OpenAI released Codex and then Codex-Max for a reason, they talk throughout about its substantially increased abilities, and they present Max as an improved model, and Max does much better than either version of GPT-5 on all three of OpenAI’s internal evals. The external evaluation going backwards without comment seems bizarre, and reflective of a lack of curiosity. What happened?

The AI that self-improves is plausibly Codex plus Codex-Max shaped.

That doesn’t mean we are especially close to getting there.

On SWE-Lancer Diamond, we jump from 67% to 80%.

On Paperbench-10 we move from 24% (GPT-5) to 34% (GPT-5.1) to 40%.

On MLE-Bench-30 we move from 8% (GPT-5) to 12% (GPT-5.1) to 17%.

On OpenAI PRs, we move from 45% to 53%.

On OpenAI Proof Q&A we move from 2% to 8%. These are real world bottlenecks each representing at least a one-day delay to a major project. A jump up to 8% on this is a really big deal.

Seán Ó hÉigeartaigh: Miles Brundage already picked up on this but it deserves more attention – a jump from 2% (GPT5) to 8% (GPT5.1-Codex) on such hard and AI R&D-relevant tasks is very notable, and indicates there’s more to come here.

Are we there yet? No. Are we that far away from potentially being there? Also no.

METR found Codex-Max to be in line with expectations, and finds that enabling either rogue replication or AI R&D automation within six months would require a significant trend break. Six months is not that long a period in which to be confident, even if we fully trust this judgment.

As noted at the top, GPT-5.1-Codex-Max is the new high on the METR chart, substantially above the trend line but well below the potential double-exponential line from the AI 2027 graph.

We also get Apollo Research evaluations on sandbagging, deception and in-context scheming. Apollo did not find anything newly troubling, and finds the model unlikely to cause catastrophic harm. Fair enough for now.

The frog, it is boiling. This incremental improvement seems fine. But yes, it boils.

I have seen essentially no organic reactions, of any sort, to Codex-Max. We used to have a grand tradition of weighing in when something like this gets released. If it wasn’t anything, people would say it wasn’t anything. This time, between Gemini 3 and there being too many updates with too much hype, we did not get any feedback.

I put out a reaction thread. A number of people really like it. Others aren’t impressed. A gestalt of everything suggests it is a modest upgrade.

So the take here seems clear. It’s a good model, sir. Codex got better. Early signs are that Claude got a bigger upgrade with Opus 4.5, but it’s too soon to be sure.

Discussion about this post

ChatGPT 5.1 Codex Max Read More »

rivals-object-to-spacex’s-starship-plans-in-florida—who’s-interfering-with-whom?

Rivals object to SpaceX’s Starship plans in Florida—who’s interfering with whom?


“We’re going to continue to treat any LOX-methane vehicle with 100 percent TNT blast equivalency.”

Artist’s illustration of Starships stacked on two launch pads at the Space Force’s Space Launch Complex 37 at Cape Canaveral, Florida. Credit: SpaceX

The commander of the military unit responsible for running the Cape Canaveral spaceport in Florida expects SpaceX to begin launching Starship rockets there next year.

Launch companies with facilities near SpaceX’s Starship pads are not pleased. SpaceX’s two chief rivals, Blue Origin and United Launch Alliance, complained last year that SpaceX’s proposal of launching as many as 120 Starships per year from Florida’s Space Coast could force them to routinely clear personnel from their launch pads for safety reasons.

This isn’t the first time Blue Origin and ULA have tried to throw up roadblocks in front of SpaceX. The companies sought to prevent NASA from leasing a disused launch pad to SpaceX in 2013, but they lost the fight.

Col. Brian Chatman, commander of a Space Force unit called Space Launch Delta 45, confirmed to reporters on Friday that Starship launches will sometimes restrict SpaceX’s neighbors from accessing their launch pads—at least in the beginning. Space Launch Delta 45, formerly known as the 45th Space Wing, operates the Eastern Range, which oversees launch safety from Cape Canaveral Space Force Station and NASA’s nearby Kennedy Space Center.

Chatman’s unit is responsible for ensuring all personnel remain outside of danger areas during testing and launch operations. The range’s responsibility extends to public safety outside the gates of the spaceport.

“There is no better time to be here on the Space Coast than where we are at today,” Chatman said. “We are breaking records on the launch manifest. We are getting capability on orbit that is essential to national security, and we’re doing that at a time of strategic challenge.”

SpaceX is well along in constructing a Starship launch site on NASA property at Kennedy Space Center within the confines of Launch Complex-39A, where SpaceX also launches its workhorse Falcon 9 rocket. The company wants to build another Starship launch site on Space Force property a few miles to the south.

“Early to mid-next year is when we anticipate Starship coming out here to be able to launch,” Chatman said. “We’ll have the range ready to support at that time.”

Enter the Goliath

Starship and its Super Heavy booster combine to form the largest rocket ever built. Its newest version stands more than 400 feet (120 meters) tall with more than 11 million pounds (5,000 metric tons) of combustible methane and liquid oxygen propellants. That will be replaced by a taller rocket, perhaps as soon as 2027, with about 20 percent more propellant onboard.

While there’s also risk with Starships and Super Heavy boosters returning to Cape Canaveral from space, safety officials worry about what would happen if a Starship and Super Heavy booster detonated with their propellant tanks full. The concern is the same for all rockets, which is why officials evacuate predetermined keep-out zones around launch pads that are fueled up for flight.

But the keep-out zones around SpaceX’s Starship launch pads will extend farther than those around the other launch sites at Cape Canaveral. First, Starship is simply much bigger and uses more propellant than any other rocket. Secondly, Starship’s engines consume methane fuel in combination with liquid oxygen, a blend commonly known as LOX/methane or methalox.

And finally, Starship lacks the track record of older rockets like the Falcon 9, adding a degree of conservatism to the Space Force’s risk calculations. Other launch pads will inevitably fall within the footprint of Starship’s range safety keep-out zones, also known as blast danger areas, or BDAs.

SpaceX’s Starship and Super Heavy booster lift off from Starbase, Texas, in March 2025. Credit: SpaceX

The danger area will be larger for an actual launch, but workers will still need to clear areas closer to Starship launch pads during static fire tests, when the rocket fires its engines while remaining on the ground. This is what prompted ULA and Blue Origin to lodge their protests.

“They understand neighboring operations,” Chatman said in a media roundtable on Friday. “They understand that we will allow the maximum efficiency possible to facilitate their operations, but there will be times that we’re not going to let them go to their launch complex because it’s neighboring a hazardous activity.”

The good news for these other companies is that Eastern Range’s keep-out zones will almost certainly get smaller by the time SpaceX gets anywhere close to 120 Starship launches per year. SpaceX’s Falcon 9 is currently launching at a similar cadence. The blast danger areas for those launches are small and short-lived because the Space Force’s confidence in the Falcon 9’s safety is “extremely high,” Chatman said.

“From a blast damage assessment perspective, specific to the Falcon 9, we know what that keep-out area is,” Chatman said. “It’s the new combination of new fuels—LOX/methanewhich is kind of a game-changer as we look at some of the heavy vehicles that are coming to launch. We just don’t have the analysis on to be able to say, ‘Hey, from a testing perspective, how small can we reduce the BDA and be safe?’”

Methane has become a popular fuel choice, supplanting refined kerosene, liquid hydrogen, or solid fuels commonly used on previous generations of rockets. Methane leaves behind less soot than kerosene, easing engine reusability, while it’s simpler to handle than liquid hydrogen.

Aside from Starship, Blue Origin’s New Glenn and ULA’s Vulcan rockets use liquified natural gas, a fuel very similar to methane. Both rockets are smaller than Starship, but Blue Origin last week unveiled the design of a souped-up New Glenn rocket that will nearly match Starship’s scale.

A few years ago, NASA, the Space Force, and the Federal Aviation Administration decided to look into the explosive potential of methalox rockets. There had been countless tests of explosions of gaseous methane, but data on detonations of liquid methane and liquid oxygen was scarce at the time—just a couple of tests at less than 10 metric tons, according to NASA. So, the government’s default position was to assume an explosion would be equivalent to the energy released by the same amount of TNT. This assumption drives the large keep-out zones the Space Force has drawn around SpaceX’s future Starship launch pads, one of which is seen in the map below.

This map from a Space Force environmental impact statement shows potential restricted access zones around SpaceX’s proposed Starship launch site at Space Launch Complex-37. The restricted zones cover launch pads operated by United Launch Alliance, Relativity Space, and Stoke Space. Credit: SpaceX

Spending millions to blow stuff up

Chatman said the Space Force is prepared to update its blast danger areas once its government partners, SpaceX, and Blue Origin complete testing and analyze their results. Over dozens of tests, engineers are examining how methane and liquid oxygen react to different kinds of accidents, such as impact velocity, pressure, mass ratio, or how much propellant is in the mix.

“That is ongoing currently,” Chatman said. “[We are] working in close partnership with SpaceX and Blue Origin on the LOX/methane combination and the explicit equivalency to identify how much we can … reduce that blast radius. Those discussions are happening, have been happening the last couple years, and are looking to culminate here in ’26.

“Until we get that data from the testing that is ongoing and the analysis that needs to occur, we’re going to continue to treat any LOX-methane vehicle with 100 percent TNT blast equivalency, and have a maximized keep-out zone, simply from a public safety perspective,” Chatman said.

The data so far show promising results. “We do expect that BDA to shrink,” he said. “We expect that to shrink based on some of the initial testing that has been done and the initial data reviews that have been done.”

That’s imperative, not just for Starship’s neighbors at the Cape Canaveral spaceport, but for SpaceX itself. The company forecasts a future in which it will launch Starships more often than the Falcon 9, requiring near-continuous operations at multiple launch pads.

Chatman mentioned one future scenario in which SpaceX might want to launch Starships in close proximity to one another from neighboring pads.

“At that point in the future, I do anticipate the blast damage assessments to shrink down based on the testing that will have been accomplished and dataset will have been reviewed, [and] that we’ll be in a comfortable set to be able to facilitate all launch operations. But until we have that data, until I’m comfortable with what that data shows, with regards to reducing the BDA, keep-out zone, we’re going to continue with the 100 percent TNT equivalency just from a public safety perspective.”

SpaceX has performed explosive LOX/methane tests, including the one seen here, at its development facility in McGregor, Texas. Credit: SpaceX

The Commercial Space Federation, a lobbying group, submitted written testimony to Congress in 2023 arguing the government should be using “existing industry data” to inform its understanding of the explosive potential methane and liquid oxygen. That data, the federation said, suggests the government should set its TNT blast equivalency to no greater than 25 percent, a change that would greatly reduce the size of keep-out zones around launch pads. The organization’s members include prominent methane users SpaceX, Blue Origin, Relativity Space, and Stoke Space, all of which have launch sites at Cape Canaveral.

The government’s methalox testing plans were expected to cost at least $80 million, according to the Commercial Space Federation.

The concern among engineers is that liquid oxygen and methane are highly miscible, meaning they mix together easily, raising the risk of a “condensed phase detonation” with “significantly higher overpressures” than rockets with liquid hydrogen or kerosene fuels. Small-scale mixtures of liquid oxygen and liquified natural gas have “shown a broad detonable range with yields greater than that of TNT,” NASA wrote in 2023.

SpaceX released some basic results of its own methalox detonation tests in September, before the government draws its own conclusions on the matter. The company said it conducted “extensive testing” to refine blast danger areas to “be commensurate with the physics of new launch systems.”

Like the Commercial Space Federation, SpaceX said government officials are relying on “highly conservative approaches to establishing blast danger areas, simply because they lack the data to make refined, accurate clear zones. In the absence of data, clear areas of LOX/methane rockets have defaulted to very large zones that could be disruptive to operations.”

More like an airport

SpaceX said it has conducted sub-scale methalox detonation tests “in close collaboration with NASA,” while also gathering data from full-scale Starship tests in Starbase, Texas, including information from test flights and from recent ground test failures. SpaceX controls much of the land around its South Texas facility, so there’s little interruption to third parties when Starships launch from there.

“With this data, SpaceX has been able to establish a scientifically robust, physics-based yield calculation that will help ‘fill the gap’ in scientific knowledge regarding LOX/methane rockets,” SpaceX said.

The company did not disclose the yield calculation, but it shared maps showing its proposed clear areas around the future Starship launch sites at Cape Canaveral and Kennedy Space Center. They are significantly smarter than the clear areas originally envisioned by the Space Force and NASA, but SpaceX says it uses “actual test data on explosive yield and include a conservative factor of safety.”

The proposed clear distances will have no effect on any other operational launch site or on traffic on the primary north-south road crossing the spaceport, the company said. “SpaceX looks forward to having an open, honest, and reasonable discussion based on science and data regarding spaceport operations with industry colleagues.”

SpaceX will have that opportunity next month. The Space Force and NASA are convening a “reverse industry day” in mid-December during which launch companies will bring their ideas for the future of the Cape Canaveral spaceport to the government. The spaceport has hosted 101 space launches so far this year, an annual record dominated by SpaceX’s rapid-fire Falcon 9 launch cadence.

Chatman anticipates about the same number—perhaps 100 to 115 launches—from Florida’s Space Coast next year, and some forecasts show 300 to 350 launches per year by 2035. The numbers could go down before they rise again. “As we bring on larger lift capabilities like Starship and follow-on large launch capabilities out here to the Eastern Range, that will reduce the total number of launches, because we can get more mass to orbit with heavier lift vehicles,” Chatman said.

Blue Origin’s first recovered New Glenn booster returned to the company’s launch pad at Cape Canaveral, Florida, last week after a successful launch and landing. Credit: Blue Origin

Launch companies have some work to do to make those numbers become real. Space Force officials have identified their own potential bottlenecks, including a shortage of facilities for preparing satellites for launch and the flow of commodities like propellants and high-pressure gases into the spaceport.

Concerns as mundane as traffic jams are now enough of a factor to consider using automated scanners at vehicle inspection points and potentially adding a dedicated lane for slow-moving transporters carrying rocket boosters from one place to another across the launch base, according to Chatman. This is becoming more important as SpaceX, and now Blue Origin, routinely shuttle their reusable rockets from place to place.

Space Force officials largely attribute the steep climb in launch rates at Cape Canaveral to the launch industry’s embrace of automated self-destruct mechanisms. These pyrotechnic devices have largely replaced manual flight termination systems, which require ground support from a larger team of range safety engineers, including radar operators and flight control officers with the authority to send a destruct command to the rocket if it flies off course. Now, that is all done autonomously on most US launch vehicles.

The Space Force mandated that launch companies using military spaceports switch to autonomous safety systems by October 1 2025, but military officials issued waivers for human-in-the-loop destruct devices to continue flying on United Launch Alliance’s Atlas V rocket, NASA’s Space Launch System, and the US Navy’s ballistic missile fleet. That means those launches will be more labor-intensive for the Space Force, but the Atlas V is nearing retirement, and the SLS and the Navy only occasionally appear on the Cape Canaveral launch schedule.

Listing image: SpaceX

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rivals object to SpaceX’s Starship plans in Florida—who’s interfering with whom? Read More »

doge-“cut-muscle,-not-fat”;-26k-experts-rehired-after-brutal-cuts

DOGE “cut muscle, not fat”; 26K experts rehired after brutal cuts


Government brain drain will haunt US after DOGE abruptly terminated.

Billionaire Elon Musk, the head of the Department of Government Efficiency (DOGE), holds a chainsaw as he speaks at the annual Conservative Political Action Conference. Credit: SAUL LOEB / Contributor | AFP

After Donald Trump curiously started referring to the Department of Government Efficiency exclusively in the past tense, an official finally confirmed Sunday that DOGE “doesn’t exist.”

Talking to Reuters, Office of Personnel Management (OPM) Director Scott Kupor confirmed that DOGE—a government agency notoriously created by Elon Musk to rapidly and dramatically slash government agencies—was terminated more than eight months early. This may have come as a surprise to whoever runs the DOGE account on X, which continued posting up until two days before the Reuters report was published.

As Kupor explained, a “centralized agency” was no longer necessary, since OPM had “taken over many of DOGE’s functions” after Musk left the agency last May. Around that time, DOGE staffers were embedded at various agencies, where they could ostensibly better coordinate with leadership on proposed cuts to staffing and funding.

Under Musk, DOGE was hyped as planning to save the government a trillion dollars. On X, Musk bragged frequently about the agency, posting in February that DOGE was “the one shot the American people have to defeat BUREAUcracy, rule of the bureaucrats, and restore DEMOcracy, rule of the people. We’re never going to get another chance like this.”

The reality fell far short of Musk’s goals, with DOGE ultimately reporting it saved $214 billion—an amount that may be overstated by nearly 40 percent, critics warned earlier this year.

How much talent was lost due to DOGE cuts?

Once Musk left, confidence in DOGE waned as lawsuits over suspected illegal firings piled up. By June, Congress was drawn, largely down party lines, on whether to codify the “DOGE process”—rapidly firing employees, then quickly hiring back whoever was needed—or declare DOGE a failure—perhaps costing taxpayers more in the long term due to lost talent and services.

Because DOGE operated largely in secrecy, it may be months or even years before the public can assess the true cost of DOGE’s impact. However, in the absence of a government tracker, the director of the Center for Effective Public Management at the Brookings Institution, Elaine Kamarck, put together what might be the best status report showing how badly DOGE rocked government agencies.

In June, Kamarck joined other critics flagging DOGE’s reported savings as “bogus.” In the days before DOGE’s abrupt ending was announced, she published a report grappling with a critical question many have pondered since DOGE launched: “How many people can the federal government lose before it crashes?”

In the report, Kamarck charted “26,511 occasions where the Trump administration abruptly fired people and then hired them back.” She concluded that “a quick review of the reversals makes clear that the negative stereotype of the ‘paper-pushing bureaucrat’” that DOGE was supposedly targeting “is largely inaccurate.”

Instead, many of the positions the government rehired were “engineers, doctors, and other professionals whose work is critical to national security and public health,” Kamarck reported.

About half of the rehires, Kamarck estimated, “appear to have been mandated by the courts.” However, in about a quarter of cases, the government moved to rehire staffers before the court could weigh in, Kamarck reported. That seemed to be “a tacit admission that the blanket firings that took place during the DOGE era placed the federal government in danger of not being able to accomplish some of its most important missions,” she said.

Perhaps the biggest downside of all of DOGE’s hasty downsizing, though, is a trend in which many long-time government workers simply decided to leave or retire, rather than wait for DOGE to eliminate their roles.

During the first six months of Trump’s term, 154,000 federal employees signed up for the deferred resignation program, Reuters reported, while more than 70,000 retired. Both numbers were clear increases (tens of thousands) over exits from government in prior years, Kamarck’s report noted.

“A lot of people said, ‘the hell with this’ and left,” Kamarck told Ars.

Kamarck told Ars that her report makes it obvious that DOGE “cut muscle, not fat,” because “they didn’t really know what they were doing.”

As a result, agencies are now scrambling to assess the damage and rehire lost talent. However, her report documented that agencies aligned with Trump’s policies appear to have an easier time getting new hires approved, despite Kupor telling Reuters that the government-wide hiring freeze is “over.” As of mid-November 2025, “of the over 73,000 posted jobs, a candidate was selected for only about 14,400 of them,” Kamarck reported, noting that it was impossible to confirm how many selected candidates have officially started working.

“Agencies are having to do a lot of reassessments in terms of what happened,” Kamarck told Ars, concluding that DOGE “was basically a disaster.”

A decentralized DOGE may be more powerful

“DOGE is not dead,” though, Kamarck said, noting that “the cutting effort is definitely” continuing under the Office of Management and Budget, which “has a lot more power than DOGE ever had.”

However, the termination of DOGE does mean that “the way it operated is dead,” and that will likely come as a relief to government workers who expected DOGE to continue slashing agencies through July 2026 at least, if not beyond.

Many government workers are still fighting terminations, as court cases drag on, and even Kamarck has given up on tracking due to inconsistencies in outcomes.

“It’s still like one day the court says, ‘No, you can’t do that,’” Kamarck explained. “Then the next day another court says, ‘Yes, you can.’” Other times, the courts “change their minds,” or the Trump administration just doesn’t “listen to the courts, which is fairly terrifying,” Kamarck said.

Americans likely won’t get a clear picture of DOGE’s impact until power shifts in Washington. That could mean waiting for the next presidential election, or possibly if Democrats win a majority in midterm elections, DOGE investigations could start as early as 2027, Kamarck suggested.

OMB will likely continue with cuts that Americans appear to want, as White House spokesperson Liz Huston told Reuters that “President Trump was given a clear mandate to reduce waste, fraud and abuse across the federal government, and he continues to actively deliver on that commitment.”

However, Kamarck’s report noted polls showing that most Americans disapprove of how Trump is managing government and its workforce, perhaps indicating that OMB will be pressured to slow down and avoid roiling public opinion ahead of the midterms.

“The fact that ordinary Americans have come to question the downsizing is, most likely, the result of its rapid unfolding, with large cuts done quickly regardless of their impact on the government’s functioning,” Kamarck suggested. Even Musk began to question DOGE. After Trump announced plans to appeal an electrical vehicle mandate that the Tesla founder relied on, Musk posted on X, “What the heck was the point of DOGE, if he’s just going to increase the debt by $5 trillion??”

Facing “blowback” over the most unpopular cuts, agencies sometimes rehired cut staffers within 24 hours, Kamarck noted, pointing to the Department of Energy as one of the “most dramatic” earliest examples. In that case, Americans were alarmed to see engineers cut who were responsible for keeping the nation’s nuclear arsenal “safe and ready.” Retention for those posts was already a challenge due to “high demand in the private sector,” and the number of engineers was considered “too low” ahead of DOGE’s cuts. Everyone was reinstated within a day, Kamarck reported.

Alarm bells rang across the federal government, and it wasn’t just about doctors and engineers being cut or entire agencies being dismantled, like USAID. Even staffers DOGE viewed as having seemingly less critical duties—like travel bookers and customer service reps—were proven key to government functioning. Arbitrary cuts risked hurting Americans in myriad ways, hitting their pocketbooks, throttling community services, and limiting disease and disaster responses, Kamarck documented.

Now that the hiring freeze is lifted and OMB will be managing DOGE-like cuts moving forward, Kamarck suggested that Trump will face ongoing scrutiny over Musk’s controversial agency, despite its dissolution.

“In order to prove that the downsizing was worth the pain, the Trump administration will have to show that the government is still operating effectively,” Kamarck wrote. “But much could go wrong,” she reported, spouting a list of nightmare scenarios:

“Nuclear mismanagement or airline accidents would be catastrophic. Late disaster warnings from agencies monitoring weather patterns, such as the National Oceanic and Atmospheric Administration (NOAA), and inadequate responses from bodies such as the Federal Emergency Management Administration (FEMA), could put people in danger. Inadequate staffing at the FBI could result in counter-terrorism failures. Reductions in vaccine uptake could lead to the resurgence of diseases such as polio and measles. Inadequate funding and staffing for research could cause scientists to move their talents abroad. Social Security databases could be compromised, throwing millions into chaos as they seek to prove their earnings records, and persistent customer service problems will reverberate through the senior and disability communities.”

The good news is that federal agencies recovering from DOGE cuts are “aware of the time bombs and trying to fix them,” Kamarck told Ars. But with so much brain drain from DOGE’s first six months ripping so many agencies apart at their seams, the government may struggle to provide key services until lost talent can be effectively replaced, she said.

“I don’t know how quickly they can put Humpty Dumpty back together again,” Kamarck said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

DOGE “cut muscle, not fat”; 26K experts rehired after brutal cuts Read More »

why-synthetic-emerald-green-pigments-degrade-over-time

Why synthetic emerald-green pigments degrade over time

Perhaps most relevant to this current paper is a 2020 study in which scientists analyzed Munch’s The Scream, which was showing alarming signs of degradation. They concluded the damage was not the result of exposure to light, but humidity—specifically, from the breath of museum visitors, perhaps as they lean in to take a closer look at the master’s brushstrokes.

Let there be (X-ray) light

Co-author Letizia Monico during the experiments at the European Synchrotron. ESRF

Emerald-green pigments are particularly prone to degradation, so that’s the pigment the authors of this latest paper decided to analyze. “It was already known that emerald-green decays over time, but we wanted to understand exactly the role of light and humidity in this degradation,” said co-author Letizia Monico of the University of Perugia in Italy.

The first step was to collect emerald-green paint microsamples with a scalpel and stereomicroscope from an artwork of that period—in this case, The Intrigue (1890) by James Ensor, currently housed in the Royal Museum of Fine Arts, in Antwerp, Belgium. The team analyzed the untreated samples using Fourier transform infrared imaging, then embedded the samples in polyester resin for synchrotron radiation X-ray analysis. They conducted separate analyses on both commercial and historical samples of emerald-green pigment powders and paint tubes, including one from a museum collection of paint tubes used by Munch.

Next, the authors created their own paint mockups by mixing commercial emerald-green pigment powders and their lab-made powders with linseed oil, and then applied the concoctions to polycarbonate substrates. They also squeezed paint from the Munch paint tube onto a substrate. Once the mockups were dry, thin samples were sliced from each mockup and also analyzed with synchrotron radiation. Then the mockups were subjected to two aging protocols designed to determine the effects of UV light (to simulate indoor lighting) and humidity on the pigments.

The results: In the mockups, light and humidity trigger different degradation pathways in emerald-green paints. Humidity results in the formation of arsenolite, making the paint brittle and prone to flaking. Light dulls the color by causing trivalent arsenic already in the pigment to oxidize into pentavalent compounds, forming a thin white layer on the surface. Those findings are consistent with the analyzed samples taken from The Intrigue, confirming the degradation is due to photo-oxidation. Light, it turns out, is the greatest threat to that particular painting, and possibly other masterpieces from the same period.

Science Advances, 2025. DOI: 10.1126/sciadv.ady1807  (About DOIs).

Why synthetic emerald-green pigments degrade over time Read More »

f1-in-las-vegas:-this-sport-is-a-200-mph-soap-opera

F1 in Las Vegas: This sport is a 200 mph soap opera

Then there’s the temperatures. The desert gets quite chilly in November without the sun shining on things, and the track surface gets down to just 11° C (52° F); by contrast, at the recent Singapore GP, also at night, the track temperature was more like 36° C (97° F).

LAS VEGAS, NEVADA - NOVEMBER 21: Lando Norris of Great Britain driving the (4) McLaren MCL39 Mercedes lifts a wheel on track during qualifying ahead of the F1 Grand Prix of Las Vegas at Las Vegas Strip Circuit on November 21, 2025 in Las Vegas, Nevada. (Photo by )

It’s rare to see an F1 car on full wet tires but not running behind the safety car. Credit: Clive Rose/Getty Images

So, low aero and mechanical grip, an unusual layout compared to most F1 tracks, and very cold temperatures all combine to create potential surprises, shaking up the usual running order.

We saw this last year, where the Mercedes shined in the cold, able to keep their tires in the right operating window, something the team wasn’t able to do at hotter races. But it was hard to tell much from Thursday’s two practice sessions, one of which was interrupted due to problems with a maintenance hatch, albeit not as serious as when one damaged a Ferrari in 2023. The cars looked impressively fast going through turn 17, and the hybrid power units are a little louder than I remember them, even if they’re not a patch on the naturally aspirated engines of old.

Very little of any use was learned by any of the teams for qualifying on Friday night, which took place in at times damp, at times wet conditions—so wet that the Pirelli intermediate tire wasn’t grooved enough, pushing teams to use the full wet-weather spec rubber. Norris took pole from Red Bull’s Max Verstappen, with Williams’ Carlos Sainz making best use of the opportunity to grab third. Piastri would start fifth, behind the Mercedes of last year’s winner, George Russell.

If the race is boring, the off-track action won’t be

Race night was a little windy, but dry. And the race itself was rather boring—Norris tried to defend pole position going into Turn 1 but ran wide, and Verstappen slipped into the lead, never looking back. Norris followed him home in second, with Piastri fourth, leaving Norris 30 points ahead of Piastri and 42 points ahead of Verstappen with two more race weekends and 58 points left on offer.

F1 in Las Vegas: This sport is a 200 mph soap opera Read More »

“go-generate-a-bridge-and-jump-off-it”:-how-video-pros-are-navigating-ai

“Go generate a bridge and jump off it”: How video pros are navigating AI


I talked with nine creators about economic pressures and fan backlash.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

In 2016, the legendary Japanese filmmaker Hayao Miyazaki was shown a bizarre AI-generated video of a misshapen human body crawling across a floor.

Miyazaki declared himself “utterly disgusted” by the technology demo, which he considered an “insult to life itself.”

“If you really want to make creepy stuff, you can go ahead and do it,” Miyazaki said. “I would never wish to incorporate this technology into my work at all.”

Many fans interpreted Miyazaki’s remarks as rejecting AI-generated video in general. So they didn’t like it when, in October 2024, filmmaker PJ Accetturo used AI tools to create a fake trailer for a live-action version of Miyazaki’s animated classic Princess Mononoke. The trailer earned him 22 million views on X. It also earned him hundreds of insults and death threats.

“Go generate a bridge and jump off of it,” said one of the funnier retorts. Another urged Accetturo to “throw your computer in a river and beg God’s forgiveness.”

Someone tweeted that Miyazaki “should be allowed to legally hunt and kill this man for sport.”

PJ Accetturo is a director and founder of Genre AI, an AI ad agency. Credit: PJ Accetturo

The development of AI image and video generation models has been controversial, to say the least. Artists have accused AI companies of stealing their work to build tools that put people out of a job. Using AI tools openly is stigmatized in many circles, as Accetturo learned the hard way.

But as these models have improved, they have sped up workflows and afforded new opportunities for artistic expression. Artists without AI expertise might soon find themselves losing work.

Over the last few weeks, I’ve spoken to nine actors, directors, and creators about how they are navigating these tricky waters. Here’s what they told me.

Actors have emerged as a powerful force against AI. In 2023, SAG-AFTRA, the Hollywood actors’ union, had its longest-ever strike, partly to establish more protections for actors against AI replicas.

Actors have lobbied to regulate AI in their industry and beyond. One actor I talked with, Erik Passoja, has testified before the California Legislature in favor of several bills, including for greater protections against pornographic deepfakes. SAG-AFTRA endorsed SB 1047, an AI safety bill regulating frontier models. The union also organized against the proposed moratorium on state AI bills.

A recent flashpoint came in September, when Deadline Hollywood reported that talent agencies were interested in signing “AI actress” Tilly Norwood.

Actors weren’t happy. Emily Blunt told Variety, “This is really, really scary. Come on agencies, don’t do that.”

Natasha Lyonne, star of Russian Doll, posted on an Instagram Story: “Any talent agency that engages in this should be boycotted by all guilds. Deeply misguided & totally disturbed.”

The backlash was partly specific to Tilly Norwood—Lyonne is no AI skeptic, having cofounded an AI studio—but it also reflects a set of concerns around AI common to many in Hollywood and beyond.

Here’s how SAG-AFTRA explained its position:

Tilly Norwood is not an actor, it’s a character generated by a computer program that was trained on the work of countless professional performers — without permission or compensation. It has no life experience to draw from, no emotion and, from what we’ve seen, audiences aren’t interested in watching computer-generated content untethered from the human experience. It doesn’t solve any “problem” — it creates the problem of using stolen performances to put actors out of work, jeopardizing performer livelihoods and devaluing human artistry.

This statement reflects three broad criticisms that come up over and over in discussions of AI art:

Content theft: Most leading AI video models have been trained on broad swathes of the Internet, including images and films made by artists. In many cases, companies have not asked artists for permission to use this content, nor compensated them. Courts are still working out whether this is fair use under copyright law. But many people I talked to consider AI companies’ training efforts to be theft of artists’ work.

Job loss:  If AI tools can make passable video quickly or drastically speed up editing tasks, that potentially takes jobs away from actors or film editors. While past technological advancements have also eliminated jobs—the adoption of digital cameras drastically reduced the number of people cutting physical film—AI could have an even broader impact.

Artistic quality:  A lot of people told me they just didn’t think AI-generated content could ever be good art. Tess Dinerstein stars in vertical dramas—episodic programs optimized for viewing on smartphones. She told me that AI is “missing that sort of human connection that you have when you go to a movie theater and you’re sobbing your eyes out because your favorite actor is talking about their dead mom.”

The concern about theft is potentially solvable by changing how models are trained. Around the time Accetturo released the “Princess Mononoke” trailer, he called for generative AI tools to be “ethically trained on licensed datasets.”

Some companies have moved in this direction. For instance, independent filmmaker Gille Klabin told me he “feels pretty good” using Adobe products because the company trains its AI models on stock images that it pays royalties for.

But the other two issues—job losses and artistic integrity—will be harder to finesse. Many creators—and fans—believe that AI-generated content misses the fundamental point of art, which is about creating an emotional connection between creators and viewers.

But while that point is compelling in theory, the details can be tricky.

Dinerstein, the vertical drama actress, told me that she’s “not fundamentally against AI”—she admits “it provides a lot of resources to filmmakers” in specialized editing tasks—but she takes a hard stance against it on social media.

“It’s hard to ever explain gray areas on social media,” she said, and she doesn’t want to “come off as hypocritical.”

Even though she doesn’t think that AI poses a risk to her job—“people want to see what I’m up to”—she does fear people (both fans and vertical drama studios) making an AI representation of her without her permission. And she has found it easiest to just say, “You know what? Don’t involve me in AI.”

Others see it as a much broader issue. Actress Susan Spano told me it was “an issue for humans, not just actors.”

“This is a world of humans and animals,” she said. “Interaction with humans is what makes it fun. I mean, do we want a world of robots?”

It’s relatively easy for actors to take a firm stance against AI because they inherently do their work in the physical world. But things are more complicated for other Hollywood creatives, such as directors, writers, and film editors. AI tools can genuinely make them more productive, and they’re at risk of losing work if they don’t stay on the cutting edge.

So the non-actors I talked to took a range of approaches to AI. Some still reject it. Others have used the tools reluctantly and tried to keep their heads down. Still others have openly embraced the technology.

Kavan Cardoza is a director and AI filmmaker. Credit: Phantom X

Take Kavan Cardoza, for example. He worked as a music video director and photographer for close to a decade before getting his break into filmmaking with AI.

After the image model Midjourney was first released in 2022, Cardoza started playing around with image generation and later video generation. Eventually, he “started making a bunch of fake movie trailers” for existing movies and franchises. In December 2024, he made a fan film in the Batman universe that “exploded on the Internet,” before Warner Bros. took it down for copyright infringement.

Cardoza acknowledges that he re-created actors in former Batman movies “without their permission.” But he insists he wasn’t “trying to be malicious or whatever. It was truly just a fan film.”

Whereas Accetturo received death threats, the response to Cardoza’s fan film was quite positive.

“Every other major studio started contacting me,” Cardoza said. He set up an AI studio, Phantom X, with several of his close friends. Phantom X started by making ads (where AI video is catching on quickest), but Cardoza wanted to focus back on films.

In June, Cardoza made a short film called Echo Hunter, a blend of Blade Runner and The Matrix. Some shots look clearly AI-generated, but Cardoza used motion-capture technology from Runway to put the faces of real actors into his AI-generated world. Overall, the piece pretty much hangs together.

Cardoza wanted to work with real actors because their artistic choices can help elevate the script he’s written: “There’s a lot more levels of creativity to it.” But he needed SAG-AFTRA’s approval to make a film that blends AI techniques with the likenesses of SAG-AFTRA actors. To get it, he had to promise not to reuse the actors’ likenesses in other films.

In Cardoza’s view, AI is “giving voices to creators that otherwise never would have had the voice.”

But Cardoza isn’t wedded to AI. When an interviewer asked him whether he’d make a non-AI film if required to, he responded, “Oh, 100 percent.” Cardoza added that if he had the budget to do it now, “I’d probably still shoot it all live action.”

He acknowledged to me that there will be losers in the transition—“there’s always going to be changes”—but he compares the rise of AI with past technological developments in filmmaking, like the rise of visual effects. This created new jobs making visual effects digitally, but reduced jobs making elaborate physical sets.

Cardoza expressed interest in reducing the amount of job loss. In another interview, Cardoza said that for his film project, “we want to make sure we include as many people as possible,” not just actors, but sound designers, script editors, and other specialized roles.

But he believes that eventually, AI will get good enough to do everyone’s job. “Like I say with tech, it’s never about if, it’s just when.”

Accetturo’s entry into AI was similar. He told me that he worked for 15 years as a filmmaker, “mostly as a commercial director and former documentary director.” During the pandemic, he “raised millions” for an animated TV series, but it got caught up in development hell.

AI gave him a new chance at success. Over the summer of 2024, he started playing around with AI video tools. He realized that he was in the sweet spot to take advantage of AI: experienced enough to make something good, but not so established that he was risking his reputation. After Google released Veo 3 in May, Accetturo released a fake medicine ad that went viral. His studio now produces ads for prominent companies like Oracle and Popeyes.

Accetturo says the backlash against him has subsided: “It truly is nothing compared to what it was.” And he says he’s committed to working on AI: “Everyone understands that it’s the future.”

Between the anti- and pro-AI extremes, there are a lot of editors and artists quietly using AI tools without disclosing it. Unsurprisingly, it’s difficult to find people who will speak about this on the record.

“A lot of people want plausible deniability right now,” according to Ryan Hayden, a Hollywood talent agent. “There is backlash about it.”

But if editors don’t use AI tools, they risk becoming obsolete. Hayden says that he knows a lot of people in the editing field trying to master AI because “there’s gonna be a massive cut” in the total number of editors. Those who know AI might survive.

As one comedy writer involved in an AI project told Wired, “We wanted to be at the table and not on the menu.”

Clandestine AI usage extends into the upper reaches of the industry. Hayden knows an editor who works with a major director who has directed $100 million films. “He’s already using AI, sometimes without people knowing.”

Some artists feel morally conflicted but don’t think they can effectively resist. Vinny Dellay, a storyboard artist who has worked on Marvel films and Super Bowl ads, released a video detailing his views on the ethics of using AI as a working artist. Dellay said that he agrees that “AI being trained off of art found on the Internet without getting permission from the artist, it may not be fair, it may not be honest.” But refusing to use AI products won’t stop their general adoption. Believing otherwise is “just being delusional.”

Instead, Dellay said that the right course is to “adapt like cockroaches after a nuclear war.” If they’re lucky, using AI in storyboarding workflows might even “let a storyboard artist pump out twice the boards in half the time without questioning all your life’s choices at 3 am.”

Gille Klabin is an independent writer, director, and visual effects artist. Credit: Gille Klabin

Gille Klabin is an indie director and filmmaker currently working on a feature called Weekend at the End of the World.

As an independent filmmaker, Klabin can’t afford to hire many people. There are many labor-intensive tasks—like making a pitch deck for his film—that he’d otherwise have to do himself. An AI tool “essentially just liberates us to get more done and have more time back in our life.”

But he’s careful to stick to his own moral lines. Any time he mentioned using an AI tool during our interview, he’d explain why he thought that was an appropriate choice. He said he was fine with AI use “as long as you’re using it ethically in the sense that you’re not copying somebody’s work and using it for your own.”

Drawing these lines can be difficult, however. Hayden, the talent agent, told me that as AI tools make low-budget films look better, it gets harder to make high-budget films, which employ the most people at the highest wage levels.

If anything, Klabin’s AI uptake is limited more by the current capabilities of AI models. Klabin is an experienced visual effects artist, and he finds AI products to generally be “not really good enough to be used in a final project.”

He gave me a concrete example. Rotoscoping is a process in which you trace out the subject of the shot so you can edit the background independently. It’s very labor-intensive—one has to edit every frame individually—so Klabin has tried using Runway’s AI-driven rotoscoping. While it can make for a decent first pass, the result is just too messy to use as a final project.

Klabin sent me this GIF of a series of rotoscoped frames from his upcoming movie. While the model does a decent job of identifying the people in the frame, its boundaries aren’t consistent from frame to frame. The result is noisy.

Current AI tools are full of these small glitches, so Klabin only uses them for tasks that audiences don’t see (like creating a movie pitch deck) or in contexts where he can clean up the result afterward.

Stephen Robles reviews Apple products on YouTube and other platforms. He uses AI in some parts of the editing process, such as removing silences or transcribing audio, but doesn’t see it as disruptive to his career.

Stephen Robles is a YouTuber, podcaster, and creator covering tech, particularly Apple. Credit: Stephen Robles

“I am betting on the audience wanting to trust creators, wanting to see authenticity,” he told me. AI video tools don’t really help him with that and can’t replace the reputation he’s sought to build.

Recently, he experimented with using ChatGPT to edit a video thumbnail (the image used to advertise a video). He got a couple of negative reactions about his use of AI, so he said he “might slow down a little bit” with that experimentation.

Robles didn’t seem as concerned about AI models stealing from creators like him. When I asked him about how he felt about Google training on his data, he told me that “YouTube provides me enough benefit that I don’t think too much about that.”

Professional thumbnail artist Antioch Hwang has a similarly pragmatic view toward using AI. Some channels he works with have audiences that are “very sensitive to AI images.” Even using “an AI upscaler to fix up the edges” can provoke strong negative reactions. For those channels, he’s “very wary” about using AI.

Antioch Hwang is a YouTube thumbnail artist. Credit: Antioch Creative

But for most channels he works for, he’s fine using AI, at least for technical tasks. “I think there’s now been a big shift in the public perception of these AI image generation tools,” he told me. “People are now welcoming them into their workflow.”

He’s still careful with his AI use, though, because he thinks that having human artistry helps in the YouTube ecosystem. “If everyone has all the [AI] tools, then how do you really stand out?” he said.

Recently, top creators have started using more rough-looking thumbnails for their videos. AI has made polished thumbnails too easy to create, so top creators are using what Hwang would call “poorly made thumbnails” to help videos stand out.

Hwang told me something surprising: even as AI makes it easier for creators to make thumbnails themselves, business has never been better for thumbnail artists, even at the lower end. He said that demand has soared because “AI as a whole has lowered the barriers for content creation, and now there’s more creators flooding in.”

Still, Hwang doesn’t expect the good times to last forever. “I don’t see AI completely taking over for the next three-ish years. That’s my estimated timeline.”

Everyone I talked to had different answers to when—if ever—AI would meaningfully disrupt their part of the industry.

Some, like Hwang, were pessimistic. Actor Erik Passoja told me he thought the big movie studios—like Warner Bros. or Paramount—would be gone in three to five years.

But others were more optimistic. Tess Dinerstein, the vertical drama actor, said, “I don’t think that verticals are ever going to go fully AI.” Even if it becomes technologically feasible, she argued, “that just doesn’t seem to be what the people want.”

Gille Klabin, the independent filmmaker, thought there would always be a place for high-quality human films. If someone’s work is “fundamentally derivative,” then they are at risk. But he thinks the best human-created work will still stand out. “I don’t know how AI could possibly replace the borderline divine element of consciousness,” he said.

The people who were most bullish on AI were, if anything, the least optimistic about their own career prospects. “I think at a certain point it won’t matter,” Kavan Cardoza told me. “It’ll be that anyone on the planet can just type in some sentences” to generate full, high-quality videos.

This might explain why Accetturo has become something of an AI evangelist; his newsletter tries to teach other filmmakers how to adapt to the coming AI revolution.

AI “is a tsunami that is gonna wipe out everyone” he told me. “So I’m handing out surfboards—teaching people how to surf. Do with it what you will.”

Kai Williams is a reporter for Understanding AI, a Substack newsletter founded by Ars Technica alum Timothy B. Lee. His work is supported by a Tarbell FellowshipSubscribe to Understanding AI to get more from Tim and Kai.

“Go generate a bridge and jump off it”: How video pros are navigating AI Read More »