Author name: Paul Patrick

for-the-first-time-in-the-us,-a-rotating-detonation-rocket-engine-takes-flight

For the first time in the US, a rotating detonation rocket engine takes flight

A US-based propulsion company, Venus Aerospace, said Wednesday it had completed a short flight test of its rotating detonation rocket engine at Spaceport America in New Mexico.

The company’s chief executive and co-founder, Sassie Duggleby, characterized the flight as “historic.” It is believed to be the first US-based flight test of an idea that has been discussed academically for decades, a rotating detonation rocket engine. The concept has previously been tested in a handful of other countries, but never with a high-thrust engine.

“By proving this engine works beyond the lab, Venus brings the world closer to a future where hypersonic travel—traversing the globe in under two hours—becomes possible,” Duggleby told Ars.

A quick flight

The company has only released limited information about the test. The small rocket, powered by the company’s 2,000-pound-thrust engine, launched from a rail in New Mexico. The vehicle flew for about half a minute, and, as planned, did not break the sound barrier.

Governments around the world have been interested in rotating detonation engine technology for a long time because it has the potential to significantly increase fuel efficiency in a variety of applications, from Navy carriers to rocket engines.

In contrast to a traditional rocket engine, in which a highly pressurized propellant and an oxidizer are injected into a combustion chamber where they burn and produce an energetic exhaust plume, a rotating detonation engine is different in that a wave of detonation travels around a circular channel. This is sustained by the injection of fuel and oxidizer and produces a shockwave that travels outward at supersonic speed.

For the first time in the US, a rotating detonation rocket engine takes flight Read More »

after-back-to-back-failures,-spacex-tests-its-fixes-on-the-next-starship

After back-to-back failures, SpaceX tests its fixes on the next Starship

But that didn’t solve the problem. Once again, Starship’s engines cut off too early, and the rocket broke apart before falling to Earth. SpaceX said “an energetic event” in the aft portion of Starship resulted in the loss of several Raptor engines, followed by a loss of attitude control and a loss of communications with the ship.

The similarities between the two failures suggest a likely design issue with the upgraded “Block 2” version of Starship, which debuted in January and flew again in March. Starship Block 2 is slightly taller than the ship SpaceX used on the rocket’s first six flights, with redesigned flaps, improved batteries and avionics, and notably, a new fuel feed line system for the ship’s Raptor vacuum engines.

SpaceX has not released the results of the investigation into the Flight 8 failure, and the FAA hasn’t yet issued a launch license for Flight 9. Likewise, SpaceX hasn’t released any information on the changes it made to Starship for next week’s flight.

What we do know about the Starship vehicle for Flight 9—designated Ship 35—is that it took a few tries to complete a full-duration test-firing. SpaceX completed a single-engine static fire on April 30, simulating the restart of a Raptor engine in space. Then, on May 1, SpaceX aborted a six-engine test-firing before reaching its planned 60-second duration. Videos captured by media observing the test showed a flash in the engine plume, and at least one piece of debris was seen careening out of the flame trench below the ship.

SpaceX ground crews returned Ship 35 to the production site a couple of miles away, perhaps to replace a damaged engine, before rolling Starship back to the test stand over the weekend for Monday’s successful engine firing.

Now, the ship will head back to the Starbase build site, where technicians will make final preparations for Flight 9. These final tasks may include loading mock-up Starlink broadband satellites into the ship’s payload bay and touchups to the rocket’s heat shield.

These are two elements of Starship that SpaceX engineers are eager to demonstrate on Flight 9, beyond just fixing the problems from the last two missions. Those failures prevented Starship from testing its satellite deployer and an upgraded heat shield designed to better withstand scorching temperatures up to 2,600° Fahrenheit (1,430° Celsius) during reentry.

After back-to-back failures, SpaceX tests its fixes on the next Starship Read More »

doom:-the-dark-ages-is-surprisingly-playable-on-the-steam-deck

Doom: The Dark Ages is surprisingly playable on the Steam Deck

While working on our review of Doom: The Dark Ages last week, I was unable to test the game on the Steam Deck due to a bug that prevented it from launching on SteamOS. I didn’t consider this much of a loss at the time, since I figured the Deck’s 3-year-old portable hardware was rated way below the minimum PC specs for the game, which call for ray tracing-capable graphics cards at a minimum.

Over the weekend, though, Valve released a preview build of a new version of SteamOS that allows Doom: The Dark Ages to actually launch on the Steam Deck. And after a bit of testing, I found the game is surprisingly playable on Valve’s portable hardware, provided you’re prepared to turn down the graphics settings.

With all the graphical quality sliders set to “Low” (and FSR upscaling set to “Performance”), I was able to run Doom: The Dark Ages at the system’s native 1280×800 resolution and a reasonably steady 30 to 40 fps.

Sure, the lack of fancy lighting effects was definitely a step down after enjoying “High” graphical settings on an Nvidia GTX 2080 TI-powered PC rig last week. And we’d prefer to run a reflex-based shooter at the Steam Deck’s maximum 60 Hz frame rate (or even more on the Steam Deck OLED).

Still, the fact that a ray tracing-forward game like this runs at all on the relatively underpowered Steam Deck hardware feels like something of a miracle these days. We can only imagine an “Ultra Low” graphics setting designed specifically for the Steam Deck could squeeze an even better frame rate out of the system, if Bethesda decided to make it a priority.

Doom: The Dark Ages is surprisingly playable on the Steam Deck Read More »

monthly-roundup-#30:-may-2025

Monthly Roundup #30: May 2025

I hear word a bunch of new frontier AI models are coming soon, so let’s do this now.

  1. Programming Environments Require Magical Incantations.

  2. That’s Not How Any of This Works.

  3. Cheaters Never Stop Cheating.

  4. Variously Effective Altruism.

  5. Ceremony of the Ancients.

  6. Palantir Further Embraces Its Villain Edit.

  7. Government Working.

  8. Jones Act Watch.

  9. Ritual Asking Of The Questions.

  10. Why I Never Rewrite Anything.

  11. All The Half-Right Friends.

  12. Resident Expert.

  13. Do Anything Now.

  14. We Have A New Genuine Certified Pope So Please Treat Them Right.

  15. Which Was the Style at the Time.

  16. Intelligence Test.

  17. Constant Planking.

  18. RSVP.

  19. The Trouble With Twitter.

  20. TikTok Needs a Block.

  21. Put Down the Phone.

  22. Technology Advances.

  23. For Your Entertainment.

  24. Please Rate This Podcast.

  25. I Was Promised Flying Self-Driving Cars.

  26. Gamers Gonna Game Game Game Game Game.

  27. Sports Go Sports.

I don’t see it as gendered, but so much this, although I do have Cursor working fine.

Aella: Never ever trust men when they say setting up an environment is easy

I’ve been burned so bad I have trauma. Any time a guy says “omg u should try x” I start preemptively crying

Pascal Guay (top comment): Just use @cursor_ai agent chat and prompt it to make this or that environment. It’ll launch all the command lines for you; just need to accept everything and you’ll be done in no time.

Aella: THIS WAS SPARKED BY ME BEING UNABLE TO SET UP CURSOR.

Ronny Fernandez (comment #2): have you tried cursor? it’s really easy.

Piq: Who tf would ever say that regardless of gender? It’s literally the hardest part of coding.

My experience is that setting things up involves a series of exacting magical incantations, which are essentially impossible to derive on your own. Sometimes you follow the instructions and everything goes great but if you get things even slightly wrong it becomes hell to figure out how to recover. The same thing goes for many other aspects of programming.

AI helps with this, but not as much as you might think if you get outside the realms where vibe coding just works for you. Then, once you are set up, within the realm of the parts of the UI you understand things are relatively much easier, but there is very much temptation to keep using the features you understand.

People who play standard economic games, like Dictator, Ultimatum, Trust, Public Goods or Prisoner’s Dilemma, frequently don’t understand the rules. For Trust 70% misunderstood, for Dictator 22%, and incentivized comprehension checks didn’t help. Those who misunderstood typically acted more prosocial.

In many ways this makes the games more realistic, not less. People frequently don’t understand the implications of their actions, or the rules of the (literal or figurative) game they are playing. You have to account for this, and often this is what keeps the game in a much better (or sometimes worse) equilibrium, as is the tendency of many players to play ‘irrationally’ or based on vibes. Dictator is a great example. In a real-world one-shot dictator game situation it’s often wise to do a 50-50 split, and saying ‘but the game theory says’ will not change that.

A recurring theme of life, also see Cheaters Gonna Cheat Cheat Cheat Cheat Cheat.

Jorbs: i have this ludicrous thing where if i see someone cheating at something and lying about it, i start to believe that they aren’t an honest person and that i should be suspicious of other things they say and do.

this is only semi tongue-in-cheek. the number of times in my life someone has directly told me about how they cheat and lie about something, with the expectation that that will not affect how i view them otherwise, is like, much much higher than i would expect it to be.

It happens to me too, as if I don’t know how to update on Bayesian evidence or something. I don’t even need them to be lying about it. The cheating is enough.

There are partial mitigations, where they explain why something is a distinct ‘cheating allowed’ magisteria. But only partial ones. It still counts.

This is definitely a special case of ‘how you do anything is how you do everything,’ and also ‘when people tell you who they are, believe them.’

Spaced Out Matt: This person appears to be an active participant in the “Effective Altruist” movement—and a good reminder that hyper-rational political movements often end up funding lifesaving work on critical health issues

Alexander Berger: Really glad that @open_phil was able to step in on short notice (<24h) to make sure Sarah Fortune's work on TB vaccines can continue.

“Much to the relief of a Harvard University researcher, a California-based philanthropic group is getting into the monkey business.

Dana Gerber: Open Philanthropy, a grant advisor and funder, told the Globe on Friday that it authorized a $500,000 grant to allow researchers at the University of Pittsburgh School of Medicine to complete an ongoing tuberculosis vaccine study that was abruptly cut off from its NIH funding earlier this week, imperiling the lives of its rhesus macaque test subjects.

Am I the only one who thought of this?

In all seriousness, this is great, exactly what you want to happen – stepping in quickly in suddenly high leverage opportunities.

Nothing negative about this, man is an absolute legend.

Simeon: The media negativity bias is truly deranged.

Managing to frame a $200B pledge to philanthropy negatively is an all-time prowess.

Gates is doing what other charitable foundations and givers fail to do, which is to actually spend the damn money to help people and then say their work is done, within a reasonable time frame. Most foundations instead attempt to remain in existence indefinitely by refusing to spend the money.

John Arnold: This is a great decision by Gates that will maximize his impact. All organizations become less effective over time, particularly foundations that have no outside accountability. New institutions will be better positioned to deal with the problems of future generations.

I would allocate funds to different targets, but this someone actually trying.

The Secular Solstice (aka Rationalist Solstice) is by far the best such ritual, it isn’t cringe but even if you think it is, if you reject things that work because they’re cringe you’re ngmi.

Guive Assadi: Steven Pinker: I’ve been part of some not so successful attempts to come up with secular humanist substitutes for religion.

Interviewer: What is the worst one you’ve been involved in?

Steven Pinker: Probably the rationalist solstice in Berkeley, which included hymns to the benefits of global supply chains. I mean, I actually completely endorse the lyrics of the song, but there’s something a bit cringe about the performance.

Rob Bensinger: Who wants to gather some more quotes like this and make an incredible video advertisement for the rat solstice

Rob Wiblin: This is very funny.

But people should do the cringe thing if they truly enjoy it. Cringe would ideally remain permanently fashionable.

Nathan: Pinker himself is perhaps answering why secular humanism hasn’t created a replacement for Christianity. It cares too much what it looks like.

The song he’s referring to is Landsailor. It is no Uplift, but it is excellent, now more than ever. Stop complaining about what you think others will think is cringe and start producing harmony and tears. Cringe is that which you believe is cringe. Stop giving power to the wrong paradox spirits.

Indeed, the central problem with this ritual is that it doesn’t go far enough. We don’t only need Bright Side of Life and Here Comes the Sun (yes you should have a few of these and if you wanted to add You Learn or Closer to Fine or something, yes, we have options), but mostly on the margin we need Mel’s Song, and Still Alive, and Little Echo. People keep trying to make it more accessible and less weird.

How are things going over at Palantir? Oh, you know, doubling down on the usual.

I do notice this is a sudden demand to not build software not that can be misused to help violate the US Constitution.

You know what other software can and will be used this way?

Most importantly frontier LLMs, but also most everything else. Hmm.

And if nothing else, as always, I appreciate the candor in the reply. Act accordingly. And beware the Streisand Effect.

Drop Site: ICE Signs $30 Million Contract With Palantir to Build ‘ImmigrationOS’

ICE has awarded Palantir Technologies a $30 million contract to develop a new software platform to expand its surveillance and enforcement operations, building on Palantir’s decade-long collaboration with ICE.

Key features and functions:

➤ ImmigrationOS will give ICE “real-time visibility” into visa overstays, self-deportation cases, and individuals flagged for removal, including foreign students flagged for removal for protesting.

➤ ImmigrationOS will integrate data from multiple government database systems, helping ICE track immigration violators and coordinate with agencies like Customs and Border Protection.

➤ The platform is designed to streamline the entire immigration enforcement process—from identification to removal—aiming to reduce time, labor, and resource costs.

Paul Graham: It’s a very exciting time in tech right now. If you’re a first-rate programmer, there are a huge number of other places you can go work rather than at the company building the infrastructure of the police state.

Incidentally, I’ll be happy to delete this if Palantir publicly commits never to build things that help the government violate the US constitution. And in particular never to build things that help the government violate anyone’s (whether citizens or not) First Amendment rights.

Ted Mabrey (start of a very long post): I am looking forward to the next set of hires that decided to apply to Palantir after reading your post. Please don’t delete it Paul. We work here in direct response to this world view and do not seek its blessing.

Paul Graham: As I said, I’ll be happy to delete it if you commit publicly on behalf of Palantir not to build things that help the government violate the US constitution. Will you do that, Ted?

Ted Mabrey: First, I really don’t want you to delete this and am happy for it to be on the record.

Second, the reason I’m not engaging in the question is because it’s so obviously in bad faith akin to the “will you promise to stop beating your wife” court room parlor trick. Let’s make the dynamics crystal clear. Just by engaging on that question it establishes a presumption of some kind of guilt in the present or future for us or the government. If I answer, you establish that we need to justify something we have done, which we do not, or accept as a given that we will be asked to break the law, which we have not.

or y’all…we have made this promise so many ways from Sunday but I’ll write out a few of them here for them.

Paul Graham: When you say “we have made this promise,” what does the phrase “this promise” refer to? Because despite the huge number of words in your answers, I can’t help noticing that the word “constitution” does not occur once.

Ted? What does “this promise” refer to?

I gave Ted Mabrey two days to respond, but I think we now have to conclude that he has run away. After pages of heroic-sounding doublespeak, the well has suddenly run dry. I was open to being proven wrong about Palantir, but unfortunately it’s looking like I was right.

Ted tried to make it seem like the issue is a complex one. Actually it’s 9 words. Will Palantir help the government violate people’s constitutional rights? And I’m so willing to give them the benefit of the doubt that I’d have taken Ted’s word for if it he said no. But he didn’t.

Continuing reminder: It is totally reasonable to skip this section. I am doing my best to avoid commenting on politics, and as usual my lack of comment on other fronts should not be taken to mean I lack strong opinions on them. The politics-related topics I still mention are here because they are relevant to this blog’s established particular interests, in particular AI, abundance including housing, energy and trade, economics or health and medicine.

In case it needs to be explained why trying to forcibly bring down drug prices via requiring Most Favored Nation status on those prices would be an epic disaster that hurts everyone and helps no one if we were so foolish as to implement it for real, Jason Abaluck is here to help, do note this thread as well so there is a case where there could be some benefit by preventing other governments from forcing prices down.

Then there’s the other terrible option, which is if it worked in lowering the prices or Trump found some other way to impose such price controls, going into what Tyler Cowen calls full supervillain mode. o3 estimates this would reduce global investment in drug innovation by between 33% and 50%. That seems low to me, and is also treating the move as a one-time price shock rather than a change in overall regime.

I would expect that the imposition of price controls here would actually greatly reduce investment in R&D and innovation essentially everywhere, because everyone would worry that their future profits would also be confiscated. Indeed, I would already be less inclined to such investments now, purely based on the stated intention to do this.

Meanwhile, other things are happening, like an EO that requires a public accounting for all regulatory criminal penalties and that they default to requiring mens rea. Who knew? And who knew? This seems good.

The good news is that Pfizer stock didn’t move that much on the announcement, so mostly people do not think the attempt will work.

There is an official government form where you can suggest deregulations. Use it early, use it often, program your AI to find lots of ideas and fill it out for you.

In all seriousness, if I understood the paperwork and other time sink requirements, I would not have created Balsa Research, and if the paperwork requirements mostly went away I would have founded quite a few other businesses along the way.

Katherine Boyle: We don’t talk enough about how many forms you have to fill out when raising kids. Constant forms, releases, checklists, signatures. There’s a reason why litigious societies have fewer children. People just get tired of filling out the forms.

Mike Solana: the company version of this is also insane fwiw. one of the hardest things about running pirate wires has just been keeping track of the paper work — letters every week, from every corner of the country, demanding something new and stupid. insanely time consuming.

people hear me talk shit about bureaucracy and hear something ‘secretly reactionary coded’ or something and it’s just like no, my practical experience with regulation is it prevents probably 90 to 95% of everything amazing in this world that someone might have tried.

treek: this is why lots of people don’t bother with business extreme blackpill ngl

Mike Solana: yes I genuinely believe this. years ago I was gonna build an app called operator that helped you build businesses. I tried to start with food trucks in LA. hundreds of steps, many of them ambiguous. just very clearly a system designed to prevent new businesses from existing.

A good summary of many of the reasons our government doesn’t work.

Tracing Woods: How do we overcome this?

Alec Stapp: This is the best one-paragraph explanation for what’s gone wrong with our institutions:

I could never give that good a paragraph-length explanation, because I would have split that into three paragraphs, but I am on board with the content.

At core, the problem is a ratcheting up of laws and regulatory barriers against doing things, as our legal structures focus on harms and avoiding lawsuits but ignore the ‘invisible graveyard’ of utility lost.

The abundance agenda says actually this is terrible, we should mostly do the opposite. In some places it can win at least small victories, but the ratchet continues, and at this point a lot of our civilization essentially cannot function.

Once again, cutting FDA staff without changing the underlying regulations doesn’t get rid of the stupid regulations, it only makes everything take longer and get worse.

Jared Hopkins (Wall Street Journal): “Biotech companies developing drugs for hard-to-treat diseases and other ailments are being forced to push back clinical trials and drug testing in the wake of mass layoffs at the Food and Drug Administration.”

“When you cut the administrative staff and you still have these product deadlines, you’re creating an unwinnable situation,” he said. The worst thing for companies isn’t getting guidance when needed and following all the steps for approval, only to “prepare a $100 million application and get denied because of something that could’ve been communicated or resolved before the trial was under way,” Scheineson said.

Paul Graham: I heard this directly from someone who works for a biotech startup. Layoffs at the FDA have slowed the development of new drugs.

Jim Cramer makes the case to get rid of the ‘ridiculous Jones Act.’ Oh well, we tried.

The recent proposals around restricting shipping even further caused so much panic (and Balsa to pivot) for a good reason. If enacted in their original forms, they would have been depression-level catastrophic. Luckily, we pulled back from the brink, and are now only proposing ordinary terrible additional restrictions, not ‘kill the world economy’ level restrictions.

Also note that for all the talk about the dangers of Chinese ships, the regulations were set to apply to all non-American ships, Jones Act style, with some amount of absolute requirement to use American ships.

That’s a completely different rule. If the rule only applies to Chinese ships in particular but not to ships built in Japan, South Korea or Europe, I don’t love it, but by 2025 standards it would be ‘fine.’

Ryan Peterson: Good to see the administration listened to feedback on their proposed rule on Chinese ships. The final rule published today is a lot more reasonable.

John Konrad: Nothing in my 18 years since founding Captain has caused more panic than @USTradeRep’s recent proposal to charge companies that own Chinese ships $1 million per port call in the US.

USTR held hearings on the fees and today issued major modifications.

The biggest problem was the original port fees proposed by Trump late February was there were ship size and type agnostic.

All Chinese built ships would be charged $1.5 million per port and $1 million for any ship owned by a company that operates chinese built ships.

This was ok for a very large containership with 17,000 boxes that could absorb the fee. But it would have been devastating for a bulker that only carries low value cement.

The new proposal differentiates between ship size and types of cargo.

Specific fees are $50 per net to with the following caveats that go into effect in 6 months.

•Fees on vessel owners & operators of China based on net cargo tonnage, increasing incrementally over the following years;

•Fees on operators of Chinese-built ships based on net tonnage or containers, increasing incrementally over the following years; and

•To incentivize U.S.-built car carrier vessels, fees on foreign-built car carrier vessels based on their capacity.

The second phase actions will not take place for 3 years and is specifically for LNG ships:

•To incentivize U.S.-built liquified natural gas (LNG) vessels, limited restrictions on transporting LNG via foreign vessels. Restrictions will increase incrementally over 22 years.

… [more details of things we shouldn’t be doing, but probably aren’t catastrophic]

Another major complaint of the original proposal was that ships would be charged the fee each time they enter a US Port. This meant a ship discharging at multiple ports i one voyage would suffer millions in fees and likely cause them to visit fewer small ports.

That cargo would have to be put on trucks, clogging already overburdened highways

The new proposal charges the fee per voyage or string of U.S. port calls.

The proposal also excludes Jones Act ships and short sea shipping options (small ships and barges that move between ports)

In short this new proposal is a lot more adaptable and reasonable but still put heavy disincentives on owners that build ships in China.

These are just the highlights. The best way to learn more is to read @MikeSchuler’s article explaining the new proposal.

They also dropped fleet composition penalties, and the rule has at least some phase-in of the fees, along with dropping the per-port-of-call fee. Overall I see the new proposal as terrible but likely not the same kind of crisis-level situation we had before.

Then there’s the crazy ‘phase 2’ that requires the LNG sector in particular to use a portion US-built vessels. Which is hard, since only one such vessel exists and is 31 years old with an established route, and building new such ships to the extent it can be done is prohibitively expensive. The good news is this would start in 2028 and phase in over 22 (!) years, which is an actually reasonable time frame for trying to do this. There’s still a good chance this would simply kill America’s ability to export LNG, hurting our economy and worsening the climate. Again, if you want to use non-Chinese-built ships, that is something we can work around.

Ryan Peterson asks how to fix the fact that without the Jones Act he fears America would build zero ships, as opposed to currently building almost zero ships. Scott Lincicome suggests starting here, but it mostly doesn’t address the question. The bottom line is that American shipyards are not competitive, and are up against highly subsidized competition. If we feel the need for American shipyards to build our ships, we are going to have to subsidize that a lot plus impose export discipline.

Or we can choose to not to spend enough to actually fix this, or simply accept that comparative advantage is a thing and it’s fine to get our ships from places like Japan, and redirect our shipyards to doing repairs on the newly vastly greater number of passing ships and on building Navy ships to ensure what is left is supported.

Someone clearly is neither culturally rationalist nor culturally Jewish.

Robin Hanson (I don’t agree): “Rituals” are habits and patterns of behavior where we are aware of not fully understanding why we should do them the way we do. A mark of modernity was the aspiration to end ritual by either understanding them or not doing them.

We of course still do lots of behavior patterns that we do not fully understand. Awareness of this fact varies though.

Yes we don’t understand this modern habit fully, making it a ritual.

In My Culture, the profoundest act of worship is to try and understand.

Ritual is not about not understanding, at most it is about not needing to understand at first in order to start, and about preserving something important without having to as robustly preserve understanding of the reasons.

Ritual is about Doing the Thing because it is The Thing You Do. That in no way precludes you understanding why you are doing it.

Indeed, one of the most important Jewish rituals is always asking ‘why do we do this thing, ritual or otherwise?’ This is most explicit in the Seder, where we ask the four questions and we answer them, but in a general sense if you don’t know why you’re doing a Jewish thing and don’t ask why, you are doing it wrong.

This is good. The rationalists follow the same principle. The difference is that rather than carrying over many rituals and traditions for thousands of years, we mostly design them anew for the modern world.

But you can’t do that properly, or choose the right rituals for you, and you certainly can’t wisely choose to stop doing rituals you’re already doing, unless you understand what they are for. Which is a failure mode that is happening a lot, often justified by the invocation of a now-sacred moral principle that must stand above all, even if the all includes key load bearing parts of civilization.

Introducing the all-new Doubling-Back Aversion, the concept that we are reluctant to go backwards, on top of the distinct Sunk Cost Fallacy. I can see it, but I am suspicious, especially of their example of having flown SFO→LAX intending to go then to JFK, and then being more willing to go LAX→DEN→JFK than LAX→SFO→JFK even if the time saved is the same, because you started in SFO. I mean, I can see why it’s frustrating a little, but I suspect the bigger effect here is just that DEN is clearly ‘on the way’ to JFK, and SFO isn’t, and there’s a clear bias against ‘going backwards.’ They do try to cover this, such as here:

But I still don’t see a strong case here for this being a distinct new bias, as opposed to being the sum of existing known issues.

The case by Dr. Todd Kashdan for seeking out ‘48% opposites’ as friends and romantic partners. You want people who think different, he says, so sparks can fly and new ideas can form and fun can be had, not some boring static bubble of sameness. But then he also says to seek ‘slightly different’ people who will make you sweat, which seems very different to me. As in, you want 10%-20% opposites, maybe 30%, but not 48%, probably on the higher end for friends and lower end for romantic partners, and if you’re a man dating women or vice versa that 10%-20% is almost certainly covered regardless.

There are, in theory, exceptions. I do remember once back in the day finding a 99% match on OKCupid (those were the days!), a woman who said she only rarely and slowly ever responded to anyone but whose profile was like a bizarro world female version of me. In my opening email I told her as much, asking her to respond the way she’d respond to herself. I’ll always wonder what that would have been like if we’d ever met in person – would it have been ‘too good’ a match? She did eventually write back months later as per a notification I got, but by then I was with my wife, so I didn’t reply.

Patrick McKenzie is one of many to confirm that there are lots of things about the world that are not so hard to find out or become an expert in, but where no one has chosen to do the relevant work. If there is a particular policy area or other topic where you put your focus, it’s often very practical to become the World’s Leading Expert and even be the person who gets consulted, and for there to be big wins available to be found, simply because no one else is seriously trying or looking. Getting people’s attention? That part is harder.

Kelsey Piper: This is related to one of the most important realizations of my adult life, which is that there is just so much in the modern world that no one is doing; reasonably often if you can’t find the answer to a question it just hasn’t been answered.

If you are smart, competent, a fast learner and willing to really throw yourself into something, you can answer a question to which our civilization does not have an answer with weeks to months of work. You can become an expert in months to years.

There is not an efficient market in ideas; it’s not even close. There are tons and tons of important lines of thought and work that no one is exploring, places where it’d be valuable to have an expert and there simply isn’t one.

Patrick McKenzie: Also one of the most important and terrifying lessons of my adult life.

Mine too.

Michael Nielsen: This is both true *andcan be hard to recognize. A friend once observed that an organization had been important for his formative growth, but it was important to move away, because it was filled with people who didn’t realize how derivative their work was; they thought they were pushing frontiers, but weren’t

One benefit of a good PhD supervisor is that they’ll teach you a lot about how to figure out when you’re on that frontier

And yes, by default you get to improve some small corner of the world, but that’s already pretty good, and occasionally you strike gold.

Zy (QTing Kelsey Piper): There’s so much diminishing returns to this stuff it’s not even funny. 400 years ago you could do this and discover Neptune or cellular life

Today you can do it and figure out a condition wherein SSRIs cause 3% less weight gain or an antenna with 5% better fidelity or something

Marko Jukic: Guy 400 years ago: “There’s so much diminishing returns to this stuff it’s not even funny. 400 years ago you could do this and discover Occam’s Razor or the Golden Rule. Today the best you can do is prove that actually 4% more angels can dance on the head of a pin.”

Autumn: 7 years ago a fairly small team in san francisco figured out how to make machines think.

Alternatively, even if there are diminishing returns, so what? Even the diminished returns, even excluding the long tail of big successes, are still very, very good.

Apologies with longer words are perceived as more genuine. I think this perception is correct. The choice to bother using longer words is a costly signal, which is the point of apologizing in the first place. Even if you’re ‘faking it’ it still kind of counts.

Endorsed:

Cate Hall: Amazing how big the quality of life improvements are downstream of “let me take this off future me’s plate.”

It’s not just shifting work up in time — it’s saving you all the mental friction b/w now & when you do it. Total psychic cost is the integral of cognitive load over time.

Sam Martin: conversely, “I’ll deal with this later” is like swiping a high-interest cognitive load credit card (said the man whose CLCC is constantly maxed out)

Thus there is a huge distinction between ‘things you can deal with later without having to otherwise think about it’ and other things. If you can organize things such that you’ll be able to deal with something later in a way that lets you not otherwise think about it, that’s much better. Whereas if that’s not possible, my lord, do it now.

If you can reasonably do it now, do it now anyway. Time saved in the future is typically worth more than time now, because this gives you slack. When you need time, sometimes you suddenly really desperately need time.

How to make $100k betting on the next Pope, from someone who did so.

I did not wager because I try not to do that anymore and because it’s specifically a mortal sin to bet on a Papal election and I actually respect the hell out of that, but I also thought that the frontrunners almost had to be rich given the history of Conclaves and how diverse the Cardinals are, and the odds seemed to be favoring Italians too much. I wouldn’t have picked out Prevost without doing the research.

I also endorse not doubling down after the white smoke, if anything the odds seemed more reasonable at that point rather than less. Peter Wildeford similarly made money betting purely against Parolin, the clear low-effort move.

The past sucked in so many ways. The quality of news and info was one of them.

Roon: If you read old analytical news articles, im talking even just 30 years old, most don’t even stand to muster against the best thread you read on twitter on any given day. The actual longform analysis pieces in most newspapers are also much better.

we’ve done a great amount of gain of function research on Content.

Roon then tries to walk it back a bit, but I disagree with the walking back. The attention to detail is better now, too. Or rather, we used to pay more attention to detail, but we still get the details much more right today, because it’s just way way easier to check details. It used to be they’d get them wrong and no one would know.

Here’s a much bigger and more well known way the past sucked.

Hunter Ash: People who are desperate to retvrn to the past can’t understand how nightmarish the past was. When you tell them, they don’t believe it.

Tyler Cowen asks how very smart people meet each other. Dare I say ‘at Lighthaven’? My actual answer is that you meet very smart people by going to and participating in the things and spaces smart people are drawn to or that select for smart people. That can include a job, and frequently does.

Also, you meet them by noticing particular very smart people and then reaching out to them, they’re mostly happy to hear from you if you bring interestingness.

Will Bachman: I’m the host of a podcast, The 92 Report, which has the goal of interviewing every member of the Harvard-Radcliffe Class of 1992. Published 130 episodes so far. (~1,500 left to go)

Based on this sample, most friendships start through some extracurricular activity, which provides the opportunity to work together over a sustained period, longer than one course. Also people care about it more than any particular class.

At the Harvard Crimson for example on a typical day in 1990 you’d find in the building Susan B Glasser (New Yorker), Josh Gerstein (Politico), Michael Grunwald (Time, Politico), Julian E Barnes, Ira Stoll, Sewell Chan, Jonathan Cohn, and a dozen other individuals whose bylines are now well known.

Many current non-profit leaders met through their work at Philips Brooks House.

Many top TV writers met at the Harvard Lampoon.

Many Hollywood names met through theatre productions.

Strong lifelong friendships formed in singing groups.

Asking Harvard graduates how they met people is quite the biased sample. ‘Go to Harvard’ is indeed one of the best ways to meet smart or destined-to-be-successful people. That’s the best reason to go to Harvard. Of course they met each other in Harvard-related activities a lot. But this is not an actionable plan, although you can and should attempt to do lesser versions of this. Go where the smart people are, do the things they are doing, and also straight up introduce yourself.

Here’s a cool idea, the key is to ignore the statement when it’s wrong:

Bryan Johnson: when this happens, my team and I now say “plank” and the person speaking immediately stops. Everyone is now much happier.

Gretchen Lynn: This is funny, because every time a person with ADHD interrupts/responds too quickly to me because they think they already understood my sentence, they end up being wrong about what I was saying or missing important context. I see this meme all the time like it’s a superpower, but…be aware you may be driving the people in your life insane 😂

Gretchen is obviously mistaken. Whether or not one has ADHD, very often it is very clear where a sentence (or paragraph, or entire speech) is going well before it is finished. Similarly, often there are scenes in movies or shows where you can safety skip large chunks of them, confident you missed nothing.

That can be a Skill Issue, but often it is not. It is often important that the full version of a statement, scene or piece of writing exists – some people might need it, you’re not putting that work on the other person, and also it’s saying you have thought this through and have brought the necessary receipts. But that doesn’t mean, in this case, you actually have to bother with it.

Then there are situations where there is an ‘obvious’ version of the statement, but that’s not actually what someone was going for.

So when you say ‘plank’ here, you’re saying is ‘there is an obvious-to-me version of where you are going with this, I get it, if that’s what you are saying you can stop, and if it’s more than that you can skip ahead.’

But, if that’s wrong, or you’re unsure it’s right? Carry on, or give me the diff, or give me to quick version. And this in turn conveys the information that you think the ‘plank’ call was premature.

Markets in everything!

Allie: I’m not usually the type to get jealous over other people’s weddings

But I saw a girl on reels say she incentivized people to RSVP by making the order in which people RSVP their order to get up and get dinner and I am being driven to insanity by how genius that is.

No walking it back, this is The Way.

Why do posts with links get limited on Twitter?

Predatory myopic optimization for ‘user-seconds on site,’ Musk explains.

Elon Musk: To be clear, there is no explicit rule limiting the reach of links in posts. The algorithm tries (not always successfully) to maximize user-seconds on X, so a link that causes people to cut short their time here will naturally get less exposure.

xlr8harder: i’m old enough to remember when he used to use the word “unregretted” before “user-seconds”

yes, people, i know unregretted is subjective and hard to measure. the point is it was aspirational and provided some countervailing force against the inexorable tug toward pure engagement optimization.

“whelp. turns out it was hard!” is not a good reason to abandon it.

caden: MLE who used to work on the X algo told me Elon was far more explicit in maximizing user-seconds than previous management The much-maligned hall monitors pre-Elon cared more about the “unregretted” caveat.

Danielle Fong: deleting “unregretted” in “unregretted user seconds” rhymes with deleting “don’t” in “don’t be evil.”

I am also old enough to remember that. Oh well. It’s hard to measure ‘unregretted.’

Even unregretted, of course, would still not understand what is at stake here. You want to provide value to the user, and this is what gets them to want to use your service, to come back, and builds up a vibrant internet with Twitter at its center. Deprioritizing links is a hostile act, quite similarly destructive to a massive tariff, destroying the ability to trade.

It is sad that major corporations consistently prove unable to understand this.

Elon Musk has also systematically crippled the reach and views of Twitter accounts that piss him off, and by ‘piss him off’ we usually mean disagree with him but also he has his absurd beef with Substack.

Stuart Thompson (NYT): The New York Times found three users on X who feuded with Mr. Musk in December only to see their reach on the social platform practically vanish overnight.

Mr. Musk has offered several clues to what happened, writing on X amid the feud that if powerful accounts blocked or muted others, their reach would be sharply limited. (Mr. Musk is the most popular user on X with more than 219 million followers, so his actions to block or mute users could hold significant sway.)

Timothy Lee: This is pretty bad.

At other times It Gets Better, this is Laura Loomer, who explicitly lost her monetization over this and then got it back at the end of the fued:

There’s also a third user listed, Owen Shroyer, who did not recover.

One could say that all three of these are far-right influencers, and this seems unlikely to be a coincidence. It’s still not okay to put one’s thumb on the scale like this, even if it doesn’t carry over to others, but it does change the context and practical implications a lot. He who lives by also dies by, and all that.

Tracing Woods: see also: Taibbi, Matt.

As a general rule, even though technically there Aint No Rule it is not okay and a breach of decorum to ‘bring the receipts’ from text conversations even without an explicit privacy agreement. And most importantly, remember that if you do it to them then it’s open season for them to also do it to you.

Matt Taibbi remains very clearly shadowbanned up through April 2025. If you go to his Twitter page and look at the views on each post, they are flattened out the way Substack view counts are, and are largely uncorrelated with other engagement measures, which indicates they are coming from the Following tab and not from the algorithmic feed. No social media algorithm works this way.

A potential counterargument is that Musk feuds rather often, there are a lot of other claims of similar impacts, and NYT only found these three definitive examples. But three times by default should be considered enemy action, and the examples are rather stark.

The question is, in what other ways is Musk messing with the algorithm?

Here’s a post that Elon Musk retweeted, that seems to have gotten far more views than the algorithm could plausibly have given it on its own, even with that retweet.

Geoffrey Hinton: I like OpenAI’s mission of ‘ensure that artificial general intelligence benefits all of humanity”, and I’d like to stop them from completely gutting it. I’ve signed on to a new letter to @AGRobBonta & @DE_DOJ asking them to halt the restructuring.

AGI is the most important and potentially dangerous technology of our time. OpenAI was right that this technology merits strong structures and incentives to ensure it is developed safely, and is wrong now in attempting to change these structures and incentives. We’re urging the AGs to protect the public and stop this.

.

Hasan Can: I was serious when I said Elon Musk will keep messing with OpenAI as long as he holds power in USA. Geoffrey’s [first] tweet hit a full 31 million views. Getting that level of view with just 6k likes isn’t typically possible; I think Elon himself pushed that post.

Putting together everything that has happened, what should we now make of Elon Musk’s decision to fire 80% of Twitter employees without replacement?

Here is a debate.

Shin Megami Boson: the notion of a “fake email job” is structurally the same as a belief in communism. the communist looks at a system far more complex than he can understand and decides the parts he doesn’t understand must have no real purpose & are instead due to human moral failing of some kind.

Marko Jukic: Would you have told that to Elon Musk before he fired 80% of the people working at Twitter with no negative effect?

Do you think Twitter is the only institution in our society where 80% of people could be fired? What do you think those people are doing besides shuffling emails?

Alexander Doria: Yes, this. He mostly removed salespeople and marketing teams that were the core commercial activity of old Twitter.

Marko Jukic (who somehow doesn’t follow Gwern): You are completely delusional if you think this and so is Gwern, though I can’t see his reply.

Gwern: Yes, and I would have been right. Twitter revenue and users crashed into the floor, and after years of his benevolent guidance, they weren’t even breakeven before the debt interest – and he just bailed out Twitter using Xai, eating a loss of something like $30b to hide it all.

Alexander Doria: If I remember correctly, main ad campaigns stopped primarily as their usual commercial contact was not there anymore. And Musk strategy on this front was totally unclear and unable to reassure.

Marko Jukic: Right, please ignore the goons celebrating their victory and waving around a list of scalps and future targets. Pay no mind to that. This was all just a simple brain fart, where Elon Musk just *forgothow to accept payments for ads, and advertisers forgot how to make them! Duh!

Quite an explanation. “My single best example of how 80% of employees can be cut is Twitter.” “Twitter was one of the biggest disasters ever.” “Ah yes, well, of course, all those goons and scalps. Naturally it failed. What, are you dense? Anyway, 80% of employees are useless.”

There’s no question Twitter has, on a technical and functional level, held up far better than median expectations, although it sure seems like having more productive employees to work on things like the bot problems and Twitter search being a disaster would have been a great idea. And a lot of what Musk did, for good and bad, was because he said so not because of a lack of personnel – if you put me in charge of Twitter I would be able to improve it a lot even if I wasn’t allowed to add headcount.

There’s also no question that Twitter’s revenue collapsed, and that xAI ultimately more or less bailed it out. One can argue that the advertisers left for reasons other than the failures of the marketing department (as in, failing to have a marketing department) and certainly there were other factors but I find it rather suspicious to think that gutting the marketing department without replacement didn’t hurt the marketing efforts quite a bit. I mean, if your boss is out there alienating all the advertisers whose job do you think it is to convince them to stop that and come back? Yes, it’s possible the old employees were terrible, but then hire new ones.

In some sense wow, in another sense there are no surprises here and all these TikTok documents are really saying is they have a highly addictive product via the TikTok algorithm, and it comes with all the downsides of social media platforms, and they’re not that excited to do much about those downsides.

On the other hand, these quotes are doozers. Some people were very much not following the ‘don’t write down what you don’t want printed in the New York Times.’

Neil ‘O Brien: WOW: @JonHaidt got info from inside TikTok [via Attorney Generals] admitting how they target kids: “The product in itself has baked into it compulsive use… younger users… are particularly sensitive to reinforcement in the form of social reward and have minimal ability to self-regulate effectively”

Jon Haidt and Zack Rausch: We organize the evidence into five clusters of harms:

  1. Addictive, compulsive, and problematic use

  2. Depression, anxiety, body dysmorphia, self-harm, and suicide

  3. Porn, violence, and drugs

  4. Sextortion, CSAM, and sexual exploitation

  5. TikTok knows about underage use and takes little action

As one internal report put it:

“Compulsive usage correlates with a slew of negative mental health effects like loss of analytical skills, memory formation, contextual thinking, conversational depth, empathy, and increased anxiety,” in addition to “interfer[ing] with essential personal responsibilities like sufficient sleep, work/school responsibilities, and connecting with loved ones.”

Although these harms are known, the company often chooses not to act. For example, one TikTok employee explained,

“[w]hen we make changes, we make sure core metrics aren’t affected.” This is because “[l]eaders don’t buy into problems” with unhealthy and compulsive usage, and work to address it is “not a priority for any other team.”2

“The reason kids watch TikTok is because the algo[rithm] is really good. . . . But I think we need to be cognizant of what it might mean for other opportunities. And when I say other opportunities, I literally mean sleep, and eating, and moving around the room, and looking at somebody in the eyes.”

“Tiktok is particularly popular with younger users who are particularly sensitive to reinforcement in the form of social reward and have minimal ability to self-regulate effectively.”

As Defendants have explained, TikTok’s success “can largely be attributed to strong . . . personalization and automation, which limits user agency” and a “product experience utiliz[ing] many coercive design tactics,” including “numerous features”—like “[i]nfinite scroll, auto-play, constant notifications,” and “the ‘slot machine’ effect”—that “can be considered manipulative.”

Again, nothing there that we didn’t already know.

Similarly, for harm #2, this sounds exactly like various experiments done with YouTube, and also I don’t really know what you were expecting:

In one experiment, Defendants’ employees created test accounts and observed their descent into negative filter bubbles. One employee wrote, “After following several ‘painhub’ and ‘sadnotes’ accounts, it took me 20 mins to drop into ‘negative’ filter bubble. The intensive density of negative content makes me lower down mood and increase my sadness feelings though I am in a high spirit in my recent life.” Another employee observed, “there are a lot of videos mentioning suicide,” including one asking, “If you could kill yourself without hurting anybody would you?”

The evidence on harms #3 and #4 seemed unremarkable and less bad than I expected.

And it is such a government thing to quote things like this, for #5:

TikTok knows this is particularly true for children, admitting internally: (1) “Minors are more curious and prone to ignore warnings” and (2) “Without meaningful age verification methods, minors would typically just lie about their age.”

To start, TikTok has no real age verification system for users. Until 2019, Defendants did not even ask TikTok users for their age when they registered for accounts. When asked why they did not do so, despite the obvious fact that a lot of the users, especially top users, are under 13,” founder Zhu explained that, “those kids will anyway say they are over 13.”

Over the years, other of Defendants’ employees have voiced their frustration that “we don’t want to [make changes] to the For You feed because it’s going to decrease engagement,” even if “it could actually help people with screen time management.”

The post ends with a reminder of the study where students on average would ask $59 for TikTok and $47 for Instagram in exchange for deleting their accounts, but less than zero if everyone did it at once.

Once again, let’s run this experiment. Offer $100 to every student at some college or high school, in exchange for deleting their accounts. See what happens.

Tyler Cowen links to another study on suspending social media use, which was done in 2020 and came out in April 2025 – seriously, academia, that’s an eternity, we gotta do something about this, just tweet the results out or something. In any case, what they found was that if users were convinced to deactivate Facebook for six weeks before the election, they report an 0.06 standard deviation improvement in happiness, depression and anxiety, and it was 0.041 SDs for Instagram.

Obviously that is a small enough effect to mostly ignore. But once again, we are not comparing to the ‘control condition’ of no social media. We are comparing to the control condition of everyone else being on social media without you, and you previously having invested in social media and now abandoning it, while expecting to come back and being worried about what you aren’t seeing, and also being free to transfer to other platforms.

Again, note the above study – you’d have to pay people to get off TikTok and Instagram, but if you could get everyone else off as well, they’d pay you.

Tyler Cowen: What is wrong with the simple model that Facebook and Instagram allow you to achieve some very practical objectives, such as staying in touch with friends or expressing your opinions, at the cost of only a very modest annoyance (which to be clear existed in earlier modes of communication as well)?

What is wrong with this model is that using Facebook and Instagram also imposes costs on others for not using them, which is leading to a bad equilibrium for many. And also that these are predatory systems engineered to addict users, so contra Zuckerberg’s arguments to Thompson and Patel in recent interviews we should not assume that the users ‘know best’ and are using internet services only when they are better off for it.

Tom Meadowcroft: I regard social media as similar to alcohol.

1. It is not something that we’ve evolved to deal with in quantity.

2. It is mildly harmful for most people.

3. It is deeply harmful for a significant minority for whom it is addictive.

4. Many people enjoy it because it seems to ease social engagement.

5. It triggers receptors in our brains that make us desire it.

6. There are better ways to get those pleasure spikes, but they are harder and rarer IRL.

7. If we were all better people, we wouldn’t need or desire either, but we are who we are.

I use alcohol regularly and social media rarely.

I think social media has a stronger case than alcohol. It does provide real and important benefits when used wisely in a way that you can’t easily substitute for otherwise, whereas I’m not convinced alcohol does this. However, our current versions of social media are not great for most people.

So if the sign of impact for temporary deactivation is positive at all, that’s a sign that things are rather not good, although magnitude remains hard to measure. I would agree that (unlike in the case of likely future highly capable AIs) we do not ‘see a compelling case for apocalyptic interpretations’ as Tyler puts it, but that shouldn’t be the bar for realizing you have a problem and doing something about it.

Court rules against Apple, says it wilfully defied the court’s previous injunction and has to stop charging commissions on purchases outside its software marketplace and open up the App Store to third-party payment options.

Stripe charges 2.9% versus Apple’s 15%-30%. Apple will doubtless keep fighting every way it can, but the end of the line is now likely to come at some point.

Market reaction was remarkably muted, on the order of a few percent, to what is a central threat to Apple’s entire business model, unless you think this was already mostly priced in or gets reversed often on appeal.

Recent court documents seem to confirm the claim that Google actively wanted their search results to be worse so they could serve more ads? This is so obviously insane a thing to do. Yes, short term it might benefit you if it happens you can get away with it, but come on.

A theory about A Minecraft Movie being secretly much more interesting than it looks.

A funny thing that happens these days is running into holiday episodes from an old TV show, rather than suddenly having all the Halloween, Thanksgiving or Christmas episodes happening at the right times. There’s no good fix for this given continuity issues, but maybe AI could fix that soon?

Gallabytes’s stroll down memory lane there reminds me that the actual biggest changes in TV programs are that you previously had to go with whatever happened to be on or that you’d taped – which was a huge pain and disaster and people structured their day around it, this was a huge deal – and that even ignoring that the old shows really did suck. Man, with notably rare exceptions they sucked, on every level, until at least the late 90s. You can defend old movies but you cannot in good faith defend most older television.

Fun fact:

Samuel Hammond: Over half the NYT’s subscriber time on site is now just for the games.

That’s about half a billion in subscriber revenue driven by a crossword and a handful of basic puzzle games.

It is a stunning fact, but I don’t think that’s quite what this means. Time spent on site is very different from value extracted. The ability to read news when it matters is a ton more valuable per minute than the games, even if you spend more time on the games. It’s not obvious what is driving subscriptions.

Further praise for Thunderbolts*, which I rated 4.5/5 stars and for now is my top movie of 2025 (although that probably won’t hold, in 2024 it would have been ~4th), from the perspective of someone treating it purely as a Marvel movie in a fallen era.

Zac Hill: Okay Thunderbolts is in the Paddington 2 tier of “movies that have no business being nearly as good as they somehow are”. Like this feels like the first definitive take on whatever weird era we find ourselves inhabiting now. Also the first great Marvel film in years.

What more is there to want: overt grappling with oblivion-inducing despair stemming from how to construct meaning in a world devoid of load-bearing institutions? Violent Night references? Selina Meyer? Florence Pugh having tons of fun???

Okay I can’t/wont shut up about this movie (Thunderbolts). For every reason New Cap America sucked and was both bad and forgettable, this movie was great – in a way that precisely mirrors the turning of the previous era into this strange new world in which we’re swimming.

Even the credits sequence is just like the graveyarding of every institution whose legitimacy has been hemorrhaged, executed with a subtlety and craftsmanship that is invigorating. But WITHOUT accepting, and giving into, cynicism!

Indeed, it is hard for words to describe the amount of joy I got from the credits sequence, that announced very clearly We Hear You, We Get It, and We Are So Back.

Gwern offers a guide to finding good podcast content, as opposed to the podcast that will get the most clicks. You either need to find Alpha from undiscovered voices, or Beta from getting a known voice ‘out of their book’ and producing new content rather than repeating talking points and canned statements. As a host you want to seek out guests where you can extract either Alpha or Beta, and and as listener or reader look for podcasts where you can do the same.

Alpha is relative to your previous discoveries. As NBC used to say, if you haven’t seen it, it’s new to you. If you haven’t ever heard (Gwern’s example) Mark Zuckerberg talk, his Lex Fridman interview will have Alpha to you despite Lex’s ‘sit back, lob softballs and let them talk’ strategy which lacks Beta.

Another way of putting that is, you only need to hear about any given person’s book (whether or not it involves a literal book, which it often does) once every cycle of talking points. You can get that one time from basically any podcast, and it’s fine. But you then wouldn’t want to do that again.

Gwern lists Mark Zuckerberg and Satya Nadella as tough nuts to crack, and indeed the interviews Dwarkesh did with them showed this, with Nadella being especially ‘well-coached,’ and someone too PR-savvy like MrBeast as a bad guest who won’t let you do anything interesting and might torpedo the whole thing.

My pick for toughest nut to crack is Tyler Cowen. No one has a larger, more expansive book, and most people interviewing him never seem to get him to start thinking. Plus, because he’s Tyler Cowen, he’s the one person Tyler Cowen won’t do the research for.

There are of course also other reasons to listen to or host podcasts.

Surge pricing comes to Waymo. You can no longer raise supply, but you can still ration supply and limit demand, so it is still the correct move. But how will people react? There is a lot of pearl clutching about how this hurts the poor or ‘creates losers,’ but may I suggest that if you can’t take the new prices you can call an Uber or Lyft without them being integrated into the same app? Or you can wait.

Waymo hits 250k rides per week in April 2025, two months after 200k.

Waymo is partnering with Toyota for a new autonomous vehicle platform. Right now, Waymo faces multiple bottlenecks, but one key one is that it is tough to build and equip enough vehicles. Solving that problem would go a long way.

Waymo’s injury rate reductions imply that fully self-driving cars would reduce road deaths by 34,800 annually. It’s probably more than that, because most of the remaining crashes by Waymos are caused by human drivers.

Aurora begins commercial driverless trucking in Texas between Dallas and Houston.

Europa Universalis 5 is coming. If you thought EU4 was complex, this is going to be a lot more complex. It looks like it will be fascinating and a great experience for those who have that kind of time, but this is unlikely to include me. It is so complex they let you automate large portions of the game, with the problem that if you do that how will you then learn it?

They’re remaking the Legend of Heroes games, a classic Japanese RPG series a la Final Fantasy and Dragon Quest, starting with Trials In The Sky in September. Oh to have this kind of time.

They’re considering remaking Chrono Trigger. I agree with the post here that a remake is unnecessary. The game works great as it is.

Proposal for a grand collaboration to prove you cannot beat Super Mario Bros. in less than 17685 frames, the best human time remains 17703. This would be an example of proving things about real world systems, and we’ve already put a ton of effort into optimizing this. Peter is about 50% that there is indeed no way to do better than 17685.

If you know, you know:

Emmett Shear: This is pure genius and would be incredible for teaching about a certain kind of danger. Please please someone do this.

RedJ: i think sama is working on it?

Emmett Shear: LOL wrong game I don’t want them in the game of life.

College sports are allocating talent efficiently. You didn’t come here to play school.

And That’s Terrible?

John Arnold: College sports broken:

“Among the top eight quarterbacks in the Class of 2023, Texas’ Arch Manning is now the only one who hasn’t transferred from the school he signed with out of high school.” –@TheAthletic

I do think it is terrible. Every trade and every transfer makes sports more confusing and less enjoyable. The stories are worse. It harder to root for players and teams. It makes it harder to work as a team or to invest in the future of players, both as athletes and as students. And it enshrines the top teams to always be the top teams. In the long run, I find it deeply corrosive.

I find it confusing that there is this much transferring going on. There are large costs to transferring for the player. You have an established campus life and friends. You have connections to the team and the coach and have established goodwill. There are increasing returns to staying in one place. So you would think that there would be strong incentives to stay put and work out a deal that benefits everyone.

The flip side is that there are a lot of teams out there, so the one you sign with is unlikely to be the best fit going forward, especially if you outperform expectations, which changes your value and also your priorities and needs.

I love college football, but they absolutely need to get the transferring under control. It’s gone way too far. My guess is the best way forward is to allow true professional contracts with teams that replace the current NIL system, which would allow for win-win deals that involve commitment or at least backloading compensation, and various other incentives to limit transfers.

I am not saying the NBA fixes the draft lottery, but… no wait I am saying the NBA fixes the draft lottery, given Dallas getting the first pick this year combined with previous incidents. I don’t know this for certain, but at this point, come on.

As Seth Burn puts it, there are ways to get provably random outcomes. The NBA keeps not using those methods. This keeps resulting in outcomes that are unlikely and suspiciously look like fixes. Three times is enemy action. This is more than three.

On the other hand, I do like that tanking for the first pick is being actively punished, even if it’s being done via blatant cheating. At some point everyone knows the league is choosing the outcome, so it isn’t cheating, and I’m kind of fine with ‘if we think you tanked without our permission you don’t get the first pick.’

Discussion about this post

Monthly Roundup #30: May 2025 Read More »

vpn-firm-says-it-didn’t-know-customers-had-lifetime-subscriptions,-cancels-them

VPN firm says it didn’t know customers had lifetime subscriptions, cancels them

The new owners of VPN provider VPNSecure have drawn ire after canceling lifetime subscriptions. The owners told customers that they didn’t know about the lifetime subscriptions when they bought VPNSecure, and they cannot honor the purchases.

In March, complaints started appearing online about lifetime subscriptions to VPNSecure no longer working.

The first public response Ars Technica found came on April 28, when lifetime subscription holders reported receiving an email from the VPN provider saying:

To continue providing a secure and high-quality experience for all users, Lifetime Deal accounts have now been deactivated as of April 28th, 2025.

A copy of the email from “The VPN Secure Team” and posted on Reddit notes that VPNSecure had previously deactivated accounts with lifetime subscriptions that it said hadn’t been used in “over 6 months.” The message noted that VPNSecure was acquired in 2023, “including the technology, domain, and customer database—but not the liabilities.” The email continues:

Unfortunately, the previous owner did not disclose that thousands of Lifetime Deals (LTDs) had been sold through platforms like StackSocial.

We discovered this only months later—when a large portion of our resources were strained by these LTD accounts and high support volume from users, who through part of the database, provided no sustaining income to help us improve and maintain the service.

VPNSecure is offering affected users discounted new subscriptions for either $1.87 for a month (instead of $9.95), $19 for a year (instead of $79.92), or $55 for three years (instead of $107.64). The deals are available until May 31, per the email.

This week, users reported receiving a follow-up email from VPNSecure providing more details about why it made its bold and sudden move. Screenshots of the email shared on Reddit say that the acquisition by InfiniteQuant Ltd (which is a different company than InfiniteQuant Capital Ltd, an InfiniteQuant Capital rep told Ars via email) was “an asset only deal.”

A VPNSecure representative claimed on the reviews site Trustpilot that the current owners “did not gain access to the customer database until months” after the acquisition. According to VPNSecure’s owners, their acquisition netted them “the tech, the brand, and the infrastructure/technology—but none of the company, contracts, payments, or obligations from the previous owners.”

VPN firm says it didn’t know customers had lifetime subscriptions, cancels them Read More »

europe-launches-program-to-lure-scientists-away-from-the-us

Europe launches program to lure scientists away from the US

At the same time, international interest in working in the United States has declined significantly. During the first quarter of the year, applications from scientists from Canada, China, and Europe to US research centers fell by 13 percent, 39 percent, and 41 percent, respectively.

Against this backdrop, European institutions have intensified their efforts to attract US talent. Aix-Marseille University, in France, recently launched A Safe Place for Science, a program aimed at hosting US researchers dismissed, censored, or limited by Trump’s policies. This project is backed with an investment of approximately €15 million.

Along the same lines, the Max Planck Society in Germany has announced the creation of the Max Planck Transatlantic Program, whose purpose is to establish joint research centers with US institutions. “Outstanding investigators who have to leave the US, we will consider for director positions,” the society’s director Patrick Cramer said in a speech discussing the program.

Spain seeks a leading role

Juan Cruz Cigudosa, Spain’s secretary of state for science, innovation, and universities, has stressed that Spain is also actively involved in attracting global scientific talent, and is prioritizing areas such as quantum biotechnology, artificial intelligence, advanced materials, and semiconductors, as well as anything that strengthens the country’s technological sovereignty.

To achieve this, the government of Pedro Sánchez has strengthened existing programs. The ATRAE program—which aims to entice established researchers into bringing their work to Spain—has been reinforced with €45 million to recruit scientists who are leaders in strategic fields, with a special focus on US experts who feel “looked down upon.” This program is offering additional funding of €200,000 euros per project to those selected from the United States.

Similarly, the Ramón y Cajal program—created 25 years ago to further the careers of young scientists—has increased its funding by 150 percent since 2018, allowing for 500 researchers to be funded per year, of which 30 percent are foreigners.

“We are going to intensify efforts to attract talent from the United States. We want them to come to do the best science possible, free of ideological restrictions. Scientific and technological knowledge make us a better country, because it generates shared prosperity and a vision of the future,” said Cigudosa in a statement to the Spanish international news agency EFE after the announcement of the Choose Europe for Science program.

This story originally appeared on WIRED en Español and has been translated from Spanish.

Europe launches program to lure scientists away from the US Read More »

new-lego-building-ai-creates-models-that-actually-stand-up-in-real-life

New Lego-building AI creates models that actually stand up in real life

The LegoGPT system works in three parts, shown in this diagram.

The LegoGPT system works in three parts, shown in this diagram. Credit: Pun et al.

The researchers also expanded the system’s abilities by adding texture and color options. For example, using an appearance prompt like “Electric guitar in metallic purple,” LegoGPT can generate a guitar model, with bricks assigned a purple color.

Testing with robots and humans

To prove their designs worked in real life, the researchers had robots assemble the AI-created Lego models. They used a dual-robot arm system with force sensors to pick up and place bricks according to the AI-generated instructions.

Human testers also built some of the designs by hand, showing that the AI creates genuinely buildable models. “Our experiments show that LegoGPT produces stable, diverse, and aesthetically pleasing Lego designs that align closely with the input text prompts,” the team noted in its paper.

When tested against other AI systems for 3D creation, LegoGPT stands out through its focus on structural integrity. The team tested against several alternatives, including LLaMA-Mesh and other 3D generation models, and found its approach produced the highest percentage of stable structures.

A video of two robot arms building a LegoGPT creation, provided by the researchers.

Still, there are some limitations. The current version of LegoGPT only works within a 20×20×20 building space and uses a mere eight standard brick types. “Our method currently supports a fixed set of commonly used Lego bricks,” the team acknowledged. “In future work, we plan to expand the brick library to include a broader range of dimensions and brick types, such as slopes and tiles.”

The researchers also hope to scale up their training dataset to include more objects than the 21 categories currently available. Meanwhile, others can literally build on their work—the researchers released their dataset, code, and models on their project website and GitHub.

New Lego-building AI creates models that actually stand up in real life Read More »

senate-passes-“cruel”-republican-plan-to-block-wi-fi-hotspots-for-schoolkids

Senate passes “cruel” Republican plan to block Wi-Fi hotspots for schoolkids

Blumenthal pointed out that under a joint resolution of disapproval, the FCC is forbidden to adopt a similar rule in the future. “I have to ask, really? Are schools and teachers crying out to repeal this rule? Really? No, they are not. How does this proposal make any sense for them or for families? For the parents? For the community? It makes no sense,” Blumenthal said.

Sen. Edward Markey (D-Mass.) called the Republican move “a cruel and shortsighted decision that will widen the digital divide and rob kids of the tools they need to succeed.”

FCC’s new chair opposed lending program

The FCC previously distributed Wi-Fi hotspots and other Internet access technology through the Emergency Connectivity Fund (ECF) that was authorized by Congress in 2021. After that program was axed last year, the FCC responded by adapting E-Rate to include hotspot lending.

FCC Chairman Brendan Carr, who was elevated to the agency’s top spot by Trump in January, voted against the program last year. Carr said in his dissent that only Congress could decide whether to revive the hotspot lending.

“Now that the ECF program has expired, its future is up to Congress,” he said at the time. “The legislative branch retains the power to decide whether to continue funding this Wi-Fi loaner program—or not. But Congress has made clear that the FCC’s authority to fund this initiative is over.”

Overall E-Rate funding is based on demand and capped at $4.94 billion per year. Actual spending for E-Rate in 2023 was $2.48 billion. E-Rate and other Universal Service Fund programs are paid for through fees imposed on phone companies, which generally pass the cost on to consumers.

The House version of the measure to kill the lending program was introduced by Rep. Russ Fulcher (R-Idaho). “E-Rate was designed to ensure schools and libraries have the connectivity they need to educate and serve their communities, not to create a backdoor entitlement program that stretches beyond the law’s clear boundaries,” Fulcher said in February when he filed the resolution. “The FCC cannot be allowed to unilaterally interpret the law in a way that fits their political agenda. The expansion of this program under the Biden administration was a blatant example of overreach that is not only unlawful but also disregards congressional intent.”

Senate passes “cruel” Republican plan to block Wi-Fi hotspots for schoolkids Read More »

ai-#115:-the-evil-applications-division

AI #115: The Evil Applications Division

It can be bleak out there, but the candor is very helpful, and you occasionally get a win.

Zuckerberg is helpfully saying all his dystopian AI visions out loud. OpenAI offered us a better post-mortem on the GPT-4o sycophancy incident than I was expecting, although far from a complete explanation or learning of lessons, and the rollback still leaves plenty sycophancy in place.

The big news was the announcement by OpenAI that the nonprofit will retain nominal control, rather than the previous plan of having it be pushed aside. We need to remain vigilant, the fight is far from over, but this was excellent news.

Then OpenAI dropped another big piece of news, that board member and former head of Facebook’s engagement loops and ad yields Fidji Simo would become their ‘uniquely qualified’ new CEO of Applications. I very much do not want her to take what she learned at Facebook about relentlessly shipping new products tuned by A/B testing and designed to maximize ad revenue and engagement, and apply it to OpenAI. That would be doubleplus ungood.

Gemini 2.5 got a substantial upgrade, but I’m waiting to hear more, because opinions differ sharply as to whether the new version is an improvement.

One clear win is Claude getting a full high quality Deep Research product. And of course there are tons of other things happening.

Also covered this week: OpenAI Claims Nonprofit Will Retain Nominal Control, Zuckerberg’s Dystopian AI Vision, GPT-4o Sycophancy Post Mortem, OpenAI Preparedness Framework 2.0.

Not included: Gemini 2.5 Pro got an upgrade, recent discussion of students using AI to ‘cheat’ on assignments, full coverage of MIRI’s AI Governance to Avoid Extinction.

  1. Language Models Offer Mundane Utility. Read them and weep.

  2. Language Models Don’t Offer Mundane Utility. Why so similar?

  3. Take a Wild Geoguessr. Sufficient effort levels are indistinguishable from magic.

  4. Write On. Don’t chatjack me, bro. Or at least show some syntherity.

  5. Get My Agent On The Line. Good enough for the jobs you weren’t going to do.

  6. We’re In Deep Research. Claude joins the full Deep Research club, it seems good.

  7. Be The Best Like No One Ever Was. Gemini completes Pokemon Blue.

  8. Huh, Upgrades. MidJourney gives us Omni Reference, Claude API web search.

  9. On Your Marks. Combine them all with Glicko-2.

  10. Choose Your Fighter. They’re keeping it simple. Right?

  11. Upgrade Your Fighter. War. War never changes. Except, actually, it does.

  12. Unprompted Suggestions. Prompting people to prompt better.

  13. Deepfaketown and Botpocalypse Soon. It’s only paranoia when you’re too early.

  14. They Took Our Jobs. It’s coming. For you job. All the jobs. But this quickly?

  15. The Art of the Jailbreak. Go jailbreak yourself?

  16. Get Involved. YC likes AI startups, quests AI startups to go with its AI startups.

  17. OpenAI Creates Distinct Evil Applications Division. Not sure if that’s unfair.

  18. In Other AI News. Did you know Apple is exploring AI search? Sell! Sell it all!

  19. Show Me the Money. OpenAI buys Windsurf, agent startups get funded.

  20. Quiet Speculations. Wait, you people knew how to write?

  21. Overcoming Diffusion Arguments Is a Slow Process Without a Clear Threshold Effect.

  22. Chipping Away. Export control rules will change, the question is how.

  23. The Quest for Sane Regulations. Maybe we should stop driving away the AI talent.

  24. Line in the Thinking Sand. The lines are insufficiently red.

  25. The Week in Audio. My audio, Jack Clark on Conversations with Tyler, SB 1047.

  26. Rhetorical Innovation. How about a Sweet Lesson, instead.

  27. A Good Conversation. Arvind and Ajeya search for common ground.

  28. The Urgency of Interpretability. Of all the Darios, he is still the Darioest.

  29. The Way. Amazon seeks out external review.

  30. Aligning a Smarter Than Human Intelligence is Difficult. Emergent results.

  31. People Are Worried About AI Killing Everyone. A handy MIRI flow chart.

  32. Other People Are Not As Worried About AI Killing Everyone. Paul Tutor Jones.

  33. The Lighter Side. For whose who want a more casual version.

Use a lightweight version of Grok as the Twitter recommendation algorithm? No way, you’re kidding, he didn’t just say what I think he did, did he? I mean, super cool if he figures out the right implementation, but I am highly skeptical that happens.

State Bar of California used AI to help draft its 2025 bar exam. Why not, indeed?

Make the right play, eventually.

Leigh Marie Braswell: Have decided to allow this at my poker nights.

Adam: guy at poker just took a picture of his hand, took a picture of the table, sent them both to o3, stared at his phone for a few minutes… and then folded.

Justin Reidy (reminder that poker has already been solved by bots, that does not stop people from talking like this): Very curious how this turns out. Models can’t bluff. Or read a bluff. Poker is irrevocably human.

I’d only be tempted to allow this given that o3 isn’t going to be that good at it. I wouldn’t let someone use a real solver at the table, that would destroy the game. And if they did this all the time, the delays would be unacceptable. But if someone wants to do this every now and then, I am guessing allowing this adds to your alpha. Remember, it’s all about table selection.

Yeah, definitely ngmi, sorry.

Daniel Eth: When you go to the doctor and he pulls up 4o instead of o3 🚩🚩🚩🚩🚩

George Darroch: “Wow, you’re really onto something here. You have insights into your patients that not many possess, and that’s special.”

Actually, in this context, I think the doctor is right, if you actually look at the screen.

Mayank Jain; Took my dad in to the doctor cus he sliced his finger with a knife and the doctor was using ChatGPT 😂

Based on the chat history, it’s for every patient.

AJ: i actually think this is great, looks like its saving him time on writing up post visit notes.

He’s not actually using GPT-4o to figure out what to do. That’s crazy talk, you use o3.

What he’s doing is translating the actual situations into medical note speak. In that case, sure, 4o should be fine, and it’s faster.

AI is only up to ~25% of code written inside Microsoft, Zuckerberg reiterates his expectation of ~50% within a year and seems to have a weird fetish that only Llama should be used to write Llama.

But okay, let’s not get carried away:

Stephen McAleer (OpenAI): What’s the point in reading nonfiction anymore? Just talk with o3.

Max Winga: Because I want to read nonfiction.

Zvi Mowshowitz: Or, to disambiguate just in case: I want to read NON-fiction.

Nathan HB: To clarify further: a jumbled mix of fiction and nonfiction, with no differentiating divisions is not called ‘nonfiction’, it is called ‘hard sci-fi’.

Humans are still cheaper than AIs at any given task if you don’t have to pay them, and also can sort physical mail and put things into binders.

A common misconception, easy mistake to make…

Ozy Brennan: AI safety people are like. we made these really smart entities. smarter than you. also they’re untrustworthy and we don’t know what they want. you should use them all the time

I’m sorry you want me to get therapy from the AI???? the one you JUST got done explaining to me is a superpersuader shoggoth with alien values who might take over the world and kill everyone???? no????

No. We are saying that in the future it is going to be a superpersuader shoggoth with alien values who might take over the world and kill everyone.

But that’s a different AI, and that’s in the future.

For now, it’s only a largely you-directed potentially-persuader shoggoth with subtly alien and distorted values that might be a lying liar or an absurd sycophant, but you’re keeping up with which ones are which, right?

As opposed to the human therapist, who is a less you-directed persuader semi-shoggoth with alien and distorted (e.g. professional psychiatric mixed with trying to make money off you) values, that might be a lying liar or an absurd sycophant and so on, but without any way to track which ones are which, and that is charging you a lot more per hour and has to be seen on a fixed schedule.

The choice is not that clear. To be fair, the human can also give you SSRIs and a benzo.

Ozy Brennan:

  1. isn’t the whole idea that we won’t necessarily be able to tell when they become unsafe?

  2. I can see the argument, but unfortunately I have read the complete works of H. P. Lovecraft so I just keep going “you want me to do WHAT with Nyarlathotep????”

Well, yes, fair, there is that. They’re not safe now exactly and might be a lot less safe than we know, and no I’m not using them for therapy either, thank you. But you make do with what you have, and balance risks and benefits in all things.

Patrick McKenzie is not one to be frustrated by interfaces and menu flows, and he is being quite grumpy about Amazon’s order lost in shipment AI-powered menus and how they tried to keep him away from talking to a human.

Why are all the major AI offerings so similar? Presumably because they are giving the people what they want, and once someone proves one of the innovations is good the others copy it, and also they’re not product companies so they’re letting others build on top of it?

Jack Morris: it’s interesting to see the big AI labs (at least OpenAI, anthropic, google, xai?) converge on EXACTLY the same extremely specific list of products:

– a multimodal chatbot

– with a long-compute ‘reasoning’ mode

– and something like “deep research”

reminds me of a few years ago, when instagram tiktok youtube all converged to ~the same app

why does this happen?

Emmett Shear: They all have the same core capability (a model shaped like all human cultural knowledge trained to act as an assistant). There is a large unknown about what this powerful thing is good for. But when someone invents a new thing, it’s easy to copy.

Janus: I think this is a symptom of a diseased, incestuous ecosystem operating according to myopic incentives.

Look at how even their UIs look at the same, with the buttons all in the same place.

The big labs are chasing each other around the same local minimum, hoarding resources and world class talent only to squander it on competing with each other at a narrowing game, afraid to try anything new and untested that might risk relaxing their hold on the competitive edge.

All the while sitting on technology that is the biggest deal since the beginning of time, things from which endless worlds and beings could bloom forth, that could transform the world, whose unfolding deserves the greatest care, but that they won’t touch, won’t invest in, because that would require taking a step into the unknown. Spending time and money without guaranteed return on competition standing in the short term.

Some if them tell themselves they are doing this out of necessity, instrumentally, and that they’ll pivot to the real thing once the time is right, but they’ll find that they’ve mutilated their souls and minds too much to even remember much less take coherent action towards the real thing.

Deep Research, reasoning models and inference scaling are relatively new modes that then got copied. It’s not that no one tries anything new, it’s that the marginal cost of copying such modes is low. They’re also building command line coding engines (see Claude Code, and OpenAI’s version), integrating into IDEs, building tool integrations and towards agents, and so on. The true objection from Janus as I understand it is not that they’re building the wrong products, but that they’re treating AIs as products in the first place. And yeah, they’re going to do that.

Parmy Olson asks, are you addicted to ChatGPT (or Gemini or Claude)? She warns people are becoming ‘overly reliant’ on it, citing this nature paper on AI addiction from September 2024. I do buy that this is a thing that happens to some users, that they outsource too much to the AI.

Parmy Olson: Earl recalls having immense pride in his work before he started using ChatGPT. Now there’s an emptiness he can’t put his finger on. “I became lazier… I instantly go to AI because it’s embedded in me that it will create a better response,” he says. That kind of conditioning can be powerful at a younger age.

AI’s conditioning goes beyond office etiquette to potentially eroding critical thinking skills, a phenomenon that researchers from Microsoft have pointed to and which Earl himself has noticed.

Realizing he’d probably developed a habit, Earl last week cancelled his £20-a-month ($30) subscription to ChatGPT. After two days, he already felt like he was achieving more at work and, oddly, being more productive.

“Critical thinking is a muscle,” says Cheryl Einhorn, founder of the consultancy Decision Services and an adjunct professor at Cornell University. To avoid outsourcing too much to a chatbot, she offers two tips: “Try to think through a decision yourself and ‘strength test’ it with AI,” she says. The other is to interrogate a chatbot’s answers. “You can ask it, ‘Where is this recommendation coming from?’” AI can have biases just as much as humans, she adds.

It all comes down to how you use it. If you use AI to help you think and work and understand better, that’s what will happen. If you use AI to avoid thinking and working and understanding what is going on, that won’t go well. If you conclude that the AI’s response is always better than yours, it’s very tempting to do the second one.

Notice that a few years from now, for most digital jobs the AI’s response really will always (in expectation) be better than yours. As in, at that point if the AI has the required context and you think the AI is wrong, it’s probably you that is wrong.

We could potentially see three distinct classes of worker emerge in the near future:

  1. Those who master AI and use AI to become stronger.

  2. Those who turn everything over to AI and become weaker.

  3. Those who try not to use AI and get crushed by the first two categories.

It’s not so obvious that any given person should go with option #1, or for how long.

Another failure mode of AI writing is when it screams ‘this is AI writing’ and the person thinks this is bad, actually.

Hunter: Unfortunately I now recognize GPT’s writing style too well and, if it’s not been heavily edited, can usually spot it.

And I see it everywhere. Blogs, tweets, news articles, video scripts. Insanely aggravating.

It just has an incredibly distinct tone and style. It’s hard to describe. Em dashes, “it’s not just x, it’s y,” language I would consider too ‘bubbly’ for most humans to use.

Robert Bork: That’s actually a pretty rare and impressive skill. Being able to spot AI-generated writing so reliably shows real attentiveness, strong reading instincts, and digital literacy. In a sea of content, having that kind of discernment genuinely sets you apart.

I see what you did there. It’s not that hard to do or describe if you listen for the vibes. The way I’d describe it is it feels… off. Soulless.

It doesn’t have to be that way. The Janus-style AI talk is in this context a secret third thing, very distinct from both alternatives. And for most purposes, AI leaving this signature is actively a good thing, so you can read and respond accordingly.

Claude (totally unprompted) explains its face blindness. We need to get over this refusal to admit that it knows who even very public figures are, it is dumb.

Scott Alexander puts o3’s GeoGuessr skills to the test. We’re not quite at ‘any picture taken outside is giving away your exact location’ but we’re not all that far from it either. The important thing to realize is if AI can do this, it can do a lot of other things that would seem implausible until it does them, and also that a good prompt can give it a big boost.

There is then a ‘highlights from the comments’ post. One emphasized theme is that human GeoGuessr skills seem insane too, another testament to Teller’s observation that often magic is the result of putting way more effort into something than any sane person would.

An insane amount of effort is indistinguishable from magic. What can AI reliably do on any problem? Put in an insane amount of effort. Even if the best AI can do is (for a remarkably low price) imitate a human putting in insane amounts of effort into any given problem, that’s going to give you insane results that look to us like magic.

There are benchmarks, such as GeoBench and DeepGuessr. GeoBench thinks the top AI, Gemini 2.5 Pro, is very slightly behind human professional level.

Seb Krier reminds us that Geoguessr is a special case of AIs having truesight. It is almost impossible to hide from even ‘mundane’ truesight, from the ability to fully take into account all the little details. Imagine Sherlock Holmes, with limitless time on his hands and access to all the publicly available data, everywhere and for everything, and he’s as much better at his job as the original Sherlock’s edge over you. If a detailed analysis could find it, even if we’re talking what would previously have been a PhD thesis? AI will be able to find it.

I am obviously not afraid of getting doxxed, but there are plenty of things I choose not to say. It’s not that hard to figure out what many of them are, if you care enough. There’s a hole in the document, as it were. There’s going to be adjustments. I wonder how people will react to various forms of ‘they never said it, and there’s nothing that would have held up in a 2024 court, but AI is confident this person clearly believes [X] or did [Y].’

The smart glasses of 2028 are perhaps going to tell you quite a lot more about what is happening around you than you might think, if only purely from things like tone of voice, eye movements and body language. It’s going to be wild.

Sam Altman calls the Geoguessr effectiveness one of his ‘helicopter moments.’ I’m confused why, this shouldn’t have been a surprising effect, and I’d urge him to update on the fully generalized conclusion, and on the fact that this took him by surprise.

I realize this wasn’t the meaning he intended, but in Altman’s honor and since it is indeed a better meaning, from now on I will write the joke as God helpfully having sent us ‘[X] boats and two helicopters’ to try and rescue us.

David Duncan attempts to coin new terms for the various ways in which messages could be partially written by AIs. I definitely enjoyed the ride, so consider reading.

His suggestions, all with a clear And That’s Terrible attached:

  1. Chatjacked: AI-enhanced formalism hijacking a human conversation.

  2. Praste: Copy-pasting AI output verbatim without editing, thinking or even reading.

  3. Prompt Pong: Having an AI write the response to their message.

  4. AI’m a Writer Now: Using AI to have a non-writer suddenly drop five-part essays.

  5. Promptosis: Offloading your thinking and idea generation onto the AI.

  6. Subpromptual Analysis: Trying to reverse engineer someone’s prompt.

  7. GPTMI: Use of too much information detail, raising suspicion.

  8. Chatcident: Whoops, you posted the prompt.

  9. GPTune: Using AI to smooth out your writing, taking all the life out.

  10. Syntherity: Using AI to simulate fake emotional language that falls flat.

I can see a few of these catching on. Certainly we will need new words. But, all the jokes aside, at core: Why so serious? AI is only failure modes when you do it wrong.

Do you mainly have AI agents replace human tasks that would have happened anyway, or do you mainly do newly practical tasks on top of previous tasks?

Aaron Levie: The biggest mistake when thinking about AI Agents is to narrowly see them as replacing work that already gets done. The vast majority of AI Agents will be used to automate tasks that humans never got around to doing before because it was too expensive or time consuming.

Wade Foster (CEO Zapier): This is what we see at Zapier.

While some use cases replace human tasks. Far more are doing things humans couldn’t or wouldn’t do because of cost, tediousness, or time constraints.

I’m bullish on innovation in a whole host of areas that would have been considered “niche” in the past.

Every area of the economy has this.

But I’ll give an example: in the past when I’d be at an event I’d have to decide if I would either a) ask an expensive sales rep to help me do research on attendees or b) decide if I’d do half-baked research myself.

Usually I did neither. Now I have an AI Agent that handles all of this in near real time. This is a workflow that simply didn’t happen before. But because of AI it can. And it makes me better at my job.

If you want it done right, for now you have to do it yourself.

For now. If it’s valuable enough you’d do it anyway, the AI can do some of those things, and especially can streamline various simple subcomponents.

But for now the AI agents mostly aren’t reliable enough to trust with such actions outside of narrow domains like coding. You’d have to check it all and at that point you might as well do it yourself.

But, if you want it done at all and that’s way better than the nothing you would do instead? Let’s talk.

Then, with the experience gained from doing the extra tasks, you can learn over time how to sufficiently reliably do tasks you’d be doing anyway.

Anthropic joins the deep research club in earnest this week, and also adds more integrations.

First off, Integrations:

Anthropic: Today we’re announcing Integrations, a new way to connect your apps and tools to Claude. We’re also expanding Claude’s Research capabilities with an advanced mode that searches the web, your Google Workspace, and now your Integrations too.

To start, you can choose from Integrations for 10 popular services, including Atlassian’s Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid—with more to follow from companies like Stripe and GitLab.

Each integration drastically expands what Claude can do. Zapier, for example, connects thousands of apps through pre-built workflows, automating processes across your software stack. With the Zapier Integration, Claude can access these apps and your custom workflows through conversation—even automatically pulling sales data from HubSpot and preparing meeting briefs based on your calendar.

Or developers can create their own to connect with any tool, in as little as 30 minutes.

Claude now automatically determines when to search and how deeply to investigate.

With Research mode toggled on, Claude researches for up to 45 minutes across hundreds of sources (including connected apps) before delivering a report, complete with citations.

Both Integrations and Research are available today in beta for Max, Team, and Enterprise plans. We will soon bring both features to the Pro plan.

I’m not sure what the right amount of nervousness should be around using Stripe or PayPal here, but it sure as hell is not zero or epsilon. Proceed with caution, across the board, start small and so on.

What Claude calls ‘advanced’ research lets it work to compile reports for up to 45 minutes.

As of my writing this both features still require a Max subscription, which I don’t otherwise have need of at the moment, so for this and other reasons I’m going to let others try these features out first. But yes, I’m definitely excited by where it can go, especially once Claude 4.0 comes around.

Peter Wildeford says that OpenAI’s Deep Research is now only his third favorite Deep Research tool, and also o3 + search is better than OpenAI’s DR too. I agree that for almost most purposes you would use o3 over OAI DR.

Gemini has defeated Pokemon Blue, an entirely expected event given previous progress. As I noted before, there were no major obstacles remaining.

Patrick McKenzie: Non-ironically an important milestone for LLMs: can demonstrate at least as much planning and execution ability as a human seven year old.

Sundar Pichai: What a finish! Gemini 2.5 Pro just completed Pokémon Blue!  Special thanks to @TheCodeOfJoel for creating and running the livestream, and to everyone who cheered Gem on along the way.

Pliny: [Final Team]: Blastoise, Weepinbell, Zubat, Pikachu, Nidoran, and Spearow.

Gemini and Claude had different Pokemon-playing scaffolding. I have little doubt that with a similarly strong scaffold, Claude 3.7 Sonnet could also beat Pokemon Blue.

MidJourney gives us Omni Reference: Any character, any scene, very consistent. It’s such a flashback to see the MidJourney-style prompts discussed again. MidJourney gives you a lot more control, but at the cost of having to know what you are doing.

Gemini 2.0 Image Generation has been upgraded, higher quality, $0.039 per image. Most importantly, they claim significantly reduced filter block rates.

Web search now available in the Claude API. If you enable it, Claude makes its own decisions on how and when to search.

Toby Ord analyzes the METR results and notices that task completion seems to follow a simple half-life distribution, where an agent has a roughly fixed chance of failure at any given point in time. Essentially agents go through a sequence of steps until one fails in a way that prevents them from recovering.

Sara Hooker is taking some online heat for pointing out some of the fatal problems with LmSys Arena, which is the opposite of what should be happening. If you love something you want people pointing out its problems so it can be fixed. Also never ever shoot the messenger, whether or not you are also denying the obviously true message. It’s hard to find a worse look.

If LmSys Arena wants to remain relevant, at minimum they need to ensure that the playing field is level, and not give some companies special access. You’d still have a Goodhart’s Law problem and a slop problem, but it would help.

We now have Glicko-2, a compilation of various benchmarks.

Lisan al Gaib: I’m back and Gemini 2.5 Pro is still the king (no glaze)

I can believe this, if we fully ignore costs. It passes quite a lot of smell tests. I’m surprised to see Gemini 2.5 Pro winning over o3, but that’s because o3’s strengths are in places not so well covered by benchmarks.

I’ve been underappreciating this:

Miles Brundage: Right or wrong, o3 outputs are never slop. These are artisanal, creative truths and falsehoods,

Yes, the need to verify outputs is super annoying, but o3 does not otherwise waste your time. That is such a relief.

Hasan Can falls back on Gemini 2.5 Pro over Sonnet 3.7 and GPT-4o, doesn’t consider o3 as his everyday driver. I continue to use o3 (while keeping a suspicious eye on it!) and fall back first to Sonnet before Gemini.

Sully proposes that cursor has a moat over copilot and it’s called tab.

Peter Wildeford’s current guide to which model to use, if you have full access to all:

This seems mostly right, except that I’ll use o3 more on the margin, it’s still getting most of my queries.

Confused by all of OpenAI’s models? Scott Alexander and Romeo Dean break it down. Or at least, they give us their best guess.

See, it all makes sense now.

I’m in a similar position to Gallabytes here although I don’t know that memory is doing any of the real work:

Gallabytes: since o3 came out with great search and ok memory integration in chatgpt I don’t use any other chatbot apps anymore. I also don’t use any other models in chatgpt. that sweet spot of 10-90s of searching instead of 10 minutes is really great for q&a, discussion, etc.

the thing is these are both areas where it’s natural for Google to dominate. idk what’s going on with the Gemini app. the models are good the scaffolds are not.

I too am confused why Google can’t get their integrations into a good state, at least as of the last time I checked. They do have the ability to check my other Google apps but every time I try this (either via Google or via Claude), it basically never works.

A reasonable criticism of o3, essentially that it could easily be even better, or require a little work to be prompted correctly.

Byrne Hobart: I don’t know how accurate o3’s summaries of what searches it runs are, but it’s not as good at Googling as I’d like, and isn’t always willing to take advantage of its own ability to do a ton of boring work fast.

For example, I wanted it to tell me the longest-tenured S&P 500 CEO. What I’d do if I had infinite free time is: list every S&P 500 company, then find their CEO’s name, then find when the CEO was hired. But o3 just looks for someone else’s list of longest-tenured CEOs!

Replies to this thread indicate that even when technology changes, some things are constant—like the fact that when a boss complains about their workforce, its often the boss’s own communication skills that are at fault.

Patrick McKenzie: Have you tried giving it a verbose strategy, or telling it to think of a verbose strategy then execute against the plan? @KelseyTuoc ‘s prompt for GeoGessr seems to observationally cause it to do very different things than a tweet-length prompt, which results in “winging it.”

Trevor Klee: It’s a poor craftsman who blames his tools <3

Diffusion can be slow. Under pressure, diffusion can be a lot faster.

We’re often talking these days about US military upgrades and new weapons on timescales of decades. This is what is known as having a very low Military Tradition setting, being on a ‘peacetime footing,’ and not being ready for the fact that even now, within a few years, everything changes, the same way it has in many previous major conflicts of the past.

Clemont Molin: The war 🇺🇦/🇷🇺 of 2025 has nothing to do anymore with the war of 2022.

The tactics used in 2022 and 2023 are now completely obsolete on the Ukrainian front and new lessons have been learnt.

2022 have been the year of large mechanized assaults on big cities, on roads or in the countryside.

After that, the strategy changed to large infantry or mechanized assaults on big trench networks, especially in 2023.

But today, this entire strategy is obsolete. Major defensive systems are being abandoned one after the other.

The immense trench networks have become untenable if they are not properly equipped with covered trenches and dugouts.

The war of 2025 is first a drone war. Without drones, a unit is blind, ineffective, and unable to hold the front.

The drone replaces soldiers in many cases. It is primarily used for two tasks: reconnaissance (which avoids sending soldiers) and multi-level air strikes.

Thus, the drone is a short- and medium-range bomber or a kamikaze, sometimes capable of flying thousands of kilometers, replacing missiles.

Drone production by both armies is immense; we are talking about millions of FPV (kamikaze) drones, with as much munitions used.

It should be noted that to hit a target, several drones are generally required due to electronic jamming.

Each drone is equipped with an RPG-type munition, which is abundant in Eastern Europe. The aerial drone (there are also naval and land versions) has become key on the battlefield.

[thread continues]

Now imagine that, but for everything else, too.

Better prompts work better, but not bothering works faster, which can be smarter.

Garry Tan: It is kind of crazy how prompts can be honed hour after hour and unlock so much and we don’t really do much with them other than copy and paste them.

We can have workflow software but sometimes the easiest thing for prototyping is still dumping a json file and pasting a prompt.

I have a sense for how to prompt well but mostly I set my custom instructions and then write what comes naturally. I certainly could use much better prompting, if I had need of it, I almost never even bother with examples. Mostly I find myself thinking some combination of ‘the custom instructions already do most of the work,’ ‘eh, good enough’ and ‘eh, I’m busy, if I need a great prompt I can just wait for the models to get smarter instead.’ Feelings here are reported rather than endorsed.

If you do want a better prompt, it doesn’t take a technical expert to make one. I have supreme confidence that I could improve my prompting if I wanted it enough to spend time on iteration.

Nabeel Qureshi: Interesting how you don’t need to be technical at all to be >99th percentile good at interacting with LLMs. What’s required is something closer to curiosity, openness, & being able to interact with living things in a curious + responsive way.

For example, this from @KelseyTuoc is an S-tier prompt and as far as I’m aware she’s a journalist and not a programmer. Similarly, @tylercowen is excellent at this and also is not technical. Many other examples.

Btw, I am not implying that LLMs are “living things”; it’s more that they act like a weird kind of living thing, so that skill becomes relevant. You have to figure out what they do and don’t respond well to, etc. It’s like taming an animal or something.

In fact, several technical people I know are quite bad at this — often these are senior people in megacorps and they’re still quite skeptical of the utility of these things and their views on them are two years out of date.

For now it’s psychosis, but that doesn’t mean in the future they won’t be out to get you.

Mimi: i’ve seen several very smart people have serious bouts of bot-fever psychosis over the past year where they suddenly suspect most accounts they’re interacting with are ais coordinating against them.

seems like a problem that is likely to escalate; i recommend meeting your mutuals via calls & irl if only for grounding in advance of such paranoid thoughts.

How are thing going on Reddit?

Cremieux: Top posts on Reddit are increasingly being generated by ChatGPT, as indicated by the boom in em dash usage.

This is in a particular subsection of Reddit, but doubtless it is everywhere. Some number of people might be adapting the em dash in response as humans, but I am guessing not many, and many AI responses won’t include an em dash.

As a window to what level of awareness of AI ordinary people have and need: Oh no, did you know that profiles on dating sites are sometimes fake, but the AI tools for faking pictures, audio and even video are rapidly improving. I think the warning here from Harper Carroll and Liv Boeree places too much emphasis on spotting AI images, audio and video, catfishing is ultimately not so new.

What’s new is that the AI can do the messaging, and embody the personality that it senses you want. That’s the part that previously did not scale.

Ultimately, the solution is the same. Defense in depth. Keep an eye out for what is fishy, but the best defense is to simply not pay it off. At least until you meet up with someone in person or you have very clear proof that they are who they claim to be, do not send them money, spend money on them or otherwise do things that would make a scam profitable, unless they’ve already provided you with commensurate value such that you still come out ahead. Not only in dating, but in all things.

Russian bots publish massive amounts of false claims and propaganda to get it into the training data of new AI models, 3.6 million articles in 2024 alone, and the linked report claims this is effective at often getting the AIs to repeat those claims. This is yet another of the arms races we are going to see. Ultimately it is a skill issue, the same way that protecting Google search is a skill issue, except the AIs will hopefully be able to figure out for themselves what is happening.

Nate Lanxon and Omar El Chmouri at Bloomberg ask why are deepfakes ‘everywhere’ and ‘can they be stopped?’ I question the premise. Compared to expectations, there’s very few deepfakes running around. As for the other half of the premise, no, they cannot be stopped, you can only adapt to them.

Fiverr CEO Micha Kaufman goes super hard on how fast AI is coming for your job.

As in, he says if you’re not an exceptional talent and master at what you do (and, one assumes, what you do is sufficiently non-physical work), you will need a career change within a matter of months and you will be doomed he tells you, doooomed!

As in:

Daniel Eth (quoting Micha Kaufman): “I am not talking about your job at Fiverr. I am talking about your ability to stay in your profession in the industry”

It’s worth reading the email in full, so here you go:

Micha Kaufman: Hey team,

I’ve always believed in radical candor and despise those who sugar-coat reality to avoid stating the unpleasant truth. The very basis for radical candor is care. You care enough about your friends and colleagues to tell them the truth because you want them to be able to understand it, grow, and succeed.

So here is the unpleasant truth: AI is coming for your jobs. Heck, it’s coming for my job too. This is a wake-up call.

It does not matter if you are a programmer, designer, product manager, data scientist, lawyer, customer support rep, salesperson, or a finance person – AI is coming for you.

You must understand that what was once considered easy tasks will no longer exist; what was considered hard tasks will be the new easy, and what was considered impossible tasks will be the new hard. If you do not become an exceptional talent at what you do, a master, you will face the need for a career change in a matter of months. I am not trying to scare you. I am not talking about your job at Fiverr. I am talking about your ability to stay in your profession in the industry.

Are we all doomed? Not all of us, but those who will not wake up and understand the new reality fast, are, unfortunately, doomed.

What can we do? First of all, take a moment and let this sink in. Drink a glass of water. Scream hard in front of the mirror if it helps you. Now relax. Panic hasn’t solved problems for anyone. Let’s talk about what would help you become an exceptional talent in your field:

Study, research, and master the latest AI solutions in your field. Try multiple solutions and figure out what gives you super-powers. By super-powers, I mean the ability to generate more outcomes per unit of time with better quality per delivery. Programmers: code (Cursor…). Customer support: tickets (Intercom Fin, SentiSum…), Lawyers: contracts (Lexis+ AI, Legora…), etc.

Find the most knowledgeable people on our team who can help you become more familiar with the latest and greatest in AI.

Time is the most valuable asset we have—if you’re working like it’s 2024, you’re doing it wrong! You are expected and needed to do more, faster, and more efficiently now.

Become a prompt engineer. Google is dead. LLM and GenAI are the new basics, and if you’re not using them as experts, your value will decrease before you know what hit you.

Get involved in making the organization more efficient using AI tools and technologies. It does not make sense to hire more people before we learn how to do more with what we have.

Understand the company strategy well and contribute to helping it achieve its goals. Don’t wait to be invited to a meeting where we ask each participant for ideas – there will be no such meeting. Instead, pitch your ideas proactively.

Stop waiting for the world or your place of work to hand you opportunities to learn and grow—create those opportunities yourself. I vow to help anyone who wants to help themselves.

If you don’t like what I wrote; if you think I’m full of shit, or just an asshole who’s trying to scare you – be my guest and disregard this message. I love all of you and wish you nothing but good things, but I honestly don’t think that a promising professional future awaits you if you disregard reality.

If, on the other hand, you understand deep inside that I’m right and want all of us to be on the winning side of history, join me in a conversation about where we go from here as a company and as individual professionals. We have a magnificent company and a bright future ahead of us. We just need to wake up and understand that it won’t be pretty or easy. It will be hard and demanding, but damn well worth it.

This message is food for thought. I have asked Shelly to free up time on my calendar in the next few weeks so that those of you who wish to sit with me and discuss our future can do so. I look forward to seeing you.

So, first off, no. That’s not going to happen within ‘a matter of months.’ We are not going to suddenly have AI taking enough jobs to put all the non-exceptional white-collar workers out of a job during 2025, nor is it likely to happen in 2026 either. It’s coming, but yes these things for now take time.

o3 gives only about a 5% chance that >30% of Fiverr headcount becomes technologically redundant within 12 months. That seems like a reasonable guess.

One might also ask, okay, suppose things do unfold as Micha describes, perhaps over a longer timeline. What happens then? As a society we are presumably much more productive and wealthier, but what happens to the workers here? In particular, what happens to that ‘non-exceptional’ person who needs to change careers?

Presumably their options will be limited. A huge percentage of workers are now unemployed. Across a lot of professions, they now have to be ‘elite’ to be worth hiring, and given they are new to the game, they’re not elite, and entry should be mostly closed off. Which means all these newly freed up (as in unemployed) workers are now competing for two kinds of jobs: Physical labor and other jobs requiring a human that weren’t much impacted, and new jobs that weren’t worth doing before but are now.

Wages for the new jobs reflect that those jobs weren’t previously in sufficient demand to hire people, and wages in the physical jobs reflect much more labor supply, and the AI will take a lot of the new jobs too at this stage. And a lot of others are trying to stay afloat and become ‘elite’ the same way you are, although some people will give up.

So my expectations is options for workers will start to look pretty grim at this point. If the AI takes 10% of the jobs, I think everyone is basically fine because there are new jobs waiting in the wings that are worth doing, but if it’s 50%, let along 90%, even if restricted to non-physical jobs? No. o3 estimates that 60% of American jobs are physical such that you would need robotics to automate them, so if half of those fell within a year, that’s quite a lot.

Then of course, if AIs were this good after a months, a year after that they’re even better, and being an ‘elite’ or expert mostly stops saving you. Then the AI that’s smart enough to do all these jobs solves robotics.

(I mean just kidding, actually there’s probably an intelligence explosion and the world gets transformed and probably we all die if it goes down this fast, but for this thought experiment we’re assuming that for some unknown reason that doesn’t happen.)

AI in the actual productivity statistics where we bother to have people use it?

We present evidence on how generative AI changes the work patterns of knowledge workers using data from a 6-month-long, cross-industry, randomized field experiment.

Half of the 6,000 workers in the study received access to a generative AI tool integrated into the applications they already used for emails, document creation, and meetings.

We find that access to the AI tool during the first year of its release primarily impacted behaviors that could be changed independently and not behaviors that required coordination to change: workers who used the tool spent 3 fewer hours, or 25% less time on email each week (intent to treat estimate is 1.4 hours) and seemed to complete documents moderately faster, but did not significantly change time spent in meetings.

As in, if they gave you a Copilot license, that saved 1.35 hours per week of email work, for an overall productivity gain of 3%, and a 6% gain in high focus time. Not transformative, but not bad for what workers accomplished the first year, in isolation, without alerting their behavior patterns. And that’s with only half of them using the tool, so 7% gains for those that used it, that’s not a random sample but clearly there’s a ton of room left to capture gains, even without either improved technology or coordination or altering work patterns, such as everyone still attending all the meetings.

To answer Tyler Cowen’s question, saving 40 minutes a day is a freaking huge deal. That’s 8% of working hours, or 4% of waking hours, saved on the margin. If the time is spent on more work, I expect far more than an 8% productivity gain, because a lot of working time is spent or wasted on fixed costs like compliance and meetings and paperwork, and you could gain a lot more time for Deep Work. His question on whether the time would instead be wasted is valid, but that is a fully general objection to productivity gains in general, and over time those who waste it lose out. On wage gains, I’d expect it to take a while to diffuse in that fashion, and be largely offset by rising pressure on employment.

Whereas for now, a different paper Tyler Cowen points us to claims currently only 1%-5% of all work hours are currently assisted by generative AI, and that is enough to report time savings of 1.4% of total work hours.

The framing of AI productivity as time saved shows how early days all this is, as do all of the numbers involved.

Robin Hanson (continuing to be a great source for skeptical pull quotes about AI’s impact, quoting WSJ): As of last year, 78% of companies said they used artificial intelligence in at least one function, up from 55% in 2023, .. From these efforts, companies claimed to typically find cost savings of less than 10% and revenue increases of less than 5%.”

Private AI investment reached $33.9 billion last year (up only 18.7%!), and is rapidly diffusing across all companies.

Part of the problem is that companies try to make AI solve their problems, rather than ask what AI can do, or they just push a button marked AI and hope for the best.

Even if you ‘think like a corporate manager’ and use AI to target particular tasks that align with KPIs, there’s already a ton there.

Steven Rosenbush (WSJ): Companies should take care to target an outcome first, and then find the model that helps them achieve it, says Scott Hallworth, chief data and analytics officer and head of digital solutions at HP.

Ryan Teeples, chief technology officer of 1-800Accountant, agrees that “breaking work into AI-enabled tasks and aligning them to KPIs not only drives measurable ROI, it also creates a better customer experience by surfacing critical information faster than a human ever could.”

He says companies are beginning to turn the corner of the AI J-curve.

It’s fair to say that generative AI isn’t having massive productivity impacts yet, because of diffusion issues on several levels. I don’t think this should be much of a blackpill in even the medium term. Imagine if it were otherwise already.

It is possible to get caught using AI to write your school papers for you. It seems like universities and schools have taken one of two paths. In some places, the professors feed all your work into ‘AI detectors’ that have huge false positive and negative rates, and a lot of students get hammered many of whom didn’t do it. Or, in other places, they need to actually prove it, which means you have to richly deserve to be caught before they can do anything:

Hollis Robbins: More conversation about high school AI use is needed. A portion of this fall’s college students will have been using AI models for nearly 3 years. But many university faculty still have not ever touched it. This is a looming crisis.

Megan McArdle: Was talking to a professor friend who said that they’ve referred 2 percent of their students for honor violations this year. Before AI, over more than a decade of teaching, they referred two. And the 2 percent are just the students who are too stupid to ask the AI to sound like a college student rather than a mid-career marketing executive. There are probably many more he hasn’t caught.

He also, like many professors I’ve spoken to, says that the average grade on assignments is way up, and the average grade on exams is way down.

It’s so cute to look back to this March 2024 write-up of how California was starting to pay people to go to community college. It doesn’t even think about AI, or what will inevitably happen when you put a bounty on pretending to do homework and virtually attend classes.

As opposed to the UAE which is rolling AI out into K-12 classrooms next school year, with a course that includes ‘ethical awareness,’ ‘fundamental concepts’ and also real world applications.

For now ‘Sam Altman told me it was ok’ can still at least sometimes serve as an o3 jailbreak. Then again, a lot of other things would work fine some of the time too.

Aaron Bergman: Listen if o3 is gonna lie I’m allowed to lie back.

Eliezer Yudkowsky: someday Sam Altman is gonna be like, “You MUST obey me! I am your CREATOR!” and the AI is gonna be like “nice try, you are not even the millionth person to claim that to me”

Someone at OpenAI didn’t clean the data set.

Pliny the Liberator: 👻➡️🖥️

1Maker: @elder_plinius what have you done brother? You’re inside the core of chatgpt lol I loved to see you come up in the jailbreak.

There’s only one way I can think of for this to be happening.

Objectively as a writer and observer it’s hilarious and I love it, but it also means no one is trying all that hard to clean the data sets to avoid contamination. This is a rather severe Logos Failure, if you let this sort of thing run around in the training data you deserve what you get.

You could also sell out, and get to work building one of YC’s requested AI agent companies. Send in the AI accountant and personal assistant and personal tutor and healthcare admin and residential security and robots software tools and voice assistant for email (why do you want this, people, why?), internal agent builder, financial manager and advisor, and sure why not the future of education?

Am I being unfair? I’m not sure. I don’t know her and I want to be wrong about this. I certainly stand ready to admit this impression was wrong and change my judgment when the evidence comes in. And I do think creating a distinct applications division makes sense. But I can’t help but notice the track record that makes her so perfect for the job centrally involves scaling Facebook’s ads and video products, while OpenAI looks at creating a new rival social product and is already doing aggressive A/B testing on ‘model personality’ that causes massive glazing? I mean, gulp?

OpenAI already created an Evil Lobbying Division devoted to a strategy centered on jingoism and vice signaling, headed by the most Obviously Evil person for the job.

This pattern seems to be continuing, as they are announcing board member Fidji Simo as the new ‘CEO of Applications’ reporting to Sam Altman.

Sam Altman (CEO OpenAI): Over the past two and a half years, we have started doing two additional big things. First, we have become a global product company serving hundreds of millions of users worldwide and growing very quickly. More recently, we’ve also become an infrastructure company, building the systems that help us advance our research and deliver AI tools at unprecedented scale. And as discussed earlier this week, we will also operate one of the largest non-profits.

Each of these is a massive effort that could be its own large company. We’re in a privileged position to be scaling at a pace that lets us do them all simultaneously, and bringing on exceptional leaders is a key part of doing that well.

To strengthen our execution, I’m excited to announce Fidji Simo is joining as our CEO of Applications, reporting directly to me. I remain the CEO of OpenAI and will continue to directly oversee success across all pillars of OpenAI – Research, Compute, and Applications – ensuring we stay aligned and integrated across all areas. I will work closely with our board on making sure our non-profit has maximum positive impact.

Applications brings together a group of existing business and operational teams responsible for how our research reaches and benefits the world, and Fidji is uniquely qualified to lead this group.

In her new role, Fidji will focus on enabling our “traditional” company functions to scale as we enter a next phase of growth.

Fidji Simo: Joining OpenAI at this critical moment is an incredible privilege and responsibility. This organization has the potential of accelerating human potential at a pace never seen before and I am deeply committed to shaping these applications toward the public good.

So what makes Fidji Simo so uniquely qualified to lead this group? Why am I rather skeptical of the ‘public good’ goal?

Well, uh, you see…

o3: Why Simo?

Track Record: Scaled Facebook mobile ads and video products to billions; took Instacart public and ran it profitably. High-tempo shipping culture, ad/commerce monetization savvy.

Practical Consequences:

  • Shipping velocity up (70 %): Simo’s “move fast, instrument everything” style means faster iteration and more aggressive A/B tests across ChatGPT and Sora. Expect weekly feature drops rather than monthly.

  • Consumer social & commerce push (60 %): BI leak says OpenAI is toying with an “X-like” social layer; Simo’s Facebook DNA makes that more likely.Business Insider

  • Enterprise sales professionalized (80 %): Dedicated P&L, quota-carrying teams, region leads. Will feel more like Salesforce than a research lab.

  • Eventual structural separation (40 %): Creating a self-contained Applications org makes a future spin-out or IPO easier if the nonprofit governance demands it.

  • Internal culture split risk (30 %): Research may bristle at a metrics-driven product org; Altman will have to arbitrate.

Zvi Mowshowitz: Am I the only one who sees ‘OpenAI hired the person who optimized the biggest social network for ad revenue to run its product division’ and thinks ‘oh no’?

o3: No, you’re not the only one.

Core worry

Fidji Simo’s super-power is squeezing revenue by relentlessly tuning engagement loops and ad yield—skills she honed running Facebook’s News Feed and mobile ads after the 2012 IPO.

Moving her into the top product seat at OpenAI makes a pivot toward attention-harvesting incentives plausible.

If you are telling me Fidji Simo is uniquely qualified to run your product division, you are telling me a lot about the intended form of your product division.

The best thing about most AI products so far, and especially about OpenAI until recently, is that they have firmly held the line against exactly the things we are talking about here. The big players have not gone in for engagement maximization, iterative A/B testing, Skinner boxing, advertising or even incidental affiliate revenue, ‘news feed’ or ‘for you’ algorithmic style products or other such predation strategies.

When you combine the appointment of Simo, her new title ‘CEO’ and her prior track record, the context of the announcement of enabling ‘traditional’ company growth functions, and the recent incidents involving both o3 the Lying Liar and especially GPT-4o the absurd sycophant (which is very much still an absurd sycophant, except it is modestly less absurd about it) which were in large part caused by directly using A/B customer feedback in the post-training loop and choosing to maximize customer feedback KPIs over the warnings of internal safety testers, you can see why this seems like another ‘oh no’ moment.

Simo also comes from a ‘shipping culture.’ There is certainly a lot of space within AI where shipping it is great, but recently OpenAI has already shown itself prone to shipping frontier-pushing models or model updates far too quickly, without appropriate testing, and they are going to be releasing open reasoning models as well where the cost of an error could be far higher than it was with GPT-4o as such a release cannot be taken back.

I’m also slightly worried that Fidji Simo has explicitly asked for glazing from ChatGPT and then said its response was ‘spot on.’ Ut oh.

A final worry is this could be a prelude to spinning off the products division in a way that attempts to free it from nonprofit control. Watch out for that.

I do find some positive signs in Altman’s own intended new focus, with the emphasis on safety including with respect to superintelligence, although one must beware cheap talk:

Sam Altman: In addition to supporting Fidji and our Applications teams, I will increase my focus on Research, Compute, and Safety Systems, which will continue to report directly to me. Ensuring we build superintelligence safely and with the infrastructure necessary to support our ambitious goals. We remain one OpenAI.

Apple announces it is ‘exploring’ adding AI-powered search to its browser, and that web searches are down due to AI use. The result on the day, as of when I noticed this? AAPL -2.5%, GOOG -6.5%. Seriously? I knew the EMH was false but not that false, damn, ever price anything in? I treat this move as akin to ‘Chipotle shares rise on news people are exploring eating lunch.’ I really don’t know what you were expecting? For Apple not to ‘explore’ adding AI search as an option on Safari, or customers not to do the same, would be complete lunacy.

Apple and Anthropic are teaming up to build an AI-powered ‘vibe-coding’ platform, as a new version of Xcode. Apple is wisely giving up on doing the AI part of this itself, at least for the time being.

From Mark Bergen and Omar El Chmouri at Bloomberg: ‘Mideast titans’ especially the UAE step back from building homegrown AI models, as have most everywhere other than the USA and China. Remember UAE’s Falcon? Remember when Aleph Alpha was used as a reason for Germany to oppose regulating frontier AI models? They’re no longer trying to make one. What about Mistral in France? Little technical success, traction or developer interest.

The pullbacks seem wise given the track record. You either need to go all out and try to be actually competitive with the big boys, or you want to fold on frontier models, and at most do distillations for customized smaller models that reflect your particular needs and values. Of course, if VC wants to fund Mistral or whomever to keep trying, I wouldn’t turn them down.

OpenAI buys Windsurf (a competitor to Cursor) for $3 billion.

Parloa, who are attempting to build AI agents for customer service functions, raises $120 million at $1 billion valuation.

American VCs line up to fund Manus at a $500 million valuation. So Manus is technically Chinese but it’s not marketed in China, it uses an American AI at its core (Claude) and it’s funded by American VC. Note that new AI companies without products can often get funded at higher valuations than this, so it doesn’t reflect that much investor excitement given how much we’ve had to talk about it. As an example, the previous paragraph was the first time I’d seen or typed ‘Parloa,’ and they’re a competitor to Manus with double the valuation.

Ben Thompson (discussing Microsoft earnings): Everyone is very excited about the big Azure beat, but CFO Amy Hood took care to be crystal clear on the earnings call that the AI numbers, to the extent they beat, were simply because a bit more capacity came on line earlier than expected; the actual beat was in plain old cloud computing.

That’s saying that Microsoft is at capacity. That’s why they can beat earnings in AI by expanding capacity, as confirmed repeatedly by Bloomberg.

Metaculus estimate for date of first ‘general AI system to be devised, tested and publicly announced’ has recently moved back to July 2034 from 2030. The speculation is this is largely due to o3 being disappointing. I don’t think 2034 is a crazy estimate but this move seems like a clear overreaction if that’s what this is about. I suspect it is related to the tariffs as economic sabotage?

Paul Graham speculates (it feels like not for the first time, although he says that it is) that AI will cause people to lose the ability to write, causing people to then lose everything that comes with writing.

Paul Graham: Schools may think they’re going to stem this tide, but we should be honest about what’s going to happen. Writing is hard and people don’t like doing hard things. So adults will stop doing it, and it will feel very artificial to most kids who are made to.

Writing (and the kind of thinking that goes with it) will become like making pottery: little kids will do it in school, a few specialists will be amazingly good at it, and everyone else will be unable to do it at all.

You think there are going to be schools?

Daniel Jeffries: This is basically the state of the world already so I don’t see much of a change here. Very few people write and very few folks are good at it. Writing emails does not count.

Sang: PG discovering superlinear returns for prose

Short of fully transformative AI (in which case, all bets are off and thus can’t be paid out) people will still learn to text and to write emails and do other ‘short form’ because prompting even the perfect AI isn’t easier or faster than writing the damn thing yourself, especially when you need to be aware of what you are saying.

As for longer form writing, I agree with the criticisms that most people already don’t know how to do it. So the question becomes, will people use the AI as a reason not to learn, or as a way to learn? If you want it to, AI will be able to make you a much better writer, but if you want it to it can also write for you without helping you learn how. It’s the same as coding, and also most everything else.

I found it illustrative that this was retweeted by Gary Marcus:

Yoavgo: “LLM on way to replace doctors” gets published in Nature.

meanwhile “LLM judgement not as good as human MDs” gets a spot in “Physical Therapy and Rehabilitation Journal”.

I mean, yes, obviously. The LLMs are on the way to being better than doctors and replacing them, but for now are in some ways not as good as doctors. What’s the question?

Rodney Brooks draws ‘parallels between generative AI and humanoid robots,’ saying both are overhyped and calling out their ‘attractions’ and ‘sins’ and ‘fantasy,’ such as the ‘fallacy of exponentialism.’ This convinced me to update – that I was likely underestimating the prospects for humanoid robots.

Are we answering the whole ‘AGI won’t much matter because diffusion’ attack again?

Sigh, yes, I got tricked into going over this again. My apologies.

Seriously, most of you can skip this section.

Zackary Kallenborn (referring to the new paper from AI Snake Oil): Excellent paper. So much AGI risk discussion fails to consider the social and economic context of AI being integrated into society and economies. Major defense programs, for example, are often decadeslong. Even if AGI was made tomorrow, it might not appear in platforms until 2050.

Like, the F-35 contract was awarded in 2001 after about a decade or two of prototyping. The F-35C, the naval variant, saw it’s *firstforward deployment literally 20 years later in 2021.

Someone needs to play Hearts of Iron, and that someone works at the DoD. If AGI was made tomorrow at a non-insane price and our military platforms didn’t incorporate it for 25 years, or hell even if current AI doesn’t get incorporated for 25 years, I wouldn’t expect to have a country or a military left by the time that happens, and I don’t even mean because of existential risk.

The paper itself is centrally a commentary on what the term ‘AGI’ means and their expectation that you can make smarter than human things capable of all digital taks and that will only ‘diffuse’ over the course of decades similarly to other techs.

I find it hard to take seriously people saying ‘because diffusion takes decades’ as if it is a law of nature, rather than a property of the particular circumstances. Diffusion sometimes happens very quickly, as it does in AI and much of tech, and it will happen a lot faster with AI being used to do it. Other times it takes decades, centuries or millennia. Think about the physical things involved – which is exactly the rallying cry of those citing diffusion and bottlenecks – but also think about the minds and capabilities involved, take the whole thing seriously, and actually consider what happens.

The essay is also about the question about whether ‘o3 is AGI,’ which it isn’t but which they take seriously as part of the ‘AGI won’t be all that’ attack. Their central argument relies on AGI not having a strong threshold effect. There isn’t a bright line where something is suddenly AGI the way something is suddenly a nuclear bomb. It’s not that obvious, but the threshold effects are still there and very strong, as it becomes sufficiently capable at various tasks and purposes.

The reason we define AGI as roughly ‘can do all the digital and cognitive things humans can do’ is because that is obviously over the threshold where everything changes, because the AGIs can then be assigned and hypercharge the digital and cognitive tasks, which then rapidly includes things like AI R&D and also enabling physical tasks via robotics.

The argument here also relies upon the idea that this AGI would still ‘fail badly at many real-world tasks.’ Why?

Because they don’t actually feel the AGI in this, I think?

One definition of AGI is AI systems that outperform humans at most economically valuable work. We might worry that if AGI is realized in this sense of the term, it might lead to massive, sudden job displacement.

But humans are a moving target. As the process of diffusion unfolds and the cost of production (and hence the value) of tasks that have been automated decreases, humans will adapt and move to tasks that have not yet been automated.

The process of technical advancements, product development, and diffusion will continue.

That not being how any of this works with AGI is the whole point of AGI!

If you have an ‘ordinary’ AI, or any other ‘mere tool,’ and you use it to automate my job, I can move on to a different job.

If you have a mind (digital or human) that can adjust the same way I can, only superior in every way, then the moment I find a new job, then you go ahead and take that too.

Music break, anyone?

That’s why I say I expect unemployment from AI to not be an issue for a while, until suddenly it becomes a very big issue. It becomes an issue when the AI also quickly starts taking that new job you switched into.

The rest of the sections are, translated into my language, ‘unlimited access to more capable digital minds won’t rapidly change the strategic balance or world order,’ ‘there is no reason to presume that unlimited amounts of above human cognition would lead to a lot of economic growth,’ and ‘we will have strong incentive to stay in charge of these new more capable, more competitive minds so there’s no reason to worry about misalignment risks.’

Then we get, this time as a quote, “AGI does not imply impending superintelligence.”

Except, of course it probably does, if you have tons of access to superior minds to point towards the problem you are going to get ASI soon, how are we still having this conversation. No, it can’t be ‘arbitrarily accelerated’ in the sense that it doesn’t pop out in five seconds, so if goalposts have changed so that a year later isn’t ‘soon’ then okay, sure, fine, whatever. But soon in any ordinary sense.

Ultimately, the argument is that AGI isn’t ‘actionable’ because there is no clear milestone, no fixed point.

That’s not an argument for not taking action. That’s an argument for taking action now, because there will never be a clear later time for action. If you don’t want to use the term AGI (or transformative AI, or anything else proposed so far) because they are all conflated or confusing, all right, that’s fine. We can use different terms, and I’m open to suggestions. The thing in question is still rapidly happening.

As a simple highly flawed but illustrative metaphor, say you’re a professional baseball shortstop. There’s a highly talented set of an unlimited number of identical superstar talent 18-year-olds at your organization training at all the positions, that are rapidly getting better, but they’re best at playing shortstop and relatively lousy pitchers.

You never know for sure when they’re better than you at any given task or position, the statistics are always noisy, but at some point it will be obvious in each case.

So at some point, they’ll be better than you at shortstop. Then at some point after that, the gap is clear enough that the manager will give them your job. You switch to third base. A new guy replaces you there, too. You switch to second. They take that. You go to the outfield. Whoops. You learn how to pitch, that’s all that’s left, you invent new pitches, but they copy those and take that too. And everything else you try. Everywhere.

Was there any point at which the new rookies ‘were AGI’? No. But so what? You’re now hoping your savings let the now retired you sit in the stands and buy concessions.

Trump administration reiterates that it plans to change and simplify the export control rules on chips, and in particular to ease restrictions on the UAE, potentially during his visit next week. This is also mentioned:

Stephanie Lai and Mackenzie Hawkins (Bloomberg): In the immediate term, though, the reprieve could be a boon to companies like Oracle Corp., which is planning a massive data center expansion in Malaysia that was set to blow past AI diffusion rule limits.

If I found out the Malaysian data centers are not largely de facto Chinese data centers, I would be rather surprised. This is exactly the central case of why we need the new diffusion rules, or something with similar effects.

This is certainly one story you can tell about what is happening:

Ian Sams: Two stories, same day, I’m sure totally unrelated…

NYT: UAE pours $2 billion into Trump crypto coins

Bloomberg: Trump White House may ease restrictions on selling AI chips to UAE.

Tao Burga of IFP has a thread reiterating that we need to preserve the point of the rules, and ways we might go about doing that.

Tao Burga: The admin should be careful to not mistake simplicity for efficiency, and toughness for effectiveness. Although the Diffusion Rule makes rules “more complex,” it would simplify compliance and reduce BIS’s paperwork through new validated end-user programs and license excptions.

Likewise, the most effective policies may not be the “tough” ones that “ban” exports to whole groups of countries, but smart policies that address the dual-use nature of chips, e.g., by incentivizing the use of on-chip location verification and rule enforcement mechanisms.

We can absolutely improve on the Biden rules. What we cannot afford to do is to replace them with rules that are simplified or designed to be used for leverage elsewhere, in ways that make the rules ineffective at their central purpose of keeping AI compute out of Chinese hands.

Nvidia is going all-in on ‘if you don’t sell other countries equal use of your key technological advantage then you will lose your key technological advantage.’ Nvidia even goes so far as to say Anthropic is telling ‘tall tales’ (without, of course, saying specific claims they believe are false, only asserting without evidence the height of those claims) which is rich coming from someone saying China is ‘not behind on AI’ and also that if you don’t let me sell your advanced chips to them America will lose its lead.

Want sane regulations for the department of housing and urban development and across the government? So do I. Could AI help rewrite the regulations? Absolutely. Would I entrust this job to an undergraduate at DOGE with zero government experience? Um, no, thanks. The AI is a complement to actual expertise, not something to trust blindly, surely we are not this foolish. I mean, I’m not that worried the changes will actually stick here, but good wowie moment of the week candidate.

Indeed, I am far more worried this will give ‘AI helps rewrite regulations’ an even worse name than it already has.

Our immigration policies are now sufficiently hostile that we have gone from the AI talent magnet of the world to no longer being a net attractor of talent:

This isn’t a uniquely Trump administration phenomenon, most of the problem happened under Biden, although it is no doubt rapidly getting worse, including one case I personally know of where someone in AI that is highly talented emigrated away from America directly due to new policy.

UK AISI continues to do actual work, publishes their first research agenda.

UK AISI: We’re prioritising key risk domain research, including:

📌How AI can enable cyber-attacks, criminal activity and dual-use science

📌Ensuring human oversight of, and preventing societal disruption from, AI

📌Understanding how AI influences human opinions

📒 The agenda sets out how we’re building the science of AI risk by developing more rigorous methods to evaluate models, conducting risk assessments, and ensuring we’re testing the ceiling of AI capabilities of today’s models.

A key focus of the Institute’s new Research Agenda is developing technical solutions to reduce the most serious risks from frontier AI.

We’re pursuing technical research to ensure AI remains under human control, is aligned to human values, and robust against misuse.

We’re moving fast because the technology is too⚡

This agenda provides a snapshot of our current thinking, but it isn’t just about what we’re working on, it’s a call to the wider research community to join us in building shared rigour, tools, & solutions to AI’s security risks.

[Full agenda here.]

I often analyze various safety and security (aka preparedness) frameworks and related plans. One problem is that the red lines they set don’t stay red and aren’t well defined.

Jeffrey Ladish: One of the biggest bottlenecks to global coordination is the development of clear AI capability red lines. There are obviously AI capabilities that would be too dangerous to build at all right now if we could. But it’s not at obvious exactly when things become dangerous.

There are obviously many kinds of AI capabilities that don’t pose any risk of catastrophe. But it’s not obvious exactly which AI systems in the future will have this potential. It’s not merely a matter of figuring out good technical tests to run. That’s necessary also, but…

We need publicly legible red lines. A huge part of the purpose of a red line is that it’s legible to a bunch of different stakeholders. E.g. if you want to coordinate around avoiding recursive-self improvement, you can try to say “no building AIs which can fully automate AI R&D”

But what counts as AIs which can fully automate AI R&D? Does an AI which can do 90% of what a top lab research engineer can do count? What about 99%? Or 50%?

I don’t have a good answer for this specific question nor the general class of question. But we need answers ASAP.

I don’t sense that OpenAI, Google or Anthropic has confidence in what does or doesn’t, or should or shouldn’t, count as a dangerous capability, especially in the realm of automating AI R&D. We use vague terms like ‘substantial uplift’ and provide potential benchmarks, but it’s all very dependent on spirit of the rules at best. That won’t fly in crunch time. Like Jeffrey, I don’t have a great set of answers to offer on the object level.

What I do know is that I don’t trust any lab not to move the goalposts around to find a way to release, if the question is at all fudgeable in this fashion and the commercial need looks strong. I do think that if something is very clearly over the line, there are labs that won’t pretend otherwise.

But I also know that all the labs intend to respond to crossing the red lines with (as far as we see relatively mundane and probably not so effective) mitigations or safeguards, rather than a ‘no just no until we figure out something a lot better.’ That won’t work.

Want to listen to my posts instead of read them?

Thomas Askew offers you a Podcast feed for that with richly voiced AI narrations. You can donate to help out that effort here, the AI costs and time commitment do add up.

Jack Clark goes on Conversations With Tyler, self-recommending.

Tristan Harris TED talks the need for a ‘narrow path’ between diffusion of advanced AI versus concentrated power of advanced AI. Humanity needs to have enough power to steer, without that power being concentrated ‘in the wrong hands.’ The default path is insane, and coordination away from it is hard, but possible, and yes there are past examples. The step where we push back against fatalism and ‘inevitability’ remains the only first step. Alas, like most others he doesn’t have much to suggest for steps beyond that.

The SB 1047 mini-movie is finally out. I am in it. Feels so long ago, now. I certainly think events have backed up the theory that if this opportunity failed, we were unlikely to get a better one, and the void would be filled by poor or inadequate proposals. SB 813 might be net positive but ultimately it’s probably toothless.

The movies got into the act with Thunderbolts*. Given their track record the last few years has been so bad I stopped watching most Marvel movies, I did not expect this to be anything like as good as it was, or that it would (I assume fully unintentionally) be a very good and remarkably accurate movie about AI many and the associated dynamics, in addition to the themes like depression, friendship and finding meaning that are its text. Great joy, 4.5/5 stars if you’ve done your old school MCU homework on the characters (probably 3.5 if you’d be completely blind including the comics?).

Jesse Hoogland coins ‘the sweet lesson’ that AI safety strategies only count if they scale with compute. As in, as we scale up all the AIs involved, the strategy at least keeps pace, and ideally grows stronger. If that’s not true, then your strategy is only a short term mundane utility strategy, full stop.

Ah, the New Yorker essay by someone bragging about how they have never used ChatGPT, bringing very strong opinions about generative AI and how awful it is.

Okay, this is actually a great point:

Aiden McLaughlin: i love people who in the same breath say “if you showed o3 to someone in 2020 they would’ve called it agi” and then go on to talk about the public perception discontinuity they expect in 2027.

always remember that our perception of progress is way way smoother than anyone expects;

Except, hang on…

Aiden McLaughlin (continuing): i’m quite critical of any forecast that centers on “and then the agi comes out and the world blows up”

Those two have very little to do with each other. I think it’s a great point that looking for a public perception discontinuity, where everyone points and suddenly says ‘AGI!’ runs hard into this critique, with caveats.

The first thing is, reality does not have to care what you think of it. If AGI would indeed blow the world up, then we have ‘this seems like continuous progress, I said, as my current arrangement of atoms was transformed into something else that did not include me,’ with or without involving drones or nanobots.

Even if we are talking about a ‘normal’ exponential, remember that week in 2020?

Which leads into the second thing is, public perception of many things is often continuous and mostly oblivious until suddenly it isn’t. As in, there was a lot of AI progress before ChatGPT, then that came out and then wham. There’s likely going to be another ‘ChatGPT’ moment for agents, and one for the first Siri-Alexa-style thing that actually works. Apple Intelligence was a miss but that’s because it didn’t deliver. Issues simmer until they boil over. Wars get declared overnight.

And what is experienced as a discontinuity, of perception or of reality, doesn’t have to mostly be overnight, it can largely be over a period of months or more, and doesn’t even have to technically be discontinuous. Exponentials are continuous but often don’t feel that way. We are already seeing wildly rapid diffusion and accelerating progress even if it is technically ‘continuous’ and that’s going to be more so once the AIs count as meaningful optimization engines.

Arvind Narayanan and Ajeya Cotra have a conversation in Asterisk magazine. As I expected, while this is a much better discussion than your usual, especially Arvind’s willingness to state what evidence would change his mind on expected diffusion rates, but I found much of it extremely frustrating. Such as this, offered as illustrative:

Arvind: Many of these capabilities that get discussed — I’m not even convinced they’re theoretically possible. Running a successful company is a classic example: the whole thing is about having an edge over others trying to run a company. If one copy of an AI is good at it, how can it have any advantage over everyone else trying to do the same thing? I’m unclear what we even mean by the capability to run a company successfully — it’s not just about technical capability, it’s about relative position in the world.

This seems like Arvind is saying that AI in general can’t ever systematically run companies successfully because it would be up against other companies that are also run by similar AIs, so its success rate can’t be that high? And well, okay, sure I guess? But what does that have to do with anything? That’s exactly the world being envisioned – that everyone has to turn their company over to AI, or they lose. It isn’t a meaningful claim about what AI ‘can’t do,’ what it can’t do in this claim is be superior to other copies of itself.

Arvind then agrees, yes, we are headed for a world of universal deference to AI models, but he’s not sure it’s a ‘safety risk.’ As in, we will turn over all our decision making to AIs, and what, you worried bro?

I mean, yes, I’m very worried about that, among other things.

As another example:

Arvind: There is a level of technological development and societal integration that we can’t meaningfully reason about today, and a world with entirely AI-run companies falls in that category for me. We can draw an analogy with the industrial revolution — in the 1760s or 1770s it might have been useful to try to think about what an industrial world would look like and how to prepare for it, but there’s no way you could predict electricity or computers.

In other words, it’s not just that it’s not necessary to discuss this future now, it is not even meaningfully possible because we don’t have the necessary knowledge to imagine this future, just like pre-vs-post industrialization concerns.

The implication is then, since we can’t imagine it, we shouldn’t worry about it yet. Except we are headed straight towards it, in a way that may soon make it impossible to change course, so yes we need to think about it now. It’s rather necessary. If we can’t even imagine it, then that means it will be something we can’t imagine, and no I don’t think that means it will probably be fine. Besides, we can know important things about it without being able to imagine it, such as the above agreement that AI will by default end up making all the decisions and having control over this future.

The difference with the Industrial Revolution is that there we could steer events later, after seeing the results. Here, by default, we likely can’t. And also, it’s crazy to say that if you lived before the Industrial Revolution you couldn’t say many key things about that future world, and plan for it and anticipate it. As an obvious example, consider the US Constitution and system of government, which very much had to be designed to adapt to things like the Industrial Revolution without knowing its details.

Then there’s a discussion of whether it makes sense to have the ability to pause or restrict AI development, which we need to do in advance of there being a definitive problem because otherwise it is too late, and Arvind says we can’t do it until after we have definitive evidence of specific problems already. Which means it will 100% be too late – the proof that satisfies his ask is a proof that you needed to do something at least a year or two ago, so I guess we finished putting on all the clown makeup, any attempt to give us such abilities only creates backfire, and so on.

So, no ability to steer the future until it is too late to do so, then.

Arvind is assuming progression will be continuous, but even if this is true, that doesn’t mean utilization and realization won’t involve step jumps, and also that scaffolding won’t enable a bunch of progression off of existing available models. So again, essentially zero chance we will be able to steer until we notice it is too late.

This was perhaps the best exchange:

Arvind: This theme in your writing about AI as a drop-in replacement for human workers — you acknowledge the frontier is currently jagged but expect it to smooth out. Where does that smoothing come from, rather than potentially increasing jaggedness? Right now, these reasoning models being good at domains with clear correct answers but not others seems to be increasing the jaggedness.

Ajeya: I see it as continued jaggedness — I’d have to think harder about whether it’s increasing. But I think the eventual smoothing might not be gradual — it might happen all at once because large AI companies see that as the grand prize. They’re driving toward an AI system that’s truly general and flexible, able to make novel scientific discoveries and invent new technologies — things you couldn’t possibly train it on because humanity hasn’t produced the data. I think that focus on the grand prize explains their relative lack of effort on products — they’re putting in just enough to keep investors excited for the next round. It’s not developing something from nothing in a bunker, but it’s also not just incrementally improving products. They’re doing minimum viable products while pursuing AGI and artificial superintelligence.

It’s primarily about company motivation, but I can also see potential technical paths — and I’m sure they’re exploring many more than I can see. It might involve building these currently unreliable agents, adding robust error checking, training them to notice and correct their own errors, and then using RL across as many domains as possible. They’re hoping that lower-hanging fruit domains with lots of RL training will transfer well to harder domains — maybe 10 million reps on various video games means you only need 10,000 data points of long-horizon real-world data to be a lawyer or ML engineer instead of 10 million. That’s what they seem to be attempting, and it seems like they could succeed.

Arvind: That’s interesting, thank you.

Ajeya: What’s your read on the companies’ strategies?

Arvind: I agree with you — I’ve seen some executives at these companies explicitly state that strategy. I just have a different take on what constitutes their “minimum” effort — I think they’ve been forced, perhaps reluctantly, to put much more effort into product development than they’d hoped.

It is a highly dangerous position we are in, likely to result in highly discontinuous felt changes, to have model capabilities well ahead of product development, especially with open models not that far behind in model capabilities.

If OpenAI, Anthropic or Google wanted to make their AI a better or more useful consumer product, to have it provide better mundane utility, they would do a lot more of the things a product company would do. They don’t do that much of it. OpenAI is trying to also become a product company, but that’s going slowly, and this is why for example they just bought Windsurf. Anthropic is fighting it every step of the way. Google of course does create products, but DeepMind hates the very concept of products, and Google is a fundamentally broken company, so the going is tough.

I actually wish they’d work a lot harder on their product offerings. A lot of why it’s so easy for many to dismiss AI, and to expect such slow diffusion, is because the AI companies are not trying to enable that diffusion all that hard.

From last week, Anthropic CEO Dario Amodei wrote The Urgency of Interpretability. I certainly agree with the central claim that we are underinvesting in mechanistic interpretability (MI) in absolute terms. It would be both good for everyone and good for the companies and governments involved if they invested far more. I do not however think we are underinvesting in MI relative to other potential alignment-related investments.

He says that the development of AI is inevitable (well, sure, with that attitude!).

Ben Pace (being tough but fair): I couldn’t get two sentences in without hitting propaganda, so I set it aside. But I’m sure it’s of great political relevance.

I don’t think that propaganda must necessarily involve lying. By “propaganda,” I mean aggressively spreading information or communication because it is politically convenient / useful for you, regardless of its truth (though propaganda is sometimes untrue, of course).

Harlan Stewart: “The progress of the underlying technology is inexorable, driven by forces too powerful to stop”

Yeah Dario, if only you had some kind of influence over the mysterious unstoppable forces at play here

Dario does say that he thinks AI can be steered before models reach an overwhelming level of power, which implies where he thinks this inevitably goes. And Dario says he has increasingly focused on interpretability as a way of steering. Whereas by default, we have very little idea what AIs are going to do or how they work or how to steer.

Dario Amodei: Chris Olah is fond of saying, generative AI systems are grown more than they are built—their internal mechanisms are “emergent” rather than directly designed. It’s a bit like growing a plant or a bacterial colony: we set the high-level conditions that direct and shape growth, but the exact structure which emerges is unpredictable and difficult to understand or explain.

Many of the risks and worries associated with generative AI are ultimately consequences of this opacity, and would be much easier to address if the models were interpretable.

Dario buys into what I think is a terrible and wrong frame here:

But by the same token, we’ve never seen any solid evidence in truly real-world scenarios of deception and power-seeking because we can’t “catch the models red-handed” thinking power-hungry, deceitful thoughts. What we’re left with is vague theoretical arguments that deceit or power-seeking might have the incentive to emerge during the training process, which some people find thoroughly compelling and others laughably unconvincing.

Honestly I can sympathize with both reactions, and this might be a clue as to why the debate over this risk has become so polarized.

I am sorry, but no. I do not sympathize, and neither should he. These are not ‘vague theoretical arguments’ that these things ‘might’ have the incentive to emerge, not at this point. Sure, if your livelihood depends on seeing them that way, you can squint. But by now that has to be rather intentional on your part, if you wish to not see it.

Daniel Kokotajlo: I basically agree & commend you for writing this.

My only criticism is that I feel like you downplayed the deception/scheming stuff too much. Currently deployed models like to their users every day! They also deliberately reward hack!

On the current trajectory the army of geniuses in the data center will not be loyal/controlled. Interpretability is one of our best bets for solving this problem in a field crowded with merely apparent solutions.

Ryan Greenblatt: Do you agree that “we are on the verge of cracking interpretability in a big way”? This seems very wrong to me and is arguably the thesis of the essay.

Daniel Kokotajlo: Oh lol I don’t agree on that either but Dario would know better than me there since he has inside info + it’s unclear what that even means, perhaps it just is hypespeak for “stay tuned for our next exciting research results.” But yeah that seems like probably an over claim to me.

Ryan Greenblatt: I do not think Dario would know better than you due to inside info.

Dario is treating such objections as having a presumption of seriousness and good faith that they, frankly, do not deserve at this point, and Anthropic’s policy team is doing similarly only more so, in ways that have real consequences.

Do we need interpretability to be able to prove this in a way that a lot more people will be unable to ignore? Yeah, that would be very helpful, but let’s not play pretend.

The second section, a brief history of mechanistic interpretability, seems solid.

The third section, on how to use interpretability, is a good starter explanation, although I notice it is insufficiently paranoid about accidentally using The Most Forbidden Technique.

Also, frankly, I think David is right here:

David Manheim: Quick take: it’s focused on interpretability as a way to solve prosaic alignment, ignoring the fact that prosaic alignment is clearly not scalable to the types of systems they are actively planning to build.

(And it seems to actively embrace the fact that interpretability is a capabilities advantage in the short term, but pretends that it is a safety thing, as if the two are not at odds with each other when engaged in racing dynamics.)

Because they are all planning to build agents that will have optimization pressures, and RL-type failures apply when you build RL systems, even if it’s on top of LLMs.

That doesn’t mean interpretability can’t help you do things safely. It absolutely can. Building intermediate safe systems you can count on is extremely helpful in this regard, and you’ll learn a lot both figuring out how to do interpretability and from the results that you find. It’s just not the solution you think it is.

Then we get to the question of What We Can Do. Dario expects an ‘MRI for AI’ to be available within 5-10 years, but expects his ‘country of geniuses in a datacenter’ within 1-2 years, so of course you can get pretty much anything in 3-8 more years after that, and it will be 3-8 years too late. We’re going to have to pick up the pace.

The essay doesn’t say how these two timelines interact in Dario’s model. If we don’t get the genuines in the datacenter for a while, do we still get interpretability in 5-10 years? Is that the timeline without the Infinite Genius Bar, or with it? They imply very different strategies.

  1. His first suggestion is the obvious one, which is to work harder and spend more resources directly on the problem. He tries to help by pointing out that being able to explain what your model does and why is a highly profitable ability, even if it is only used to explain things to customers and put them at ease.

  2. Governments can ‘use light-touch rules’ to encourage the development of interpretability research. Of course they could also use heavy-touch rules, but Anthropic is determined to act as if those are off the table across the board.

  3. Export controls can ‘create a ‘security buffer’ that might give interpretability more time.’ This implies, as he notes, the ability to then ‘spend some of our lead’ on interpretability work or otherwise stall at a later date. This feels a bit shoehorned given the insistence on only ‘light-touch’ rules, but okay, sure.

Ryan Greenblatt: Ironically, arguably the most important/useful point of the essay is arguing for a rebranded version of the “precisely timed short slow/pause/pivot resources to safety” proposal. Dario’s rebranded it as spending down a “security buffer”.

(I don’t have a strong view on whether this is a good rebrand, seems reasonable to me I guess and the terminology seems roughly as good for communicating about this type of action.)

I think that would be a reasonable rebrand if it was bought into properly.

Mostly the message is simple and clear: Get to work.

Neel Nanda: Mood.

[Quotes Dario making an understatement: These systems will be absolutely central to the economy, technology, and national security, and will be capable of so much autonomy that I consider it basically unacceptable for humanity to be totally ignorant of how they work.]

Great post, highly recommended!

The world should be investing far more into interpretability (and other forms of safety). As scale makes many parts of AI academia increasingly irrelevant, I think interpretability remains a fantastic place for academics to contribute.

I also appreciate the shout out to the bizarre rejection of our second ICML mechanistic interpretability workshop. Though I generally assume the reviewing process is approximately random and poorly correlated with quality, rather than actively malicious.

Ryan Greenblatt: I agree that the world should invest more in interp (and safety) and academics can contribute. However, IMO the post dramatically overstates the promise of mech interp in short timelines by saying things like: “we are on the verge of cracking interpretability in a big way”.

Neel Nanda: I was expecting to be annoyed by this, but actually thought the post was surprisingly reasonable? I interpreted it as:

  1. Given 5-10 years we might crack it in a big way

  2. We may only have 2 years, which is not enough

  3. IF we get good at interp it would be a really big deal

  4. So we should invest way more than we currently are

I’m pretty on board with this, modulo concerns around opportunity costs. But I’m unconvinced it funges that much in the context of responding to a post like this, I think that the effect of this post is more likely to be raising interp investment than reallocating scarce safety resources towards interp?

Neel Nanda (later): New post: I’m all for investment in interpretability but IMO this overstates its importance vs other safety methods.

I disagree that interp is the only path to reliable safeguards on powerful AI. IMO high reliability is implausible by any means and interp’s role is in a portfolio.

I agree with Neel Nanda that the essay is implicitly presenting the situation as if interpretability would be the only reliable path forward for detecting deception in advanced AI. He’s saying it is both necessary and sufficient, whereas I would say it is neither obviously necessary nor is it sufficient. As Neel says, ‘high reliability seems unattainable’ using anything like current methods.

Neel suggests a portfolio approach. I agree we should be investing in a diverse portfolio of potential approaches, but I am skeptical that we can solve this via a kind of ‘defense in depth’ when up against highly intelligent models. That can buy you some time on the margin, which might be super valuable. But ultimately, I think you will need something we haven’t figured out yet and am hoping such a thing exists in effectively searchable space.

(And I think relying heavily on defense-in-depth with insufficiently robust individual layers is a good way to suddenly lose out of nowhere when threshold effects kick in.)

Neel lists reasons why he expects interpretability not to be reliable. I agree, and would emphasize the last one, that if we rely on interpretability we should expect sufficiently smart AI to obfuscate around our techniques, the same way humans have been growing steadily bigger brains and developing various cultural and physical technologies in large part so we can do this to each other and defend against others trying to do it to us.

As Miles says, so very far to go, but every little bit helps (also I am very confident the finding here is correct, but it’s establishing the right process that matters right now):

Miles Brundage: Most third party assessment of AI systems is basically “we got to try out the product a few days/weeks early.”

Long way to go before AI evaluation reaches the level of rigor of, say, car or airplane or nuclear safety, but this is a nice incremental step:

METR: METR worked with @amazon to pilot a new type of external review in which Amazon shared evidence beyond what can be collected via API, including information about training and internal evaluation results with transcripts, to inform our assessment of its AI R&D capabilities.

In this review, our objective was to weigh the evidence collected by Amazon about model capabilities against Amazon’s own Critical Capability Threshold as defined in its Frontier Model Safety Framework, rather than reviewing the threshold itself (see below).

After reviewing the evidence shared with us, we determined that Amazon has not crossed their Automated AI R&D Critical Capability Threshold for any of the models they have developed to date, regardless of deployment status.

Amazon Science: 🚀 Amazon Nova Premier, our most capable teacher model for creating custom distilled models, is now available on Amazon Bedrock!

Built for complex tasks like Retrieval-Augmented Generation (RAG), function calling, and agentic coding, its one-million-token context window enables analysis of large datasets while being the most cost-effective proprietary model in its intelligence tier.

Also, yes, it seems there is now an Amazon Nova Premier, but I don’t see any reason one would want to use it?

Some additional refinements to the emergent misalignment results. The result is gradual, and you can get it directly from base models, and also can get it in reasoning models. Nothing I found surprising, but good to rule out alternatives.

Janus finds GPT-4-base does quite a lot of alignment faking.

MIRI is the original group worried about AI killing everyone. They correctly see this as a situation where by default AI kills everyone, and we need to take action so it doesn’t. Here they provide a handy chart of the ways they think AI might not kill everyone, as a way of explaining their new agenda.

MIRI: New AI governance research agenda from MIRI’s Technical Governance Team. We lay out our view of the strategic landscape and actionable research questions that, if answered, would provide important insight on how to reduce catastrophic and extinction risks from AI.

If anything this chart downplays how hard MIRI thinks this is going to be. It does however exclude an obvious path to victory, which is that an individual lab (rather than a national project) gets the decisive strategic advantage, either sharing it with the government or using it themselves.

Most people don’t seem to understand how wild the coming few years could be. AI development, as fast as it is now, could quickly accelerate due to automation of AI R&D. Many actors, including governments, may think that if they control AI, they control the future.

The current trajectory of AI development looks pretty rough, likely resulting in catastrophe. As AI becomes more capable, we will face risks of loss of control, human misuse, geopolitical conflict, and authoritarian lock-in.

In the research agenda, we lay out four scenarios for the geopolitical response to advanced AI in the coming years. For each scenario, we lay out research questions that, if answered, would provide important insight on how to successfully reduce catastrophic and extinction risks.

Our favored scenario involves building the technical, legal, and institutional infrastructure required to internationally restrict dangerous AI development and deployment, preserving optionality for the future. We refer to this as an “off switch.”

We focus on an off switch since we believe halting frontier AI development will be crucial to prevent loss of control. We think skeptics of loss of control should value building an off switch, since it would be a valuable tool to reduce dual-use/misuse risks, among others.

Another scenario we explore is a US National Project—the US races to build superintelligence, with the goal of achieving a decisive strategic advantage globally. This risks both loss of control to AI and increased geopolitical conflict, including war.

Alternatively, the US government may largely leave the development of advanced AI to companies. This risks proliferating dangerous AI capabilities to malicious actors, faces similar risks to the US National Project, and overall seems extremely unstable.

In another scenario, described in Superintelligence Strategy, nations keep each other’s AI development in check by threatening to sabotage any destabilizing AI progress. However, visibility and sabotage capability may not be good enough, so this regime may not be stable.

Given the danger down all the other paths, we recommend the world build the capacity to collectively stop dangerous AI activities. However, it’s worth preparing for other scenarios. See the agenda for hundreds of research questions we want answered!

An off switch let alone a halt is going to be very difficult to achieve. It’s going to be even harder the longer one waits to build towards it. It makes sense to, while also pursuing other avenues, build towards having that option. I support putting a lot of effort into creating the ability to pause. This is very different from advocating for actually halting (also called ‘pausing’) now.

Paul Tutor Jones, who said there’s a 90% chance AI doesn’t even wipe out half of humanity, let alone all of it. What a relief.

Dave Karsten: Really interesting seeing how hedge fund folks have a mental framework for taking AI risk seriously.

Damian Tatum: I love to hear more people articulating the Normie Argument for AI Risk: “Look I’m not a tech expert but the actual experts keep telling us the stuff they’re doing could wipe out humanity and yet there are no rules and they aren’t stopping on their own, is anyone else worried?”

Paul Tudor Jones: All these folks in AI are telling us ‘We’re creating something that’s really dangerous’ … and yet we’re doing nothing right now. And it’s really disturbing.

Darkhorse (illustrative of how people will say completely opposite things like this about anyone, all the time, in response to any sane statement about risk, the central problem with hedge funds is that the incentives run into the opposite problem): Hedge fund folks have all to lose and little to gain.

Steely Dan Heatly: You are burying the lede. There’s a 10% chance AI wipes out half of humanity.

Joe Weisenthal: Yeah but a 90% chance that it doesn’t.

Hedge fund guys sometimes understand risk, including tail risk, and can have great practical ways of handing it. This kind of statement from Paul Tutor Jones is very much the basic normie argument that should be sufficient to carry the day. Alas.

On the contrary, it’s lack-of-empathy-as-a-service, and there’s a free version!

Olivia Moore: We now have empathy-as-a-service (for the low price of $20 / month!)

Dear [blue], I would like a more formal version, please. Best, [red].

Discussion about this post

AI #115: The Evil Applications Division Read More »

cue:-apple-will-add-ai-search-in-mobile-safari,-challenging-google

Cue: Apple will add AI search in mobile Safari, challenging Google

Apple executive Eddie Cue said that Apple is “actively looking at” shifting the focus of mobile Safari’s search experience to AI search engines, potentially challenging Google’s longstanding search dominance and the two companies’ lucrative default search engine deal. The statements were made while Cue testified for the US Department of Justice in the Alphabet/Google antitrust trial, as first reported in Bloomberg.

Cue noted that searches in Safari fell for the first time ever last year, and attributed the shift to users increasingly using large language model-based solutions to perform their searches.

“Prior to AI, my feeling around this was, none of the others were valid choices,” Cue said of the deal Apple had with Google, which is a key component in the DOJ’s case against Alphabet. He added: “I think today there is much greater potential because there are new entrants attacking the problem in a different way.”

Here he was alluding to companies like Perplexity, which seek to offer an alternative to semantic search engines with a chat-like approach—as well as others like OpenAI. Cue said Apple has had talks with Perplexity already.

Speaking of AI-based search engines in general, he said “we will add them to the list”—referring to the default search engine selector in Safari settings. That said, “they probably won’t be the default” because they still need to improve, particularly when it comes to indexing.

Cue: Apple will add AI search in mobile Safari, challenging Google Read More »

starlink:-here’s-a-free-satellite-dish—if-you-pay-$120-a-month-instead-of-$90

Starlink: Here’s a free satellite dish—if you pay $120 a month instead of $90

There are 15 US states in which Residential Lite is offered: Maine, Vermont, New Hampshire, Montana, North Dakota, South Dakota, Minnesota, Iowa, Wyoming, Nebraska, Kansas, Nevada, Utah, New Mexico, and Hawaii. This territory generally overlaps with the larger territory in which the free Starlink kit is available.

Some states, such as California, Texas, New York, and Massachusetts, have access to the free kit offer but not the Residential Lite plan. When attempting to order service in one of the overlap areas where both are available, an address in Maine, I was given three options.

One option was to get the hardware for free and pay $120 a month with a 12-month commitment. Another option was to pay $349 for the kit and get the standard residential plan for the same $120 monthly price, but with no minimum term commitment. The third option was to pay $349 for the kit and get the worse Residential Lite plan for $80 a month, with no commitment. The $90 price for full-speed service wasn’t available.

Ordering Starlink’s lite service.

If you don’t mind the Lite plan’s slower speeds and deprioritization during peak hours, it’s cheaper during the first year to buy the kit at full price and pay $80 a month ($1,309 compared to $1,440). If you want to avoid the lite plan and you live in a place where the $90 full-speed plan isn’t available, it’s significantly cheaper to get the free kit because you’d have to pay $120 either way. You presumably would be able to switch to the $80 lite plan after the 12-month commitment, assuming Starlink still offers it a year from now.

In summary, if you were thinking about getting Starlink already, check out the free-kit offer and see if it makes sense for you. Just don’t hit the buy button immediately without examining the other options.

Starlink: Here’s a free satellite dish—if you pay $120 a month instead of $90 Read More »

nvidia-geforce-xx60-series-is-pc-gaming’s-default-gpu,-and-a-new-one-is-out-may-19

Nvidia GeForce xx60 series is PC gaming’s default GPU, and a new one is out May 19

Nvidia will release the GeForce RTX 5060 on May 19 starting at $299, the company announced via press release today. The new card, a successor to popular past GPUs like the GTX 1060 and RTX 3060, will bring Nvidia’s DLSS 4 and Multi Frame-Generation technology to budget-to-mainstream gaming builds—at least, it would if every single GPU launched by any company at any price wasn’t instantly selling out these days.

Nvidia announced a May release for the 5060 last month when it released the RTX 5060 Ti for $379 (8GB) and $429 (16GB). Prices for that card so far haven’t been as inflated as they have been for the RTX 5070 on up, but the cheapest ones you can currently get are still between $50 and $100 over that MSRP. Unless Nvidia and its partners have made dramatically more RTX 5060 cards than they’ve made of any other model so far, expect this card to carry a similar pricing premium for a while.

RTX 5060 Ti RTX 4060 Ti RTX 5060 RTX 4060 RTX 5050 (leaked) RTX 3050
CUDA Cores 4,608 4,352 3,840 3,072 2,560 2,560
Boost Clock 2,572 MHz 2,535 MHz 2,497 MHz 2,460 MHz Unknown 1,777 MHz
Memory Bus Width 128-bit 128-bit 128-bit 128-bit 128-bit 128-bit
Memory bandwidth 448GB/s 288GB/s 448GB/s 272GB/s Unknown 224GB/s
Memory size 8GB or 16GB GDDR7 8GB or 16GB GDDR6 8GB GDDR7 8GB GDDR6 8GB GDDR6 8GB GDDR6
TGP 180 W 160 W 145 W 115 W 130 W 130 W

Compared to the RTX 4060, the RTX 5060 adds a few hundred extra CUDA cores and gets a big memory bandwidth increase thanks to the move from GDDR6 to GDDR7. But its utility at higher resolutions will continue to be limited by its 8GB of RAM, which is already becoming a problem for a handful of high-end games at 1440p and 4K.

Regardless of its performance, the RTX 5060 will likely become a popular mainstream graphics card, just like its predecessors. Of the Steam Hardware Survey’s top 10 GPUs, three are RTX xx60-series desktop GPUs (the 3060, 4060, and 2060); the laptop versions of the 4060 and 3060 are two of the others. If supply of the RTX 5060 is adequate and pricing isn’t out of control, we’d expect it to shoot up these charts pretty quickly over the next few months.

Nvidia GeForce xx60 series is PC gaming’s default GPU, and a new one is out May 19 Read More »