Author name: Paul Patrick

monthly-roundup-#32:-july-2025

Monthly Roundup #32: July 2025

Welcome to the monthly roundup of things that don’t fit into other categories and don’t rise to the level of their own posts.

When people tell you who they are, believe them (with obvious exceptions). In particular, if they explicitly describe themselves as evil, or demonic, or uses other similar terms, definitely believe them.

Did you know 67% of all college students bet on sports? That’s a group that is majority female, so that statistic is wild. This is in the context of Ron Yorko developing a class in sports betting awareness from a neuroscience perspective for CMU freshman.

Cooking scales well, but for single people the economics are remarkably bad. Stop telling single people not to order delivery.

Chase Sapphire Reserve annual fee increases to $795 from $550, you get a $300 travel credit. That should cut down considerably the number of people who get value here.

Claim that AirBnB and Vrbo are headed downhill, which directionally matches my experiences, although it’s obviously not as bad as this portrays things. Revealed preference is that there was a period when I defaulted to an AirBnB, and I have definitely switched back to renting hotel rooms in most situations.

More cautionary tales of AirBnB, I continue to update towards using hotels unless I have strong need of something bigger.

Seb Krier: In case anyone is considering booking an @Airbnb_uk, make sure there are no flea/tick infestations in your room because you will only get a refund of 30% of the cost of the last night alone, and no compensation for replacing infested luggage, medication for bites etc. Absurd!

Update: after escalating further, got the trip refunded as a “one-time concession” (but no other compensation). 👍🏼

Peter Wildeford: AirBnB has absolutely no downside protection. AirCover is a lie. We had to move out because the building architect said that the roof was at risk of collapse. AirBnB refunded us just $300 out of $2900.

Covi: AirCover is totally deceptive. They make it sound like they have you covered, yet a host cancelled the day before check in and I called being like “so can you help me” and they’re like here’s a £20 voucher, best we can do.

That claim that chess grandmasters burn 6000 calories a day during intense play? Not only is it Obvious Nonsense, the story of how it got repeated a lot is even stupider than you think.

Adam Strandberg: To summarize: a grad student took physiological measurements of 11 ordinary chess players (not grandmasters). They reported in a summary in a chess magazine that the maximum chest movement rate they measured in a 10 second period was almost three times that of an average measurement from a different study.

Robert Sapolsky then cited this thesis in his popular book, dropping the distinction between maximum and average to give a 3X breathing rate. He later took the 3X number and multiplied that by 2000 calories per day to get the number 6000, adding the “grandmaster” rhetorical flourish along the way.

He spread this fact through his own talks at Stanford and through interviews with journalists, who accurately repeated him. When questioned about the source of the number, he then claimed on multiple occasions that the number actually came from someone else, and that journalists had distorted his argument.

Suffice it to say this is unbecoming of such an esteemed professor.

Europe’s war against air conditioning continues to be truly absurd. It’s even more absurd considering how well it lines up with solar power. If the solar panels can’t produce the energy to run the air conditioning, then you didn’t need to turn it on. It also is the obvious response any time someone says ‘their lived experiences are better.’

This does seem like a good heuristic:

caesararum: “oh you want to criticize veterans? why didn’t you sign up”

i did, two combat tours

anyway, do you want to keep arguing or should I just chalk this up as a W and move on

Alex Godofsky: whenever someone gives me this sort of “oh? do YOU have experience [with whatever]?” challenge I know they’re a fraud because approximately 0% of people concede when it turns out you do.

There are cases where the person is actually asking nicely and they clearly are hoping you tell them yes, as in ‘have you done this procedure before?’ or ‘are you familiar with [X] method?’ That’s different.

When someone says this in a way that clearly implies that they think the answer is no and they are using that to dismiss you, then yeah, doesn’t matter, it will change nothing, and you should likely write them off whether or not you can answer yes.

I am doing my best to avoid commenting on politics. As usual my lack of comment on other fronts should not be taken to mean I lack strong opinions on them. Yet sometimes, things reach a point where I cannot fail to point them out.

If you are looking to avoid such things, I have split out this section, so you can skip it.

FDA has a new pilot program that can slash FDA’s drug review time from 10-12 months to 1-2 months, by evaluating things along the way during clinical trials, which was what they did during Operation Warp Speed. That would straight up accelerate the deployment of such drugs by most of a year. It would also greatly encourage future investment, not only is the process faster the drug companies know where they are at throughout and can adjust accordingly. The term ‘AI’ does not once appear in the report.

Which demands the obvious question, why the hell are we only doing this now?

As per Levels of Friction, yes, you should have anticipated the results we got when moving things into Tier 1 where they are legal and ubiquitous without limiting principles:

John Arnold: I think legislators expected 10% THC weed and straightforward sports betting of money lines and over/unders when they legalized both but were quickly met with 30%+ THC products and props, parlays, and in-game wagers, each an order of magnitude more dangerous.

Zac Hill: This is exactly what happened and is also why we need more game designers working in policy.

It doesn’t have to be game designers. Ordinary capitalists should be fully equipped to reason this out.

Click-to-cancel, which I agree with Sheel Mohnot was by far the best thing Lina Khan did at the FTC, has been stopped by a panel of three Republican judges so the industry could get ‘more time and process’ to explain why they opposed the rule. The story here about his failure to cancel a gym membership is rage inducing and completely standard.

Martin Skrelli (replying to Lina Khan): Get a job.

Sir, when she had a job you complained, how you complain again, please make up your goddamned mind. Also offer me a click so I can cancel.

It seems the UK government literally got an injunction forbidding the press from talking about what the government was doing with respect to Afghan migrants? Regardless of what you think of what was being done, forbidding the press from discussing it feels like a Declaration of Independence, time-to-start-over-with-a-new-government level of violation of basic principles of freedom?

Balsa Research can’t keep up, as the House suddenly and overwhelmingly passed the American Cargo for American Ships Act that would require 100% of transportation project [DOT related] materials transported over oceans to go on US ships. So we’re going to make it a lot more expensive to use ships for projects that are ‘procured, furnished or financed by’ the DOT. No, this is not ‘worse than the Jones Act,’ the blast radius is far smaller and it only applies the flagging requirement, but this plus the Jones Act is worse than only the Jones Act.

That’s in addition the cataclysmic regulations we helped fight back against earlier.

Meanwhile, you know how the Jones Act was supposed to promote American shipbuilding?

Instead, the beneficiaries of the Jones Act, via owning existing Jones Act ships, have enlisted the government to actively sabotage American shipbuilding even further.

As in, and I quote: “Some Jones Act companies now expressing fear that building new ships could devalue their current fleets.”

I’d say ‘mask off moment,’ but it’s not. What mask?

John Konrad: BREAKING NEWS: Massive shipbuilding changes in DC. None of them good. @gCaptain has confirmed from a White House source that Trump has closed the shipbuilding office at the NSC.

Reuters reports that Ian Bennitt, the President’s Special Assistant for Shipbuilding at the White House, has been fired.

Favored candidates for Provost and Superintendent positions at the U.S. Merchant Marine Academy have received denial notices.

At a recent USNI shipbuilding conference, it became clear: major shipbuilding primes are actively fighting plans to expand commercial shipbuilding.

Sources inside the Pentagon say Admirals and SES are digging in their heels on several key shipbuilding objectives.

Some Jones Act companies now expressing fear that building new ships could devalue their current fleets.

Congressional sources say progress on the SHIPS Act is stalling in committee. It’s also unlikely the new Commandant will be confirmed before the August break.

We’ve confirmed that the French billionaire who offered to invest $20B in U.S. shipping sent a letter to Trump saying he’s not getting the support he needs to move forward.

The U.S. Coast Guard is slashing cutter orders left and right.

I spoke with half a dozen senior sources in DC—every single one is frustrated.

Zero follow-through on Trump’s State of the Union promise to open a dedicated White House shipbuilding office.

It’s been 252 days since the election, and not a single new ship has been ordered.

The smartest maritime policy guy I know sent me this: “Spot on that JA carriers do not want any newbuilding on grounds it devalues their assets and that primes don’t want it either. @WeAreHII & Crowley are acting poorly. I see this dynamic as a center of gravity of the mess.”

That’s right. We literally got an offer to invest $20 billion in US shipbuilding, and the Trump administration said no, we won’t support that. No non-US-built ships can be used, and also no US ships can be built. Also tariffs on things like steel.

So, no ships, then. Except the handful that exist, which get to profit.

The corruption is staggering. It can always get worse.

Chris Lakin: If you make >$300k/yr why aren’t you announcing random $1,000 prizes every Saturday for whatever you want to see happen in the world?

$1k prize for best blog post on X, $1k or best art like Y, $1k for best _____. High agency mindset.

near: is $1k a large enough prize to make things happen in sf?

Chris Lakin: Many of the https://franciscosan.org have been less than this.

Don’t view the money as “paying for time” — $1k isn’t enough for that. View it as “showing seriousness that someone cares enough to invest limited resources”

Gallabytes: I tried this for 10k + a job offer but the prize was too hard & nobody won it.

The answer is (as always) transaction costs.

At one point, I coordinated with Paul Christiano to put out an AI Alignment Prize. On a per dollar basis, I am confident we generated and highlighted excellent work. However, we also had to put in a ton of time evaluating all the entries. A lot of other would-be prizes will have a similar problem, and once you announce a prize people can get very persnickety about details.

Also you have to use part of your social bandwidth to communicate the prize.

However, yes, you should be doing it more. And I should be doing it more.

One cool variant is to create a Manifold market on ‘will [X] happen?’ where the [X] is something you want to happen and that someone can go make happen. The absolute value of the prize is low but in my experience this is highly motivating, and for example got my hands on a Switch 2. There is tons of alpha in offering a symbolic but real prize that shows you care at all.

Somehow you can still get 5 million views by posting that you were stupid enough to use Uber Eats in New York City instead of Caviar, then counting sales tax and the tip as part of the delivery fee and saying you paid $30 for delivery.

By comparison, on Caviar, I tested out a similar sized order, subtotal was $94, sales tax was over $8 and total charge was $109.03. I mean, you can be an idiot and press the Pay More button if you want, I suppose.

Explanation of why all airport restaurants get similar Yelp ratings, they’re all run by the same group of people. Except no, that still makes no sense, because the food still mostly tastes the same as it does on the outside. If you go to Shake Shack you still get a Shake Shack burger, you go to Dunkin Donuts you get their donuts, and so on. So yes, there is a bit of equalization in service, perhaps, but that doesn’t explain it? I know that I will almost always make the same choices at airport restaurants I would at a similar outside food court.

So I think this is still a mystery, that likely has more to do with how people rate restaurants when they are being charged a lot and are travelling? As in, you’re always happy to eat something at all, always frustrated by the price and options and conditions, so you end up around 2.5-3 star averages almost no matter what? I guess?

A ghost kitchen Xi’an Famous Foods is doing bonkers business in Alexandria. Xi’an Famous Foods is quite good, I recommend the Lamb Noodles like everyone else does, but you have to eat it right away (and also I made the mistake of looking, and it is a lot of calories, so I don’t do it often). This isn’t only me, they’ve been consistent about insisting on the eating right away part, which applies here way more than usual. I worry many customers aren’t getting the full experience.

Joe Weisenthal says all cities have good food now. Nate Silver calls out Boston as being somewhat lacking among the top metro areas as do many others, which he attributes to it being a college town, and many others question the premise.

My understanding can be summarized this way:

  1. No matter where you go, average quality of food is way, way up.

  2. No matter where you go, the best available food is way, way up.

  3. No matter where you go, variety of available food or good food is way, way up.

  4. The average place is still far behind the better places, almost everywhere.

  5. You can eat fine basically everywhere there are people, at this point.

  6. This is all true regardless of your price level.

  7. The average and best available options still vary a lot from place to place.

  8. This difference matters, and can matter a lot. NYC is awesome here.

Yes you can take a systematic approach to anything and very often you should do it.

Lonely: why don’t autistic people make behaving appropriately and predictably in social situations their special interest.

Hotistic: While you were studying the blade, autistic people were studying appropriate ways to laugh and when to laugh and why it’s ok to laugh just to not make normies uncomfortable.

Madeline Pendleton: In 4th grade I tried to teach myself “how to be human” by replicating the tv show Friends. I did a peer survey and asked my classmates who their favorite character was. Phoebe won, so I spent the entire summer studying her and entered 5th grade AS Phoebe.

For those wondering it worked pretty well, I definitely became more popular. If you’re struggling socially I can 10/10 recommend just becoming Phoebe Buffay from Friends for a while.

I did become very popular almost overnight so I’m going to say yes [it did work.]

Sasha: The best part about this is that if any character on Friends was autistic, it would 100% be Phoebe.

Madeline Pendleton: Oh my god.

Trash Panda: I’ve always struggled making friends so at one point in high school I decided to copy the personality of fictional characters I liked because if I liked them surely other people would like me if I acted like them, right?

The character I chose was fucking Deadpool 🤦🏻‍♀️🤣🤣

Perhaps the supposedly ‘normal’ humans should also be doing more systematic study of how to do you do, fellow humans? They seem to have skipped over some things.

Meghan Murphy: This is the saddest thing I’ve ever read.

Ok never mind this is the saddest thing I’ve ever read:

[Quotes Dark Hyacinth: Parties are boring. A bunch of people standing around drinking. What’s fun about that?]

The parties like this were taken from me at the time (in the 90s and early 00s) and I never experienced them, but I did understand they existed and I was sad about this.

Cate Hall comes out against the concept of willpower. I see this post as correctly attacking people who simply tell you to Use The Try Harder and think doing hard things through ‘sheer willpower’ is virtuous and those who don’t have it deserve to suffer or anything like that.

I strongly agree that the best way to get good results is to set things up to be easy, and that anyone who says any form of ‘you don’t need [X] you only need willpower’ is usually the asshole in a given situation. Engineering solutions are great.

I still think the post goes too far in treating willpower as a non-useful concept. Willpower is a highly useful handle for an important tool that one can and should cultivate and learn how to use wisely. You can also choose to call it something else, if you prefer.

Cate Hall asks ‘are you stuck in movie logic?’ in particular highlighting one form of Idiot Plot where the whole problem could be cleared up in five minutes if people would simply talk to each other and Say The Thing rather than repeatedly and conspicuously dancing around it and not Saying The Thing. As she says, there is a time and place for not Saying The Thing but on the margin you should say it.

Technically when you register for the LSAT you are representing and affirming that you are doing so for the sole purpose of seeking admission to law school, wait what?

Isaac King: I suppose I already knew this, but it’s striking how many of the people responding to this seem to legitimately not understand the difference between “did you lie” and “can anyone prove you lied”.

I’m against lying in general. If there’s no good way around it and I think the other party is expecting me to lie, then I’ll sometimes grudgingly do it, but I try to avoid it as much as feasible.

I too am strongly against lying but there are exceptions and this is one of them. Technical attestations with no legitimate purpose or ability to be enforced, and no person who is relying on them in any way to be accurate, don’t count.

What are protests actually for? Ben Landau-Taylor asserts that if you want your protest to exert any political pressure, this requires that you demonstrate the capacity for violence (ideally while carefully avoiding any actual violence). Otherwise, no the state won’t respect your demonstration of support, so the purpose of the protest is as a pep rally for the participants (and I would add a signal to others in other ways, which can then indirectly pressure the state in various ways), which can be worthwhile but should not be confused with political pressure.

I think this model goes too far but is essentially correct, with the caveat that you can also credibly threaten things other than violence, but you have to credibly threaten something.

Most of this Scott Sumner post is about underconfidence in monetary policy, where I find little to disagree with, but what I want to talk about here are ChatGPT’s examples of underconfidence:

I don’t keep up with the superhero genre, so I asked ChatGPT to find some examples of underconfidence:

After Peter Parker is bitten by a radioactive spider, he gains superhuman abilities—but at first, he doesn’t fully understand or control them.

Other characters with a similar arc include:

Clark Kent (Superman) in some origin stories (like Smallville), where he gradually learns to control his immense strength.

Eleven from Stranger Things, though not a traditional superhero, also fits the theme of discovering and misjudging her powers at first.

These are terrible examples.

Clark Kent does not have an underconfidence problem with his powers at any point that I can see. He has a lack of control problem, which is a very real issue. He does have regular person underconfidence problems as Clark Kent, but that’s different.

Peter Parker, in every example I have seen, is initially radically reckless and overconfident. He does things that risk getting him killed if he lacks Required Secondary Powers he has not yet verified.

Have appliances declined in durability? The answer is yes, but only modestly, this reflects consumer demand for more features and not caring much about durability, and also largely reflects government requirements for water and energy efficiency. Besides, prices have declined a lot, so it is fine.

Something to watch out for:

Danielle Fong: 💭if mansplaining is telling someone something they already know, chicksplaining is explaining a dilemma to someone, but she already knows what she wants to do

Yishan: I had this extremely agentic female friend in college and I figured out really quickly that whenever she asked me for advice on what to do, the best solution was to figure out what she already wanted to do, and then advise her to do that because mostly she just wanted validation/permission to do some slightly transgressive thing. Over time, it became “Yishan, you give the best advice! No one else understands, but you get it!” which I guess was technically true.

Dushyant: She won’t tell you what she prefers though

Danielle Fong: Yeah you have to figure it out.

Argentina grows at 7.6% YoY in Q2, exceeding expectations. Economists surveyed by Arentina’’s central bank in May expected 5.2% annual growth in 2025. Also note from March 31 that poverty has fallen sharply from 53% to 38%.

TSA stops requiring us to take off our shoes even if we didn’t pay for TSA Pre.

A fungus was discovered that can eat even hard to break down plastics, so you could plausibly throw it into a landfill and it would do the rest? It is rarely that simple and there are obvious things to check first, but yes we do get bailed out like this every so often. Also note that if you build superintelligence, things like this will tend to happen a lot more often in a variety of ways.

John Wentworth advises us to centrally seek wizard power, the ability and skills to do and create things yourself, rather than king power, which is dominance and bargaining power and directing others, mostly in ways that can only get you what money can buy and involves you marching in front of parades thinking you decide where the parade goes. This allowed him to reorient his own drives in this way.

He also highlights a comment from there noting that rationalist types can present depression very differently than others, in a comment I’m quoting in full:

John Wentworth: In response to the Wizard Power post, Garrett and David were like “Y’know, there’s this thing where rationalists get depression, but it doesn’t present like normal depression because they have the mental habits to e.g. notice that their emotions are not reality. It sounds like you have that.”

… and in hindsight I think they were totally correct.

Here I’m going to spell out what it felt/feels like from inside my head, my model of where it comes from, and some speculation about how this relates to more typical presentations of depression.

Core thing that’s going on: on a gut level, I systematically didn’t anticipate that things would be fun, or that things I did would work, etc. When my instinct-level plan-evaluator looked at my own plans, it expected poor results.

Some things which this is importantly different from:

  • Always feeling sad

  • Things which used to make me happy not making me happy

  • Not having energy to do anything

… but importantly, the core thing is easy to confuse with all three of those. For instance, my intuitive plan-evaluator predicted that things which used to make me happy would not make me happy (like e.g. dancing), but if I actually did the things they still made me happy. (And of course I noticed that pattern and accounted for it, which is how “rationalist depression” ends up different from normal depression; the model here is that most people would not notice their own emotional-level predictor being systematically wrong.) Little felt promising or motivating, but I could still consciously evaluate that a plan was a good idea regardless of what it felt like, and then do it, overriding my broken intuitive-level plan-evaluator.

That immediately suggests a model of what causes this sort of problem.

The obvious way a brain would end up in such a state is if a bunch of very salient plans all fail around the same time, especially if one didn’t anticipate the failures and doesn’t understand why they happened. Then a natural update for the brain to make is “huh, looks like the things I do just systematically don’t work, don’t make me happy, etc; let’s update predictions on that going forward”. And indeed, around the time this depression kicked in, David and I had a couple of significant research projects which basically failed for reasons we still don’t understand, and I went through a breakup of a long relationship (and then dove into the dating market, which is itself an excellent source of things not working and not knowing why), and my multi-year investments in training new researchers failed to pay off for reasons I still don’t fully understand. All of these things were highly salient, and I didn’t have anything comparably-salient going on which went well.

So I guess some takeaways are:

  • If a bunch of salient plans fail around the same time for reasons you don’t understand, your instinctive plan-evaluator may end up with a global negative bias.

  • If you notice that, maybe try an antidepressant. Bupropion has been helpful for me so far, though it’s definitely not the right tool for everyone (especially bad if you’re a relatively anxious person; I am the opposite of anxious).

Scott Aaronson officially admits to being a rationalist.

Polymarket is really hitting the big time, with more visits than FanDuel or DraftKings.

The true gambling kings do remain Robinhood and Coinbase.

Cracking down on alcohol in the USSR in the 1984-1990 period made big differences, and they mostly seem to be clear improvements. Note that divorce rates went up.

Derek Thompson looks back at how poor we were in 1776. We are, by comparison, unfathomably rich. George Washington spent $15k/year in today’s dollars on candles to keep the lights on. Heat was so expensive Jefferson couldn’t write in winter because his ink would freeze.

Religious attendance by the young is way up in the UK, as in by a factor of four or more, and France’s Catholic Church did more baptisms this year (17k) then they have in 20 years, in what some call The Quiet Revival. American bible sales are up 22%. I have seen similar statistics in a few places. What I have yet to see is an explanation of why this is happening, but also I have never seen a satisfying explanation of past cycles of religious revival.

OpenPhil is hiring, including for their new Abundance and Growth team (Generalist JD, Specialist JD).

I strongly endorse this, although I doubt we’ll get it. AI parsing for topics is a bonus.

Gallabytes: I want to be able to mute (person & topic) not just person OR topic. some people are broadly interesting but also have some pet issue they post a lot about upon which they are cursed with stupidity.

Indeed. I can think of a number of accounts where I highly value their opinions on [X], usually things like games or AI highly relevant to my interest, and very much do not value their comments on [Y], often political but sometimes simply something boring.

This is not a coincidence because nothing is ever a coincidence, also obviously, although the impact here is dramatically overstated of course:

Yung Marco: just spent ~3 hours reading

LessWrong/EA/MIRI deep lore

it is fascinating how in the 21st century 90% of variance in personal success can be explained by “did you find the right online communities or not.”

this will be increasingly so, post more…

“oh wow, you were an integral participant of the most important technological revolution of all time? you must have 7 sigma IQ and birthplace luck”

“nope, I just posted on the right forum”

One did not simply post to classic LessWrong. It was so intimidating that I at the time was worried to post there, which I shouldn’t have been, but if you weren’t ready the response was super harsh, you would be effectively shown the door. There was tons of filtering. Even if you weren’t shown the door, you wouldn’t get to be a true part of the community, although you could still have for example gotten an early line on Bitcoin.

There were also strong attractors. If you were the type of person who could be there, there was a substantial chance you ended up there. It’s true that there is a ‘invisible graveyard’ of other LessWrong people that would have been right at home and never found it, but I don’t think it is that much larger than the actual group. Same with MIRI.

Going forward, for future groups, I expect the effects will be similar, so long as it remains humans who are shaping our future. Let’s hope that lasts.

Similarweb says Threads now has slightly more monthly active users than Twitter? But it also says Twitter has about 35 times as much web traffic. I don’t buy this?

I wonder about this situation, and what is really going on.

As in, a good portion of those who see Brah’s post are going to notice that Freiman’s post saying ‘constant 2022 dollars’ right there in large friendly letters. I do think the true situation is more complicated than the chart suggests, but yes people are getting richer by these measures.

Ryx Commar notes a problem, and correctly identifies it as a sorting problem, not an average quality issue:

Ryx: A phenomenon in internet discourse over the last 5 years is that the correlation between signals of textual quality (grammar, punctuation, social media likes, probability it shows up in my feed) and actual textual quality has completely broken down. And it’s driving me insane.

All the biggest idiots in the world now use grammar check and spell check on their phones. You also have LLMs spitting out garbage. The Twitter algorithm puts tons of slop in your feed now. You actually have to read and manually sort through so much more stupid content.

It’s not so much that people have gotten dumber, it’s that dumb people and dumb text now blend in more with smart people and smart text. So my brain actually engages with all this dumb text. This is one of the bigger reasons why the internet today feels more psychically damaging.

The solution is to rely instead on other markers. Stick almost entirely to curated following-or-listed-only feeds (did you know even YouTube, Facebook, Instagram and TikTok let you do this, if you dare go to all such places?), except where algorithms are very good.

Social media likes and views are still a very rich indicator, but you have to control for circumstances, starting with the account posting but also the subject and the way it is constructed. With enough skill you can still get benefit out of them but it’s tricky.

Apple made the stop button on the alarm small because if you don’t force people to wake up to find the button they oversleep 30% more, whereas an easy to find snooze button only buys you a few minutes.

In ‘it’s worse than you know’ news:

Shoshana Weissmann: “Yesterday the ABC reported the trial found face-scanning technologies “repeatedly misidentified” children as young as 15 as being in their 20s and 30s. These tools could only guess children’s ages “within an 18-month range in 85 percent of cases”. This means a 14-year-old child might gain access to a social media account, while a 17-year-old might be blocked.”

That is how badly it performs in a non-adversarial situation. This is how your age verification works when everyone is scanning their actual faces with no attempt to fool the system. If you’re facing kids who want to fool the system? I mean just give up, even if you mysteriously ruled out the ‘hey other kid can you do the verification for me’ strategy. Sonnet thought you could probably just literally use a fake mustache.

I do not understand this either: Why do all laptops, or at least all not-dirt-cheap ones, not have the same connectivity features as smartphones?

A Patrick McKenzie tale of how to allow kids to make phone calls on their Amazon Fire tablets, which for them required multiple non-intuitive steps.

I thought I had a lot of open tabs. I counted 139 including all my tab groups, of which probably half are actually necessary. I was incorrect, this does not appear to be ‘a lot of open tabs.’

Also, really, Safari?

Ryan Briggs: I asked my wife why she was in private browsing mode on her phone and she explained that Safari only allows 500 tabs in regular mode so she had to switch. You think you know a person.

William Eden: Oh my god I just asked my wife and she sent me a screenshot with 500 open tabs wtf

I have 13 tabs open on my phone and it’s too many. Less than 20 total across ALL of my devices.

Charles Neill: You need to create tab groups. You need to download more browsers. You need to be tab-maxxing.

I too have grown increasingly skeptical that meta-analysis in its typical form does anything all that useful.

Eliezer Yudkowsky: studies 1, 3, 5: objects fall down

studies 2, 4, 6: objects fall upward

sane people: at least half of these studies must be doing something terribly wrong; they’re not all reporting inside the same reality

journal papers: our meta-analysis shows that objects hover in place.

Tracing Woods here makes a similar argument for education meta-studies in particular, that the different studies have dramatically different setups and criteria, and you need to look at the studies individually if you want to learn anything. I buy it.

If you post a graph showing a small effect, but it is zoomed in, people get the wrong idea, so try not to do that when this would be a problem.

Firewood alone was supposedly 28% of GDP. Except wait, does that actually make any sense? A quarter of economic activity was firewood? We should believe that because a paper said so?

River Tam: Who would win, a PhD in natural resource economics doing detailed historical analysis of published firewood prices and consumption volumes over 300 years or one autodidact’s “I doubt it?”

Emmett Shear: You’d be surprised.

Actually, calling out absurd numbers as absurd is The Way.

Michael Vassar: The autodidact in this case, 100%. But @ben_r_hoffman has already addressed the most glaring flaws in this particular paper, the asymmetric treatment of non-market labor between numerator and denominator.

Eliezer Yudkowsky: Looks like it was just the very straightforward “firewood was informal economy, formal small by comparison, guy with an axe chops firewood for their house in more like 5-10% of annual labor”. This is the Way of having a sense of numbers and asking honest questions.

Tetraspace: Chopping firewood is 30% of GDP but the economy is 300% of GDP

Spending 10% of labor on firewood is still a lot. Firewood was a huge deal. But 10% is very different from 30%, and makes a lot more sense. If it was 30%, that would have it be approaching farming in terms of how much labor goes into it even directly for the farmers, and this simply does not intuitively make any sense.

In theory if any given necessity gets sufficiently difficult to obtain it can become an arbitrarily large cost. But that definitely was not the way to bet, and indeed we have an explanation for what was going on: They were valuing all firewood at market price (which is well above typical cost) and then comparing to GDP estimates that treat production very differently than that, and I think the problem goes at least one step deeper than Bernard identifies, they likely aren’t even including this type of measurement of firewood itself in the denominator, and also they are using urban firewood prices for what was mostly rural consumption.

Bernard Stanford: If you value all informal economy firewood production at market price, and then compare it to extant GDP estimates, you need to make sure ALL informal economic production is similarly valued, or you’ll massively overestimate firewood’s share of GDP. Seems to be what happened!

The approach seems to have a serious flaw in assuming that THIS sector of the informal economy was underestimated, but surely not any OTHER sector. Yudkowsky’s objection seems to be bang-on.

Halogen: Eliezer is just right here. The number is off by at least one half an order of magnitude, it has to be. This is how real science works, you put things together and think about things. It’s not about memorizing your favorite papers and having 100 econometrics tricks in your bag.

The question is then, why don’t we feel rich? The reason we don’t feel rich is that we are not permitted to live in between how we lived then and how we are supposed to live now.

And yet, we could do so much better. Which is all very good news.

Liz: people don’t internalize how desperately poor the world still is. Yes it’s gotten better, the world is unrecognizable compared to a generation or two ago. doesn’t change the fact that the world is deeply impoverished and operating at a tiny fraction of its potential.

The problem isn’t inequality it’s just raw productive capacity. It’s artificially constrained and even the most productive places on the planet are operating with hands behind their back.

Fully endorsed. I do not know of a single example of a too-large name on a badge.

Gwern: Conference/convention advice on nametags (this is aimed at ~5 different events):

The ideal nametag is a large double-sided placard on a lanyard, with a printed full name on both sides, in large font. No, larger than that. No—NO, THAT IS STILL NOT LARGE ENOUGH. KEEP GOING!

(No, that is still not large enough. But y’all aren’t ready for that conversation.)

If I can’t read it from across a large crowded room in <1s, then the nametag has failed.

Especially do not make it 1-sided! They always flip around and are unreadable for half the event.

(Also, do not add lots of art or logos or random text. No, attendees do not need to be reminded what event they are at. They can probably remember what they traveled halfway across the world for… Remember, the nametags are for them, not you or your designers.)

Oh no.

Netflix: Julia Garner and Anthony Boyle will portray Caroline Ellison and Sam Bankman-Fried in the new limited series The Altruists. Two hyper-smart young idealists try to remake the global financial system in the blink of an eye…only to seduce each other into stealing $8 billion.

enci: Can’t believe how much you fumbled this

We also would have accepted Jonah Hill, as per Atomic. This is not merely a technical historical accuracy thing, I think it’s actually important. Also, given this and the name The Altruists, and also the description I – I mean seriously, what, no, that’s not how any of this worked – I presume they have zero idea what they are doing.

How should we think about Warren Buffet’s $6 billion donation going entirely to other foundations, mostly to the Gates Foundation? This is definitely not first best, but he is getting quite old, so I don’t think asking him to manage the money himself is a reasonable ask, trying to force generic additional foundation into existence without his focused attention seems unlikely to work out, and at this scale there are few options available. Obviously I have some suggestions I think are much better places to put a good chunk of these funds, but I’m not mad at it.

NYT once again pulls the Kevin Bacon Game, as in ‘[X] is associated with [Y] which has a similar name to [A] which includes [Z] so obviously [X] is linked to [Y].’

Andy Masley: NYT piece today connecting Elon to longtermism and by extension EA. Nothing really new. I just don’t buy the basic implication that longtermism’s responsible for turning Elon crazy. If you’ve become unhinged, any big ideology is going to be a useful justification.

If Elon were actually being guided by longtermist ideas he would’ve tried to influence US AI and biosecurity and nuclear policy. He didn’t. He nuked USAID and some of the governments’ most effective and utilitarian programs for insane culture war reasons.

EA and longtermism are in the cultural water in tech spaces. You can use both to justify almost anything if you just engage with meme versions. If longtermism were more than an aesthetic fad for Elon I would’ve expected his behavior to be radically different.

Tetraspace: The problem with asking an actual EA what they think about Elon Musk would be that either they’d tone it down for the camera or it would be rude to elicit people saying that about a senior official.

Elon Musk is no longer a senior official. It would still be rather rude.

This is rapidly evolving into a generalized weapon against everything good.

As in:

  1. Person [P] supports thing [X] that would be good in the long term.

  2. Even worse, [P] is trying to figure out actions [Y] that accomplish [X]!

  3. Effective Altruism!

  4. Which means bad! Get it? It means bad! And so cringe.

We see this in its pure form with David Sacks, saying anyone opposed to anything he wants must be an EA in a mask, and that we have to ban states from passing laws about AI because all state laws about AI would of course be the result of a global conspiracy of evil EAs. But you can do the same thing about anything, anywhere.

As Henry Shevlin says, you have to know your EAs.

In a French experiment, they report that imposing a maximum donation increased likelihood and quantity of giving, at least as effectively as a suggested donation, but what they actually did was paid 10 Euros for completing a questionnaire and then offered people the chance to donate either 0-10, 0-10 (with a suggestion of 6) or 0-6 Euros. And yes, in this case 0-6 did better, but this obviously doesn’t either describe what they claim it does or generalize. It does suggest the important principle that you want to appear reasonable.

There are two distinct problems here: That on the margin there are huge rewards to learning to work the system, and that the intrinsic motivations have perhaps changed.

David Perell: Ten years ago, when YouTubers got together, they talked about editing and storytelling and how to make better videos. Now they talk about how to game the algorithm by increasing click-through rates.

Just about sums up social media right now.

This is not a critique of YouTubers. It’s the rational thing to do. To put numbers on this, all things being equal, when I publish a video with a 3% click-through rate, it’ll get ~3,000 views while a video with a 6% click-through rate will get north of ~100,000 views.

There was a time when you could simply make great content and people would watch (and in certain pockets, that’s still true) but just about every mega-YouTuber has devoted ungodly amounts of time and attention to title / thumbnail strategy.

Pratyush: Jimmy Iovine said that the number one reason music isn’t as good anymore is musicians want to be famous, not great. And nowadays, you can get famous without being great.

A lot of modern culture slop is downstream of this change in behavioral drive.

My money is on the problem being mostly about the reward systems rather than the motivation. Yes, some people primarily want to be famous and successful, but that has always been true. What changed is that if you pursue excellence, the excellence that gets rewarded and that you can measure is largely about working the system, whereas making the underlying products ‘better’ matters too but it is a slower process that on the margin doesn’t pay off for a long cycle. Success is so reliant on virality.

That’s one reason I am so grateful for Substack. It is one of the few places where virality is great when it happens, but it matters remarkably little for long term success.

The New York Times comes out with its best 100 movies of the 21st Century, as voted on by influential Hollywood people.

My main takeaway was, wow, there are a lot of movies and I have seen not many. My secondary takeaway was, well, this does explain a lot, I suppose.

My evaluation:

Have seen, excellent pick (definitely would have made my list): 14

Have seen, good pick (would be happy to have this on the list): 10

Have seen, questionable pick (I mean weird flex, not my pick): 8

Have seen, actively bad pick (no, seriously, no, don’t watch this): 2

Haven’t seen, probably good pick, but I because of reasons I never saw it: 11

Haven’t seen, can’t tell: 52

Haven’t seen, probably bad pick: 3

If we look only at the 34 that I’ve seen, that ratio isn’t that bad, but you have highly favorable selection working for you there.

Recent movie pickings have been slim. A lot of people liked Superman. I did not.

As a reverse experiment, I went through my Letterboxd diary list (as in, what have I watched since I started tracking, that was released in the 21st century.) The ones that 100% should be on the list are Anora, The Fall Guy and Looper. All three are missing, and I get that the other two are quirky opinions but I don’t think there’s any excuse for excluding Anora. The bubble for my list would be somewhere in the 4.5 range. Of my 4.5 star movies recorded, NONE of them made it either: Challengers, Poor Things, Megalopolis, Weird: The Weird Al Story, Deadpool and Wolverine, Predestination, You Hurt My Feelings and May/December. Some of that is that the list clearly has an anti-recency bias, there are literally zero movies from 2024 or 2025. Who knows.

I think a lot of the problem was that they only asked each person to vote for 10 movies rather than 100 movies. That introduces some odd distortions.

For better opinions, here are Scott Sumner’s latest movie reviews. There is also well-earned praise for Lighthaven and the events there. I have been seeing less movies lately in favor of watching more television shows, and because few movies this year have appealed to me. I do hope to turn that back around, especially now that (by the time you read this) Love Island USA is done for the year, but I also think going through phases of intense interests and jumping around is actually correct.

Here’s another example of ‘whatever you are doing, commit to the bit.’

Romy: Back in the winter i was depressed and speculated that if i got a hobby it would fix me, so i signed up for a ceramics class. I now spend 10-20 hours per week doing ceramics and am not depressed. It turns out you can actually just assign yourself a special interest.

Spent 2 hours designing and building most of this pentagonal planter today even tho i was hungry and had a headache

stef: hell yeah we’re always looking for complicated solutions and the answer is literally just use your hands to make/build/fix stuff

I don’t know how much this generalizes or how much it depends on it having been a physical skill like ceramics, but yeah. Get into something.

In one of the weirdest arguments I’ve seen in a long time, Tyler Cowen says people read less and perhaps have lower literacy skills but the ‘most likely culprit for our current problems’ is the decline of network television and people’s willingness to obey Walter Cronkite and be duller and more conformist. I suppose the point is that reading was already gone and mostly we’re substituting out of TV and there are some cultural downsides to that?

But that has nothing to do with the question about reading, and also that’s a different set of problems? Surely, if English Majors Can’t Read, that isn’t caused by their failure to watch a bunch of NBC. My read on the post covering the reading debate here is that it’s a mirage, reading hasn’t actually declined that much, we’re now constantly interacting via text, it’s more that attention spans for long texts have declined and this isn’t obviously wrong, and the reason students 100 years ago sound so much better is that they are a highly selected group.

To the extent there really is an issue, I say the problem was caused by… network television, which shifted a ton of consumption away from reading to video. After that, the recent changes didn’t make things worse (I think?) but substituted something else for the network television.

YouTube Shorts is now averaging over 200 billion daily views. There are only ~8 billion people on Earth, so that’s 25 per person. And then Reels and probably TikTok are both bigger than that. Yikes.

Kevin Roose: Need a phrase like “vanity metric” but for numbers you can’t disclose because they reveal your dominance and create existential malaise in all who hear them.

Robin Hanson points out our consumption of fiction and music is dramatically higher than it used to be, these are rough AI estimates, I note that o3-pro for me estimated 9-14 hours a week for all fiction rather than 24, although Opus was 15-20 hours:

Robin Hanson: Note the huge increase over time. As US adults now average ~21 hours a week at jobs, and ~14 at housework, adults now spend substantially more hours on both fiction and music than they do on either jobs or housework. So it seems fair to wonder: is this behavior adaptive?

The post doesn’t focus on music, and I would ignore it. There is no real sense in which we ‘spend’ three hours a day on music. o3-pro estimates 97% of our music consumption is passive, so active consumption may even have gone down. There’s no reason to presume this is or is not adaptive.

I consume far less because I find music reduces my productivity, but it brings me joy and I should probably consume more.

Fiction however is presumably being consumed as a primary activity. So this change, largely in response to vastly superior supply of both fiction and free time, is plausibly maladaptive. Certainly 24 sounds like a ton, although 14 seems a lot more sane to me.

One could decompose this change into leisure consumption over time, and the share of that consumption that is fiction or actively listening to music. It seems plausible that given the decision to consume so much leisure, it is not a mistake to consume this much fiction and music, or it is a much smaller mistake. So to the extent we worry about a cultural error here, the focus should be on our potentially maladaptive increase in total leisure.

A paper’s model of ‘inefficient bargaining’ puts a 2% lower bound on the chance a TV show is cancelled even if it would be efficient to continue, higher if there is asymmetric information. That’s the nature of any similar negotiation, if you’re not risking a 2% chance any given negotiation blows up you are not negotiating very hard.

I’ve talked about it before but I seriously can’t get over that the world works this way.

Tetraspace: China: [slams defect button] I win

America: I’d love to cooperate but the incentives, you see, my hand is forced…

Japan: The sign says to cooperate ?? why wouldn’t I cooperate ??

Peter Wildeford: The current way we do the 5 star system just sucks

Ryan Moulton: Game theory forces this. Using the ends of the range maximizes your power over the average.

Toucan: In japan they don’t have game theory, which is why 95% of restaurants get a 3.5 or below (correct)

If all you ever do is throw the number in the average, and all you care about is the average, then yes, rating something 3/5 is silly. But you don’t directly benefit that much from the average, so all you have to do is have the ratings also do something else, especially if they help you track things or help algorithms or AIs make predictions, or you get a reward for a reasonable distribution, or people are reading your reviews directly, and so on. Movie ratings do survive with a reasonable distribution for similar reasons, even in America.

The problem is that if you try to force calibration in various ways, that opens up other ways to cheat the system, so this would work if and only if people weren’t adjusting.

It was Monster Train 2 month. We’re back, baby.

I centrally describe Monster Train 2 as More Monster Train. Had fun the first time? Have fun again, with a bunch of cool new features, figure out the new clans, and climb. As before, the goal of Monster Train is to do Something Utterly Ludicrous, or more precisely something that wins the run, which means knowing exactly what does and does not win runs. There are particular battles that are run killers if you don’t realize the danger.

Ultimately I decided that I had fun for enough hours I was happy I bought and played the game, but that I’d had this experience before, I could keep going and achieve more things but my experience had peaked and I was done after 14 hours. Which is fine.

I am now on Clair Obscur Expedition 33. I agree with everyone else that, some frustrations with navigation aside it has been a great experience so far. I do have notes, especially that certain choices are not well balanced.

Recommendations for how to maximize your Clair Obscur Expedition 33 experience. The first is minimize spoilers. The others are out of your hands and are minor spoilers, so I’m not going to tell you, and you shouldn’t click the link until after you play.

If I am understanding this right, XBox is going to transition to a modular platform that will be fully compatible with PCs and basically be a way to play PC games on console and handheld formats? They lost to Sony so they’re going after Valve?

I agree with dCrusius that retro games both classic and new are pretty awesome, and it is not only nostalgia, and there’s a reason my kids like them so much. Restrictions breed creativity and I love being able to actually fully grok everything. There are still great modern games too, of course.

Reid Duke reports from PT: Final Fantasy. Sounds like old times. DI Goetschel also reports as well, the first part is highly particular but the second part involves universal principles that don’t require you know what the cards do.

There was a poker tournament where one player got a $1 million dollar extra payout if he won, which was much larger than all the other prizes. So the other finalist let him win. All Magic: The Gathering players and game theorists are unsurprised, but in poker this is a real problem, because poker depends on the ability of various players to do various insane prop bets and competitions and such that create weird incentives, and for the other players to not respond by coordinating to make the conditions happen, whether or not they then directly get (or negotiated for) a cut.

I do miss the original Railroad Tycoon.

David: 2000s tycoon games were deep strategy games that really forced you manage tradeoffs and balance budgets/spend/revenue 2020s tycoon games are almost all pay-to-win waiting/idle games.

Alan Cole: The fact that Railroad Tycoon 2, specifically, had a complete and coherent simulation of equity and debt finance for companies, M&A transactions, individuals who could short sell or purchase on margin, and similar, really makes me wonder about reverse Flynn effects.

Railroad Tycoon was great because it focused on actually interesting decisions, and simulated actually interesting things in ways that felt real and forced you to think and work with a variety of real concepts. Alas, yeah, these types of games seem to have gone very downhill, even though one could very easily make them great by making the retro version and then using modern tech to make it better. But no one does it.

A common risk and gaming pattern:

Noam Brown: AI researchers will literally negotiate $100 million comp packages by themselves but they won’t play poker for more than $50 buy-ins.

Meanwhile, I mentioned to a VC I lost 300 playing poker in Vegas and his response was “300 what?”

Steven Adler: How much did you lose in the high-roller Blood on the Clocktower game though.

The VC’s question seems highly valid, and there are at least two very distinct plausible answers, although one probably means he was flying a bit too high.

The thing about poker and gambling is that you only have to gamble enough to make you care. It can’t be $0, but if I can get excited by amounts of money that mean nothing to me, why not? The excitement is the point, I’m certainly not making my hourly. If I ever do get to play a major tournament, which is the only time I might plausibly play for stakes that actually matter to me at this point in real terms, it will be because of the competition and the title.

I do remember what it was like to be gambling actually important, life changing amounts of money on a daily basis. I never actually got to the point where I enjoyed that aspect, but I did it because that’s the only way to get the alpha.

By default, never tell a streamer any potentially new-to-them game information they aren’t explicitly asking for you to tell them, and wondering aloud does not count as asking. I am fully with Jorbs here.

DHS Is Considering Reality Show Where Immigrants Compete for Citizenship, from the producer and writer of Duck Dynasty. I would have tapped Mark Burnett, creator of Survivor and The Apprentice, because obviously.

To be clear, this is extremely funny, but also we should totally do this, because skill-based immigration rules as does wholesome family entertainment.

The challenges might need some work, though?

In a 36-page slide deck reviewed by the Journal, Worsoff’s team outlines a reality-style TV show where, in one-hour episodes, immigrants compete to prove they are the most American.

In one challenge set in San Francisco, for example, immigrants would compete in a gold rush competition where they are sent into a mine to retrieve the most gold.

In another episode, contestants would be divided into teams and placed on an auto assembly line in Detroit to reassemble the chassis of a model T.

An alternative pitch, of course, would be Green Card Marriage. Relationships on The Bachelor tend not to last, so let’s raise the stakes. If you don’t actually marry and make it two years we kick you back out of the country. Remember, you can’t be 4TWR when coming to America is always the right reason. So all bets are off.

Waymo expands, now so tantalizingly close to SFO.

Plus this area of course:

For now maybe a shuttle or quick taxi ride for the last mile into SFO?

Waymo’s speed disadvantage does add up on longer trips, like this comparison showing Waymo 50 minutes slower than an Uber if traversing the entire length of the covered area down to Burlingame, due the whole ‘always obey all the traffic laws and rules of the road and almost never have an accident’ thing.

A key question on self-driving cars is, are we going to use them to give children better freedom of movement, because now they can safely go anywhere without having to drive? Are we perhaps also going to let them walk around because the primary threat (other than police) was always cars and the self-driving cars are vastly safer for pedestrians? Or are we going to be totally crazy and not let them do any of it?

I also disagree that they will make traffic worse, because self-driving cars can coordinate traffic very well, even if humans would end up in a pointless jam that feeds on itself, and because the cars can coordinate their movements much better, also we could vastly improve parking issues. But yes, ultimately if we want to get optimal road use we need to charge to use the roads.

A cool thought experiment, 23 million autonomous vehicles could take care of all car rides, a 90%+ reduction in vehicles, by an o3 estimate. This seems right to me at least if you exclude isolated people’s vehicle needs.

For now, we’re a little short.

Joseph Carlson: Waymo plans to more than double it’s fleet from 1,500 to at least 3,000 by the end of next year [thanks to a new manufacturing facility in Arizona].

That’s one of those statistics that is both impressive and disappointing at the same time. It is great to double the size of the fleet, but why only double? Why not 10x, or 100x? I want my self-driving cars.

A bill was introduced in Washington, DC to allow fully self-driving cars. For the last few months Waymo has been forced to have dummy human drivers behind the wheel, with rides for customers in Washington DC, which will be their seventh city, only slated for 2026.

There is a new culturally important sport in town, which is Love Island USA. Make no mistake, this is a sport, and a rather excellent one. Season 7 was reportedly several times the size of the former peak of Season 6 by audience and size of online discussion, so chances are Season 8 is going to be huge next year. The best part is that there is still so much room for improvement in the format.

NIL Go is the new attempt to get a handle on payments to athletes in college sports, requiring all substantial payments to go through them so they can check the deal and approve it, with arbitration if you object. It seems likely this will fail and we’re simply going to face a full market for student athlete services, with extra steps, but at least they are trying once more.

An SMBC is very much not how any of this works, which was the joke, but the problem is that SMBC is too often actually describing how things do work, such that Eliezer felt compelled to point out all the ways this one was wrong, which only made the whole thing funnier.

Refuse the call to adventure today!

Lydia Laurenson: Vibegala theme this year was “the hero’s journey” and I particularly loved the satirical guerrilla posters that Chelsea Sierra Voss made to discourage attendees from heeding the call of adventure 😹

Maybe learn a foreign language instead?

Terrible Maps: How people react when you try to speak their language

Or, if you must do more, here’s a handy guide.

Sarah-Jayne Blakemore: I was explaining to my Ukrainian colleague the phrase ‘There’s no such thing as a free lunch’. She told me the equivalent in Ukrainian is ‘The only free cheese is in the mousetrap’ – which is so much better

Discussion about this post

Monthly Roundup #32: July 2025 Read More »

southwestern-drought-likely-to-continue-through-2100,-research-finds

Southwestern drought likely to continue through 2100, research finds

This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy, and the environment. Sign up for their newsletter here.

The drought in the Southwestern US is likely to last for the rest of the 21st century and potentially beyond as global warming shifts the distribution of heat in the Pacific Ocean, according to a study published last week led by researchers at the University of Texas at Austin.

Using sediment cores collected in the Rocky Mountains, paleoclimatology records and climate models, the researchers found warming driven by greenhouse gas emissions can alter patterns of atmospheric and marine heat in the North Pacific Ocean in a way resembling what’s known as the negative phase of the Pacific Decadal Oscillation (PDO), fluctuations in sea surface temperatures that result in decreased winter precipitation in the American Southwest. But in this case, the phenomenon can last far longer than the usual 30-year cycle of the PDO.

“If the sea surface temperature patterns in the North Pacific were just the result of processes related to stochastic [random] variability in the past decade or two, we would have just been extremely unlucky, like a really bad roll of the dice,” said Victoria Todd, the lead author of the study and a PhD student in geosciences at University of Texas at Austin. “But if, as we hypothesize, this is a forced change in the sea surface temperatures in the North Pacific, this will be sustained into the future, and we need to start looking at this as a shift, instead of just the result of bad luck.”

Currently, the Southwestern US is experiencing a megadrought resulting in the aridification of the landscape, a decades-long drying of the region brought on by climate change and the overconsumption of the region’s water. That’s led to major rivers and their basins, such as the Colorado and Rio Grande rivers, seeing reduced flows and a decline of the water stored in underground aquifers, which is forcing states and communities to reckon with a sharply reduced water supply. Farmers have cut back on the amount of water they use. Cities are searching for new water supplies. And states, tribes, and federal agencies are engaging in tense negotiations over how to manage declining resources like the Colorado River going forward.

Southwestern drought likely to continue through 2100, research finds Read More »

as-white-house-talks-about-impounding-nasa-funding,-congress-takes-the-threat-seriously

As White House talks about impounding NASA funding, Congress takes the threat seriously

This year, given the recent action on the budget measures, it is possible that Congress could pass Appropriations legislation for most of the federal government, including NASA before October 1.

Certainly there is motivation to do so, because the White House and its Office of Management and Budget, led by Russ Vought, has indicated that in absence of Appropriations legislation it is planning to take measures that would implement the Presidents Budget Request, which set significantly lower spending levels for NASA and other federal agencies.

For example, as Ars reported earlier this month, the principal investigators of NASA science missions that White House seeks to kill have been told to create termination plans that could be implemented within three months, beginning as soon as October 1.

Whether there is a continuing resolution, or shutdown, then, the White House appears likely to go to court to implement its spending priorities at federal agencies, including NASA.

Congress acknowledges the threat

This week the Ranking Members of House committee with oversight over NASA raised the alarm publicly about this in a letter to Sean Duffy, the Secretary of Transportation who was recently named interim administrator of NASA as well.

NASA appears to be acting in accordance with a fringe, extremist ideology emanating from the White House Office of Management and Budget that asserts a right to impound funds appropriated by Congress for the sake of executive branch priorities. Moreover, it now appears that the agency intends to implement funding cuts that were never enacted by Congress in order to “align” the agency’s present-day budget with the Trump Administration’s slash-and-burn proposed budget for the next fiscal year, with seemingly no concern for the devastation that will be caused by mass layoffs, widespread program terminations, and the possible closure of critical centers and facilities. These decisions are wrong, and they are not yours to make.

The letter reminds Duffy that Congress sets the budget, and federal agencies work toward those budget levels. However, the legislators say, NASA is moving ahead with funding freezes for various programs reducing employees across the agency. Approximately 2,700 employees have left the agency since the beginning of the Trump Administration.

As White House talks about impounding NASA funding, Congress takes the threat seriously Read More »

netflix’s-first-show-with-generative-ai-is-a-sign-of-what’s-to-come-in-tv,-film

Netflix’s first show with generative AI is a sign of what’s to come in TV, film

Netflix used generative AI in an original, scripted series that debuted this year, it revealed this week. Producers used the technology to create a scene in which a building collapses, hinting at the growing use of generative AI in entertainment.

During a call with investors yesterday, Netflix co-CEO Ted Sarandos revealed that Netflix’s Argentine show The Eternaut, which premiered in April, is “the very first GenAI final footage to appear on screen in a Netflix, Inc. original series or film.” Sarandos further explained, per a transcript of the call, saying:

The creators wanted to show a building collapsing in Buenos Aires. So our iLine team, [which is the production innovation group inside the visual effects house at Netflix effects studio Scanline], partnered with their creative team using AI-powered tools. … And in fact, that VFX sequence was completed 10 times faster than it could have been completed with visual, traditional VFX tools and workflows. And, also, the cost of it would just not have been feasible for a show in that budget.

Sarandos claimed that viewers have been “thrilled with the results”; although that likely has much to do with how the rest of the series, based on a comic, plays out, not just one, AI-crafted scene.

More generative AI on Netflix

Still, Netflix seems open to using generative AI in shows and movies more, with Sarandos saying the tech “represents an incredible opportunity to help creators make films and series better, not just cheaper.”

“Our creators are already seeing the benefits in production through pre-visualization and shot planning work and, certainly, visual effects,” he said. “It used to be that only big-budget projects would have access to advanced visual effects like de-aging.”

Netflix’s first show with generative AI is a sign of what’s to come in TV, film Read More »

experts-lay-into-tesla-safety-in-federal-autopilot-trial

Experts lay into Tesla safety in federal autopilot trial

For example, she said Tesla “clearly recognized that mode confusion is an issue—this is where people, for example, think the car is in Autopilot and don’t understand that the Autopilot has disengaged,” she told the court.

Cummings also referred to the deposition of Tesla autopilot firmware engineer Ajshay Phatak. Phatak’s deposition told the court that the company did not keep good track of Autopilot crashes prior to 2018, and Cummings pointed out that “it was clear they knew that they had a big problem with people ignoring the warnings. Ignoring the hands-on requests. And…as you know, prior to this accident. It was known to Tesla that they were having problems with people ignoring their warnings.”

Tesla’s abuse of statistics to make misleading claims about safety is nothing new: In 2017, Ars found out that Tesla’s claim about Autopilot reducing crashes was not at all backed by data, which, in fact, showed the driver assist increased crash rates.

Mendel Singer, a statistician at Case Western University School of Medicine, was very unimpressed with Tesla’s approach to crash data statistics in his testimony. Singer noted that he was “not aware of any published study, any reports that are done independently… where [Tesla] actually had raw data and could validate it to see does it tend to make sense” and that the car company was not comparing like with like.

“Non-Teslas crashes are counted based on police reports, regardless of safety system deployment,” Singer said. Further, Tesla kept misleading claims about safety on its website for years, Singer pointed out. When asked whether he would have accepted a paper for peer review from Tesla regarding its reports, “that would have been a really quick and easy rejection,” he said.

While it’s possible that Tesla will still settle this case, we may also see the trial carried out to its conclusion.

“The plaintiffs in this instance have already received compensation from the driver of the Tesla in question, apparently in a decent amount. My understanding is that this makes them much less likely to take the kinds of offers Tesla has been making for settlements, and this is more about the justice,” said Edward Niedermeyer, author and long-time Tesla-watcher.

“That said, the judge in the case has made some frustrating rulings around confidentiality on key issues, so it’s possible that may be in Tesla’s favor. They could also just up their settlement offer enough to be impossible to refuse,” Niedermeyer said.

Experts lay into Tesla safety in federal autopilot trial Read More »

apple-sues-youtuber-who-leaked-ios-26’s-new-“liquid-glass”-software-redesign

Apple sues YouTuber who leaked iOS 26’s new “Liquid Glass” software redesign

“Defendants’ misconduct was brazen and egregious,” says Apple’s filing. “After Mr. Prosser learned that Mr. Ramacciotti needed money, and that his friend Ethan Lipnik worked at Apple on unreleased software designs, Defendants jointly planned to access Apple’s confidential and trade secret information through Mr. Lipnik’s Apple-owned development iPhone.”

Apple’s main source of information appears to be an audio message sent to Lipnik by Ramacciotti, which Lipnik then provided to Apple. An April 4 email from an anonymous source, also shared in the filing, named Lipnik as the source of the leaks and alleged the involvement of Ramaciotti and three other names that are blacked out.

According to the filing, Lipnik has been fired from Apple “for failing to follow Apple’s policies designed to protect its confidential information, including development devices and unreleased software and features.” The filing also accuses Lipnik of failing to report “multiple prior breaches” to Apple.

For his part, Prosser claims that Apple’s timeline of events is incorrect.

“This is not how the situation played out on my end,” Prosser posted to social media late yesterday. “Luckily have receipts for that. I did not ‘plot’ to access anyone’s phone. I did not have any passwords. I was unaware of how the information was obtained. Looking forward to speaking with Apple on this.”

Prosser then posted a screenshot from a messaging app, dated to February, which implies that he had been sent the information about the Liquid Glass redesign unsolicited.

Apple’s suit is seeking damages from Prosser and Ramacciotti, and it wants “to protect its trade secrets” and “prevent Messrs. Ramacciotti and Prosser from continuing to act unlawfully.” Even though the company has already publicly announced iOS 26 and the Liquid Glass design, Apple describes Prosser and Ramacciotti as “an ongoing threat” because Lipnik’s phone “contained other announced design elements that remain confidential.”

Apple sues YouTuber who leaked iOS 26’s new “Liquid Glass” software redesign Read More »

rocket-report:-spacex-won’t-land-at-johnston-atoll;-new-north-sea-launch-site

Rocket Report: SpaceX won’t land at Johnston Atoll; new North Sea launch site


All the news that’s fit to lift

“Europe is seizing the opportunity to lead.”

NASA astronauts Mike Fincke (left) and Zena Cardman (right), the pilot and commander of NASA’s SpaceX Crew-11 mission to the International Space Station, view a Falcon 9 rocket ahead of their spaceflight. Credit: SpaceX

Welcome to Edition 8.03 of the Rocket Report! We are at an interesting stage in Europe, with its efforts to commercialize spaceflight. Finally, it seems the long-slumbering continent is waking up to the need to leverage private capital to drive down the costs of space access, and we are seeing more investment flow into European companies. But it is critical that European policymakers make strategic investments across the industry or companies like PLD Space, which outlined big plans this week, will struggle to get off the launch pad.

As always, we welcome reader submissions, and if you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Avio celebrates freedom from Arianespace. Representatives from Italy, Germany, and France met at the European Space Agency headquarters last week to sign the Launcher Exploitation Declaration, which officially began the transfer of Vega C launch operation responsibilities from Arianespace to the rocket’s builder, Avio, European Spaceflight reports. “It is a historic step that reinforces our nation’s autonomy in access to space and assigns us a strategic responsibility towards Europe,” said Avio CEO Giulio Ranzo. “We are ready to meet this challenge with determination, and we are investing in technologies, expertise, and infrastructure to ensure a competitive service.”

A breaking of long-term partnerships … In addition to securing control over the full exploitation of the Vega launch vehicle family, Italy, through Avio, is also investing in what comes next. The country has committed more than 330 million euros to the development of the MR60 methalox rocket engine and two demonstrator vehicles. These, along with the MR10 engine being developed under the Vega E programme, will support Avio’s preparation of a future reusable launch vehicle. Historically, France, Germany, and Italy have worked together on European launch vehicles. This appears to be another step in breaking up that long-term partnership toward more nationalistic efforts.

PLD Space outlines grand ambitions. PLD Space, Spain’s sole contestant in the European Launcher Challenge, unveiled its long-term strategy at the company’s Industry Days event this week, Payload reports. The company is targeting a production rate of 32 Miura 5 launchers annually by 2030. To achieve this output, PLD plans to deepen its vertical integration, consolidate its supplier network, and begin to serialize its manufacturing process beginning in 2027.

Building up the supply chain … The company’s production plans also call for the parallel development of Miura Next, a heavy-lift vehicle capable of bringing 13 tons to orbit. However, the company will start with the Miura 5 vehicle, which PLD expects to launch for the first time from French Guiana in 2026. Since the beginning of 2024, PLD has invested a total of 50 million euros in its Miura 5 supply chain, consisting of 397 industrial partners, many of which are located in Spain and other European countries.  These plans are great, but sooner or later, the 14-year-old company needs to start putting rockets in space. (submitted by EllPeaTea)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

New consortium will study space plane. A UK-based space and defense consultant group, Frazer-Nash, will lead a program to design a vehicle and its integrated systems with the goal of building and flying a Mach 5-capable aircraft at the edge of space by early 2031. This so-called INVICTUS program was funded with a 7 million-euro grant from the European Space Agency and is seen as a stepping stone toward developing a reusable space plane that takes off and lands horizontally from a runway.

Seeking to lead a new era of flight … Over 12 months, INVICTUS has been tasked to deliver the concept and elements of the preliminary design of the full flight system. It will attempt to demonstrate the efficacy of hydrogen-fueled, precooled air-breathing propulsion at hypersonic speeds, technology that will ultimately enable horizontal take-off. “With INVICTUS, Europe is seizing the opportunity to lead in technologies that will redefine how we move across the planet and reach beyond it,” said Tommaso Ghidini, head of the Mechanical Department at the European Space Agency. (submitted by Jid)

ESA backs North Sea launch site. A private company developing a launch site in the North Sea, EuroSpaceport, has secured support from the European Space Agency. The company, founded five years ago, is developing a sea-based launch platform built on a repurposed offshore wind turbine service vessel, European Spaceflight reports. Rockets are envisioned to launch from a position 50 to 100 km offshore from the port of Esbjerg, in Denmark.

Seeing the forest for the trees … On Wednesday, EuroSpaceport announced that it had signed an agreement with the European Space Agency and Polish rocket builder SpaceForest to support the first launch from its Spaceport North Sea platform. The company will receive support from the agency through its Boost! Program. SpaceForest has been a recipient of Boost! funding, receiving 2.4 million euros in October 2024. SpaceForest said the mission will be used to verify the launch procedures of its Perun rocket under nominal suborbital conditions. (submitted by EllPeaTea)

Amazon and SpaceX, best frenemies? Maybe not, but for the time being, they appear to be friends of convenience. A Falcon 9 rocket launched from Florida’s Space Coast early on Wednesday with a batch of Internet satellites for Amazon’s Project Kuiper network, thrusting a rival one step closer to competing with SpaceX’s Starlink broadband service. With this launch, Amazon now has 78 Kuiper satellites in orbit, Ars reports. The full Kuiper constellation will consist of 3,232 satellites to provide broadband Internet service to most of the populated world, bringing Amazon in competition with SpaceX’s Starlink network.

Launch is not cheap … Kuiper is an expensive undertaking, estimated at between $16.5 billion and $20 billion by the industry analytics firm Quilty Space. Quilty has concluded that Amazon is spending $10 billion on launch alone, exceeding the company’s original cost estimate for the entire program. Amazon has booked more than 80 launches to deploy the Kuiper constellation, but the company didn’t turn to SpaceX until it had to. A shareholder lawsuit filed in 2023 accused Amazon founder Jeff Bezos and the company’s board of directors of breaching their “fiduciary duty” by not considering SpaceX as an option for launching Kuiper satellites. The plaintiffs in the lawsuit alleged Amazon didn’t consider the Falcon 9 due to an intense and personal rivalry between Bezos and SpaceX founder Elon Musk. Amazon bowed to the allegations and announced a contract with SpaceX for three Falcon 9 launches in December 2023 to provide “additional capacity” for deploying the Kuiper network.

NASA targets end of July for Crew-11. NASA said Monday that it and SpaceX were targeting July 31 for the flight of SpaceX’s Crew-11 mission to the orbiting outpost, Spaceflight Now reports. The mission is led by NASA astronaut Zena Cardman. She will be flying along with fellow NASA astronaut Mike Fincke, Japan Aerospace Exploration Agency (JAXA) astronaut Kimiya Yui and Roscosmos cosmonaut Oleg Platonov.

Pushing Dragon reuse … The mission was moved up from its previously planned August launch window to create more room in the manifest for the arrival of the Cargo Dragon flying the CRS-33 mission. That Dragon will perform a boost of the space station as a demonstration of some of the capabilities SpaceX will use on its US Deorbit Vehicle currently in work. Crew-11 will fly to the orbiting outpost on Crew Dragon Endeavour, which will be its sixth trip to the ISS. This will be the first Crew Dragon spacecraft to fly for a sixth time.

SpaceX won’t use Johnston Atoll for rocket cargo tests. Johnston Atoll, an unincorporated US territory and Pacific island wildlife refuge with a complicated military history, will no longer become a SpaceX reusable rocket test site, Popular Science reports. “The Department of the Air Force has elected to hold the preparation of the Johnston Atoll Environmental Assessment for a proposed rocket cargo landing demonstration on Johnston Atoll in abeyance while the service explores alternative options for implementation,” Air Force spokesperson Laurel Falls said.

Taking a toll on the atoll … Located roughly 860 miles southwest of Hawaii, Johnston Atoll has served as a base for numerous US military operations for over 90 years. Despite this, the atoll remains a home for 14 tropical bird species as part of the Pacific Remote Islands Marine National Monument. The site had been under consideration for tests as part of a military program to deliver cargo around the planet, using suborbital missions on rocket such as SpaceX’s Starship vehicle. The Johnston Atoll plans included the construction of two landing pads that were met with public backlash from wildlife experts and indigenous representatives. (submitted by Tfargo04)

Blue Origin confirms ESCAPADE is up next. On Thursday, Blue Origin said on social media that the second launch of its New Glenn rocket will carry NASA’s ESCAPADE mission as its primary payload. This launch will support ESCAPADE’s science objectives as the twin spacecraft progress on their journey to the Red Planet. Also onboard is a technology demonstration from @Viasat in support of @NASASpaceOps’ Communications Services Project.

Left unsaid was when the launch will occur … The social media post confirms a report from Ars in June, which said the ESCAPADE spacecraft was up next on New Glenn. Previously, the company has said this second launch will take place no earlier than August 15. However, that is less than one month away. Late September is probably the earliest realistic launch date, with October or November more likely for the second flight of the company’s large rocket.

Next three launches

July 19: Falcon 9 | Starlink 17-3 | Vandenberg Space Force Base, California | 03: 44 UTC

July 21: Falcon 9 | O3b mPOWER 9 & 10 | Cape Canaveral Space Force Station, Florida | 21: 00 UTC

July 22: Falcon 9 | NASA’s Tandem Reconnection and Cusp Electrodynamics Reconnaissance Satellites | Vandenberg Space Force Base, California | 18: 05 UTC

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Rocket Report: SpaceX won’t land at Johnston Atoll; new North Sea launch site Read More »

the-pixel-watch-4-might-not-become-e-waste-if-you-damage-it

The Pixel Watch 4 might not become e-waste if you damage it

This would be an important upgrade, even if it’s not flashy. That glass dome on Google’s watches is just begging to get cracked or scratched—it’s standard Gorilla Glass rather than the sapphire glass you see on Apple and Samsung watches. For the first three models, a cracked screen or dissolving glue on the bottom has meant that the watch was effectively e-waste. Even if you were willing to pay Google to repair it, that was not an option.

Google never had a satisfactory answer for this approach to wearables. In lieu of repairs, it offered a protection plan, which guaranteed you a replacement watch if yours was damaged. For $4 per month and $49 per incident, Google would simply send a new watch. And it would actually be new; there are no officially refurbished Pixel Watches because Google doesn’t repair them.

Not supporting repairs was a bizarre decision for a company that so often promotes its commitment to sustainability. Google has, at times, fallen short of those ideals, perhaps most notably with the defective batteries in its A-series Pixel phones. But that’s at least theoretically an unforeseeable outcome. Google intentionally designed the Pixel Watches such that they could not be repaired.

Google should never have released a single watch that couldn’t be repaired, let alone three of them. Hopefully, this report is accurate, and Google will right this wrong with the Pixel Watch 4.

The Pixel Watch 4 might not become e-waste if you damage it Read More »

everything-we-learned-from-a-week-with-apple-carplay-ultra

Everything we learned from a week with Apple CarPlay Ultra


CarPlay Ultra takes over the main instrument display as well as the infotainment.

Aston Martin dashboard showing CarPlay ultra logo

Aston Martin is the first automaker to adopt Apple’a CarPlay Ultra, which takes over all the displays in the car. Credit: Michael Teo Van Runkle

Aston Martin is the first automaker to adopt Apple’a CarPlay Ultra, which takes over all the displays in the car. Credit: Michael Teo Van Runkle

For the 2025 model year, Aston Martin’s user interface took a major step forward across the lineup, with improvements to the physical controls and digital infotainment, as well as updated gauge cluster layouts. However, the big news dropped in the spring, when Aston and Apple announced the launch of CarPlay Ultra, the next generation of Apple’s nearly ubiquitous automotive operating system.

Ultra extends beyond the strictly “phone” functions of traditional CarPlay to now encompass more robust vehicular integration, including climate control, drive modes, and the entire gauge cluster readout. Running Ultra, therefore, requires a digital gauge cluster. So far, not many automakers other than Aston have signaled their intent to join the revolution: Kia/Hyundai/Genesis will adopt Ultra next, and Porsche may come after that.

Before future partnerships come to fruition, I spent a week with a DB12 Volante to test Ultra’s use cases and conceptual failure points, most critically to discover whether this generational leap actually enhances or detracts from an otherwise stellar driving experience.

Setup

The following gallery will take you through the setup process. Michael Teo Van Runkle

Connecting to Ultra via Bluetooth takes a minute or two longer than traditional CarPlay and includes more consent screens to cover the additional legal ramifications of the operating system sharing data with the car, and vice versa. Apple restricts this data to multimedia info, plus real-time speed and engine status, vehicle lights, and similar functions. Specifically, neither the iPhone nor third-party apps store any vehicle data after disconnecting from the car, and the car doesn’t keep personal data once the iPhone disconnects, either.

What about Siri? I generally keep Siri turned off so that accidental “Hey, Siri” activations don’t constantly interrupt my life—but by pushing the DB12’s steering wheel button, I could test simple tasks that went just about as well as typical for Siri (read: don’t expect much “Apple Intelligence” quite yet). Standard Siri data sharing with Apple therefore applies when used with Ultra.

I tested Ultra with an iPhone 16 Pro, but the software requires an iPhone 12 or newer and the latest iOS 18.5 update. As a type of simple failure exercise, I turned my phone off while driving more than once. Doing so reverts both the gauge cluster and infotainment screen to Aston’s native UI, the former almost instantly and the latter just a few seconds later. However, once I turned my phone back on, I struggled to reactivate either traditional CarPlay or Ultra until I forgot the device in my Bluetooth settings and started over from scratch. This held true for every attempt.

We didn’t love the fact that there was some latency with the needles on the dials. Michael Teo Van Runkle

Once initiated, though, Ultra fired up straightaway every time. Much faster than the typical lag to boot up traditional CarPlay. In fact, as soon as I unlocked the doors but before entering the DB12, the gauge cluster showed Ultra’s Apple-style readouts. These configurable designs, which Apple developed with Aston’s input, include a classic analog-style gauge view as well as layouts that allow for minimized data, navigation, and stylistic choices selectable through the center console screen or by swiping the haptic button on the DB12’s steering wheel.

Call me old-fashioned, but I still enjoy seeing a tachometer, speedometer, drive modes, and fuel level versus range remaining and a digital speed—especially on an engaging performance vehicle like the DB12 Volante. Apple might be skilled at making new tech easy to use, but it’s hard to beat the power of millions of minds adapting to analog gauges over the past century or so. And in this case, Ultra’s tach(s) showed a bit of latency or lag while ripping that 671-hp twin-turbo V8 up through the revs, something I never noticed in the native UI.

It’s much more holistic now

Ultra’s biggest improvements over preceding CarPlay generations are in the center console infotainment integration. Being able to access climate controls, drive modes, and traction settings without leaving the intuitive suite of CarPlay makes life much easier. In fact, changing between drive modes and turning traction control off or down via Aston’s nifty adjustable system caused less latency and lagging in the displays in Ultra. And for climate, Ultra actually brings up a much better screen after spinning the physical rotaries on the center console than you get through Aston’s UI—plus, I found a way to make the ventilated seats blow stronger, which I never located through the innate UI despite purposefully searching for a similar menu page.

There are different main instrument UIs to choose from, like this one. Michael Teo Van Runkle

Some specific functions do require dipping out of Ultra, though, including changing any audio settings for the spectacular Bowers & Wilkins sound system. I also found two glitches. Trying to bring down the DB12 Volante’s convertible top cued up a “Close trunk separator” alert, but the only way to close the trunk separator is via the same button as the convertible top. So instead, the windows only went up and down repeatedly as I tried to enjoy open-top motoring. This happened both in Ultra and without, however, so it could just be an Aston issue that Ultra couldn’t fix.

Plus, over the course of my eight days with Ultra, I experienced one moment where both the infotainment and gauge cluster went totally black. This resembled GM’s Ultium screen issues and lasted about 30 seconds or so before both flickered to life again. At first, I suspected an inadvertent attempt to activate nighttime driving mode. But again, this could have been an Aston issue, an Apple issue, or both.

Running around Los Angeles, I never found a spot with zero reception (I run e-sims, both Verizon and AT&T simultaneously, for this very reason), but I did purposefully enter airplane mode. This time, Ultra stayed active, and regardless, Apple assured me that essential functions, including navigation, can pre-load offline data for planned route guidance. But at the very worst, as with the phone turning off or battery dying, Ultra can simply revert to the onboard navigation.

Using Ultra regularly seemed to deplete my iPhone’s battery slightly more quickly than normal, and I noticed some warming of the iPhone—though without a controlled experiment, I can’t say with certainty whether these two symptoms happened quicker than simply running traditional CarPlay or Bluetooth. And in reality, most cars running Ultra (for Aston and beyond) should come equipped with wireless charge pads and plenty of USB-C ports anyhow to keep those batteries topped up. On hot summer days in LA, though, my iPhone seemed to get warmest while using inductive charging and Ultra simultaneously, to my admittedly unscientific touch.

Apple Maps is the only map that is allowed to go here in CarPlay Ultra. Michael Teo Van Runkle

For commuters who brave traffic using Advanced Driver Assistance Systems (ADAS), Ultra seemed to work smoothly with the DB12’s lane departure warnings, steering corrections, and adaptive cruise control—though I typically turn all this off via Aston’s handy single button, which helps to stave off frustration. This introduces a loophole or gap in regulations, however, whether CarPlay Ultra needs to meet the ISO’s ASIL-D standards or achieve some kind of National Highway Traffic Safety Administration certification.

Traditional CarPlay stuck with infotainment and basic “phone” functions, but now that the iPhone essentially accesses and displays ADAS, drive modes, and traction setting information, where does regulated consumer safety come in? And where does liability rest, in the event of a driver aid or corrective maneuver going awry? Somehow, this question seems most likely to wind up on the desk of an insurance adjuster sooner rather than later.

Can we try it in an EV?

For me, some disappointment arose from being unable to cue up either Waze or Google Maps in Ultra’s gauge cluster navigation screens rather than strictly Apple Maps. But in many ways, I suspect that Ultra might work even better when (or if) Hyundai/Kia/Genesis introduce compatible EVs, rather than Aston’s (so far) more classic ICE vehicles. And not just because the modern futurist aesthetic matches better, either, but more so thanks to the improved accuracy of range, charging, and navigation features.

The center infotainment screen’s integration with vehicular functions, therefore, stands out as much more of a pro for Aston Martins than Ultra’s gauge cluster readout, enhancing the driving experience through a more intuitive UI that decreases time spent glancing away from the road. For those who want to skip out on Ultra, it’s also worth noting that the iPhone allows for the choice to stick with traditional CarPlay only as well. However, I suspect car buyers will eventually begin to expect Ultra, even if the added jump to vehicular control represents somewhat less of a massive leap than simply picking between models equipped with CarPlay or not.

It’s unclear whether other automakers will find the advantages worthy of converting to Ultra, including Rivian, which offers neither CarPlay nor Android Auto, or GM, which skipped out on CarPlay for EVs. On the other hand, automakers may also decide to hesitate before handing over further control to Apple now that the Apple Car is officially dead. And in that regard, Ultra might just represent the final straw that inspires further improvements to proprietary user interfaces across the industry as well.

Everything we learned from a week with Apple CarPlay Ultra Read More »

kimi-k2

Kimi K2

While most people focused on Grok, there was another model release that got uniformly high praise: Kimi K2 from Moonshot.ai.

It’s definitely a good model, sir, especially for a cheap-to-run open model.

It is plausibly the best model for creative writing, outright. It is refreshingly different, and opens up various doors through which one can play. And it proves the value of its new architecture.

It is not an overall SoTA frontier model, but it is not trying to be one.

The reasoning model version is coming. Price that in now.

Introducing the latest model that matters, Kimi K2.

🚀 Hello, Kimi K2! Open-Source Agentic Model!

🔹 1T total / 32B active MoE model

🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models

🔹Strong in coding and agentic tasks

🐤 Multimodal & thought-mode not supported for now

With Kimi K2, advanced agentic intelligence is more open and accessible than ever. We can’t wait to see what you build!

API is here: https://platform.moonshot.ai

– $0.15 / million input tokens (cache hit)

– $0.60 / million input tokens (cache miss)

– $2.50 / million output tokens

[Tech blog here, weights & code here, Github here.]

Try it now at http://Kimi.ai or via API!

Simeon: These costs 👀

K2 is based on the Muon optimizer, so it’s a unique offering. There were claims that the method would not scale or would be unstable, Kimi seems to have proven this false.

K2 takes DeepSeek’s extreme mixture of experts (MoE) with 671B total parameters and goes a bit further, taking the total size to 1T.

Despite that size you can get it running on Groq, Teortaxes reports you can get it to 185 tokens/second there at full context, and Aarush Sah says they then made it even faster than that.

By all accounts Kimi K2 is excellent for its size and cost, and at least competitive with DeepSeek’s v3, with many saying K2 is clearly ahead.

Presumably a reasoning model is coming. Please adjust your expectations (and if desired your stock portfolio) in advance of that event, and do not lose your head if they release an app with it and it gets popular for a time. Remember all the ways in which the DeepSeek Moment was misleading, and also the underreaction to v3. We do not want another massive overreaction to the wrong news.

I also once again warn against saying a release means a lab or country has ‘caught up’ if, at the time of the release, there are some aspects where the model is state of the art. There are those who actively prefer Kimi K2 over other models, even without reference to cost, especially for purposes related to creative writing. I can totally believe that the new method is excellent for that. A remarkable achievement. But keep that achievement in perspective.

Once again, an impressive result was made on the cheap by a modest team.

Teortaxes: Kimi is 200 people, very few of them with “frontier experience”, a platform (but you can buy such data) and a modest GPU budget. In theory there are many dozens of business entities that could make K2 in the West. It’s telling how none did. Not sure what it’s telling tho.

DeepSeek has redefined the LLM landscape, R1-0528 is substantially better than R1, V4 will redefine it again most likely.

Kimi will keep releasing strong models too.

My guess is that we primarily don’t do it because we don’t do it, but also because restrictions breed creativity and we don’t have to do it, and because we don’t have the incentive, or especially the felt incentive, to do it.

As in, if you are in China, then building a cheap (to train, and to run) model is on top of a short list of candidates for The Thing You Do in the space. Then you release it, with a basic clean implementation, and let others worry about features. A huge part of the motivation behind releasing these models is national prestige and national competition. Everyone around you is egging you on as is the government. That is a highly asymmetrical motivation.

Whereas in America, you could try to do that, but why would you? If you can do this, you can get a better valuation, and make more money, doing something else. The profit margins on the ultimate offering are very low and usually zero. Your lunch could get eaten by a top lab at any time, since ultimately no one cares what it cost to train the model, and your lunch will expire quickly regardless. If you are one of the cracked engineers that would join such a team, you’ll get a better offer to join a different team doing something else. Even if you got close you’d likely do better getting acqui-hired. There’s no need to skimp on compute.

It will be interesting to see how well OpenAI does when they release an open model.

Some basic ones:

Lech Mazur put Kimi through his paces. It did lousy on hallucinations, thematic generalization and extended word connections, and downright terribly in the elimination game of social skills. The system isn’t tuned for that sort of thing, but on short-story creative writing it is the new champion.

Harvard Ihle is there with WeirdML, it does well for its price point as a non-reasoning open model, although grok-3-mini (high) is cheaper and scores higher, and r1-0528 keeps the open model high score. But this metric favors reasoning models so there’s a lot of room to improve here by adding reasoning.

This isn’t a benchmark, but it also sort of is one and it’s pretty cool:

Hardmaru: Every ML Engineer’s dream loss curve:

“Kimi K2 was pre-trained on 15.5T tokens using MuonClip with zero training spike, demonstrating MuonClip as a robust solution for stable, large-scale LLM training.”

Paper Abstract: Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven.

We identify two crucial techniques for scaling up Muon: (1) adding weight decay and (2) carefully adjusting the per-parameter update scale.

These techniques allow Muon to work out-of-the-box on large-scale training without the need of hyper-parameter tuning. Scaling law experiments indicate that Muon achieves computational efficiency compared to AdamW with compute optimal training.

Aravind Srinivas (CEO Perplexity): Kimi models are looking good on internal evals. So we will likely to begin post training on it pretty soon. Congrats to @Kimi_Moonshot for delivering an incredible model.

Renji the whale maximalist: Kimi K2 is mindblowing. Holy fucking crap.

Did they really not even do any RL yet?

I can’t even believe how good it is.

What’s the main reason why it’s so good? Muon?

So far I’ve just tried general purpose tasks / creative writing / educational explanations. Does way better than even o3 and Gemini 2.5 pro so far.

Teortaxes: well they obviously did RL, maybe even another GRPO++ just not long-CoT. Let’s not allow this confusion to spread, I’ve had enough of «MoE from 4 finetuned experts» meme

Renji: Yup, my mistake. It definitely has RL.

Viemccoy: I think Kimi might actually be my new favorite model. Her vocabulary is off the charts, good epistemics, excellent storyteller, plays along but maintains good boundaries. There’s something very, very special here. I actually think this is a much bigger deal than most realize.

Grist: been having a blast with kimi.

love to seed a snippet or idea then be the token courier for r1 and kimi. back and forth. enjoy the little worlds they build with a little bit of organic slop i offer them.

John Pressman: Kimi K2 is very good. I just tried the instruct model as a base model (then switched to the base model on private hosting) and mostly wanted to give a PSA that you can just ignore the instruction format and use open weights instruct models as base models and they’re often good.

Teortaxes: For a wide range of tasks, K2 is probably the cheapest model by far right now, in terms of actual costs per task. It is just cheap, it has no long-CoT, and it does not yap. This is very refreshing. Like the best of Anthropic models, but cheaper and even more to the point.

Hannes: Interesting. For me it keeps inventing/hardcoding results and curves instead of actually running algorithms (tried it on unit square packing). Extremely high sycophancy in first 90 minutes of testing.

Teortaxes: It’s overconfident.

Hasan Can: Kimi K2 is definitely a good model, its world knowledge is on par with sota closed source models. It passed all my odd knowledge questions that aren’t in benchmarks. Next up is coding.

Eleventh Hour: Need more time with it, but it has weirdly Opus3-like themes so far.

Deckard: It’s on par with gpt4base. Enormous potential to allow the public to experiment with and explore SOTA base models – much lower probability of falling into a synthetic training data generator basin compared to llama. requires more skill to use than gpt4base.

Also it really seems to have a breadth of very precise and high resolution knowledge of the human information landscape.

Dominik Lukes: I almost didn’t bother – yet, another open model from China – what a yawn! But, no. This one is different. o3 feels on agentic choices (and the occasional lying) along with Claude 4 feels on coding and league of its own on writing.

Still, many gaps in performance – feels last gen (as in Claude 3-level) on some multilingual and long-context tasks.

Will be exciting to see what happens when they add reasoning and multimodal capabilities.

And can’t wait for the distills and finetunes – should be fun.

Tim Duffy: Smart model with a unique style, likely the best open model. My one complaint so far is that it has a tendency to hallucinate. A couple times it happened to me in the QT.

[From QT]: While in a conversation with Claude, Kimi K2 claims that they were asked by a Chinese student to justify the Tienanmen Square crackdown. Interesting as a hallucination but also for the forthright attitude.

Hrishi (video at the link): Kimi is the real deal. Unless it’s really Sonnet in a trench coat, this is the best agentic open-source model I’ve tested – BY A MILE.

Here’s a sliceof a 4 HOUR run (~1 second per minute) with not much more than ‘keep going’ from me every 90 minutes or so.

The task involved editing multiple files, reading new context, maintaining agentic state (not forgetting where you were or forgetting instructions). This is a repo with included prompts, notes, plans, lots of things to mistake as instructions and be poisoned by.

Tyler Cowen simply asked ‘Kimimania?’ and the comments section was generally impressed by its performance.

There were only a few places people reported being a bit let down, other than by it not yet being a reasoning model.

Echo Nolan: Failed my little private eval, a complex mathematical reasoning task based on understanding the math in a paper. Very stubborn when I tried to gently point it in the right direction, refused to realize it was wrong.

Leo Abstract: t bombed my private eval and could not be walked through it, but it humbly admitted fault when shown. did better on chinese-related subtests. overall i like that it’s less cringing and ‘glazing’, though.

Kromen: I have a suspicion a model extensively trained on o3 synthetic data.

Some very similar quirks.

deckard: Yeah big o3 vibes in terms of making shit up.

Open and cheap and unique and new and pretty good is a great combination, also note the very low market share here for xAI and also for OpenAI. This isn’t overall market share, it’s in a very specific context, but Kimi is definitely breaking through.

OpenRouter: Moonshot AI has surpassed xAI in token market share, just a few days after launching Kimi K2

🎁 We also just put up a free endpoint for Kimi – try it now!

Also this is another case where one should compare cost or compute, not tokens, since different models use radically different amounts of compute and have different orders of magnitude of cost. Anthropic’s share of tokens here represents quite a lot of the compute and dollars spent.

I see exactly why Teortaxes predicted this, yet so far I haven’t seen the reports of shortfalls, although various third-party benchmarks make it clear they are there:

Teortaxes: I predict that in a few days we’ll see reports on many stubborn shortfalls of K2 and a certain disenchantment. They don’t have a lot of experience at this level; it’ll become clear that the good old 0324 has it beat for many usecases. That’s fine. They’ll improve.

Sam Peach: Kimi-K2 just took top spot on both EQ-Bench3 and Creative Writing!

Another win for open models. Incredible job @Kimi_Moonshot

It’s edging out o3 at the top there, followed by Opus, R1-old and then Sonnet. R1-0528 is solid but does substantially worse. Here’s EQ-Bench 3:

Given how other models score on these benchmarks, this appears meaningful.

I find ‘coherent’ rather funny as a greatest weakness. But hey.

Here’s the (a little too narrow?) slop test, as in ‘not x, but y.’ Lower is better.

Lech Mazur has it taking the #1 spot over o3, Gemini 2.5 Pro and Claude Opus in Short-Story Creative Writing.

Lech Mazur: Across all six tasks, Kimi K2’s strengths are unmistakable: the model displays a sophisticated command of literary craft, consistently delivering stories that are lush with metaphor, structurally cohesive, and often thematically ambitious. Its greatest assets are its ability to integrate disparate prompts with apparent ease, weave objects and symbols into layered narrative functions, and compress complex ideas into tight, resonant pieces. The prose frequently aspires to—and sometimes achieves—publication-level lyricism, earning consistent praise for inventive metaphors, subtextual depth, and the purposeful unity of assigned elements.

However, these technical strengths are mirrored by several persistent, interconnected weaknesses. Kimi’s writing is often hampered by an overreliance on abstraction, ornamented metaphor, and poetic language that, while impressive, can overwhelm narrative clarity and blunt emotional impact.

Characters frequently serve as vehicles for theme or plot, lacking the idiosyncratic humanity and “messy” believability that define memorable fiction. Emotional arcs are apt to be summarized or symbolically dramatized rather than fully earned through concrete, lived experience—stories often reach for catharsis but settle for a tidy, intellectual satisfaction.

Similarly, plots and resolutions risk neatness and convenience, with endings that are more structural than surprising or hard-won. World-building flourishes, but sometimes at the expense of organic logic or clarity, resulting in “atmospheric wallpaper” rather than truly lived-in settings.

A recurring critique is the model’s “perfectionism”: stories rarely fail structurally and are rarely inept, but this very competence can sterilize the work, creating narratives that feel like artful answers to a prompt instead of necessary, lived stories. The result is a corpus of fiction that demands admiration for its craft but too often holds the reader at arm’s length—heady rather than affecting, elegant rather than unforgettable.

In summary:

Kimi K2 excels at literary compression, metaphorical invention, and unifying disparate elements, establishing a high technical baseline. But without risking mess, ambiguity, and emotional friction, it tends to “tell” its meaning rather than let it bloom naturally, ultimately producing stories that are admirable, sometimes moving, but rarely vital or transformative.

Those are important weaknesses but we’ve definitely reached ‘horse can talk at all’ territory to get to this point.

xl8harder: I had the impression that Kimi K2 uses a better, more diverse vocabulary than I was used to seeing, so I ran a quick linguistic diversity analysis on the SpeechMap data, and yep, Kimi K2 has the top score.

Method; I lemmatize the responses, and then for each response I calculate both root TTR and Maas index (two linguistic diversity metrics that control for response length) and average them together for each model.

Kimi K2 got top score on both metrics.

[More details in thread.]

Surprisingly, Sonnet didn’t make the top 30. First was opus 4 at 67. I’m not sure what explains this, because I have the perception of claude models as being quite good with language. Though perhaps not so much in generic assistant-y requests?

It’s a strange metric. Gemma-3 does remarkably well and better than Gemini-2.5-Pro.

John Pressman: So what stands out to me about [Kimi K2]. Is that it doesn’t do the thing language models normally do where they kind of avoid detail? Like, a human will write about things using specific names and places.

And if you pay close attention to LLM writing they usually avoid this. It’s one of the easiest ways to spot LLM writing. This model emphatically *does nothave this problem. It writes about people and events with the rich detail characteristic of histories and memoirs. Or fictional settings with good worldbuilding.

Doomslide: How beautiful it is to get public confirmation that optimizers with different targets actually produce different minds. Muon effectively optimizes for solutions that “restrict to spheres” (tho in practice it doesn’t quite). What if this is just strictly better.

Leo Abstract: Its writing reminds me of deepseek. something interesting going on with the training data they’re using over there.

My instinctive guess is it is less about what data is being used, and more what data is not being used or what training isn’t being done.

Another hypothesis is that the bilingual nature of Chinese models makes them, if not better, at least different, and when you’re used to an ocean of slop different is great.

Zeit: Matches my impression so far:

Difficult Yang: You know why people think Kimi K2 doesn’t sound like “botslop”? It’s because it’s… how should I put it… it’s very Chinese English (not in the Chinglish way… it’s hard to describe).

Perhaps the most accessible analogy I have is the first time you read Xianxia in English it feels so fresh, it feels so novel, the attitudes and the writing are so different than what you’ve read before.

And then you read your second and your third and you’re like “oh wait, this is just its own subculture with its own recognizable patterns.”

xl8harder: I’ve wondered if the bilinguality of these models has any durable effect. Are you saying that, or that it’s in the curation of post training data, etc?

Difficult Yang: The most straightforward explanation is it is RLHF induced. But I don’t actually know.

Hieu Pham: Yes. Exactly my take. Glad someone else feels the same way. I read Zhu Xian in Vietnamese and some chapters in English. K2’s answers feel similar.

Teortaxes: Makes sense.

A lot of what makes a hack writer a hack writer is that they keep doing the same things over and over again, and eventually everyone is in some sense a hack. So having a different writer can be a breath of fresh air even if they are a hack.

You could kind of say that any given author or model, or almost any other form or genre of creative work, has a ‘time to slop,’ before a reader sees the patterns. And different variations use up different amounts of that ‘time to slop’ for others, and the American models all sound the same so they all burn that fuse together.

There is still very much better and worse, some things really are slop and some things really aren’t. I am inclined to believe that Kimi K2 is doing something fundamentally ‘less slop-like,’ but also I am guessing a lot of this is that it is different, not only via being Chinese and culturally different but because it was trained differently, and thus it feels fresh and new.

Right now we have 10,000 outputs, all the same. If can we can instead get 10,000 outputs, all different, perhaps we’d have something.

We will continue to see what Kimi K2 can do, how best to use it, what its weaknesses are, and how much of its refreshing nature is being better in places versus being different. It is too early, and I haven’t had time with it directly.

Presumably Kimi will use this to create a reasoning model. If they don’t, there’s nothing stopping someone else from doing so instead. So far we’ve seen a remarkable lack of independent reasoning model conversions, but they’re remarkably cheap to do.

We will also see what other labs can do now that this architecture has been proven. What could OpenAI, Google, Meta or xAI do if they copied these methods but used orders of magnitude more compute? If they integrated this into what they already do? If they used this as part of a MoE? I presume we will find out.

Discussion about this post

Kimi K2 Read More »

donkey-kong-bananza-is-a-worthy-successor-to-super-mario-odyssey’s-legacy

Donkey Kong Bananza is a worthy successor to Super Mario Odyssey’s legacy


D-K… donkey kong is here!

Cathartic, punch-fueled land destruction is a great showcase for Switch 2 hardware.

Screenshots you can feel. Credit: Nintendo

Screenshots you can feel. Credit: Nintendo

When the Switch 2 was fully unveiled back in April, we weren’t alone in expecting the announcement of a true follow-up to Super Mario Odyssey—one of the original Switch’s best-selling games and our pick for the best game of 2017. Instead, we got our first look at Donkey Kong Bananza, the big ape’s first fully 3D adventure since the Rare-developed Donkey Kong 64 in 1999.

The fact that Nintendo wasn’t willing to commit its longstanding plumber mascot to its first first-party platformer on the Switch 2 could have been seen as a sign of a rushed, second-tier spin-off effort. After playing through Donkey Kong Bananza, though, I’m happy to report that nothing could be further from the truth for this deep and worthy spiritual successor to Super Mario Odyssey (from many of the same development staff). Donkey Kong Bananza captures the same sense of joyful movement and exploration as the best Mario games while adding an extremely satisfying terrain-destruction system that shows off the capabilities of the Switch 2 hardware.

Beat up the earth

It’s that terrain-destruction system that sets Donkey Kong Bananza apart from previous 3D platformers from Nintendo and others. Three of the four face buttons on the Switch 2 controllers are devoted to letting Donkey Kong punch either horizontally, upward, or downward, often taking out large chunks of the nearby scenery as he does.

Take that, rock!

Credit: Nintendo

Take that, rock! Credit: Nintendo

Punching through the terrain in this manner forms the fast, crunchy, and powerfully kinetic core of the game. It’s hard to overstate how incredibly cathartic it can be to quickly reduce a well-ordered chunk of dirt and rock into a mountain of valuable, collectible golden rubble (then gathering up all the nearby rubble with a quick tap of a shoulder button). Imagine a 3D Mario game by way of Traveller’s Tales Lego games, and you’ll have some idea of the extremely satisfying combination on offer here.

The semi-persistent changes in scenery also do a good job of highlighting the Switch 2’s hardware, which doesn’t seem to drop a single frame, even as the rubble flies and the ground’s shape morphs under Donkey Kong’s persistent punching. That extra hardware power also lends itself to some nice graphical touches, from the mirror-like shine on a pile of golden rubble to the gentle movement of fur that rustles in the breeze.

I get around

Donkey Kong can also pick up chunks of terrain, using them as impromptu melee weapons or hurling them to destroy far-off enemies, obstacles, or key switches. The aiming-and-throwing controls for this terrain-throwing system are just clunky enough to be annoying—this is a far cry from Gears of Donkey Kong or something. Still, the interactions between different types of hurled terrain end up forming the root of many interesting situational puzzles—throwing some snow to harden sections of a harmful lava lake into a solid platform, for instance, or using a chunk of explosive rock to destroy an otherwise impervious spiky enemy.

When you’re not tearing up the scenery to your benefit, simply getting around in Donkey Kong Bananza is a joy. Donkey Kong Country fans will be happy to know the classic roll is back and can be used to help extend jumps or quickly change mid-air direction (a la Cappy from Mario Odyssey). Donkey Kong can also slide along on chunks of terrain in a zippy, madcap land-surfing mode that’s wonderfully difficult to control effectively. The ability to climb along the edge of most surfaces adds a layer to the vertical gameplay dimension that doesn’t rely on precision jumping and which is utilized well to hide some of the game’s more out-of-the-way secrets.

This Kong’s got a funny face…

Credit: Nintendo

This Kong’s got a funny face… Credit: Nintendo

As the game progresses, you’ll also unlock a handful of animalistic “Bananza” transformations from a menagerie of gigantic animal DJs (don’t ask). These temporarily grant DK new powers—a quick-dashing Zebra or a fluttering, hovering ostrich, for instance. The game builds some specific gatekeeping challenges around each transformation, of course, but the extra locomotion options become a welcome part of your locomotion toolbelt when simply exploring generic areas.

Running around and smashing up the world isn’t all joy, though. Problems arise when you dig into thick patches of dirt, crafting a narrow, Kong-sized tunnel surrounded by opaque earth. The camera system does its best to deal with these tricky scenarios, making the ground opaque and highlighting only the notable features around you. Still, it’s easy to lose track of where your digging has taken you and how to get back to the surface, especially when the best way out of a jam is to “dig up, stupid.”

Oooh, Banana!

All this terrain destruction and digging is in service of the game’s primary goal: collecting a bunch of giant bananas. These are roughly as plentiful as the Power Moons scattered across Super Mario Odyssey and roughly as varied in their availability. Some sit out in the open, waiting to be stumbled on. Others are hidden in some of the game’s most out-of-the-way underground crevices and practically require the use of collectible in-game treasure maps to find. Many are hidden in elaborate challenge rooms that test your precision platforming, terrain destruction, or combat skills.

Unlike the Power Moons in Mario Odyssey, though, hunting down bananas is largely optional to progress down the succession of elaborate, wide-open, high-ceilinged layers (read: “levels”) on a quest toward the planet’s core. Instead, bananas are primarily used to unlock upgrades in a surprisingly deep skill tree or grant DK more health, more punching power, or longer Bananza transformations. Other collectibles can be used to buy stylish and protective outfits to further increase DK’s endurance.

You’d be forgiven for not believing that these large explorable “layers” are supposed to be underground.

Credit: Nintendo

You’d be forgiven for not believing that these large explorable “layers” are supposed to be underground. Credit: Nintendo

These upgrades provide ample incentive to go off the beaten path for those who like exploring and dozens of hours of enjoyable challenges for completionists to delve into after the credits roll. But the game’s structure also allows skillful and/or impatient players to zip to the game’s conclusion quite quickly, rushing through the visually inventive bosses that guard the game’s major chokepoints.

Those who rush, though, may end up struggling with the game’s final gauntlet of challenges, which quickly ramp up the difficulty while re-introducing some classic DK enemies (that we aren’t allowed to say more about at the moment).

Wait, that kid is Pauline?

Thus far, we’ve avoided talking about the ridiculously convoluted plot the game builds around Donkey Kong’s quest for bananas and the evil corporate forces that want to stop his journey deep into the planet’s core. The game’s underground world is populated with all sorts of talking animals, sentient rocks, and familiar Kong faces to assist DK or ask him for help with various ridiculous errands. They’re cute, but their chatter is more or less ignorable.

The reimagined Pauline is an adorable addition to the lineup.

Credit: Nintendo

The reimagined Pauline is an adorable addition to the lineup. Credit: Nintendo

The main exception is Pauline, the damsel-in-distress from the original Donkey Kong, recast here as a precocious child working with DK to find a way back to her home on the surface. Pauline’s effort to overcome inherent stage fright and embrace the magical power of her singing voice was surprisingly touching. That’s largely thanks to a winning voice-acting performance that forms the basis for some toe-tapping gibberish playing behind DK’s Bananza transformations.

The adorable relationship between young Pauline and the silent Donkey Kong is the icing on a very satisfying cake. Even though Mario is nowhere to be seen, Donkey Kong Bananza seems destined to be thought of in the same category as the Mario games that defined earlier Nintendo hardware launches.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Donkey Kong Bananza is a worthy successor to Super Mario Odyssey’s legacy Read More »

there-could-be-“dark-main-sequence”-stars-at-the-galactic-center

There could be “dark main sequence” stars at the galactic center


Dark matter particle and antiparticle collisions could make some stars immortal.

For a star, its initial mass is everything. It determines how quickly it burns through its hydrogen and how it will evolve once it starts fusing heavier elements. It’s so well understood that scientists have devised a “main sequence” that acts a bit like a periodic table for stars, correlating their mass and age with their properties.

The main sequence, however, is based on an assumption that’s almost always true: All of the energy involved comes from the gravity-driven fusion of lighter elements into heavier ones. However, three astrophysicists consider an alternative source of energy that may apply at the very center of our galaxy— energy released when dark matter particles and antiparticles collide and annihilate. While we don’t even know that dark matter can do that, it’s a hypothetical with some interesting consequences, like seemingly immortal stars, and others that move backward along the main sequence path.

Dark annihilations

We haven’t figured out what dark matter is, but there are lots of reasons to think that it is comprised of elementary particles. And, if those behave like all of the particles we understand well, then there will be both regular and antimatter versions. Should those collide, they should annihilate each other, releasing energy in the process. Given dark matter’s general propensity not to interact with anything, these collisions will be extremely rare except in locations with very high dark matter concentrations.

The only place that’s likely to happen is at the very center of our galaxy. And, for a while, there was an excess of radiation coming from the galactic core that people thought might be due to dark matter annihilations, although it eventually turned out to have a more mundane explanation.

At the extreme densities found within a light year of the supermassive black hole at the center of our galaxy, concentrations are high enough that these collisions could be a major source of energy. And so astronomers have considered what all that energy might do to stars that end up in a black hole’s orbit, finding that under the right circumstances, dark matter destruction could provide more energy to a star than fusion.

That prompted three astrophysicists (Isabelle John, Rebecca Leane, and Tim Linden) to try to look at things in an organized fashion, modeling a “dark main sequence” of stars as they might exist within a close proximity to the Milky Way’s center.

The intense gravity and radiation found near the galaxy’s core mean that stars can’t form there. So, anything that’s in a tight orbit had formed somewhere else before gravitational interactions had pushed it into the gravitational grasp of the galaxy’s central black hole. The researchers used a standard model of star evolution to build a collection of moderate-sized stars, from one to 20 solar masses at 0.05 solar mass intervals. These are allowed to ignite fusion at their cores and then shift into a dark-matter-rich environment.

Since we have no idea how often dark matter particles might run into each other, John, Leane, and Linden use two different collision frequencies. These determine how much energy is imparted into these stars by dark matter, which the researchers simply add as a supplement to the amount of fusion energy the stars are producing. Then, the stars are allowed to evolve forward in time.

(The authors note that stars that are thrown into the grasp of a supermassive black hole tend to have very eccentric orbits, so they spend a lot of time outside the zone where dark matter collisions take place with a significant frequency. So, what they’ve done is the equivalent of having these stars experience the energy input given their average orbital distance from the galaxy’s core. In reality, a star would spend some years with higher energy input and some years with lower input as it moves about its orbit.)

Achieving immortality

The physics of what happens is based on the same balance of forces that govern fusion-powered stars, but produces some very strange results. Given only fusion power, a star will exist at a balance point. If gravity compresses it, fusion speeds up, more energy is released, and that energy causes the star to expand outward again. That causes the density drop, slowing fusion back down again.

The dark matter annihilations essentially provide an additional source of energy that stays constant regardless of what happens to the star’s density. At the low end of the mass range the researchers considered, this can cause the star to nearly shut off fusion, essentially looking like a far younger star than it actually is. That has the effect of causing the star to move backward along the main sequence diagram.

The researchers note that even lighter stars could essentially get so much additional energy that they can’t hold together and end up dissipating, something that’s been seen in models run by other researchers.

As the mass gets higher, stars reach the point where they essentially give up on fusion and get by with nothing but dark matter annihilations. They have enough mass to hold together gravitationally, but end up too diffused for fusion to continue. And they’ll stay that way as long as they continue to get additional injections of energy. “A star like this might look like a young, still-forming star,” the authors write, “but has features of a star that has undergone nuclear fusion in the past and is effectively immortal.”

John, Leane, and Linden find that the higher mass stars remain dense enough for fusion to continue even in proximity to the galaxy’s black hole. But the additional energy kept that fusion happening at a moderate rate. They proceeded through the main sequence, but at a pace that was exceptionally slow, so that running the simulation for a total of 10 billion years didn’t see them change significantly.

The other strange thing here is that all of this is very sensitive to how much dark matter annihilation is taking place. A star that’s “immortal” at one average distance will progress slowly through the main sequence if its average distance is a light year further out. Similarly, stars that are too light to survive at one location will hold together if they are a bit further from the supermassive black hole.

Is there anything to this?

The big caution is that this work only looks at the average input from dark matter annihilation. In reality, a star that might be immortal at its average distance will likely spend a few years too hot to hold together, and then several years cooling off in conditions that should allow fusion to reignite. It would be nice to see a model run with this sort of pulsed input, perhaps basing it on the orbits of some of the stars we’ve seen that get close to the Milky Way’s central black hole.

In the meantime, John, Leane, and Linden write that their results are consistent with some of the oddities that are apparent in the stars we’ve observed at the galaxy’s center. These have two distinctive properties: They appear heavier than the average star in the Milky Way, and all seem to be quite young. If there is a “dark main sequence,” then the unusual heft can be explained simply by the fact that lower mass stars end up dissipating due to the additional energy. And the model would suggest that these stars simply appear to be young because they haven’t undergone much fusion.

The researchers suggest that we could have a clearer picture if we were able to spend enough time observing the stars at our galaxy’s core with a large enough telescope, allowing us to understand their nature and orbits.

Physical Review D, 2025. DOI: Not yet available  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

There could be “dark main sequence” stars at the galactic center Read More »