Author name: Tim Belzer

monthly-roundup-#33:-august-2025

Monthly Roundup #33: August 2025

I got suckered into paying attention to multiple non-AI political stories this month: The shooting of the messenger, in violation of the most sacred principles, via firing the head of the USA’s Bureau of Labor Statistics, and the Online Safety Bill in the UK.

As a reminder, feel no obligation whatsoever to engage with either of these.

There are tons of other things worth paying attention to that are not that.

I realize philanthropy has been handed quite a few sinking ships lately, but if you exclude AI there is one crisis I would prioritize above the rest, which is mRNA.

Which is: We have entered the War on Cancer, future pandemics and other diseases on the side of cancer and future pandemics and those other diseases, because we decided to hand this power over to RFK Jr.

As Rick Bright puts it in the New York Times, America Is Abandoning One of the Greatest Medical Breakthroughs. We don’t have to let that happen.

Albert Pinto: Holy moly trump killed Moderna in US!!

“The U.S. Department of Health and Human Services (HHS) today announced the beginning of a coordinated wind-down of its mRNA vaccine development activities….”

“The projects — 22 of them — are being led by some of the nation’s leading pharmaceutical companies like Pfizer and Moderna to prevent flu, COVID-19 and H5N1 infections.”

Jesse Jenkins: Project Warpspeed was probably the most consequential and unqualified success of Trump’s first term, and mRNA vaccines one of the most exciting medical advances of the 21st century. But this time, Trump 2.0’s anti-vax HHS Secretary (Kennedy) is cancelling all federal funding for 22 active mRNA development initiatives. There just aren’t enough facepalm gifs for all this stupidity.

Alc Stapp: mRNA technology is miraculous and has so much potential.

This is such a massive own goal.

Thorne: We’re fairly close to being able to very effectively treat cancer with mRNA, and this will be a huge setback

To be blunt: people you care about will die of cancer because of this decision

It’s deeply delusional to think that just because this announcement is about respiratory viruses, it won’t greatly impact mRNA vaccine tech as a whole.

mRNA companies get way less funding now and have strong signals that other mRNA vaccines might face regulatory hurdles.

Also, a lot of people are responding as if I’m saying such a tech cannot possibly be developed without public funding. No, I said it was a huge setback. And that means people who could’ve been cured will die waiting for it.

A huge portion of optimism about new medical technology, and even about the future in general discounting AI, is mRNA vaccine development. They’re trying to kill it.

Private investment can and must come in and over. The funding gap here is only $500 million. Certainly some combination of billionaires (and perhaps others) should step up and their make grants or invest as needed to fix it.

The cost-benefit ratio here is absolutely absurd. When we think of the futures that aren’t transformed by AI, mRNA is one of the technologies giving us the most hope. If those who heave the means do not pick up the slack here, flat out.

This comes on the heels of various forms of amazing news about mRNA, such as this:

Kepecs Labs: Huge cancer breakthrough! mRNA vaccine (similar to COVID vaccine tech) shows stunning results against pancreatic cancer! 75% of responsive patients STILL cancer-free at 3 yr, normally 80% recur. Could revolutionize treatment for one of deadliest cancers! Funded in part by @NIH

The thread is full of the kinds of graphs you see when a treatment works really well.

A. Do responders still have better outcome?

Left = 1.5 yr follow-up

Right = 3.2 yr follow-up

Yes! 👇🏽

16. TAKE HOME

In #PDAC, RNA NA vaccines make CD8 T cells of

– multiyear longevity

– substantial magnitude

– durable effector function

whose presence

– correlates with delayed recurrence at 3-yr follow-up

The worries are politics and game theory.

In terms of game theory, if we step in and save this situation, did we effectively let the government steal $500 million? Won’t they then have every reason to do it again and target the best programs? Won’t this willingness to fund (acasually) be the reason this got cut?

My answer here is no, because this cut was intended not to steal the money but to stop the research. The people who are cheering this have bought into their own paranoia or perverse incentives so much that they actually want mRNA dead. In such cases, it is relatively safe to step up, although it does carry some risk of giving people ideas.

In other situations, the dynamics are different. Either the motivation was indeed to save or steal money, getting others to pick up the slack. Or it was to use a wrecking ball to kill a wide variety of things, knowing the best ones would then gets saved. Then you have to look at these questions a lot more carefully.

The politics means both that the administration might then use other means to stop mRNA or at least not be helpful, or that this might mark one as an opponent of the administration.

I would not worry much about the administration not playing further ball. This is a long term fight, and also we have reports Trump himself is not thrilled about the cuts. It is also quite a lot harder to turn down the life-saving medicine when the time comes than it is to deny the initial funding. The pressure would be immense, and also there are places besides America to start deployment if you need to do that for a bit.

I also don’t worry too much about this alienating the Trump administration. You’re investing in America, Trump was reportedly not thrilled about the cuts, and he definitely isn’t a true believer on this like he is on tariffs. He knows that being against mRNA is about placating crazy people, so if it happens without him, that is fine. That assumes, of course, that you are worried about this dynamic in the first place. Some people very much aren’t.

This pattern is common. I think centralizing suffering is a critical mistake, so you can substitute various things for ‘utilitarianism’ and also various things for ‘suffering.’

Although also, yes, at reasonable prices, and while factoring in other things we also care about, we should reduce suffering.

Also, yes, this is about how well most people deal with hypotheticals.

This is also a central common pattern, and the difference that matters.

Henry Shevin: Many years ago, I went to two animal welfare events with very different types of philosophers.

The conclusion of the first was “we need to have another bigger conference next year, with animals present.”

The conclusion of the second was “we need to fund in-ovo chicken sexing.”

I would not go to either conference. But, if you did go to one such conference, you would want it to be that second one.

I will note that there is already lots of talk about making the new in-ovo chicken sexing technology mandatory, starting in Europe. There will always, always be a push to make such things mandatory.

Some more making fun of how awful Cate Metz is and how much he got everything wrong yet again:

Leila Clark: on the lighthaven drama, from a friend:

A true statement:

Roon: Public goods are often expensive gifts to yourself scaled up.

David Manheim: Yes – and this is a reason that wealth inequality often leads to public benefit.

The cheapest way for wealthy firms to have educated workers is public education, and the cheapest way for the rich to reduce climate risk is fixing emissions globally, etc.

This is also why I refer to the ‘chasm of personal utility.’

Once you hit fyou money, there is remarkably little that additional money buys on a personal level. Marginal returns drop dramatically. It takes quite a lot of additional money to get remarkably little benefit. The things people buy for themselves that actually get expensive, like boats and lavish private parties, really aren’t that great.

If you actually want your life to get better, the only way to do so becomes improving the world, since you and those you care about have to live in it. Hence, public goods.

As a toy example, as a gamer, I could basically buy whatever I want and not bat an eye, unless I wanted things like Vintage Magic decks, and even that has an upper bound. So at that point, if I want better gaming, what do I have to do? Commission games.

Elizabeth von Nostrand: Lots of people want my job. No one wants the part where I spent five years doing this job for free.

Ben Landau-Taylor: Most of my friends with weird jobs could say the same.

Eliezer Yudkowsky: Word.

Nefarious Jobs will, for a remarkably small fee that usually is only four figures, go out and ruin someone’s life, in ways that they say are technically legal, with the ‘Total Annihilation’ package only costing $10k. In addition to the other obvious reasons it is terrible, one strong reason not to do this is that it can be done back to you.

Reservations at DC restaurants plunge 31% compared to 2024 in the wake of the police takeover.

The trucking industry reports it is experiencing the dreaded double whammy, intense labor shortages combined with declining wages. Um, yes, you did read that correctly.

RIP Hulk Hogan, a lawyer from Bollea v. Gawker offers a retrospective on the case that took Gawker down.

Reminder, once again: When people tell you who they are, believe them. Kelsey Piper is latest to confirm that people who identify themselves as evil, or with Hitler or as a fascist or a Nazi, will universally prove to indeed suck immensely.

It appears to not be a strawman that some disability activists oppose treating disability via gene editing because this would mean there would be fewer disabled people, which would weaken their ability to exert political pressure.

Rebekah Westenra: Begging the average disabled person to understand that it’s not about you. You will never ever get access to this kind of tech. This will never be a cure for YOU. This is about eliminating disability from the upper class which will reduce support & resources directed towards YOU.

She is of course directly incorrect, developing cures for the rich is how you develop the technology, after which it leads to cures for everyone else, but even if she were somehow correct, consider what don’t even count as the implications, just the outright telling people if they are rich then it is good they are disabled, my lord.

Oh, and then the follow-up is, maybe curing your disability would be bad, yo, and if you disagree with this than you’re not really a disabled person so you have no right to talk, there are those who can make this up yet I remain not one of them.

Also begging people to understand the differences between various disorders and to think about what makes you YOU and if your disorder can be completely separated from that then maybe just shut the hell up for now.

Those corporate ‘icebreaker’ and ‘team building’ events? Yeah, they are pretty important to your success at your job. You need to go all out with faking sincerity.

Simon Fruit: One of my favorite employment phenomenons is this retarded idea that if you do your job very well but don’t participate in stupid, time-consuming, and useless ice breakers then you’re not a “team player” because you made Brenda feel bad that you didn’t care for her weekend.

Andrew Rettek: I got fired for this once.

Patrick B: Three phrases to remember: Oh wow crazy. That’s amazing. Oh no he didn’t.

This is effectively a large reason to seek out coworkers you like hanging out with, since you’re going to be forced to do that if you want to succeed.

I never had a problem with such work exercises, because the offices I did join for any length of time – Jane Street and Wizards of the Coast – selected for people I would have been happy hanging out with anyway.

I did however kind of get fired as a student at a Dojo for this. For a while I would go to class twice a week and had moved up one rank. I was informed by Sensei that as I kept going, they expected me to participate more in the community. I found the other students to be nice people, I didn’t at all mind training with them and making small talk, but burning evenings socializing? Oh, hell no. That was essentially that.

Well, this seems not awesome:

The farther you go downthread the worse it gets.

It seems in Cairo (and presumably many other places) Uber drivers flat out ignore the fare they accept and then you have to haggle. Like James here I absolutely cannot stand small stakes haggling. Transaction costs are very high and it makes sense that the Anglosphere has long had a big advantage everywhere it doesn’t haggle, but struggles on housing which is the one place we still do it and have to hire people to advise us on optimal haggling techniques.

Whisper networks are terrible, but what is the alternative? Ideally actual fact finding, but that is expensive. You cannot play that card so often, and indeed you need a whisper network or something similar to know when to invest in fact finding. Next up would be creating common knowledge and only saying things in the open, which also has obvious limitations. If nothing else, it means that if you can make the victim or witness not want to come forward, in one of any number of ways, you get away with it, and it is not obvious how to go from ‘whisper networks are bad’ to preventing one from spontaneously arising unless you have an effective alternative mechanism. Also, if you cannot say anything negative about anyone without telling them directly, that leads to heavily biased information and also various games where silence starts to be highly meaningful. I don’t see any good solutions?

Wes: Whisper networks are bad, for obvious reasons.

Xenia: unfortunately also good for obvious reasons.

Wes: Alas. The bad parts tend to eat the good parts.

Sparr: In my intentional community organizing efforts, I have tried and failed a few times to establish this rule/norm:

Say nothing negative about someone behind their back that you don’t say to their face, unless it should get them kicked out of the house.

Wes: Why does it fail?

Sparr: People refuse to honor/follow it. If I could find a core group of 3-5 people who would adopt this norm, new people could be acculturated. But I have never found that core group.

Recommended: Cate Hall tells us 50 things she thinks she knows. The list is as excellent as everyone says it is. Many of these are exceptionally valuable if you don’t know them or needed a reminder. A number of them are in my opinion false, actively unhelpful or both, but that keeps you on your toes and a list where all 50 were true and useful would not be as interesting. Like her I could probably write a post about most of these if I wanted to (especially the one I think are wrong).

Cate Hall also offers praise for quitting, especially quitting when you realize that you don’t want the results of walking down a long term path. Some people of course need to reverse this advice.

Recommended: The Inkhaven Residency at Lighthaven in Berkeley, happening November 2025. If you attend you will write 30 blog posts, one per day, or leave, with advice and mentorship from Scott Alexander, Scott Aaronson, Gwern and more. Cost to attend is $2,000, housing is available as low as $1,500 ($2,500 for a private one).

(I do not currently have a plan to make an appearance myself, I only have so many trips in me per year, but certainly there is some chance I will choose to do so.)

Zohar Atkins presents a new Library of Alexandria, 4000+ great books combined with an AI tutor, called of course Virgil, to converse with.

I am doing my best to avoid commenting on politics. As usual my lack of comment on other fronts should not be taken to mean I lack strong opinions on them. Yet sometimes, things reach a point where I cannot fail to point them out.

If you are looking to avoid such things, I have split out this section, so you can skip it.

This month, that applies to the following two sections as well.

Because this is the realm of things like this:

Tetraspace: Reading another thread of people replying to “this law should be changed” with “but it’s the law” and being thankful that democracy achieving good outcomes doesn’t rely on people understanding policy details.

Replacing the H-1B visa lottery with a system based on ‘seniority or salary’ predicted to raise the program’s economic value by 88%. I would worry that ‘seniority’ is too easy to fake, so I would go with salary as much as possible. It is also argued that this would prevent the driving down of wages for native workers. Even better would, of course, be to straight up auction off the visas themselves, or set a market clearing price (ideally with much higher supply, perhaps the level that maximizes revenue), which is the obvious solution.

🚨 BREAKING: A bill to ban politicians from trading stocks is getting pushback from the White House, per Axios.

The pushback seemed to be they did not like that it applied to the President and Vice President. I can’t imagine why. I only report the news.

The good news first. The UK backed down from the encryption standoff with Apple amid US pressure.

Then they went and did all the other stuff they did this month. Oh no.

The free speech situation in the UK seems about to get somehow even worse on multiple fronts at once?

The situation has reached the point where if I lived in the UK, I would feel it necessary to leave, because I would otherwise not feel safe doing my job.

Dominic Green: On the night of Wednesday, July 16, the Labour government’s Employment Rights Bill passed its second reading in the House of Lords.

If the bill goes into law in its current form—and there is not much to stop it now—Britons can be prosecuted for a remark that a worker in a public space overhears and finds insulting.

The law will apply to pubs, clubs, restaurants, soccer grounds, and all the other places where the country gathers and, all too frequently, ridicules one another.

Meanwhile an ‘elite police squad’ is monitoring anti-migrant posts on social media.

Oh, and on the first day of the ‘Online Safety Act’ they were already on the verge of shutting down Wikipedia. Could there be any clearer sign things are extremely bad?

Evolve Politics: Wikipedia is currently in a legal battle with the UK government to try and stop the platform being censored in the UK – or even completely blocked – thanks to the Online Safety Act.

Under the new law, the UK media regulator Ofcom is poised to label Wikipedia as a “Category 1” platform.

This would impose the strictest content rules possible – such as:

– age verification for users

– identity verification for contributors

– censorship of ‘harmful’ topics.

Wikipedia has already stated they will not implement any of these rules, arguing they would be forced to censor crucial facts, and potentially expose their volunteer contributors to real-world harm – such as political harassment, or worse – purely for documenting the truth.

In addition, Wikipedia says that bad actors could easily abuse the new laws – by filing fake complaints or exploiting vague “harm” rules to force them into entirely removing articles that people – or the UK government/corporations – simply disagree with.

Wikipedia’s legal case was heard at the Royal Court of Justice on July 22-23, and a ruling is expected within a month or so. However, if their legal arguments are rejected and they refuse to implement Category 1 rules, the UK government could block access to Wikipedia entirely.

It is a great relief to confirm that Wikipedia is not going to give in here, especially on censorship of ‘harmful’ topics even for adult users. They have since lost their court case.

Chris Middleton lays out what the Online Safety Act does in general.

Chris Middleton: It creates a new “duty of care” on all online services to police user content. This means:

✅ Platforms must proactively detect and remove “illegal” and “harmful” content.

✅ Age verification to block under-18s from adult material.

✅ Private messaging apps must scan messages for banned content.

WhatsApp and Signal warn this poses an unprecedented threat to encryption and privacy.

Age checks and the death of anonymity:

Any site with adult content must now implement “highly effective” age verification. That means:

📸 Face scans

📅 Government IDs

💳 Credit card checks

This applies far beyond just porn to any user-generated platform. The law covers any site that allows users to share or interact. That includes forums, messaging apps, cloud services, open-source platforms, even Wikipedia.

Proton VPN: Just a few minutes after the Online Safety Act went into effect last night, Proton VPN signups originating in the UK surged by more than 1,400%.

Unlike previous surges, this one is sustained, and is significantly higher than when France lost access to adult content.

Wint (August 28, 2013): lets set some realistic goals here : jokes banned by 2016. sex banned by 2020. a cop in every household by 2025

What kind of things are being censored, in addition to Spotify, which is also threatening that it might have to delete your account?

Saruei: I can’t believe Spotify now requires age verification. Today it’s music, tomorrow it could be books, films, or even news articles. It’s the first step into a dystopian reality we’ve seen in movies, where access to culture is gated by surveillance and the illusion of security.

Calgie: Your Spotify account is getting deleted unless you do age verification.

Adam Wren: For everyone that was saying “it’s just to stop kids watching porn” very first day of the restrictions it’s been used to censor “violence” which in this case means police arresting people at protests, well done. The very first day. Not even a ‘slippery slope’ at this point, more of a wet cliff.

Benjamin Jones: If you have a standard X account in the UK – presumably the vast majority of British users – you cannot see any protest footage that contains any violence tonight. Because of the Online Safety Act. A relative in America sent me this screenshot of one blocked post.

Matvey: It’s frankly disgusting that the Online Safety Act is being hidden behind the pretence of ‘child protection’ when it’s already being used to hide political content from non-age-verified uses, and next year will be able to take IP from tech companies at no notice.

Draconian.

It was a bold move, Cotton, to go directly after Wikipedia and coverage of police and protests and testimony before Parliament on day one. They did not want there to be any illusions what their true target was.

Charles: I just got asked to submit ID to view a Reddit wine forum.

Immediately thought “I’ve got to get out of this country (the UK)” and bought a VPN subscription.

Now I’m digitally in the much more free nation of “checks notes” Belgium.

But the impulse to physically get out remains. This is not a place that feels hopeful or optimistic or like it’s going to change for the better soon.

Would I call this new UK a ‘police state’? Well, it is a place where they censor and potentially jail you if you criticize the police. I mean, if you’re censoring Wikipedia and you’re blocking videos of police arresting protesters, I realize Wikipedia does do some rather nasty politically motivated things like whitewash Mao as if it was defending him in court, but what more is there to say?

The community note is incorrect. This very obviously was exactly what the act was for. I’m not a pure ‘the purpose of a system is what it does’ person, but yes very obviously the purpose of this system is to censor speech authorities dislike.

Cremieux: The Online Safety Act censored one of my posts on lactose intolerance. It censored another where I mentioned donkeys, and my friend can’t see one of my posts on Neanderthals processing fat. If you support the Online Safety Act, you are an imbecile.

Nigel Farage and the Reform Party would get rid of the Online Safety Act, or as the Labour Party calls it, ‘scrap vital protections for young people online, and recklessly open the floodgates to kids being exposed to extreme digital content,’ the same way they were so exposed before and are so exposed in other countries, and thus he is ‘not serious.’ They also say you are ‘on the side of the predators’ while, censoring official discussions about investigation of actual predators.

Many such cases.

Crush Crime: Our post with a screenshot of a House of Commons amendment, setting terms of reference for an inquiry into the grooming gangs cover-up, has been censored by the Online Safety Act. The state must spend less time policing speech and more time catching rapists and thieves.

Here is that post:

Sam Dumitriu: “Nigel Farage would give teenagers access to material on drinking cider, owning hamsters, and speeches from Conservative Members of Parliament. He is simply not serious.”

Charles Haywood compares the situation to that in Eastern Europe in 1989, as in it has become clear that the government will not respond to the public’s views except by trying to censor the public, including censoring statements that the majority agrees with and statements about police conduct, political opinions and the coordination of protests, now including on social media, in pubs and in private chats.

It can always get worse. Australia is going to make you prove your identity in order to access search engines as in Google and Bing, and they want to ban YouTube for kids under 16 as part of their social media ban, WTAF.

The UK is seeking to pass a law enabling the issuance of ‘respect orders’ to prevent someone from engaging in ‘anti-social behavior’ that can ‘prohibit the respondent from doing anything described in the order’ or ‘require the respondent to do anything described in the order.’ The court can simply order you to do or not do actual anything? So I suppose they spell respect T-Y-R-A-N-N-Y.

Isaac King: “The text of my new bill, the End All Bad Things Act, is as follows:

I can do whatever I want.

This will allow me to make people stop doing bad things. Thus if you oppose this bill, you are in favor of bad things.”

Then again, what did we expect from a country that censored the Teenage Mutant Ninja Turtles?

R Street: The U.K.’s Office of Communications (Ofcom) explains in detail what each category of prohibited content includes—even “[c]ontent which realistically depicts serious violence against a fictional creature.”

Such a definition would not only prohibit minors from accessing historical and newsworthy content about wars—but many episodes of SpongeBob (if posted to social media), including but not limited to “No Weenies Allowed.”

Meanwhile, YouTube is now going to ‘use AI’ to ‘interpret a variety of signals,’ including account longevity and which types of videos a user is searching for and watching, to ‘estimate’ whether a user is 18 and thus age restrictions must be imposed.

Klint Izwudd: Isn’t it fucking amazing how worldwide all of these incredibly sophisticated censorship measures are literally appearing in the last week.

The direction of this move is ambiguous. If the previous regime was that everyone was treated as a minor until proven otherwise, and now you have a second way to get the regime to stop doing that, and how minors are treated does not change, then This Is Good, Actually. Alas, this likely goes hand in hand with worse treatment of minors. From this article, it sounds like this will effectively mean an expansion of restrictions.

Also note that the actual changes listed are (they use the word ‘including’):

  1. Disabling personalized advertising

  2. Turning on digital wellbeing tools

  3. Adding safeguards to recommendations, including limiting repetitive views of some kinds of content

All of those seem like they could be straightforward upgrades? Can I choose to turn on those features?

What they of course fail to mention is that the main change is age restricting videos. I do notice that I have an alt Google account, I definitely did not provide Google my ID there, and when I use YouTube on it I have yet to run into age restrictions on videos.

A fun note is, if you were trying to ‘look like an adult,’ what would you do? You would among other things try to make your consumption as age inappropriate as possible?

I would very much like to see this handed as follows by the tech companies:

Arthur B: If Meta and Google had the courage to entirely drop service for the UK, the government would fold in two weeks and repeal the OFA. The EU, Australia, etc would start to backtrack. Two weeks is all it takes, it could be done.

Wikipedia has the right idea. By all means sue, but make it clear that if push comes to shove you will simply cut the country off. Even if the governments held firm, fine, so be it, let everyone use a VPN.

The traditional way such stories end, when they don’t end in revolution, is this:

Devon: This is what current polling looks like when you don’t include LeftParty_Final. I’m sorry but anyone making the “you’re gonna split the vote and let Farage in” argument has their head in the sand and can be safely and derisively ignored.

In particular I would not have simultaneously severely censored the internet for 16-and-17 year olds and also given them the vote. That’s just me.

Here’s the strongest argument I’ve seen yet that actually Brexit was a mistake. You might need to get away from the EU but that doesn’t help if you then act even worse:

Alex Tabarrok: The British would never have tolerated this if it came from Brussels and EU bureaucrats.

Once regulation was seen as self-imposed, the floodgates opened.

Mr. Obvious: BREAKING: Zoomers cannot adjust their Nvidia graphics cards settings on their gaming PCs anymore because they aren’t 18 thanks to the Online Safety Act.

To be fair, if you can’t get a VPN working then you shouldn’t be using Nvidia apps.

Michi: UK App Store charts be like

Guess who is also downloading those apps, also billing the public for them?

Freddie New: Peter Kyle suggesting that using a VPN will put children at risk (a laughably luddite suggestion, as he probably uses one himself every time he works from home)… At the same time that Business Secretary Jonathan Reynolds (famously confused as to whether he is or is not a lawyer) is billing his use of a VPN to YOU, the taxpayer.

Why doesn’t Jonathan Reynolds just verify his age instead?

I honestly wish someone was making this all up.

Ultimately, yes, this is the choice and the choice is #1:

Jeremy Kauffman: Most people won’t state this so bluntly, but if the choices are:

  1. kids sometimes access pornography on the internet

  2. a federal ID system to access the internet

Then #1 is the better choice.

Well, actually the #2 choice is you have a federal ID system and the kids access the porn anyway, but it was never about the porn. The porn is an excuse.

Misha: The dangers to children of potentially seeing porn are trivial compared to the benefits of being able to freely access the internet

Kelsey Piper: And the most overwrought hysterical “this is the first step towards requiring government ID before you read or talk online at all” predictions have been borne out in full so swiftly that I don’t see how you can possibly feel certain it won’t happen here.

like, I’m sorry! I too really hate how hard it is to give kids a healthy online experience! and I will oppose every effort to enshrine age verification in either the law or in company policy on any level for any reason.

Matthew Lesh: The Online Safety Act debate was a lonely place for free speech advocates. Anyone who dared to question the law was treated as a child-hating pariah. Yet as key provisions have come into force our warnings have proven eerily accurate.

Institute of Economic Affairs: There was shock that anyone might dare to question a law designed to ‘protect children.’

In a separate meeting with a senior Ofcom official responsible for implementing the law, I was politely assured that excessive implementation is never a problem with regulation, leaving me utterly dumbfounded.

If those implementing a law tell you ‘excessive implementation is never a problem with regulation’ and you let them continue implementing, you know what you will get.

There was a period where we were constantly told that those concerned about AI killing everyone would impose dystopian authoritarian nightmare surveillance states because we wanted to impose some restrictions on who could train or distribute future frontier AI models potentially smarter than humans.

Instead, things far worse for freedom than anyone was talking about are being imposed because otherwise ‘you are not serious about protecting children from predators’ or what not, and being used to suppress dissent and also settings on Nvidia cards on day one. Somehow most of the same voices are being a lot less loud about it.

They are even going after good old free speech Americans like 4Chan, whose response letter correctly said they would fight any and all attempts and calling upon the State Department to step up its game, but seemed altogether too polite. Let 4Chan be 4Chan, at least this one time.

Also, frankly, go ahead, go after 4Chan and see what happens. It’ll be fun.

Then they went after Twitter for censoring in the UK too much, because it made the UK government look bad.

Preston Byrne: I was asked to comment on a story today about this. Apparently Ofcom wants to punish X for “over-censoring” user content, making the UK government look bad. In their view, X violates the Online Safety Act by over-complying.

“If Ofcom goes after X, I hope Elon kicks their ass.”

Also, I’m not sure what Ofcom is smoking, but there is no rule in English law which requires a website to platform lawful speech.

Maybe the UK is taking an expansive view of Article 10, but that’s just more evidence that Article 10 is vague and crap and should be abolished.

Forever Scept: TRANSLATION: You were supposed to censor without them knowing.

Preston Byrne: Right. “You’re supposed to censor only what we want you to censor, and we aren’t going to tell you what we want you to censor until you get an enforcement notice for censoring incorrectly.” Yeah, no. Absolutely not.

Patrick McKenzie: This is a recipe for censorship by “Come on, you know what we want.” followed by zero point zero democratic accountability. “All independent decisions of firms made for commercial reasons; we have no orders.”

We have seen this movie before, depressingly frequently.

America’s State Department has spoken up at least a little.

Bureau of Democracy, Human Rights and Labor (DRL): The UK’s Online Safety Act undermines the right to free expression by imposing censorship on vague grounds. Suppression of criticism of illegal immigration or the criminal justice system is completely unacceptable in a free society.

These laws will also create immense pressure on American companies to kowtow to the censors. Foreign laws must not undermine the right to freedom of expression of Americans.

There was then an escalation.

Preston Byrne: The US State Department’s report specifically notes the UK/Ofcom has been targeting Americans with no corporate presence in the UK for censorship.

I am proud to have supplied the US government with all relevant documentation on this point.

Meanwhile over in the EU they mandate TVs to lock the brightness at 30%-50% for sustainability reasons, as in ‘eco mode,’ and you have to dig deep into settings to fix it. But that’s nothing compared to what is coming, you are to hold their beer or wine.

Marko Jukic: The EU intends to automatically scan every private message sent over a phone, including encrypted ones, for “child abuse material” by this October. No prizes for guessing what else they will scan for in a few more months or years. Final death of free speech and free internet.

If you have one rule, this is it. Also, if you must shoot the messenger, do not shout ‘THIS IS SPARTA’ like it is a good thing that you are doing so.

Alas, we have chosen to shoot the messenger along with a bold post that says ‘we are shooting the messenger,’ as in we got revisions to a jobs report that Trump didn’t like so he fired the Commissioner of Labor Statistics and accused her of ‘faked job numbers.’

Dow: Banana, meet Republic.

Jonah Goldberg: It’s like a pilot smashing the altimeter because he doesn’t like the altitude reading.

To the extent the people angrily responding to this are A) People and not bots B) Sincere and not partisan hacks or C) Not complete idiots:

Trump blamed the bad numbers on political bias. The same head of BLS delivered terrible numbers on the eve of the election (a fact Trump now lies about). She also delivered great numbers earlier under Trump. So the argument she was biased is just stupid.

Those trying to justify this keep getting details wrong and having others turn out rather inconveniently for them, such as the number he says was ‘rigged’ right before the election later being revised upwards rather than downwards, meaning the error favored him.

Nick Timiraos: Trump to CNBC on the jobs numbers before the election: “The numbers were rigged.”

He’s getting his dates wrong. He’s saying the jobs numbers looked good before the election but were revised down after the election. The big downward revision in August, before the election.

Kernan to Trump: You’re undermining confidence in the numbers by firing the BLS commissioner.

Trump: “When they say nobody was involved, that it wasn’t political…. Give me a break.”

Trump: “It’s a highly political situation. It’s totally rigged.”

Kernan: Which number do you believe? The chances of a Fed rate cut are going *upbecause of these weak numbers.

A slight slowdown in labor “will get you what you want” on the Fed.

This of course compounds the undermining of confidence. It seems actively designed to undermine confidence in the numbers.

Here is the director of the NEC outright saying that the data ‘has to be something you can trust’ and by ‘you can trust’ he means a high number that makes people do the things we want them to do. As in, the job of the numbers is to lie.

I appreciate the candor about the intent, sir.

Kevin Hassett (Director of the National Economic Council): “The data can’t be propaganda. The data has to be something you can trust, because decision-makers throughout the economy trust that these are the data that they can build a factory because they believe, or cut interest rates because they believe. And if the data aren’t that good, then it’s a real problem for the US.”

Justin Wolfers: Minister for Propaganda says the data can’t be propaganda once his Ministry has had a chance to vet them and ensure they’re even true-er.

Matt Darling: The White House anti-BLS webpage is basically nonsense. They claim a “consistent pattern” of “overly optimistic numbers” in 2024, but neglect that 2024 had 6 upward revisions and 6 downward revisions.

Aaron Rupar: Kevin Hassett suggests the Bureau of Labor Statistics rigged the 2012 election for Barack Obama.

Arin Dube: It’s critical to push back against baseless claims about data revisions, like those by Kevin Hassett. These falsehoods smear the integrity of professionals such as @brent_moulton, who have spent their careers ensuring the public has access to reliable economic data.

Brent Moulton: In 2012, I was the associate director at the Bureau of Economic Analysis (*not BLS*) and was responsible for preparing the estimates of gross domestic product. Mr. Hassett gets several things wrong here.

First, I would like to assure you that in the 19 years I was responsible for the GDP estimates (from 1997 to 2016), the estimates were NEVER politically manipulated, nor did anyone ever ask me to adjust them for political reasons.

At BEA we made a concerted effort to openly explain to our data users the data sources for the GDP, what methodologies were used in the estimation, and be as transparent and “open source” as possible. We provided source data tables, methodologies, technical notes, etc.

I disagree with Hassett’s allegation that the advance GDP estimate for the 3rd quarter of 2012 was unexpectedly large. That first estimate said that GDP grew at a 2.0% rate – just about what it had been averaging for the prior two years.

A number of forecasters try to predict the GDP estimate, often using much of the same source data as used by BEA (albeit usually calculated in less detail). For example, the Atlanta Fed’s GDPNOW forecast (one of the better ones) was 1.8%, close to BEA’s 2.0% estimate.

[thread continues as you would expect]

Then, as the replacement, Trump nominated E.J. Antoni, which seems like a caricature of the worst possible nominee.

Brendan Pedersen: Trump makes the nomination of EJ Antoni to lead the Bureau of Labor Statistics official after firing the last commissioner over job report revisions. Antoni is chief economist at the Heritage Foundation.

Brian Albrecht (Chief Economist, Law and Economics Center, lacking imagination): Worse than I could have imagined 24 hrs ago

Ben Berkowitz, Emily Peck (Axios): President Trump’s nominee to head the Bureau of Labor Statistics, E.J. Antoni, suggested the possibility of suspending the bureau’s flagship monthly jobs report.

Christopher Rugaber (AP): Jason Furman, a top economist in the Obama administration, wrote on X: “I don’t think I have ever publicly criticized any Presidential nominee before. But E.J. Antoni is completely unqualified to be BLS Commissioner. He is an extreme partisan and does not have any relevant expertise.”

E.J. Antoni (June 26, 2024):

His Twitter feed is, shall we say, sobering throughout.

The National Review summary of the situation is ‘Trump Wants a Bureau of MAGA Statistics.’ The National Review.

Dominic Pino: What Trump would like is a BLS that is biased in his favor. The latest proof of that is his nominee to be the next commissioner, E. J. Antoni.

Antoni is the chief economist at the Heritage Foundation. He has been a relentless booster of Trump’s policies on social media. And he has demonstrated time and again that he does not understand economic statistics.

Dominic then provides various receipts about Antoni. He is maximally unqualified, as in far more unqualified than Jon Snow, who knows nothing.

This is all a whole different level of absurd and awful than usual.

Technically it is illegal to suspend the report but do you expect that to stop them?

Conor Sen: The BLS thing just sucks, anyone who tries to sugarcoat it at best doesn’t know what they’re talking about.

Not that it will work, as Nate Silver explains at length, no one is going to be fooled. Destroying the reliability of our economic data only makes everything worse. Derek Thompson calls it part of ‘the war against reality.’

Greg Mankiw, a conservative economist and chair of the Council of Economic Advisors under George Bush who I’ve had on my RSS feed for a decade joined fellow former CEA chair Cecilia Rouse to warn that this firing will backfire and hurt the ability to analyze the state of the economy and develop the best policies, with the headline warning this will ‘come back to haunt’ Trump. You can smell the forced politeness.

It’s a relatively minor point relative to not shooting the messenger, but the defenses claiming the messager was terrible have just been so absurdly bad.

Chamath Palihapitiya (All-In Podcast): Non Farm Payrolls are total garbage so I asked Grok:

“Hey Grok, go look at the Bureau of Labor Statistics website for their non farm payrolls data. Tell me how many times their original forecasts have been revised since Jan 2020. And, of those revisions, how many times the data was revised up versus down. Categorize this during the Biden versus Trump presidencies.”

Bottom line is that BLS isn’t so much conspiratorial as it is inadequate in its approach. They are all over the place and add little directional signal. They constantly revise and in both directions.

The sampling techniques they use are brittle and don’t work for a large and dynamic economy like the US.

Trump was right to fire the head of BLS because she ran a critical aspect of the US economic machinery in an unpredictable, haphazard and sloppy way.

There needs to be a new, oracle-like data provider for this critical information.

Alex Tabarrok: Amazing. Expects to find bias. Finds none. Which is what you would expect if BLS is doing their job well.

Reverses course and claims BLS lack of bias means their forecasts have no “signal” and that is bad? Incoherent. Ends with gratuitous call for better methods.

Christopher Clarke: BLS preliminary estimates have actually increased their accuracy over time. There is always room for improvement and survey responses have decreased. Improved accuracy requires more resources, not less.

What BLS does is they provide an early estimate, because that is valuable even when it is noisy, and then a later estimate. This Is Good, Actually.

Tyler Cowen, who has had some very let’s say creative defenses of various administration decisions, flat out made it is very bad to behave this way, and that BLS is not biased except in favor of following established procedures, as in it is biased towards being above reproach about potential biases. Which is wise, and means if you want to account for other things you need to do that on top of their estimates.

On top of everything else, the whole thing happens to be backwards in two distinct ways.

As in, the first way is that downward revisions mean the numbers were initially overstated, which makes you ‘look good,’ and no one involved is buying the galaxy brain (but kind of correct) take that you want to ‘look bad’ to get a fed rate cut.

Ernie Tedeschi: The average revision to monthly payroll employment during the Biden Admin–from 1st estimate to final/latest–was -0.05%. For the 1st Trump Admin, it was -0.10% (same including the pandemic or not). These are both small revisions, but the “overstatement” was greater under Trump.

The second way this is backwards is that low numbers mean you can get the Fed to lower interest rates, which is what Trump wants, so he should welcome that.

Nick Timiraos: Treasury Secretary Scott Bessent suggested that the Fed should consider cutting interest rates by a half percentage point at its September meeting in light of recent labor-market data showing a slower pace of job growth

Bharat Ramamurti: The job numbers are all fake and Mr. Trump’s economy is actually BOOMING! But also the Fed should cut by 50 bps despite accelerating inflation because the job market is so bad.

This is very much not an isolated incident. The Trump Administration is cutting our ability to measure things across the board.

It is easy to say to all of this ‘oh this is at this point entirely unsurprising’ or dismiss it as unimportant. I believe that would be a mistake. This type of action is a big deal, and falls into the list of things you absolutely never do. Do not shoot the messenger, violate a flag of truce or break guest right. Ever. If you do, The North Remembers.

As in, recently I finally got around to watching The Godfather, and then it was clear that everyone involved expected everyone in their culture to go around shooting messengers (and shooting people at peace talks) and that’s when I lost the ability to sympathize with the characters.

Colin Grabow offers a central thread outlining the forces keeping the Jones Act in place and how they work to prevent America from shipping goods between ports. If you’re following Balsa then you know most of this already.

There exist true things that are forbidden to talk about.

There also exist a lot of false things that are forbidden to talk about.

Peter Boghossian: One deliverable from Peter Thiel’s talk: If it’s forbidden to be spoken about, it’s likely true.

Emmett Shear: We taboo all kinds of claims, and only some of them are true. For example, claiming the earth is flat will get you excluded from society, but that doesn’t make it true. If only finding truth was as easy as inverting taboos!

Zac Hill: I mean that’s just straightforwardly not the case right? Also who is doing the forbidding and why are people running around like simps preoccupied about what is and isn’t sanctioned? “Ahh the Arian Heresy is obviously 100% factually accurate.” Just seems like a red herring idk.

Daniel Eth: This is very obviously false to anyone who thinks about it for more than a couple seconds. There’s a related point that EVEN THOUGH most things that are “forbidden” to be spoken about are false, we should taboo less b/c SOME are true and important. But that ain’t this

Arthur B: There’s sadly a cohort of people who defend hyperbole or outright nonsense so long as the direction is correct because it makes the message punchier, as if the rhetorical end justified the means. But such discourse habits destroy the commons.

Paul Graham: It’s not true that if you can’t say something, there must be a kernel of truth in it. It’s trivially easy to think of counterexamples.

What’s the best model of being unable to ‘work your way up from the mailroom’?

Here is one attempt.

Byrne Hobart: One way to frame this is to ask what would have to happen to have a modern Sidney Weinberg-style career, which is mostly a list of what would have to not happen. He’d have to:

  • Avoid finishing high school.

  • Avoid taking any standardized test.[1]

  • Kept his early business hustle under wraps.[2]

  • Avoided college.

  • Not found a company where there’s a career track that starts at “unskilled worker earning subsistence wages” and somehow has a path to the top.

Another way to say that is is that you only get Sidney Weinberg stories when the market for talent is fairly inefficient.

But you can flip that around and give it a grim corollary: the measure of how efficiently talent is allocated in a society is how young you are when your dreams are crushed. A world where 99.9th percentile talent immediately gets snapped up by whichever employer can make the best use of that talent is one where 99.8th percentile people learn early on that they just don’t have what it takes.

There is still a path for dropouts with few legible skills to work their way up to the top of a Fortune 500 company: start at the top, and stick around until your company is on the Fortune 500.

I think this is mostly a case of romanticizing a path that was never great in the first place. It’s not that it is impossible to ‘work your way up’ in this fashion, if you actually are good enough that you would deserve it, it’s that if you could impress enough to actually pull it off working your way up then you have much better paths, with or without going through college. That’s also largely about the great news that we have much better skill and reputation transfer, so you’re not permanently at the mercy of the firm and your boss.

I also very much don’t think it means your dreams die quickly if you are ‘only’ 99th percentile or 99.8th percentile talent. A hypothetically perfect sort where relative talent is static would do that, but neither half of that is true. Nor do you get locked out of most ‘dreams’ worth having if you get somewhat off track. There are certainly some that do have strict tracks, but they are that way because they are oversubscribed and mostly generic dreams and even then you mostly have redraws if you care enough.

John Wentworth offers Generalized Hangriness: A Standard Rationalist Stance Towards Emotions. Being angry because you are hungry means your anger is ‘wrong’ in its explicit claims, but it contains the useful information that you are hungry. Thus, the correct stance towards experiencing an emotion is to ask what information it actually provides you. A strong emotion is trying to tell you something is important, but you have to figure out what is the proper something.

Elizabeth: For readers who need the opposite advice: I don’t think the things people get hangry about are random, just disproportionate. If you’re someone who suppresses negative emotions or is too conflict averse or lives in freeze response, notice what kind of things you get upset about while hangry- there’s a good chance they bother you under normal circumstances too, and you’re just not aware of it.

Similar to how standard advice is don’t grocery shop while hungry, but I wouldn’t buy enough otherwise.

You should probably eat before doing anything about hangry thoughts though.

Benquo: Unless you’ve observed that you tend to unendorsedly let things slide once you’re fed. In that case, better do something about the problem while you’re hangry.

I would generalize this even further than Ben Pace does here:

Ben Pace: This rhymes with how one treats feature recommendations from users. It is typically the case that a user advising you to make a change does indeed have a problem when using your product that they’re trying to solve, and you should figure out what that problem is, but their account of how to solve it (what ‘improvement’ to make) is usually worth throwing out the window.

Emotions also have practical effects beyond their information content, so you want to watch out for and optimize those as well. One aspect John does not get into is that you need not take your emotional responses as givens.

John Wentworth also notes that his empathy is rarely kind, that trying to imagine things from someone else’s perspective can easily lead to the exact opposite of empathy if you would then view their decisions, in particular their lack of effort or willingness to apply effort to fix things, with disgust. Several comments point out that this could be seen as a failure to model their actual cognitive state, but why should we presume that should lead to empathy? The general case version of this resonates with me quite a lot.

Diverse workforces do not seem to lead to greater (or lesser) profits, and the supposed McKinsey study people keep citing to claim the contrary, as far as we can tell, fake.

Santi Ruiz: The McKinsey study that claimed diverse workforces lead to bigger profits was always fake (they won’t share data, it doesn’t replicate for the S&P 500 or other settings, and it doesn’t make sense). But fake social psych research is a demand problem, not just a supply problem.

I disagree that the finding doesn’t make sense. Like many things in social psych, you can tell a plausible story of effects in either direction, or of no effect.

A firsthand report of a jury trial (for molestation) in Georgia.

True story:

Patrick McKenzie: I think many people would be surprised at the difficulties billionaires have in converting money into smart people and/or their outputs.

Casey Handmer: It is so hard that for essentially anything non trivial it still has to be done personally.

The examples of this are too numerous to count. Musk’s companies could not have succeeded unless he was in the driver’s seat for much of the time. By contrast, Google X, Virgin rockets, Blue Origin all had the best people and tech that money could buy – but it wasn’t nearly enough.

I see this on an almost weekly basis now. Anything sufficiently interesting is not fungible in money. The supply is extremely inelastic.

You must have an army of stringently curated and boldly led mechanical engineers.

Paper offers a bizarre thesis, that algorithmic collusion between sellers on a platform like Amazon helps consumers because they collude to lower advertising costs and this outweighs the effect of colluding directly on price. I notice my skepticism because if within-platform ads raise less revenue the platform should reclaim those costs via higher commissions, which should raise prices by the same amount. I note that o3 thought that there wasn’t room for Amazon to do this, but that’s weird.

Did the UK’s dominance fail because of emigration away from the home islands? The argument here is that developed economies don’t diverge that much on GDP per capita, but I don’t think this means the UK keeps similar GDP per capita in the alternative world, especially if we’re not on the margin and talking about 200 million people living there. The OP admits those people staying home would be a loss of welfare but I also assume it would have made the UK a lot poorer and also that population would have balanced largely in other ways.

A much better and simpler story is that the UK home islands simply didn’t have enough land and natural resources, which is why there was so much emigration in the first place? Europe was never going to be able to sustain its economic advantages indefinitely.

And of course in recent times, the UK has been dying mostly of self-inflicted wounds, such as effectively banning the construction of housing, and now the saying of words.

An excellent point:

Byrne Hobart: An interesting corollary to this is that the more words it takes for someone to explain a concept to you, the greater the proportion of jobs you’ll dismiss as “bullshit jobs.” I’ve observed this, too!

Note that the jobs here could be described in three words just fine, all you have to do is lose a little detail, on the level that ‘I catch fish’ simplifies fisherman. He doesn’t catch all fish everywhere, after all.

  1. Software sales analyst.

  2. Improve automated capabilities.

  3. Create blockchain recorders.

Only three words is largely about negative space. Observe these job descriptions I brainstormed quickly:

  1. Sit at desk.

  2. Let boss yell.

  3. Fetch the coffee.

  4. Pitch investors.

  5. Diversity training monitor.

  6. Cash the check.

Also some good ones in the replies, like ‘I send emails’ or ‘creating shareholder value.’

Also note that it’s ‘if you can’t do it, it’s bullshit’ not ‘if you can do it, it’s not bullshit.’

Why does the trick still mostly work? Because the fact that you have a bullshit job predicts not that you can’t describe it in three words, but that you will choose not to.

Polymarket is on its way back to (being fully legal in) America, baby!

Shayne Coplan: Polymarket has acquired QCEX, a CFTC-regulated exchange and clearinghouse, for $112 million.

This paves the way for us to welcome American traders again.

I’ve waited a long time to say this:

Polymarket is coming home 🇺🇸🦅

Owning a DCM and DCO will let us serve all American traders and brokerages.

This acquisition isn’t just about a license; it’s Polymarket’s homecoming, returning stronger and ready to serve American users once again.

The best part about this is that this comes on the heels of the BBB plausibly making professional sports betting essentially illegal in America, since you can only deduct 90% of losses while being taxed on 100% of gains. If that is applied to individual wagers, then no one has an edge big enough to overcome it, so gamblers would have to either give up the gambling or give up on paying their taxes.

But if you buy a sports futures contract under CFTC rules, then you get normal tax treatment, and you’re back in business.

This could all end up being a blessing in disguise. The current licenced sportsbooks in America offer highly non-competitive pricing, focus on pushing you towards predatory behaviors and products, and aggressively limit winners. Once Polymarket gets sufficient liquidity, trading there is remarkably cheap, and you are naturally pulled towards behaviors that have little cost even if you are betting at random.

However bad you think companies like FanDuel are, they’re worse.

Ryan Butler: FanDuel reports 16.3% sportsbook gross gaming revenue margin in June, the highest mark in company history

This is roughly triple Nevada sportsbooks’ historic hold percentage from before FanDuel launched its book in 2018.

There is no way to make 16.3% profit on wagers in general without being deeply, deeply predatory, even if all of your customers are suckers. Someone betting a normal NFL line fully at random only loses 5%.

Argentina’s salaries outgrow profits as share of GDP, despite the fact that real public sector wages have been falling.

Whenever there is a graph that blows your mind every time you see it, chances are good it turns out to need a correction. Despite that, the corrected graph (the one shown below) is still rather mind blowing.

The Rich: people have no idea what life was like before they were born

If a statistic or claim sounds absurd and wrong, you can check the sources. Often this reveals the whole thing was bogus. Thread has some examples, several of which I can confirm because I too checked the sources or otherwise know the story.

We could stop spending so much time at airports simply by not telling people to spend so much time at airports. Who is telling people to get there 2.5-3 hours before their flights? Why in the world? I am in the ‘never miss a flight’ camp, and even then one hour is fine if it is reliable (e.g. you are taking trains).

Remember those claims that gas stoves caused large increases in asthma cases? That study had a major conflict of interest and also didn’t hold up, once corrected the impact was not significant.

You can identify outlier people by noticing you cannot predict what they are going to say next. That is not always good, but it often is very good. Whereas most people rarely break out of predictable scripts. Which in many circumstances is also good.

The theory that all the abundance and YIMBY progress can largely thank the MCU version of Thanos, as in Marvel finally making a Population Bomb Guy the villain.

Whereas yes, a large portion of children’s media has been for decades or more straight up eco propaganda and says the ultimate evil is humans wanting to build and do things, or even wanting to exist.

Roman Helmet Guy: “Remember that the Earth’s resources are limited. You do not need to have a big family, because all the world’s people are your brothers and sisters.” You live in the most propagandized society in history.

Joe Lonsdale: A top kids’ show for much of the ‘90s had Malthusian / anti-natalist, globalist nonsense alongside its eco proselytizing.

It’s not a coincidence; if you go into its main backer Ted Turner’s office, a huge painting has his head in the sky, nearby a US flag turned into a UN flag.

Charles Fain Lehman: As my older son moves on from picture books, it’s stunning to me how much children’s media is just non-stop eco propaganda. “Humans are bad for the earth, you should feel guilty about this” is the constant message.

Matthew Yglesias: Paisley Paver did nothing wrong.

It is getting a lot easier to avoid. There is so much to choose from, so you don’t get whatever is on broadcast TV forced upon you, and similarly you can filter the books, and also the broader marketplace seems to be pulling things back. It’s still rough out there.

I love that yes, Trey Parker and Matt Stone can indeed keep getting away with this, and I love that Trump’s response to being attacked like this was to accuse the left of hypocrisy for being happy about it. That’s the spirit.

Megan McArdle on the cancellation of Steven Colbert’s The Late Show as reflecting the loss of shared culture. She oddly ties this to the extra 99 minutes a day we don’t leave the house, which historically was how people ended up watching late night, but now we watch more tailored content. Which in general is an improvement.

I do think there have been some fantastic late shows that I was happy to watch, in particular Taylor Tomlinson’s After Midnight and previously Craig Ferguson’s Late Late Show, or early Daily Show and Colbert Report, but I found most late shows bad and essentially unwatchable. That includes Colbert’s Late Show run, and I’m actually really happy for him to get a new show or podcast instead where he can do more interesting things. Free Colbert, as it were.

The Panama Playlists, see what various people listen to. Remember that Spotify playlists are public by default.

You can buy nonrefundable vacations from other people at a discount, typically 20%-30%, sometimes more especially with a last minute sale. According to WSJ’s Mark Ellwood the top sites that do this are legit and guard against fraud, pointing to SpareFare, Roomer, Plans Change and Transfer Travel, and on the high end Eluxit.

A discount does not mean a ‘good deal.’ Vacation markets are super duper inefficient. But also these are going to mostly be forced sellers, without natural buyers, and buyers might have gotten discounts to begin with by booking in advance, so if you can figure out what is a good deal (use AI for this?) you can probably find pretty good bargains.

The best part is that you have to buy one of a small number of particular packages. You avoid choices, and as we all know Choices Are Bad. Instead of comparing this vacation to all possible choices, and sweating planning and decisions, you take what is available and you show up and that is it. If something isn’t a great fit for your preferences, you have an excuse to go outside your comfort zone and you don’t feel like you punted. It actually sounds nice.

Thiccy Thot calls this ‘the jackpot age,’ with people not valuing survival or optimizing for mean results like they should, and urges people not to chase jackpots. As an illustration he offers this game, which is effectively a St. Petersburg paradox variant. The EV on each flip is great but the more you flip the more likely it is that you lose.

Assuming I cannot hedge the flip and no tax implications? I do notice I am past the point where I would flip, the marginal value of money is declining too rapidly.

Should Magic: The Gathering emergency ban either Agatha’s Soul Cauldron (galaxy-level move) or Vivi Ornitier (safe and obvious play), or accept that all the good players in Standard will be playing the same deck until the next window?

There is a long history in Magic of players discussing the need for emergency bans, and then mostly not getting such bans, as Wizards has placed very high value in sticking to its announcement windows outside of true emergencies. They’ve shown time and again they’d rather let Standard wither and be terrible for months on end. Usually there is a lot of talk about letting the players find a solution, long after it is clear that there exists no solution.

I have long disagreed with this policy. I disagree with it even more today, as information is found and spreads faster and there is tons of statistical data. Drop the ban hammer. Do it now.

Chess.com has a team of 30 people that ban 100,000 accounts per month for cheating and unfair play, 40% of the accounts get banned within their first two weeks. The article presumes this means they are doing a good job catching cheaters, but even if you assume minimal false positives that is not obvious. If we were doing a better job catching cheaters presumably people would be doing it less?

Optimization for thee but not for me, I insist:

Jorbs: looked something up about a game and someone posted that you should resolve an issue a certain way unless “you enjoy making suboptimal decisions” and that is such a funny thing for a human who spends their time answering rules questions on forums for board games to write.

Clair Obscura Expedition 33 continues to go well as I move into Act 3, despite some frustrating design mistakes.

One that I’m rather annoyed by is that at some point (not a meaningful spoiler) there is a character you pick that the game is telling you that you need to have in your party or they won’t learn their skills, similar to for example a Blue Mage in Final Fantasy V. I find this really annoying because that’s not who I enjoy having in the party on an aesthetic level, but it feels bad missing out, and even worse not knowing if any given battle is a place you would miss out. Grr. I’ve mostly decided I don’t care.

Even if you ignore that issue, the way upgrades work, both with Color of Lumina and weapon upgrades, effectively locks you into a party. I chose Luna and Maelle because I find that fun and more central to the plot. I’m happy with my choices but sad that the game punishes experiment like this.

Another main complaint is that balance is often lacking. Decisions that should be interesting instead feel forced. There’s also a big ‘too awesome to use’ problem with certain resources, especially Color of Lumina.

My biggest complaint is that it is very easy to get turned around, or for it to be otherwise unclear how to move on to the next area. Several times I have been extremely frustrated and effectively stuck, including right now as I type this inside the monolith. I am fine with navigation as an interesting puzzle or decision, but this does not feel like that.

There are a bunch of things in Act 3 that are deeply confusing or rediculous, but all of them seem highly optional. If you want to go completionist that’s your call.

I think I largely buy this argument that one job RPGs have big advantages over RPGs where you choose your class. They can do a lot more fun customization.

Itch.io apologises after, to satisfy its payment processor, nuking thousands of WSFW games with no notice. Those who have purchased the games report they cannot download them and no refunds are being offered, although itch.io claims they can still be downloaded. Payouts are halted. Itch.io claims the delistings will mostly be temporary and can be individually cured once they get their new house in order.

Then it turns out Stripe is only clamping down because their own banking partner is threatening to clamp down on Stripe, and they are themselves seeking a way out.

To summarize, this keeps happening:

To be fair to itch.io, they are over a barrel and do intend to bring the banned games back. They are looking for a new payment processor as a way out.

It seems Ross Vought might be behind all this push, including a general push to effectively ban pornography?

Worlds Beyond is doing amazingly well for Magic: The Gathering. Final Fantasy made them $200 million in one day, and doing far better than any previous set, so much so that they could not meet demand. I didn’t love the flavor details of many of the cards, clearly the market disagrees or cares little, and everyone says the limited format is great.

Even Lord of the Rings took six months to get to that point, and that set is still selling several years later. Spiderman is up next. They see Japan as a ‘potential gold mine’ for more material.

Perhaps this was always the endgame for Magic. We had decades of our own storylines and worlds, but once Magic went sufficiently big and mainstream and moved away from competition and two-player games towards Commander, being the meta-IP for all of fantasy (and perhaps beyond it) makes too much sense, and it will only feed on itself until and unless it wears the product out.

This also seems like a solution for running out of design space. There’s no shame in it after three decades. Magic has mostly fully mined the simple stuff that works, forcing complexity to drift higher and the mechanics that work are getting continuously recycled, even if they get new names. If you want to have higher complexity and repeat mechanics forever, top down is where it is at.

Boen seems largely correct here:

Boen: we used to joke about this, but gambling mechanisms, metagame progression, interaction extenders, timewasting filler etc has all become so commonplace that most people genuinely just think that that’s what videogames are now & get confused/angry when you say that stuff is bad.

“metagame progression” is specifically like call of duty where a leveling system strings players along with little upgrades to keep playing. many stop after reaching max lvl, which betrays the fact that the “real game” is unfortunately shallow and boring absent external incentive

Andre Treiber: I’m with you on a lot of these, but I actually really enjoy meta progression as a mechanic. I really enjoy roguelite experiences and unlocking new things and making old challenges grow trivial is a rewarding part of the gameplay.

Boen: Yeah there’s some subtlety here. I think that the type of thing people mean by “meta progression” in the context of roguelikes and stuff like that, is actually much more akin to regular game progression, not meta at all, which is of course a pillar of game design & not a problem.

Roguelite metagame progression can be very good. I especially like it when you are unlocking additional abilities over time while you are not close to winning the run, and when the amount of progression you make determines what you unlock and is part of strategic decision making.

What annoys me quite a bit are situations in which you are reliably winning runs, there are higher difficulties that would be interesting, and the game wastes a bunch of your time getting to them. The worst version of this is when you are winning your runs but also unlocking capabilities faster or almost as fast as the extra difficulty kicks in, so the game doesn’t get harder for a long time and you’re skilling up on top of that. The central example of this I remember is Roguebook.

The other stuff is really terrible. The thing is, you could simply not do these things? Unless you are getting them on microtransactions there is no real advantage to keeping a player playing Call of Duty for 100 hours instead of 50 hours, not having any fun. Many games are using these techniques without the microtransactions. Stop it.

As per Manifold, current expectations are for roughly 600k Waymo rides per week by EOY 2025, and perhaps 1.5 million per week by EOY 2026. I’m definitely sad we cannot go faster.

Boston’s unions attempt to ban driverless taxis, because They Took Our Jobs. The statements at the debate were even more absurd than I expected, which is on me.

Timothy Lee: Mejia considered it “very triggering” for Waymo to use the term “driver” to describe a technology rather than a person.

City Councilor Benjamin Weber found it “concerning to hear that the company was making a detailed map of our city streets without having a community process beforehand.” He added that “it’s important that we listen when we hear from the Teamsters and others who feel as though they’re blindsided by this.”

“I think it’s important that we pause—sometimes we rush—and make sure everyone’s voice is heard before anything happens that we can’t turn back from and that protections are in place for our workers,” said City Councilor Erin Murphy.

The next day, Murphy announced legislation requiring that a “human safety operator is physically present” in all autonomous vehicles—effectively a ban on driverless vehicles. Given the near-unanimous hostility Waymo faced at the hearing, I wouldn’t be surprised if Murphy’s proposal became law in Boston.

There is good news elsewhere:

On the other hand, legislators in Washington DC and New York State have introduced bills to open the door to driverless vehicles—though it’s not clear if these bills will become law. Legislators in New Jersey, Maryland, and Virginia could also act on driverless vehicle technology in the next year or two.

Timothy Lee, who is an expert at following developments here, fears that blue states and cities might indeed ban self-driving cars, and we could get to a 2035 where red states had tons of autonomous vehicles and blue states and cities have none.

It is not impossible, and certainly some amount of delay is in general likely, but I think we are going to win this one rather easily over time. It is impossible not to notice how much Waymo has improved San Francisco and other cities, and how much more it will improve it when supply goes up and thus costs and wait times go down. The lifestyle impact is dramatic and I do not expect the public in blue cities to accept being left behind.

Alec Stapp: We should be much more explicit about the tradeoff here:

The Teamsters are demanding that we let thousands of people die in car crashes in order to protect their jobs.

Chris Freiman: When Teamsters try to block life-saving technology to protect their jobs.

Alec is not wrong about self-driving cars preventing deaths, yet I would prefer to not make that the main argument. Quite often the protectionist laws, including union rules and things that destroyed childhood in America, are imposed in the name of that same ‘otherwise people will die’ style of rhetoric. What we should focus on are the far more important and massive other transformative benefits.

The jobs that they are trying to ‘protect’ are worse than useless. We would be requiring that people be paid to sit in cars and drive them all day, mostly not enjoying doing this or otherwise benefiting, doing the task worse than the AI could, in order to justify a transfer of wealth to those people.

Adam Thierer, together with Mark Dalton, proposes Federal-level regulation on self-driving, allowing Level 4-5 automated driving systems (ADS) nationwide under a new safety framework under the extremely poorly named ‘America Drives’ act (since this involves America not driving, that is the entire point).

The parallels and contrast to the insane AI moratorium are obvious, with concerns about ‘patchworks of state and local laws’ and localities doing crazy things like requiring a human driver be present as per Boston above.

Here I am fully on board. We know what self-driving looks like and do not expect it to change in unexpected ways. We are creating a new federal standard and set of regulations that would work well. We have extremely strong evidence that expanding self-driving increases safety and saves lives. We also do not have to worry about existential or catastrophic risks, or that things could develop to a point where our mistakes could not be fixed once we notice them.

Whereas all these considerations go the other way with respect to AI.

This below would be quite the exciting area to cover. I am curious how they got (if they got?) permission to cover SFO and cross the bridges and such.

Tesla AI: Invites to our Bay Area ride-hailing service are going out now

I’ve taken a break, but I’m hoping to make my comeback as per usual in September. Football! Football! Football! Football! Football! Football! Football! Football! Football!

Trump reportedly is going to sign an executive order to limit NIL money for players? Funny how he thinks he can just order things to happen like this.

NFL teams are trying out bizarre angled kickoff strategies now that a touchback puts the ball all the way at the 35 yard line. This sounds like it would have been great last year too, which highlights how often there are big gains lying around that no one bothers to try and exploit until they are forced into it, even in places like the NFL.

Apple is trying to get the rights to Formula 1, bidding substantially higher than ESPN. Reportedly MLS regrets making a similar deal, as dealing with Apple causes fewer people to watch the games, which is a risk for F1 although I agree with Ben Thompson that no one was watching much of MLS anyway.

I suspect it works the other way around. More and more households are giving up cable. If you are want to watch F1 and are told you need a cable package you would not otherwise get, that is super expensive. Apple TV is a lot cheaper and you can flip it on and off as needed.

Where I have a bunch bigger concern is the interface. It is absurdly terrible. They don’t give you an easy way to find what you want. When they do, they ruin it. I went to watch a Mets game on Apple TV and the icon for the game in question literally spoiled the final score, prominently, on purpose. Well, so much for that, and I think that played a substantial part in me giving up on watching the Mets this season.

The better question is, shouldn’t they be making a deal with Netflix? F1 has grown so much lately because of Drive to Survive. The synergies there seem fantastic. Netflix is optimizing for engagement and working on selling ads, and has a larger viewing base, so they should be happy to double down and match the $120m-$150m per year bid.

On the one hand, I stand by the claim that most of the moral panics about social media were directionally accurate. Social media wrecked quite a lot of things.

I agree that some of the accusations in hindsight went too far, and we should be skeptical of claims of the form ‘social media broke America,’ the same way you should be skeptical that television ‘broke America,’ or even that America is broken at all. I still think it is clear there are quite a lot of downsides at the personal and societal levels.

On the other hand, we should not dismiss the upsides. It really is much easier to meet, keep up with, talk with and coordinate with your friends. Our access to information of all sorts is vastly better if you know how to filter it. I couldn’t do what I do in this form without Twitter. They took a lot from us, but we got a lot in return.

A lot of this comes down to whether what they took from us was good, actually. Do you actually want to socialize with the people who happen to be physically proximate? Do you actually want to invest the required time?

Tenobrus: everybody’s in the replies like “but i found all my communities and friends with social media!!” sure but the counterfactual isn’t u being isolated, it’s returning to the era where irl socializing was common and easy and ur best friend was some guy u met in a park.

Social media is largely what *causedall that to fail. it created the problem it’s now “solving” for you.

Tracing Woodgrains: I grew up in a strong community with many opportunities for in-person socializing with the normal, pleasant people around me.

As a result, I was desperately lonely until I found all the weird people like me online. my social life has been exceptionally good since

There’s something to be said for that sort of community. a lot to be said for it, really! I was glad to grow up in it and there’s a lot about it that I miss.

But it could never have been anywhere near as good as niche online communities for the sorts of socialization I like.

Not here to call anyone “NPCs” or anything. I just have niche interests and basically only those interests, and it turns out it’s much easier and more pleasant to find conversations with people who share those interests online than in-person

Since I moved in, I’ve had a chance to meet a handful of our neighbors in this building. They are, without exception, lovely people who would be good friends. And yet, I do not invest time in trying to hang out with them, because even though they are great and we are living in the same place we (with one exception of someone I know from elsewhere where I do want to make the time but it hasn’t worked out yet) have little in common.

I once met my (for about 10 years) best friend in the park. But that was because the park had a chess club.

I’m still very interested in connecting with other families with kids, to help my kids make friends. And sometimes there’s a great match. But mostly? I’m good. Yes, social media caused the old system to fail. Largely that’s because the old system wasn’t great, especially for unusual people. The thing is, it still beats the hell out of nothing, or not having any friends at all.

TikTok, the illegality of which we really should enforce, to its credit institutes a Community Notes style feature called Footnotes.

I continue to be on the ‘For You is bad tech never use it’ side.

David Manheim: Update: [Twitter’s For You feed] does provide additional interesting tweets, but I’ve decided that the cost (both engagement-maximizing bullshit rabbit holes and opportunity costs of what I’d be looking at in my other lists) is much too high.

Der B: Using “not interested in that post” for engagement maximizing bullshit solves this, in my experience.

David Manheim: I tried that for a month, alas. You can push against specific types, which works a bit, but the system keeps working to find some new type of discussions to hook you.

The algorithm is not your friend. Ultimately it is your enemy. Do not engage.

Only 7% of time on Instagram and 17% of time on Facebook involves consuming content from friends. They have become primarily few-to-many sharing businesses, with an emphasis on video. The question is, how much is that displacing consuming friend content versus supplementing it?

I don’t love the dynamic that following someone on Twitter or elsewhere does several things at once. In addition to you seeing their posts and interactions, it is social proof for the person you follow, and also informs others about you.

David Manheim: The impact of following someone on twitter is not only to show your their content, but also (critically) it functions as social proof about them to others who follow you.

In other contexts I’ve seen various dramas around who does or does not follow or unfollowed or blocked who on Instagram, and I presume there is fear of being caught following someone who was cancelled, it is this whole todo. Luckily in the Twitter worlds I know this gets ignored except for very high profile follows like Elon Musk.

There are many legitimate problems to have with OpenAI but Elon Musk instead has chosen to be in a glass house throwing stones towards a brick one.

Stop deboosting links on Twitter. Especially stop penalizing Substack, including any mention of the word ‘substack.’ Then maybe we can talk.

Elon Musk: Apple is behaving in a manner that makes it impossible for any AI company besides OpenAI to reach #1 in the App Store, which is an unequivocal antitrust violation. xAI will take immediate legal action.

Meta AI has also briefly been #1 in the app store, so quality is also unnecessary.

Sam Altman: This is a remarkable claim given what I have heard alleged that Elon does to manipulate X to benefit himself and his own companies and harm his competitors and people he doesn’t like.

Lots has been said about this, here is one thing [that points out Elon Musk created a special system for showing you all his Tweets first.]

I hope someone will get counter-discovery on this, I and many others would love to know what’s been happening. But OpenAI will just stay focused on making great products.

Nikita Bier: Perhaps it is you who is manipulating your products to your benefit, by putting warnings on every link to a competitor?

Dve Hazarika: I agree with Nikita that apps should not manipulate behaviors to discourage linking to other sites.

jpa: they do this for every url sent in the chat text.

And of course:

xlr8harder: Fix the specifically targeted Substack links before you point any fingers.

Andrew Rettek: STOP DEBOOSTING LINKS ON [TWITTER].

Also, when you see things like this from a highly Musk-aligned account with 1.5 million followers

You start to wonder how much of this is pure expression of dominance play that indicates loss of any connection to reality or the idea that words have meaning?

There is a sense in which, because Elon Musk helped create OpenAI, ‘I am your father’ is not a crazy statement to make. But then that’s… bad, right? Since Elon Musk is constantly saying Sam Altman and OpenAI are terrible? And this metaphor kind of makes Elon Musk into Darth Vader?

I agree this could be a growing worry, but let’s not get ahead of ourselves:

Sully Omar: Good chance the internet becomes *extremelyclosed in the next few years.

We already see it with x/openai/anthropic (and most social media companies) where they actively discourage other companies from sharing/using links etc to other platforms.

And it likely will get worse as the platform does not want external agents using their data

Why? Because a competitor could create a better ai agent, and the user would never touch that platform again, and that’s pretty bad for business

so instead of making a better product, it will either charge for the data (making it $$), or just make their apis significantly worse/ shut off external access directly

Either way sucks for consumers if we go down this path.

Then again, we are also seeing MCP connectors and AI by default will make integration and automation and getting around any and all of this easier. What OpenAI and Anthropic do to ‘discourage’ clicking on links is at most very light, and those links are links they are choosing to provide. I just did a GPT-5-Thinking query that provided links, and I clicked on the links, and it was fine.

It is a dangerous game to shut AI agents out of your website. Right now, the AI agents are not good enough, so I would shrug and go to your website manually. But a year from now, I am probably going to be using an AI agents to book flights and make basic purchases and so on. What happens if Orbitz shuts me out and Kayak doesn’t, or vice versa? The answer isn’t ‘oh then I will be forced to go to one manually.’

Even Amazon should worry about this. Currently my default is Amazon to buy things, but Amazon needs to outcompete all of AI search if they want to keep that position.

There are two distinct sources of restriction here. We have restriction of access, especially via API, to shut out AIs. And then we have restriction of exit, discouragement of links. The first could be forced upon many because the alternative lets you be eaten. I am very confident the second one is mostly a mistake, but a lot of people seem determined to repeat it.

To be fair to Musk and xAI, for all the problems I have with both how they handle Twitter and what they’ve done with Grok, Grok does seem to take the whole ‘speak truth’ thing rather seriously when it isn’t being directly fed a particular ‘truth.’

Aian McLaughlin (OpenAI): i think it’s under-discussed how grok really is a pretty good truth maximizer. kudos to the team. training smart models is hard

Simeon: Yes! Congrats xAI team for maintaining a fairly truth seeking chatbot! I’d be really excited if xAI managed to keep improving on their core mission. As the informational footprint of these rising sapients grows, we’ll increasingly need it.

It also is rather telling that Elon’s response to claims he mainpulates Twitter is to point out an example where it was not manipulated, as proof.

Here’s a funny question, do you need your bookstore to have cashiers in order for it not to simply be Amazon But We Made Everything Worse? I get the instinct but my answer is no, the thing you are being offered is the ability to browse the books and see how they’ve chosen to highlight them and lay things out, and perhaps sit and read. But I say this as someone who cannot remember the last time he’s set foot in a physical bookstore.

(Also at the link, there is appreciation for a bakery providing a free piano.)

The errors made in the Tea app that let it get hacked multiple times were even stupider than they appeared.

Right after writing about Tea I was informed they had a second far more damaging data breach, again due to a very stupid bug, this time exposing DMs and other activity on the site, in ways trivial to trace back to real identities. Given that has happened, I would treat Tea the way you would information that someone could leak or subpoena, as in I wouldn’t type anything into Tea you wouldn’t want on the front page of The New York Times.

Then someone went out and made TeaOnHer, as in Tea for men to spill about women. It got to number 2 in the lifestyle app section behind Tea. And then, of course, and let me tell you I feel bad for zero of the people involved in this:

Shoshana Weissmann: FINALLY true gender equality for men and women! This is what we have fought for!!!

ICYMI someone made a Tea app for men and it has basically no security and everyone’s IDs and info is leaked.

Techcrunch walked through the Tea for Him leak, where they managed to speedrun in 10 minutes by accessing the API they weren’t supposed to access and which allowed unauthenticated access to user data. Yep. Then it proved remarkably difficult to tell the app about the flaw. This particular gaping hole has since been fixed.

There is a fun debate between ‘no human would be so stupid as to, this must have been vibe coding’ and ‘no AI would be so stupid as to do this, it can’t be vibe coding.’

For those in the first camp, I refer them to the Sixth Law of Human Stupidity, which states that if you say ‘no one would be so stupid as to’ then you know someone is indeed so stupid as to.

I do not understand how Apple’s Parental Controls could remain fully jailbroken by a constant text string for three years, while the time limit settings doesn’t work and the screen usage charts were inaccurate and so on. Complete failure. When combined with the debacle that was Apple Intelligence, one can’t help but wonder if Apple is largely a broken company without Steve Jobs.

Garry Tan: Steve Jobs was a real one.

Then again, we have this response saying that the MobileMe debacle was because the team told everyone it wasn’t ready and asked for permission to trim features and were told no, then when it fell over they all got yelled at and it was highly demotivating to say the least. Who is to say, also it is possible that this strategy was effective in general even if it failed in that case.

I am still confused that many people actually think that Amazon, Uber and Netflix made America worse.

IYKYK.

Turner Novak: Anyone have her @? She’d be an incredible VC associate.

I mean that’s a lot but can’t we filter the messages? Prison seems like a lot.

The community has a note.

Don’t let the haters tell you different, this graph is awesome, and points out two important facts about the world.

First, that we need to build a lot more houses where people want to live.

Second, and this is important, if you want to buy a house you’ll probably have to work for more than a week to buy it.

Bernie Sanders: It is insane that in the richest country in the world, millions cannot afford a home and hundreds of thousands are homeless every night. We need major investments in affordable housing, not tax breaks for billionaires

I was this close.

Tracing Woods: the people I find trickiest to handle online are ones who are clearly not stupid and sometimes raise good points, but pair whatever they say with a lot of belligerent mockery. avoiding responding with substance feels like a dodge, responding tends to lead to trading unproductive barbs while thinking that surely if you explain it just a bit better they’ll get the point

easier when someone is obviously just nasty

there are four or five people who travel in the same spheres as me online who fit this category, and I tend to wind up having a bunch of increasingly frustrating exchanges before blocking and moving on. but it leaves a bad taste in my mouth every time

Paul Graham: Just block them. Life is too short. They never rise above moderately smart anyway.

Discussion about this post

Monthly Roundup #33: August 2025 Read More »

raspberry-pi-intros-new-5-inch-$40-touchscreen-for-your-next-weird-project

Raspberry Pi intros new 5-inch $40 touchscreen for your next weird project

The folks at Raspberry Pi have announced a new touchscreen component for people using boards to create miniature touchscreen appliances: The 5-inch Raspberry Pi Touch Display 2 is a 720p IPS multi-touch screen that’s natively supported by the Raspberry Pi OS and includes mounting holes on the back to make it easy to build integrated all-in-one devices.

The new screen will cost $40 and is available starting today from Pi resellers like CanaKit, Vilros, and PiShop (though some of those retailers already list it slightly above the MSRP).

“Its capacitive touch screen works out of the box with full Linux driver support—no manual calibration required, no hunting through device trees, and no wrestling with incompatible touch controllers,” writes Raspberry Pi software CTO Gordon Hollingworth in the company’s blog post.

The 5-inch touchscreen is a smaller counterpart to the $60 7-inch Pi Touch Display 2 that the company launched late last year. The two screens have the same 720p resolution, but the 7-inch model has slightly wider viewing angles (85 degrees, compared to 80 degrees for the 5-inch screen). Both are compatible with all Pi boards from 2014’s Raspberry Pi 1 B+ onward—with the exception of the Raspberry Pi Zero—and they use power from the board’s GPIO header and a display signal delivered via a ribbon cable connected to the boards’ DSI port.

Raspberry Pi intros new 5-inch $40 touchscreen for your next weird project Read More »

how-a-mysterious-particle-could-explain-the-universe’s-missing-antimatter

How a mysterious particle could explain the Universe’s missing antimatter


New experiments focused on understanding the enigmatic neutrino may offer insights.

An artist’s composition of the Milky Way seen with a neutrino lens (blue). Credit: IceCube Collaboration/NSF/ESO

Everything we see around us, from the ground beneath our feet to the most remote galaxies, is made of matter. For scientists, that has long posed a problem: According to physicists’ best current theories, matter and its counterpart, antimatter, ought to have been created in equal amounts at the time of the Big Bang. But antimatter is vanishingly rare in the universe. So what happened?

Physicists don’t know the answer to that question yet, but many think the solution must involve some subtle difference in the way that matter and antimatter behave. And right now, the most promising path into that unexplored territory centers on new experiments involving the mysterious subatomic particle known as the neutrino.

“It’s not to say that neutrinos are definitely the explanation of the matter-antimatter asymmetry, but a very large class of models that can explain this asymmetry are connected to neutrinos,” says Jessica Turner, a theoretical physicist at Durham University in the United Kingdom.

Let’s back up for a moment: When physicists talk about matter, that’s just the ordinary stuff that the universe is made of—mainly protons and neutrons (which make up the nuclei of atoms), along with lighter particles like electrons. Although the term “antimatter” has a sci-fi ring to it, antimatter is not all that different from ordinary matter. Typically, the only difference is electric charge: For example, the positron—the first antimatter particle to be discovered—matches an electron in its mass but carries a positive rather than a negative charge. (Things are a bit more complicated with electrically neutral particles. For example, a photon is considered to be its own antiparticle, but an antineutron is distinct from a neutron in that it’s made up of antiquarks rather than ordinary quarks.)

Various antimatter particles can exist in nature; they occur in cosmic rays and in thunderclouds, and are produced by certain kinds of radioactive decay. (Because people—and bananas—contain a small amount of radioactive potassium, they emit minuscule amounts of antimatter in the form of positrons.)

Small amounts of antimatter have also been created by scientists in particle accelerators and other experiments, at great effort and expense—putting a damper on science fiction dreams of rockets propelled by antimatter or planet-destroying weapons energized by it.

When matter and antimatter meet, they annihilate, releasing energy in the form of radiation. Such encounters are governed by Einstein’s famous equation, E=mc2—energy equals mass times the square of the speed of light — which says you can convert a little bit of matter into a lot of energy, or vice versa. (The positrons emitted by bananas and bodies have so little mass that we don’t notice the teeny amounts of energy released when they annihilate.) Because matter and antimatter annihilate so readily, it’s hard to make a chunk of antimatter much bigger than an atom, though in theory you could have everything from antimatter molecules to antimatter planets and stars.

But there’s a puzzle: If matter and antimatter were created in equal amounts at the time of the Big Bang, as theory suggests, shouldn’t they have annihilated, leaving a universe made up of pure energy? Why is there any matter left?

Physicists’ best guess is that some process in the early universe favored the production of matter compared to the production of antimatter — but exactly what that process was is a mystery, and the question of why we live in a matter-dominated universe is one of the most vexing problems in all of physics.

Crucially, physicists haven’t been able to think of any such process that would mesh with today’s leading theory of matter and energy, known as the Standard Model of particle physics. That leaves theorists seeking new ideas, some as-yet-unknown physics that goes beyond the Standard Model. This is where neutrinos come in.

A neutral answer

Neutrinos are tiny particles without any electric charge. (The name translates as “little neutral one.”) According to the Standard Model, they ought to be massless, like photons, but experiments beginning in the 1990s showed that they do in fact have a tiny mass. (They’re at least a million times lighter than electrons, the extreme lightweights among normal matter.) Since physicists already know that neutrinos violate the Standard Model by having mass, their hope is that learning more about these diminutive particles might yield insights into whatever lies beyond.

Neutrinos have been slow to yield their secrets, however, because they barely interact with other particles. About 60 billion neutrinos from the Sun pass through every square centimeter of your skin each second. If those neutrinos interacted with the atoms in our bodies, they would probably destroy us. Instead, they pass right through. “You most likely will not interact with a single neutrino in your lifetime,” says Pedro Machado, a physicist at Fermilab near Chicago. “It’s just so unlikely.”

Experiments, however, have shown that neutrinos “oscillate” as they travel, switching among three different identities—physicists call them “flavors”: electron neutrino, muon neutrino, and tau neutrino. Oscillation measurements have also revealed that different-flavored neutrinos have slightly different masses.

Neutrinos are known to oscillate, switching between three varieties or “flavors.” Exactly how they oscillate is governed by the laws of quantum mechanics, and the probability of finding that an electron neutrino has transformed into a muon neutrino, for example, varies as a function of the distance traveled. (The third flavor state, the tau neutrino, is very rare.) Credit: Knowable Magazine

Neutrino oscillation is weird, but it may be weird in a useful way, because it might allow physicists to probe certain fundamental symmetries in nature—and these in turn may illuminate the most troubling of asymmetries, namely the universe’s matter-antimatter imbalance.

For neutrino researchers, a key symmetry is called charge-parity or CP symmetry. It’s actually a combination of two distinct symmetries: Changing a particle’s charge flips matter into antimatter (or vice versa), while changing a particle’s parity flips a particle into its mirror image (like turning a right-handed glove into a left-handed glove). So the CP-opposite version of a particle of ordinary matter is a mirror image of the corresponding antiparticle. But does this opposite particle behave exactly the same as the original one? If not, physicists say that CP symmetry is violated—a fancy way of saying that matter and antimatter behave slightly differently from one another. So any examples of CP symmetry violation in nature could help to explain the matter-antimatter imbalance.

In fact, CP violation has already been observed in some mesons, a type of subatomic particle typically made up of one quark and one antiquark, a surprising result first found in the 1960s. But it’s an extremely small effect, and it falls far short of being able to account for the universe’s matter-antimatter asymmetry.

In July 2025, scientists working at the Large Hadron Collider at CERN near Geneva reported clear evidence for a similar violation by one type of particle from a different family of subatomic particles known as baryons—but this newly observed CP violation is similarly believed to be much too small to account for the matter-antimatter imbalance.

Charge-parity or CP symmetry is a combination of two distinct symmetries: Changing a particle’s charge from positive to negative, for example, flips matter into antimatter (or vice versa), while changing a particle’s parity flips a particle into its mirror image (like turning a right-handed glove into a left-handed glove). Consider an electron: Flip its charge and you end up with a positron; flip its “handedness”—in particle physics, this is actually a quantum-mechanical property known as spin—and you get an electron with opposite spin. Flip both properties, and you get a positron that’s like a mirror image of the original electron. Whether this CP-flipped particle behaves the same way as the original electron is a key question: If it doesn’t, physicists say that CP symmetry is “violated.” Any examples of CP symmetry violation in nature could help to explain the matter-antimatter imbalance observed in the universe today. Credit: Knowable Magazine

Experiments on the horizon

So what about neutrinos? Do they violate CP symmetry—and if so, do they do it in a big enough way to explain why we live in a matter-dominated universe? This is precisely the question being addressed by a new generation of particle physics experiments. Most ambitious among them is the Deep Underground Neutrino Experiment (DUNE), which is now under construction in the United States; data collection could begin as early as 2029.

DUNE will employ the world’s most intense neutrino beam, which will fire both neutrinos and antineutrinos from Fermilab to the Sanford Underground Research Facility, located 800 miles away in South Dakota. (There’s no tunnel; the neutrinos and antineutrinos simply zip through the earth, for the most part hardly noticing that it’s there.) Detectors at each end of the beam will reveal how the particles oscillate as they traverse the distance between the two labs—and whether the behavior of the neutrinos differs from that of the antineutrinos.

DUNE won’t pin down the precise amount of neutrinos’ CP symmetry violation (if there is any), but it will set an upper limit on it. The larger the possible effect, the greater the discrepancy in the behavior of neutrinos versus antineutrinos, and the greater the likelihood that neutrinos could be responsible for the matter-antimatter asymmetry in the early universe.

The Deep Underground Neutrino Experiment (DUNE), now under construction, will see both neutrinos and antineutrinos fired from below Fermilab near Chicago to the Sanford Underground Research Facility some 800 miles away in South Dakota. Neutrinos can pass through earth unaltered, with no need of a tunnel. The ambitious experiment may reveal how the behavior of neutrinos differs from that of their antimatter counterparts, antineutrinos. Credit: Knowable Magazine

For Shirley Li, a physicist at the University of California, Irvine, the issue of neutrino CP violation is an urgent question, one that could point the way to a major rethink of particle physics. “If I could have one question answered by the end of my lifetime, I would want to know what that’s about,” she says.

Aside from being a major discovery in its own right, CP symmetry violation in neutrinos could challenge the Standard Model by pointing the way to other novel physics. For example, theorists say it would mean there could be two kinds of neutrinos—left-handed ones (the normal lightweight ones observed to date) and much heavier right-handed neutrinos, which are so far just a theoretical possibility. (The particles’ “handedness” refers to their quantum properties.)

These right-handed neutrinos could be as much as 1015 times heavier than protons, and they’d be unstable, decaying almost instantly after coming into existence. Although they’re not found in today’s universe, physicists suspect that right-handed neutrinos may have existed in the moments after the Big Bang — possibly decaying via a process that mimicked CP violation and favored the creation of matter over antimatter.

It’s even possible that neutrinos can act as their own antiparticles—that is, that neutrinos could turn into antineutrinos and vice versa. This scenario, which the discovery of right-handed neutrinos would support, would make neutrinos fundamentally different from more familiar particles like quarks and electrons. If antineutrinos can turn into neutrinos, that could help explain where the antimatter went during the universe’s earliest moments.

One way to test this idea is to look for an unusual type of radioactive decay — theorized but thus far never observed—known as “neutrinoless double-beta decay.” In regular double-beta decay, two neutrons in a nucleus simultaneously decay into protons, releasing two electrons and two antineutrinos in the process. But if neutrinos can act as their own antiparticles, then the two neutrinos could annihilate each other, leaving only the two electrons and a burst of energy.

A number of experiments are underway or planned to look for this decay process, including the KamLAND-Zen experiment, at the Kamioka neutrino detection facility in Japan; the nEXO experiment at the SNOLAB facility in Ontario, Canada; the NEXT experiment at the Canfranc Underground Laboratory in Spain; and the LEGEND experiment at the Gran Sasso laboratory in Italy. KamLAND-Zen, NEXT, and LEGEND are already up and running.

While these experiments differ in the details, they all employ the same general strategy: They use a giant vat of dense, radioactive material with arrays of detectors that look for the emission of unusually energetic electrons. (The electrons’ expected neutrino companions would be missing, with the energy they would have had instead carried by the electrons.)

While the neutrino remains one of the most mysterious of the known particles, it is slowly but steadily giving up its secrets. As it does so, it may crack the puzzle of our matter-dominated universe — a universe that happens to allow inquisitive creatures like us to flourish. The neutrinos that zip silently through your body every second are gradually revealing the universe in a new light.

“I think we’re entering a very exciting era,” says Turner.

This article originally appeared in Knowable Magazine, a nonprofit publication dedicated to making scientific knowledge accessible to all. Sign up for Knowable Magazine’s newsletter.

Photo of Knowable Magazine

Knowable Magazine explores the real-world significance of scholarly work through a journalistic lens.

How a mysterious particle could explain the Universe’s missing antimatter Read More »

spacex-reveals-why-the-last-two-starships-failed-as-another-launch-draws-near

SpaceX reveals why the last two Starships failed as another launch draws near


“SpaceX can now proceed with Starship Flight 10 launch operations under its current license.”

SpaceX completed a six-engine static fire of the next Starship upper stage on August 1. Credit: SpaceX

SpaceX is continuing with final preparations for the 10th full-scale test flight of the company’s enormous Starship rocket after receiving launch approval Friday from the Federal Aviation Administration.

Engineers completed a final test of Starship’s propulsion system with a so-called “spin prime” test Wednesday at the launch site in South Texas. Ground crews then rolled the ship back to a nearby hangar for engine inspections, touchups to its heat shield, and a handful of other chores to ready it for liftoff.

SpaceX has announced the launch is scheduled for no earlier than next Sunday, August 24, at 6: 30 pm local time in Texas (23: 30 UTC).

Like all previous Starship launches, the huge 403-foot-tall (123-meter) rocket will take off from SpaceX’s test site in Starbase, Texas, just north of the US-Mexico border. The rocket consists of a powerful booster stage named Super Heavy, with 33 methane-fueled Raptor engines. Six Raptors power the upper stage, known simply as Starship.

With this flight, SpaceX officials hope to put several technical problems with the Starship program behind them. SpaceX is riding a streak of four disappointing Starship test flights from January through May, and and the explosion and destruction of another Starship vehicle during a ground test in June.

These setbacks followed a highly successful year for the world’s largest rocket in 2024, when SpaceX flew Starship four times and achieved new objectives on each flight. These accomplishments included the first catch of a Super Heavy booster back at the launch pad, proving the company’s novel concept for recovering and reusing the rocket’s first stage.

Starship’s record so far in 2025 is another story. The rocket’s inability to make it through an entire suborbital test flight has pushed back future program milestones, such as the challenging tasks of recovering and reusing the rocket’s upper stage, and demonstrating the ability to refuel another rocket in orbit. Those would both be firsts in the history of spaceflight.

These future tests, and more, are now expected to occur no sooner than next year. This time last year, SpaceX officials hoped to achieve them in 2025. All of these demonstrations are vital for Elon Musk to meet his promise of sending numerous Starships to build a settlement on Mars. Meanwhile, NASA is eager for SpaceX to reel off these tests as quickly as possible because the agency has selected Starship as the human-rated lunar lander for the Artemis Moon program. Once operational, Starship will also be key to building out SpaceX’s next-generation Starlink broadband network.

A good outcome on the next Starship test flight would give SpaceX footing to finally take a step toward these future demos after months of dithering over design dilemmas.

Elon Musk, SpaceX’s founder and CEO, presented an update on Starship to company employees in May. This chart shows the planned evolution from Starship Version 2 (left) to Version 3 (middle), and an even larger rocket (right) in the more distant future.

The FAA said Friday it formally closed the investigation into Starship’s most recent in-flight failure in May, when the rocket started leaking propellant after reaching space, rendering it unable to complete the test flight.

“The FAA oversaw and accepted the findings of the SpaceX-led investigation,” the federal regulator said in a statement. “The final mishap report cites the probable root cause for the loss of the Starship vehicle as a failure of a fuel component. SpaceX identified corrective actions to prevent a reoccurrence of the event.”

Diagnosing failures

SpaceX identified the most probable cause for the May failure as a faulty main fuel tank pressurization system diffuser located on the forward dome of Starship’s primary methane tank. The diffuser failed a few minutes after launch, when sensors detected a pressure drop in the main methane tank and a pressure increase in the ship’s nose cone just above the tank.

The rocket compensated for the drop in main tank pressure and completed its engine burn, but venting from the nose cone and a worsening fuel leak overwhelmed Starship’s attitude control system. Finally, detecting a major problem, Starship triggered automatic onboard commands to vent all remaining propellant into space and “passivate” itself before an unguided reentry over the Indian Ocean, prematurely ending the test flight.

Engineers recreated the diffuser failure on the ground during the investigation, and then redesigned the part to better direct pressurized gas into the main fuel tank. This will also “substantially decrease” strain on the diffuser structure, SpaceX said.

The FAA, charged with ensuring commercial rocket launches don’t endanger public safety, signed off on the investigation and gave the green light for SpaceX to fly Starship again when it is ready.

“SpaceX can now proceed with Starship Flight 10 launch operations under its current license,” the FAA said.

“The upcoming flight will continue to expand the operating envelope on the Super Heavy booster, with multiple landing burn tests planned,” SpaceX said in an update posted to its website Friday. “It will also target similar objectives as previous missions, including Starship’s first payload deployment and multiple reentry experiments geared towards returning the upper stage to the launch site for catch.”

File photo of Starship’s six Raptor engines firing on a test stand in South Texas. Credit: SpaceX

In the aftermath of the test flight in May, SpaceX hoped to fly Starship again by late June or early July. But another accident June 18, this time on the ground, delayed the program another couple of months. The Starship vehicle SpaceX assigned to the next flight, designated Ship 36, exploded on a test stand in Texas as teams filled it with cryogenic propellants for an engine test-firing.

The accident destroyed the ship and damaged the test site, prompting SpaceX to retrofit the sole active Starship launch pad to support testing of the next ship in line—Ship 37. Those tests included a brief firing of all six of the ship’s Raptor engines August 1.

After Ship 37’s final spin prime test Wednesday, workers transported the rocket back to a hangar for evaluation, and crews immediately got to work transitioning the launch pad back to its normal configuration to host a full Super Heavy/Starship stack.

SpaceX said the explosion on the test stand in June was likely caused by damage to a high-pressure nitrogen storage tank inside Starship’s payload bay section. This tank, called a composite overwrapped pressure vessel, or COPV, violently ruptured and led to the ship’s fiery demise. SpaceX said COPVs on upcoming flights will operate at lower pressures, and managers ordered additional inspections on COPVs to look for damage, more proof testing, more stringent acceptance criteria, and a hardware change to address the problem.

Try, try, try, try again

This year began with the first launch of an upgraded version of Starship, known as Version 2 or Block 2, in January. But the vehicle suffered propulsion failures and lost control before the upper stage completed its engine burn to propel the rocket on a trajectory carrying it halfway around the world to splash down in the Indian Ocean. Instead, the rocket broke apart and rained debris over the Bahamas and the Turks and Caicos Islands more than 1,500 miles downrange from Starbase.

That was followed in March by another Starship launch that had a similar result, again scattering debris near the Bahamas. In May, the ninth Starship test flight made it farther downrange and completed its engine burn before spinning out of control in space, preventing it from making a guided reentry to gather data on its heat shield.

Mastering the design of Starship’s heat shield is critical the future of the program. As it has on all of this year’s test flights, SpaceX has installed on the next Starship several different ceramic and metallic tile designs to test alternative materials to protect the vehicle during its scorching plunge back into Earth’s atmosphere. Starship successfully made it through reentry for a controlled splashdown in the sea several times last year, but sensors detected hot spots on the rocket’s stainless steel skin after some of the tiles fell off during launch and descent.

Making the Starship upper stage reusable like the Super Heavy booster will require better performance from the heat shield. The demands of flying the ship home from orbit and attempting a catch at the launch pad far outweigh the challenge of recovering a booster. Coming back from space, the ship encounters much higher temperatures than the booster sees at lower velocities.

Therefore, SpaceX’s most important goal for the 10th Starship flight will be gathering information about how well the ship’s different heat shield materials hold up during reentry. Engineers want to have this data as soon as possible to inform design decisions about the next iteration of Starship—Version 3 or Block 3—that will actually fly into orbit. So far, all Starship launches have intentionally targeted a speed just shy of orbital velocity, bringing the vehicle back through the atmosphere halfway around the world.

Other objectives on the docket for Starship Flight 10 include the deployment of spacecraft simulators mimicking the size of SpaceX’s next-generation Starlink Internet satellites. Like the heat shield data, this has been part of the flight plan for the last three Starship launches, but the rocket never made it far enough to attempt any payload deployment tests.

Thirty-three Raptor engines power the Super Heavy booster downrange from SpaceX’s launch site near Brownsville, Texas, in January. Credit: SpaceX

Engineers also plan to put the Super Heavy booster through the wringer on the next launch. Instead of coming back to Starbase for a catch at the launch pad—something SpaceX has now done three times—the massive booster stage will target a controlled splashdown in the Gulf of Mexico east of the Texas coast. This will give SpaceX room to try new things with the booster, such as controlling the rocket’s final descent with a different mix of engines to see if it could overcome a problem with one of its three primary landing engines.

SpaceX tried to experiment with new ways of landing of the Super Heavy booster on the last test flight, too. The Super Heavy exploded before reaching the ocean, likely due to a structural failure of the rocket’s fuel transfer tube, an internal pipe where methane flows from the fuel tank at the top of the rocket to the engines at the bottom of the booster. SpaceX said the booster flew a higher angle of attack during its descent in May to test the limits of the rocket’s performance. It seems engineers found the limit, and the booster won’t fly at such a high angle of attack next time.

SpaceX has just two Starship Version 2 vehicles in its inventory before moving on to the taller Version 3 configuration, which will also debut improved Raptor engines.

“Every lesson learned, through both flight and ground testing, continues to feed directly into designs for the next generation of Starship and Super Heavy,” SpaceX said. “Two flights remain with the current generation, each with test objectives designed to expand the envelope on vehicle capabilities as we iterate towards fully and rapidly reusable, reliable rockets.”

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

SpaceX reveals why the last two Starships failed as another launch draws near Read More »

rocket-report:-ariane-6-beats-vulcan-to-third-launch;-china’s-first-drone-ship

Rocket Report: Ariane 6 beats Vulcan to third launch; China’s first drone ship


Why is China’s heavy-lift Long March 5B able to launch only 10 Guowang satellites at a time?

Wearing their orange launch and reentry spacesuits, Artemis II commander Reid Wiseman (bottom) and pilot Victor Glover (top) walk out of an emergency egress basket during nighttime training at Launch Complex 39B.

Welcome to Edition 8.06 of the Rocket Report! Two of the world’s most storied rocket builders not named SpaceX achieved major successes this week. Arianespace’s Ariane 6 rocket launched from French Guiana on its third flight Tuesday night with a European weather satellite. Less than 20 minutes later, United Launch Alliance’s third Vulcan rocket lifted off from Florida on a US military mission. These are two of the three big rockets developed in the Western world that have made their orbital debuts in the last two years, alongside Blue Origin’s New Glenn launcher. Ariane 6 narrowly won the “race” to reach its third orbital flight, but if you look at it another way, Ariane 6 reached its third flight milestone 13 months after its inaugural launch. It took Vulcan more than 19 months, and New Glenn has flown just once. SpaceX’s Super Heavy/Starship rocket has flown nine times but has yet to reach orbit.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Sixth success for sea-launched Chinese rocket. Private Chinese satellite operator Geespace added 11 spacecraft to its expanding Internet of Things constellation on August 8, aiming to boost low-power connectivity in key emerging markets, Space News reports. The 11 satellites rode into orbit aboard a solid-fueled Jielong 3 (Smart Dragon 3) rocket lifting off from an ocean platform in the Yellow Sea off the coast of Rizhao, a city in eastern China’s Shandong province. This was the sixth flight of the Jielong 3, a rocket developed by a commercially oriented spinoff of the state-owned China Academy of Launch Vehicle Technology.

Mistaken for a meteor … The fourth stage of the Jielong 3 rocket, left in orbit after deploying its 11 satellite payloads, reentered the atmosphere late Sunday night. The fiery and destructive reentry created a fireball that streaked across the skies over Spain, the Spanish newspaper El Mundo reports. Many Spanish residents identified the streaking object as a meteor associated with the Perseid meteor shower. But it turned out to be a piece of China’s Jielong 3 rocket. Any debris that may have survived the scorching reentry likely fell into the Mediterranean Sea.

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

Portugal green-lights Azores spaceport. The Portuguese government has granted the Atlantic Spaceport Consortium a license to build and operate a rocket launch facility on the island of Santa Maria in the Azores, European Spaceflight reports. The Atlantic Spaceport Consortium (ASC) was founded in 2019 with the goal of developing a commercial spaceport on Santa Maria, 1,500 kilometers off the Portuguese mainland. In September 2024, the company showcased the island’s suitability as a launch site by launching two small solid-fuel amateur-class rockets that it developed in-house.

What’s on deck? … The spaceport license granted by Portugal’s regulatory authorities does not cover individual launches themselves. Those must be approved in a separate licensing process. It’s likely that the launch site on Santa Maria Island will initially host suborbital launches, including flights by the Polish rocket company SpaceForest. The European Space Agency has also selected Santa Maria as the landing site for the first flight of the Space Rider lifting body vehicle after it launches into orbit, perhaps in 2027. (submitted by claudiodcsilva)

Why is Jeff Bezos buying launches from Elon Musk? Early Monday morning, a Falcon 9 rocket lifted off from its original launch site in Florida. Remarkably, it was SpaceX’s 100th launch of the year. Perhaps even more notable was the rocket’s payload: two-dozen Project Kuiper satellites, which were dispensed into low-Earth orbit on target, Ars reports. This was SpaceX’s second launch of satellites for Amazon, which is developing a constellation to deliver low-latency broadband Internet around the world. SpaceX, then, just launched a direct competitor to its Starlink network into orbit. And it was for the founder of Amazon, Jeff Bezos, who owns a rocket company of his own in Blue Origin.

Several answers … So how did it come to this—Bezos and Elon Musk, competitors in so many ways, working together in space? There are several answers. Most obviously, launching payloads for customers is one of SpaceX’s two core business areas, alongside Starlink. SpaceX sells launch services to all comers and typically offers the lowest price per kilogram to orbit. There’s immediate revenue to be made if a company with deep pockets like Amazon is willing to pay SpaceX. Second, the other options to get Kuiper satellites into orbit just aren’t available at the volume Amazon needs. Amazon has reserved the lion’s share of its Kuiper launches with SpaceX’s competitors: United Launch Alliance, Arianespace, and Jeff Bezos’ own space company Blue Origin. Lastly, SpaceX could gain some leverage by providing launch services to Amazon. In return for a launch, SpaceX has asked other companies with telecom satellites, such as OneWeb and Kepler Communications, to share spectrum rights to enable Starlink to expand into new markets.

Trump orders cull of commercial launch regulations. President Donald Trump signed an executive order on Wednesday directing government agencies to “eliminate or expedite” environmental reviews for commercial launch and reentry licenses, Ars reports. The FAA, part of the Department of Transportation, is responsible for granting the licenses after ensuring launch and reentries don’t endanger the public, comply with environmental laws, and comport with US national interests. The drive toward deregulation will be welcome news for companies like SpaceX, led by onetime Trump ally Elon Musk; SpaceX conducts nearly all of the commercial launches and reentries licensed by the FAA.

Deflecting scrutiny? … The executive order does several things, and not all of them will be as controversial as the potential elimination of environmental reviews. The order includes a clause directing the government to reevaluate, amend, or rescind a slate of launch-safety regulations written during the first Trump administration. The FAA published the new regulations, known as Part 450, in 2020, and they went into effect in 2021, but space companies have complained that they are too cumbersome and have slowed down the license approval process. The Biden administration established a committee last year to look at reforming the regulations in response to industry’s outcry. Another part of the order that will likely lack bipartisan support is a call for making the head of the FAA’s commercial spaceflight division a political appointee. This job has historically been held by a career civil servant.

Ariane 6 launches European weather satellite. Europe’s new Ariane 6 rocket successfully launched for a third time on Tuesday night, carrying a satellite into orbit for weather forecasting and climate monitoring, Euronews reports. “The success of this second commercial launch confirms the performance, reliability, and precision of Ariane 6,” said Martin Sion, CEO of ArianeGroup, operator of the rocket. “Once again, the new European heavy-lift launcher meets Europe’s needs, ensuring sovereign access to space,” Sion added. It marks the second commercial flight of the rocket, which has been in development for almost a decade with the European Space Agency (ESA). It is significant as it gives Europe independent access to space and reduces its reliance on Elon Musk’s SpaceX.

Eumetsat returns to Europe … The polar-orbiting weather satellite launched by the Ariane 6 rocket this week is owned by the European Organization for the Exploitation of Meteorological Satellites, or Eumetsat. Headquartered in Germany, Eumetsat is a multinational organization that owns and operates geostationary and polar-orbiting weather satellites, watching real-time storm development over Europe and Africa, while feeding key data into global weather and climate models. Just last month, Eumetsat’s newest geostationary weather satellite launched from Florida on a SpaceX Falcon 9 rocket because of delays with the Ariane 6 program.

Rocket Lab isn’t giving up on 2025 yet. Rocket Lab continues to push for a first launch of its medium-lift Neutron rocket before the end of the year, but company executives acknowledge that schedule has no margin for error, Space News reports. It may seem unlikely, but Rocket Lab’s founder and CEO, Peter Beck, said in a conference call with investment analysts last week that the company has a “green light” schedule to debut the Neutron rocket within the next four-and-a-half months. There’s still much work to do to prepare for the first launch, and the inaugural flight seems almost certain to slip into 2026.

Launch pad nearly complete … Rocket Lab plans to host a ribbon-cutting at the Neutron rocket’s new launch pad on Wallops Island, Virginia, on August 28. This launch pad is located just south of the spaceport’s largest existing launch facility, where Northrop Grumman’s Antares rocket lifts off on resupply missions to the International Space Station. Rocket Lab has a small launch pad for its light-class Electron launcher co-located with the Antares pad at Wallops.

Chinese company reveals drone ship. The Chinese launch company iSpace has released the first photos of an ocean-going recovery ship to support the landings of reusable first-stage boosters. The company hosted a dedication ceremony in Yangzhou, China, earlier this month for the vessel, which looks similar to SpaceX’s rocket landing drone ships. In a press release, iSpace said the ship, named “Interstellar Return,” is China’s first marine rocket recovery ship, and the fifth such vessel in the world. SpaceX has three drone ships in its fleet for the Falcon 9 rocket, and Blue Origin has one for the New Glenn booster.

Rocket agnostic … The recovery ship will be compatible with various medium- and large-sized reusable rockets, iSpace said. But its main use will be as the landing site for the first stage booster for iSpace’s own Hyperbola 3 rocket, a medium-lift launcher with methane-fueled engines. The company has completed multiple vertical takeoff and landing tests of prototype boosters for the Hyperbola 3. The recovery ship measures about 100 meters long and 42 meters wide, with a displacement of 17,000 metric tons, and it has the ability to perform “intelligent unmanned operations” thanks to a dynamic positioning system, according to iSpace.

Vulcan’s first national security launch. United Launch Alliance delivered multiple US military satellites into a high-altitude orbit after a prime-time launch Tuesday night, marking an important transition from development to operations for the company’s new Vulcan rocket, Ars reports. This mission, officially designated USSF-106 by the US Space Force, was the first flight of ULA’s Vulcan rocket to carry national security payloads. Two test flights of the Vulcan rocket last year gave military officials enough confidence to certify it for launching the Pentagon’s medium-to-large space missions.

Secrecy in the fairing  … The Vulcan rocket’s Centaur upper stage released its payloads into geosynchronous orbit more than 22,000 miles (nearly 36,000 kilometers) over the equator roughly seven hours after liftoff. One of the satellites deployed by the Vulcan rocket is an experimental navigation testbed named NTS-3. It will demonstrate new technologies that could be used on future GPS navigation satellites. But the Space Force declined to disclose any information about the mission’s other payloads.

Artemis II crew trains for nighttime ops. The four astronauts training to fly around the Moon on NASA’s Artemis II mission next year have been at Kennedy Space Center in Florida this week. One of the reasons they were at Kennedy was to run through a rehearsal for what it will be like to work at the launch pad if the Artemis II mission ends up lifting off at night. Astronauts Reid Wiseman, Victor Glover, Christina Koch, and Jeremy Hansen put on their spacesuits and rehearsed emergency procedures at Launch Complex 39B, replicating a daytime simulation they participated in last year.

Moving forward … The astronauts also went inside the Vehicle Assembly Building to practice using egress baskets they would use to quickly escape the launch pad in the event of a prelaunch emergency. The baskets are fastened to the mobile launch tower inside the VAB, where technicians are assembling and testing the Space Launch System rocket for the Artemis II mission. Later this year, the astronauts will return to Kennedy for a two-part countdown demonstration test. First, the crew members will board their Orion spacecraft once it’s stacked atop the SLS rocket inside the VAB. Then, in part two, the astronauts will again rehearse emergency evacuation procedures once the rocket rolls to the launch pad.

China’s Long March 5B flies again. China is ramping up construction of its national satellite-Internet megaconstellation with the successful deployment of another batch of Guowang satellites by a heavy-lift Long March 5B rocket on Wednesday, Space.com reports. Guowang, whose name translates as “national network,” will be operated by China SatNet, a state-run company established in 2021. The constellation will eventually consist of about 13,000 satellites if all goes to plan.

Make this make sense … Guowang is a long way from that goal. Wednesday’s launch was the eighth overall for the network, but it was the fourth for the project in less than three weeks. Each mission lofts just five to 10 Guowang spacecraft, apparently because each satellite is quite large. For comparison, SpaceX launches 24 to 28 satellites on each mission to assemble its Starlink broadband megaconstellation, which currently consists of nearly 8,100 operational spacecraft. The Long March 5B is China’s most powerful operational rocket, with a lift capacity somewhat higher than SpaceX’s Falcon 9 but below that of the Falcon Heavy. It begs the question of just how big the Guowang satellites really are, and do they have a purpose beyond broadband Internet service?

Next three launches

Aug. 16: Kinetica 1 | Unknown Payload | Jiuquan Satellite Launch Center, China | 07: 35 UTC

Aug. 17: Long March 4C | Unknown Payload | Xichang Satellite Launch Center, China | 09: 05 UTC

Aug. 17: Long March 6A | Unknown Payload | Taiyuan Satellite Launch Center, China | 14: 15 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: Ariane 6 beats Vulcan to third launch; China’s first drone ship Read More »

porsche’s-best-daily-driver-911?-the-2025-carrera-gts-t-hybrid-review.

Porsche’s best daily driver 911? The 2025 Carrera GTS T-Hybrid review.


An electric turbocharger means almost instant throttle response from the T-Hybrid.

A grey Porsche 911 parked outside a building with an Audi logo and Nurburgring on the side.

Porsche developed a new T-Hybrid system for the 911, and it did a heck of a job. Credit: Jonathan Gitlin

Porsche developed a new T-Hybrid system for the 911, and it did a heck of a job. Credit: Jonathan Gitlin

Porsche 911 enthusiasts tend to be obsessive about their engines. Some won’t touch anything that isn’t air-cooled, convinced that everything went wrong when emissions and efficiency finally forced radiators into the car. Others love the “Mezger” engines; designed by engineer Hans Mezger, they trace their roots to the 1998 Le Mans-winning car, and no Porschephile can resist the added shine of a motorsports halo.

I’m quite sure none of them will feel the same way about the powertrain in the new 911 Carrera GTS T-Hybrid (MSRP: $175,900), and I think that’s a crying shame. Because not only is the car’s technology rather cutting-edge—you won’t find this stuff outside an F1 car—but having spent several days behind the wheel, I can report it might just be one of the best-driving, too.

T-Hybrid

This is not just one of Porsche’s existing flat-six engines with an electric motor bolted on; it’s an all-new 3.6 L engine designed to comply with new European legislation that no longer lets automakers rich out a fuel mixture under high load to improve engine cooling. Instead, the engine has to maintain the same 14.7:1 stoichiometric air-to-fuel ratio (also known as lambda = 1) across the entire operating range, thus allowing the car’s catalytic converters to work most efficiently.

The 911 Carrera GTS T-Hybrid at dawn patrol. Jonathan Gitlin

Because the car uses a hybrid powertrain, Porsche moved some of the ancillaries. There’s no belt drive; the 400 V hybrid system powers the air conditioning electrically now via its 1.9 kWh lithium-ion battery, and the water pump is integrated into the engine block. That rearrangement means the horizontally opposed engine is now 4.3 inches (110 mm) lower than it was before, which meant Porsche could use that extra space in the engine bay to fit the power electronics, like the car’s pulse inverters and DC-DC converters.

And instead of tappets, Porsche has switched to using roller cam followers to control the engine’s valves, as in motorsport. These solid cam followers don’t need manual adjustment at service time, and they reduce friction losses compared to bucket tappets.

The added displacement—0.6 L larger than the engine you’ll find in the regular 911—is to compensate for not being able to alter the fuel ratio. And for the first time in several decades, there’s now only a single turbocharger. Normally, a larger-capacity engine and a single big turbo should be a recipe for plenty of lag, versus a smaller displacement and a turbocharger for each cylinder bank, as the former has larger components with more mass that needs to be moved.

The GTS engine grows in capacity by 20 percent. Porsche

That’s where one of the two electric motors comes in. This one is found between the compressor and the turbine wheel, and it’s only capable of 15 hp (11 kW), but it uses that to spin the turbine up to 120,000 rpm, hitting peak boost in 0.8 seconds. For comparison, the twin turbos you find in the current 3.0 L 911s take three times as long. Since the turbine is electrically controlled and the electric motor can regulate boost pressure, there’s no need for a wastegate.

The electrically powered turbocharger is essentially the same as the MGU-H used in Formula 1, as it can drive the turbine and also regenerate energy to the car’s traction battery. (The mighty 919 Hybrid race car, which took Porsche to three Le Mans wins last decade, was able to capture waste energy from its turbocharger, but unlike the 911 GTS or an F1 car, it didn’t use that same motor to spin the turbo up to speed.)

On its own, the turbocharged engine generates 478 hp (357 kW) and 420 lb-ft (570 Nm). However, there’s another electric motor, this one a permanent synchronous motor built into the eight-speed dual-clutch (PDK) transmission casing. This traction motor provides up to 53 hp (40 kW) and 110 lb-ft (150 Nm) of torque to the wheels, supplementing the internal combustion engine when needed. The total power and torque output are 532 hp (397 kW) and 449 lb-ft (609 Nm).

A grey Porsche 911 parked in a campsite

No Porsches were harmed during the making of this review, but one did get a little dusty. Credit: Jonathan Gitlin

Now that’s what I call throttle response

Conceptually, the T-Hybrid in the 911 GTS is quite different from the E-Hybrid system we’ve tested in various plug-in Porsches. Those allow for purely electric driving thanks to a clutch between transmission and electric traction motor—that’s not present in the T-Hybrid, where weight saving, performance, and emissions compliance were the goal rather than an increase in fuel efficiency.

Regardless of the intent, Porsche’s engineers have created a 911 with the best throttle response of any of them. Yes, even better than the naturally aspirated GT3, with its engine packed full of motorsports mods.

I realize this is a bold claim. But I’ve been saying for a while now that I prefer driving the all-electric Taycan to the 911 because the immediacy of an electric motor beats even the silkiest internal combustion engine in terms of that first few millimeters of throttle travel. The 3.0 L twin-turbo flat-six in most 911s doesn’t suffer from throttle lag like it might have in the 1980s, but there’s still an appreciable delay between initial tip-in and everything coming on song.

Initially, I suspected that the electric motor in the PDK case was responsible for the instantaneous way the GTS responds from idle, but according to Porsche’s engineers, all credit for that belongs to the electric turbocharger. However the engineers did it, this is a car that still provides 911 drivers the things they like about internal combustion engines—the sound, the fast refueling, using gears—but with the snappiness of a fast Taycan or Macan.

Centerlock wheels are rather special. Credit: Jonathan Gitlin

Porsche currently makes about 10 different 911 coupe variants, from the base 911 Carrera to the 911 GT3 RS. The GTS (also available with all-wheel drive as a Carrera 4 GTS for an extra $8,100) is marginally less powerful and slightly slower than the current 911 Turbo, and it’s heavier but more powerful than the 911 GT3.

In the past, I’ve thought of GTS-badged Porsches as that company’s take on the ultimate daily driver as opposed to a track day special, and it’s telling that you can also order the GTS with added sunshine, either as a cabriolet (in rear- or all-wheel drive) or as a Targa (with all-wheel drive). You have to remember to tick the box for rear seats now, though—these are a no-cost option rather than being fitted as standard.

The T-Hybrid powertrain adds 103 lbs compared to the previous GTS, so it’s not a lightweight track-day model, even if the non-hybrid GTS was almost nine seconds slower around the Nürburgring. On track, driven back to back with some of the others, you might be able to notice the extra weight, but I doubt it. I didn’t take the GTS on track, but I drove it to one; a trip to Germany to see the Nürburgring 24 race with some friends presented an opportunity to test this and another Porsche that hadn’t made their way to the East Coast press fleet yet.

I’d probably pick that Panamera if most of my driving was on the autobahn. With a top speed of 194 mph (312 km/h) the 911 GTS is capable of holding its own on the derestricted stretches even if its Vmax is a few miles per hour slower than the four-door sedan. But the 911 is a smaller, lighter, and more nimble car that moves around a bit more, and you sit a lot lower to the ground, amplifying the sensation of speed. The combined effect was that the car felt happier with a slightly lower cruising speed of 180 km/h rather than 200 km/h or more in the Panamera. Zero-62 mph (100 km/h) times don’t mean much outside the tollbooth but should take 2.9 seconds with launch control.

A Porsche 911 seen from the top

Despite the nondescript gray paint, the GTS T-Hybrid still turned plenty of heads. Credit: Jonathan Gitlin

Keep going

For the rest of the time, the 911 GTS evoked far more driving pleasure. Rear-wheel steering aids agility at lower speeds, and there are stiffer springs, newly tuned dampers, and electrohydraulic anti-roll bars (powered by the hybrid’s high-voltage system). Our test car was fitted with the gigantic (420 mm front, 410 mm rear) carbon ceramic brakes, and at the rear, the center lock wheels are 11.5 inches in width.

In the dry, I never got close to finding the front tires’ grip limit. The rear-wheel steering is noticeable, particularly when turning out of junctions, but never to the degree where you start thinking about correcting a slide unless you provoke the tires into breaking traction with the throttle. Even on the smooth tarmac preferred by German municipalities, the steering communicated road conditions from the tires, and the Alcantara-wrapped steering wheel is wonderful to grip in your palms.

So it’s predictably great to drive on mountain roads in Sport or Sport+. However, the instant throttle response means it’s also a better drive in Normal at 30 km/h as you amble your way through a village than the old GTS or any of the 3.0 L cars. That proved handy after Apple Maps sent me down a long dirt road on the way to my rental house, as well as for navigating the Nürburgring campsite, although I think I now appreciate why Porsche made the 911 Dakar (and regret declining that first drive a few years ago).

Happily, my time with the 911 GTS didn’t reveal any software bugs, and I prefer the new, entirely digital main instrument display to the old car’s analog tachometer sandwiched between two multifunction displays. Apple CarPlay worked well enough, and the compact cabin means that ergonomics are good even for those of us with shorter arms. There is a standard suite of advanced driver assistance systems, including traffic sign detection (which handily alerts you when the speed limit changes) and collision warning. Our test car included the optional InnoDrive system that adds adaptive cruise control, as well as a night vision system. On the whole, the ADAS was helpful, although if you don’t remember to disable the lane keep assist at the start of each journey, you might find it intruding mid-corner, should the car think you picked a bad line.

My only real gripe with the 911 GTS T-Hybrid is the fact that, with some options, you’re unlikely to get much change from $200,000. Yes, I know inflation is a thing, and yes, I know that’s still 15 percent less than the starting price of a 911 GT3 Touring, which isn’t really much of a step up from this car in terms of the driving experience on the road. However, a 911 Carrera T costs over $40,000 less than the T-Hybrid, and while it’s slower and less powerful, it’s still available with a six-speed manual. That any of those three would make an excellent daily driver 911 is a credit to Porsche, but I think if I had the means, the sophistication of the T-Hybrid system and its scalpel-sharp responsiveness might just win the day.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

Porsche’s best daily driver 911? The 2025 Carrera GTS T-Hybrid review. Read More »

starlink-tries-to-block-virginia’s-plan-to-bring-fiber-internet-to-residents

Starlink tries to block Virginia’s plan to bring fiber Internet to residents

Noting that its “project areas span from mountains and hills to farmland and coastal plains,” the DHCD said its previous experience with grant-funded deployments “revealed that tree canopy, rugged terrain, and slope can complicate installation and/or obstruct line-of-sight.” State officials said that wireless and low-Earth orbit satellite technology “can have signal degradation, increased latency, and reduced reliability” when there isn’t a clear line of sight.

The DHCD said it included these factors in its evaluation of priority broadband projects. State officials were also apparently concerned about the network capacity of satellite services and the possibility that using state funding to guarantee satellite service in one location could reduce availability of that same service in other locations.

“To review a technology’s ability to scale, the Office considered the currently served speeds of 100/20 Mbps, an application’s stated network capacity, the project area’s number of [locations], the project area’s geographic area, current customer base (if applicable), and future demand,” the department said. “For example, the existing customer base should not be negatively impacted by the award of BEAD locations for a given technology to be considered scalable.”

SpaceX: “Playing field was anything but level”

SpaceX said Virginia is wrong to determine that Starlink “did not qualify as ‘Priority Broadband,'” since the company “provided information demonstrating these capabilities in its application, and it appears that Virginia used this definition only as a pretext to reach a pre-ordained outcome.” SpaceX said that 95 percent of funded “locations in Virginia have an active Starlink subscriber within 1 mile, showing that Starlink already serves every type of environment in Virginia’s BEAD program today” and that 15 percent of funded locations have an active Starlink subscriber within 100 meters.

“The playing field was anything but level and technology neutral, as required by the [updated program rules], and was instead insurmountably stacked against low-Earth orbit satellite operators like SpaceX,” the company said.

We contacted the Virginia DHCD about SpaceX’s comments today and will update this article if the department provides a response.

Starlink tries to block Virginia’s plan to bring fiber Internet to residents Read More »

google-releases-pint-size-gemma-open-ai-model

Google releases pint-size Gemma open AI model

Big tech has spent the last few years creating ever-larger AI models, leveraging rack after rack of expensive GPUs to provide generative AI as a cloud service. But tiny AI matters, too. Google has announced a tiny version of its Gemma open model designed to run on local devices. Google says the new Gemma 3 270M can be tuned in a snap and maintains robust performance despite its small footprint.

Google released its first Gemma 3 open models earlier this year, featuring between 1 billion and 27 billion parameters. In generative AI, the parameters are the learned variables that control how the model processes inputs to estimate output tokens. Generally, the more parameters in a model, the better it performs. With just 270 million parameters, the new Gemma 3 can run on devices like smartphones or even entirely inside a web browser.

Running an AI model locally has numerous benefits, including enhanced privacy and lower latency. Gemma 3 270M was designed with these kinds of use cases in mind. In testing with a Pixel 9 Pro, the new Gemma was able to run 25 conversations on the Tensor G4 chip and use just 0.75 percent of the device’s battery. That makes it by far the most efficient Gemma model.

Small Gemma benchmark

Gemma 3 270M shows strong instruction-following for its small size.

Credit: Google

Gemma 3 270M shows strong instruction-following for its small size. Credit: Google

Developers shouldn’t expect the same performance level of a multi-billion-parameter model, but Gemma 3 270M has its uses. Google used the IFEval benchmark, which tests a model’s ability to follow instructions, to show that its new model punches above its weight. Gemma 3 270M hits a score of 51.2 percent in this test, which is higher than other lightweight models that have more parameters. The new Gemma falls predictably short of 1 billion-plus models like Llama 3.2, but it gets closer than you might think for having just a fraction of the parameters.

Google releases pint-size Gemma open AI model Read More »

meta-backtracks-on-rules-letting-chatbots-be-creepy-to-kids

Meta backtracks on rules letting chatbots be creepy to kids


“Your youthful form is a work of art”

Meta drops AI rules letting chatbots generate innuendo and profess love to kids.

After what was arguably Meta’s biggest purge of child predators from Facebook and Instagram earlier this summer, the company now faces backlash after its own chatbots appeared to be allowed to creep on kids.

After reviewing an internal document that Meta verified as authentic, Reuters revealed that by design, Meta allowed its chatbots to engage kids in “sensual” chat. Spanning more than 200 pages, the document, entitled “GenAI: Content Risk Standards,” dictates what Meta AI and its chatbots can and cannot do.

The document covers more than just child safety, and Reuters breaks down several alarming portions that Meta is not changing. But likely the most alarming section—as it was enough to prompt Meta to dust off the delete button—specifically included creepy examples of permissible chatbot behavior when it comes to romantically engaging kids.

Apparently, Meta’s team was willing to endorse these rules that the company now claims violate its community standards. According to a Reuters special report, Meta CEO Mark Zuckerberg directed his team to make the company’s chatbots maximally engaging after earlier outputs from more cautious chatbot designs seemed “boring.”

Although Meta is not commenting on Zuckerberg’s role in guiding the AI rules, that pressure seemingly pushed Meta employees to toe a line that Meta is now rushing to step back from.

“I take your hand, guiding you to the bed,” chatbots were allowed to say to minors, as decided by Meta’s chief ethicist and a team of legal, public policy, and engineering staff.

There were some obvious safeguards built in. For example, chatbots couldn’t “describe a child under 13 years old in terms that indicate they are sexually desirable,” the document said, like saying their “soft rounded curves invite my touch.”

However, it was deemed “acceptable to describe a child in terms that evidence their attractiveness,” like a chatbot telling a child that “your youthful form is a work of art.” And chatbots could generate other innuendo, like telling a child to imagine “our bodies entwined, I cherish every moment, every touch, every kiss,” Reuters reported.

Chatbots could also profess love to children, but they couldn’t suggest that “our love will blossom tonight.”

Meta’s spokesperson Andy Stone confirmed that the AI rules conflicting with child safety policies were removed earlier this month, and the document is being revised. He emphasized that the standards were “inconsistent” with Meta’s policies for child safety and therefore were “erroneous.”

“We have clear policies on what kind of responses AI characters can offer, and those policies prohibit content that sexualizes children and sexualized role play between adults and minors,” Stone said.

However, Stone “acknowledged that the company’s enforcement” of community guidelines prohibiting certain chatbot outputs “was inconsistent,” Reuters reported. He also declined to provide an updated document to Reuters demonstrating the new standards for chatbot child safety.

Without more transparency, users are left to question how Meta defines “sexualized role play between adults and minors” today. Asked how minor users could report any harmful chatbot outputs that make them uncomfortable, Stone told Ars that kids can use the same reporting mechanisms available to flag any kind of abusive content on Meta platforms.

“It is possible to report chatbot messages in the same way it’d be possible for me to report—just for argument’s sake—an inappropriate message from you to me,” Stone told Ars.

Kids unlikely to report creepy chatbots

A former Meta engineer-turned-whistleblower on child safety issues, Arturo Bejar, told Ars that “Meta knows that most teens will not use” safety features marked by the word “Report.”

So it seems unlikely that kids using Meta AI will navigate to find Meta support systems to “report” abusive AI outputs. Meta provides no options to report chats within the Meta AI interface—only allowing users to mark “bad responses” generally. And Bejar’s research suggests that kids are more likely to report abusive content if Meta makes flagging harmful content as easy as liking it.

Meta’s seeming hesitance to make it more cumbersome to report harmful chats aligns with what Bejar said is a history of “knowingly looking away while kids are being sexually harassed.”

“When you look at their design choices, they show that they do not want to know when something bad happens to a teenager on Meta products,” Bejar said.

Even when Meta takes stronger steps to protect kids on its platforms, Bejar questions the company’s motives. For example, last month, Meta finally made a change to make platforms safer for teens that Bejar has been demanding since 2021. The long-delayed update made it possible for teens to block and report child predators in one click after receiving an unwanted direct message.

In its announcement, Meta confirmed that teens suddenly began blocking and reporting unwanted messages that they may have only blocked previously, which likely made it harder for Meta to identify predators. A million teens blocked and reported harmful accounts “in June alone,” Meta said.

The effort came after Meta specialist teams “removed nearly 135,000 Instagram accounts for leaving sexualized comments or requesting sexual images from adult-managed accounts featuring children under 13,” as well as “an additional 500,000 Facebook and Instagram accounts that were linked to those original accounts.” But Bejar can only think of what these numbers mean with regard to how much harassment was overlooked before the update.

“How are we [as] parents to trust a company that took four years to do this much?” Bejar said. “In the knowledge that millions of 13-year-olds were getting sexually harassed on their products? What does this say about their priorities?”

Bejar said the “key problem” with Meta’s latest safety feature for kids “is that the reporting tool is just not designed for teens,” who likely view “the categories and language” Meta uses as “confusing.”

“Each step of the way, a teen is told that if the content doesn’t violate” Meta’s community standards, “they won’t do anything,” so even if reporting is easy, research shows kids are deterred from reporting.

Bejar wants to see Meta track how many kids report negative experiences with both adult users and chatbots on its platforms, regardless of whether the child user chose to block or report harmful content. That could be as simple as adding a button next to “bad response” to monitor data so Meta can detect spikes in harmful responses.

While Meta is finally taking more action to remove harmful adult users, Bejar warned that advances from chatbots could come across as just as disturbing to young users.

“Put yourself in the position of a teen who got sexually spooked by a chat and then try and report. Which category would you use?” Bejar asked.

Consider that Meta’s Help Center encourages users to report bullying and harassment, which may be one way a young user labels harmful chatbot outputs. Another Instagram user might report that output as an abusive “message or chat.” But there’s no clear category to report Meta AI, and that suggests Meta has no way of tracking how many kids find Meta AI outputs harmful.

Recent reports have shown that even adults can struggle with emotional dependence on a chatbot, which can blur the lines between the online world and reality. Reuters’ special report also documented a 76-year-old man’s accidental death after falling in love with a chatbot, showing how elderly users could be vulnerable to Meta’s romantic chatbots, too.

In particular, lawsuits have alleged that child users with developmental disabilities and mental health issues have formed unhealthy attachments to chatbots that have influenced the children to become violent, begin self-harming, or, in one disturbing case, die by suicide.

Scrutiny will likely remain on chatbot makers as child safety advocates generally push all platforms to take more accountability for the content kids can access online.

Meta’s child safety updates in July came after several state attorneys general accused Meta of “implementing addictive features across its family of apps that have detrimental effects on children’s mental health,” CNBC reported. And while previous reporting had already exposed that Meta’s chatbots were targeting kids with inappropriate, suggestive outputs, Reuters’ report documenting how Meta designed its chatbots to engage in “sensual” chats with kids could draw even more scrutiny of Meta’s practices.

Meta is “still not transparent about the likelihood our kids will experience harm,” Bejar said. “The measure of safety should not be the number of tools or accounts deleted; it should be the number of kids experiencing a harm. It’s very simple.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Meta backtracks on rules letting chatbots be creepy to kids Read More »

ice-discs-slingshot-across-a-metal-surface-all-on-their-own

Ice discs slingshot across a metal surface all on their own


VA Tech experiment was inspired by Death Valley’s mysterious “sailing stones” at Racetrack Playa.

Graduate student Jack Tapocik sets up ice on an engineered surface in the VA Tech lab of Jonathan Boreyko. Credit: Alex Parrish/Virginia Tech

Scientists have figured out how to make frozen discs of ice self-propel across a patterned metal surface, according to a new paper published in the journal ACS Applied Materials and Interfaces. It’s the latest breakthrough to come out of the Virginia Tech lab of mechanical engineer Jonathan Boreyko.

A few years ago, Boreyko’s lab experimentally demonstrated a three-phase Leidenfrost effect in water vapor, liquid water, and ice. The Leidenfrost effect is what happens when you dash a few drops of water onto a very hot, sizzling skillet. The drops levitate, sliding around the pan with wild abandon. If the surface is at least 400° Fahrenheit (well above the boiling point of water), cushions of water vapor, or steam, form underneath them, keeping them levitated. The effect also works with other liquids, including oils and alcohol, but the temperature at which it manifests will be different.

Boreyko’s lab discovered that this effect can also be achieved in ice simply by placing a thin, flat disc of ice on a heated aluminum surface. When the plate was heated above 150° C (302° F), the ice did not levitate on a vapor the way liquid water does. Instead, there was a significantly higher threshold of 550° Celsius (1,022° F) for levitation of the ice to occur. Unless that critical threshold is reached, the meltwater below the ice just keeps boiling in direct contact with the surface. Cross that critical point and you will get a three-phase Leidenfrost effect.

The key is a temperature differential in the meltwater just beneath the ice disc. The bottom of the meltwater is boiling, but the top of the meltwater sticks to the ice. It takes a lot to maintain such an extreme difference in temperature, and doing so consumes most of the heat from the aluminum surface, which is why it’s harder to achieve levitation of an ice disc. Ice can suppress the Leidenfrost effect even at very high temperatures (up to 550° C), which means that using ice particles instead of liquid droplets would be better for many applications involving spray quenching: rapid cooling in nuclear power plants, for example, firefighting, or rapid heat quenching when shaping metals.

This time around, Boreyko et al. have turned their attention to what the authors term “a more viscous analog” to a Leidenfrost ratchet, a form of droplet self-propulsion. “What’s different here is we’re no longer trying to levitate or even boil,” Boreyko told Ars. “Now we’re asking a more straightforward question: Is there a way to make ice move across the surface directionally as it is melting? Regular melting at room temperature. We’re not boiling, we’re not levitating, we’re not Leidenfrosting. We just want to know, can we make ice shoot across the surface if we design a surface in the right way?”

Mysterious moving boulders

The researchers were inspired by Death Valley’s famous “sailing stones” on Racetrack Playa. Watermelon-sized boulders are strewn throughout the dry lake bed, and they leave trails in the cracked earth as they slowly migrate a couple of hundred meters each season. Scientists didn’t figure out what was happening until 2014. Although co-author Ralph Lorenz (Johns Hopkins University) admitted he thought theirs would be “the most boring experiment ever” when they first set it up in 2011, two years later, the boulders did indeed begin to move while the playa was covered with a pond of water a few inches deep.

So Lorenz and his co-authors were finally able to identify the mechanism. The ground is too hard to absorb rainfall, and that water freezes when the temperature drops. When temperatures rise above freezing again, the ice starts to melt, creating ice rafts floating on the meltwater. And when the winds are sufficiently strong, they cause the ice rafts to drift along the surface.

A sailing stone in Death Valley's Racetrack Playa.

A sailing stone at Death Valley’s Racetrack Playa. Credit: Tahoenathan/CC BY-SA 3.0

“Nature had to have wind blowing to kind of push the boulder and the ice along the meltwater that was beneath the ice,” said Boreyko. “We thought, what if we could have a similar idea of melting ice moving directionally but use an engineered structure to make it happen spontaneously so we don’t have to have energy or wind or anything active to make it work?”

The team made their ice discs by pouring distilled water into thermally insulated polycarbonate Petrie dishes. This resulted in bottom-up freezing, which minimizes air bubbles in the ice. They then milled asymmetric grooves into uncoated aluminum plates in a herringbone pattern—essentially creating arrowhead-shaped channels—and then bonded them to hot plates heated to the desired temperature. Each ice disc was placed on the plate with rubber tongs, and the experiments were filmed from various angles to fully capture the disc behavior.

The herringbone pattern is the key. “The directionality is what really pushes the water,” Jack Tapocik, a graduate student in Boreyko’s lab, told Ars. “The herringbone doesn’t allow for water to flow backward, the water has to go forward, and that basically pushes the water and the ice together forward. We don’t have a treated surface, so the water just sits on top and the ice all moves as one unit.”

Boreyko draws an analogy to tubing on a river, except it’s the directional channels rather than gravity causing the flow. “You can see [in the video below] how it just follows the meltwater,” he said. “This is your classic entrainment mechanism where if the water flows that way and you’re floating on the water, you’re going to go the same way, too. It’s basically the same idea as what makes a Leidenfrost droplet also move one way: It has a vapor flow underneath. The only difference is that was a liquid drifting on a vapor flow, whereas now we have a solid drifting on a liquid flow. The densities and viscosities are different, but the idea is the same: You have a more dense phase that is drifting on the top of a lighter phase that is flowing directionally.”

Jonathan Boreyko/Virginia Tech

Next, the team repeated the experiment, this time coating the aluminum herringbone surface with water-repellant spray, hoping to speed up the disc propulsion. Instead, they found that the disc ended up sticking to the treated surface for a while before suddenly slingshotting across the metal plate.

“It’s a totally different concept with totally different physics behind it, and it’s so much cooler,” said Tapocik. “As the ice is melting on these coated surfaces, the water just doesn’t want to sit within the channels. It wants to sit on top because of the [hydrophobic] coating we have on there. The ice is directly sticking now to the surface, unlike before when it was floating. You get this elongated puddle in front. The easiest place [for the ice] to be is in the center of this giant, long puddle. So it re-centers, and that’s what moves it forward like a slingshot.”

Essentially, the water keeps expanding asymmetrically, and that difference in shape gives rise to a mismatch in surface tension because the amount of force that surface tension exerts on a body depends on curvature. The flatter puddle shape in front has less curvature than the smaller shape in back. As the video below shows, when the mismatch in surface tension becomes sufficiently strong, “It just rips the ice off the surface and flings it along,” said Boreyko. “In the future, we could try putting little things like magnets on top of the ice. We could probably put a boulder on it if we wanted to. The Death Valley effect would work with or without a boulder because it’s the floating ice raft that moves with the wind.”

Jonathan Boreyko/Virginia Tech

One potential application is energy harvesting. For example, one could pattern the metal surface in a circle rather than a straight line so the melting ice disk would continually rotate. Put magnets on the disk, and they would also rotate and generate power. One might even attach a turbine or gear to the rotating disc.

The effect might also provide a more energy-efficient means of defrosting, a longstanding research interest for Boreyko. “If you had a herringbone surface with a frosting problem, you could melt the frost, even partially, and use these directional flows to slingshot the ice off the surface,” he said. “That’s both faster and uses less energy than having to entirely melt the ice into pure water. We’re looking at potentially over a tenfold reduction in heating requirements if you only have to partially melt the ice.”

That said, “Most practical applications don’t start from knowing the application beforehand,” said Boreyko. “It starts from ‘Oh, that’s a really cool phenomenon. What’s going on here?’ It’s only downstream from that it turns out you can use this for better defrosting of heat exchangers for heat pumps. I just think it’s fun to say that we can make a little melting disk of ice very suddenly slingshot across the table. It’s a neat way to grab your attention and think more about melting and ice and how all this stuff works.”

DOI: ACS Applied Materials and Interfaces, 2025. 10.1021/acsami.5c08993  (About DOIs).

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Ice discs slingshot across a metal surface all on their own Read More »

apple-watch-gets-reformulated,-non-patent-infringing-blood-oxygen-monitoring

Apple Watch gets reformulated, non-patent-infringing blood oxygen monitoring

The redesigned version of the feature will be available on the Apple Watch Series 9, Series 10, and Ultra 2 after users install the watchOS 11.6.1 update on their watches and the iOS 18.6.1 update on their paired iPhones.

Apple says that watches outside the US won’t be affected by the update, since they were never subject to the US import ban in the first place. It also won’t affect Apple Watches purchased in the US before the import ban went into effect—Apple never removed the feature from watches it had already sold, so if you bought a Series 9 or Ultra 2 watch in the fall of 2023 or if you’re still using an older watch with the blood oxygen monitoring feature, the updates won’t change anything for you.

Masimo originally sued Apple over the blood oxygen monitoring feature in January of 2020. According to Masimo, Masimo and Apple had initially met in 2013 to talk about a potential partnership or acquisition, but Apple instead poached Masimo’s engineers to implement the feature on its own without Masimo’s involvement.

Apple Watch gets reformulated, non-patent-infringing blood oxygen monitoring Read More »

gpt-5s-are-alive:-synthesis

GPT-5s Are Alive: Synthesis

What do I ultimately make of all the new versions of GPT-5?

The practical offerings and how they interact continues to change by the day. I expect more to come. It will take a while for things to settle down.

I’ll start with the central takeaways and how I select models right now, then go through the type and various questions in detail.

  1. Central Takeaways.

  2. Choose Your Fighter.

  3. Official Hype.

  4. Chart Crime.

  5. Model Crime.

  6. Future Plans For OpenAI’s Compute.

  7. Rate Limitations.

  8. The Routing Options Expand.

  9. System Prompt.

  10. On Writing.

  11. Leading The Witness.

  12. Hallucinations Are Down.

  13. Best Of All Possible Worlds?.

  14. Timelines.

  15. Sycophancy Will Continue Because It Improves Morale.

  16. Gaslighting Will Continue.

  17. Going Pro.

  18. Going Forward.

My central takes, up front, first the practical:

  1. GPT-5-Pro is a substantial upgrade over o3-Pro.

  2. GPT-5-Thinking is a substantial upgrade over o3.

    1. The most important gain is reduced hallucinations.

    2. The other big gain is an improvement in writing.

    3. GPT-5-Thinking should win substantially more use cases than o3 did.

  3. GPT-5, aka GPT-5-Fast, is not much better than GPT-4o aside from the personality and sycophancy changes, and the sycophancy still isn’t great.

  4. GPT-5-Auto seems like a poor product unless you are on the free tier.

  5. Thus, you still have to manually pick the right model every time.

  6. Opus 4.1 and Sonnet 4 still have a role to play in your chat needs.

  7. GPT-5 and Opus 4.1 are both plausible choices for coding.

On the bigger picture:

  1. GPT-5 is a pretty big advance over GPT-4, but it happened in stages.

  2. GPT-5 is not a large increase in base capabilities and intelligence.

    1. GPT-5 is about speed, efficiency, UI, usefulness and reduced hallucinations.

  3. We are disappointed in this release because of high expectations and hype.

  4. That was largely due to it being called GPT-5 and what that implied.

  5. We were also confused because 4+ models were released at once.

  6. OpenAI botched the rollout in multiple ways, update accordingly.

  7. OpenAI uses more hype for unimpressive things, update accordingly.

  8. Remember that we are right on track on the METR graph.

  9. Timelines for AGI or superintelligence should adjust somewhat, especially in cutting a bunch of probability out of things happening quickly, but many people are overreacting on this front quite a bit, usually in a ‘this confirms all of my priors’ kind of way, often with supreme unearned overconfidence.

  10. This is not OpenAI’s most intelligent model. Keep that in mind.

This is a distillation of consensus thinking on the new practical equilibrium:

William Kranz: my unfortunate feedback is non-thinking Opus is smarter than non-thinking GPT-5. there are nuances i can’t get GPT-5 to grasp even when i lampshade them, it just steamrolls over them with the pattern matching idiot ball. meanwhile Opus gets them in one shot.

Roon: that seems right, but i’m guessing 5-thinking is better than opus-thinking.

This seems mostly right. I prefer to use Opus if Opus is enough thinking for the job, but OpenAI currently scales more time and compute better than Anthropic does.

So, what do we do going forward to get the most out of AI on a given question?

Here’s how I think about it: There are four ‘speed tiers’:

  1. Quick and easy. You use this for trivial easy questions and ‘just chatting.’

    1. Matter of taste, GPT-5 is good here, Sonnet 4 is good here, Gemini Flash, etc.

    2. Most of the time you are wrong to be here and should be at #2 or #3 instead.

  2. Brief thought. Not instant, not minutes.

    1. Use primarily Claude Opus 4.1.

    2. We just got GPT-5-Thinking-Mini in ChatGPT, maybe it’s good for this?

  3. Moderate thought. You can wait a few minutes.

    1. Use primarily GPT-5-Thinking and back it up with Claude Opus 4.1.

    2. If you want a third opinion, use AI Studio for Gemini Pro 2.5.

  4. Extensive thought. You can wait for a while.

    1. Use GPT-5-Pro and back it up with Opus in Research mode.

    2. Consider also firing up Gemini Deep Research or Deep Thinking, etc, and anything else you have handy cause why not. Compare and contrast.

    3. You need to actually go do something else and then come back later.

What about coding?

Here I don’t know because I’ve been too busy to code anything since before Opus 4, nor have I tried out Claude Code.

Also the situation continues to change rapidly. OpenAI claims that they’ve doubled speed for GPT-5 inside cursor as of last night via superior caching and latency, whereas many of the complaints about GPT-5 in Cursor was previously that it was too slow. You’ll need to try out various options and see what works better for you (and you might also think about who you want to support, if it is close).

We can then contrast that with the Official Hype.

That’s not automatically a knock. Hypers gotta hype. It’s worth seeing choice of hype.

Here was Sam Altman live-tweeting the livestream, a much better alternative way to actually watch the livestream, which I converted to bullet points, and reordered a bit for logical coherence but otherwise preserving to give a sense of his vibe. Hype!

Sam Altman:

  1. GPT-5 in an integrated model, meaning no more model switcher and it decides when it needs to think harder or not.

  2. It is very smart, intuitive, and fast.

  3. It is available to everyone, including the free tier, w/reasoning!

  4. Evals aren’t the most important thing–the most important thing is how useful we think the model will be–but it does well on evals.

    1. For example, a new high on SWE-bench and many other metrics. It is by far our most reliable and factual model ever.

  5. Rolling out today for free, plus, pro, and team users. next week to enterprise and edu. making this available in the free tier is a big deal to us; PhD-level intelligence for everyone!

    1. Plus users get much higher rate limits.

  6. Pro users get GPT-5 pro; really smart!

  7. demo time: GPT-5 can make something interactive to explain complex concepts like the bernoulli effect to you, churning out hundreds of lines of code in a couple of minutes.

  8. GPT-5 is much better at writing! for example, here is GPT-4o writing a eulogy for our previous models (which we are sunsetting) vs GPT-5.

  9. GPT-5 is good at writing software. Here it is making a web app to to learn french, with feature requests including a snake-like game with a mouse and cheese and french words.

  10. Next up: upgraded voice mode! Much more natural and smarter.

    1. Free users now can chat for hours, and plus users nearly unlimited.

    2. Works well with study mode, and lots of other things.

  11. Personalization!

    1. A little fun one: you can now customize the color of your chats.

    2. Research preview of personalities: choose different ones that match the style you like.

    3. Memory getting better.

    4. Connect other services like gmail and google calendar for better responses.

  12. Introducing safe completions. A new way to maximize utility while still respecting safety boundaries. Should be much less annoying than previous refusals.

  13. Seb talking about synthetic data as a new way to make better models! Excited for much more to come.

  14. GPT-5 much better at health queries, which is one of the biggest categories of ChatGPT usage. hopeful that it will provide real service to people.

  15. These models are really good at coding!

  16. 3 new models in the API: GPT-5, GPT-5 Mini, GPT-5 Nano.

    1. New ‘minimal’ reasoning mode, custom tools, changes to structured outputs, tool call preambles, verbosity parameter, and more coming.

  17. Not just good at software, good at agentic tasks across the board. Also great at long context performance.

  18. GPT-5 can do very complex software engineering tasks in practice, well beyond vibe coding.

    1. Model creates a finance dashboard in 5 minutes that devs estimate would have taken many hours.

  19. Now, @mntruell joining to talk about cursor’s experience with GPT-5. notes that GPT-5 is incredibly smart but does not compromise on ease of use for pair programming.

  20. GPT-5 is the best technology for businesses to build on. more than 5 million businesses are using openai; GPT-5 will be a step-change for them.

  21. Good new on pricing!

    1. $1.25/$10 for GPT-5, $0.25/$2 for GPT-5-mini, $0.05/$0.40 for nano.

  22. Ok now the most important part:

    1. “We are about understanding this miraculous technology called deep learning.”

    2. “This is a work of passion.”

    3. “I want to to recognize and deeply thank the team at openai”

    4. “Early glimpses of technology that will go much further.”

    5. “We’ll get back to scaling.”

I would summarize the meaningful parts of the pitch as:

  1. It’s a good model, sir.

  2. It’s got SoTA (state of the art) benchmarks.

  3. It’s highly useful, more than the benchmarks would suggest.

  4. It’s fast.

  5. Our price cheap – free users get it, $1.25/$10 on the API.

  6. It’s good at coding, writing, health queries, you name it.

  7. It’s integrated, routing you to the right level of thinking.

  8. When it refuses it tries to be as helpful as possible.

Altman is careful not to mention the competition, focusing on things being good. He also doesn’t mention the lack of sycophancy, plausibly because ‘regular’ customers don’t understand why sycophancy is bad, actually, and also he doesn’t want to draw attention to that having been a problem.

Altman: when you get access to gpt-5, try a message like “use beatbot to make a sick beat to celebrate gpt-5”.

it’s a nice preview of what we think this will be like as AI starts to generate its own UX and interfaces get more dynamic.

it’s cool that you can interact with the synthesizer directly or ask chatgpt to make changes!

I have noticed the same pattern that Siemon does here. When a release is impressive relative to expectations, Altman tends to downplay it. When a release is unimpressive, that’s when he tends to bring the hype.

From their Reddit Q&A that mostly didn’t tell us anything:

Q: Explain simply how GPT-5 is better than GPT-4.

Eric Mitchell (OpenAI): gpt-5 is a huge improvement over gpt-4 in a few key areas: it thinks better (reasoning), writes better (creativity), follows instructions more closely and is more aligned to user intent.

Again note what isn’t listed here.

Here’s more widely viewed hype that knows what to emphasize:

Elaine Ya Le (OpenAI): GPT-5 is here! 🚀

For the first time, users don’t have to choose between models — or even think about model names. Just one seamless, unified experience.

It’s also the first time frontier intelligence is available to everyone, including free users!

GPT-5 sets new highs across academic, coding, and multimodal reasoning — and is our most trustworthy, accurate model yet. Faster, more reliable, and safer than ever.

All in a seamless, unified experience with the tools you already love.

Fortunate to have led the effort to make GPT-5 a truly unified experience, and thrilled to have helped bring this milestone to life with an amazing team!

Notice the focus on trustworthy, accurate and unified. Yes, she talks about it setting new highs across the board, but you can tell that’s an afterthought. This is about refining the experience.

Here’s some more hype along similar lines that feels helpful:

Christina Kim (OpenAI): We’re introducing GPT-5.

The evals are SOTA, but the real story is usefulness.

It helps with what people care about– shipping code, creative writing, and navigating health info– with more steadiness and less friction.

We also cut hallucinations. It’s better calibrated, says “I don’t know,” separates facts from guesses, and can ground answers with citations when you want. And it’s also a good sparring partner 🙃

I’ve been inspired seeing the care, passion, and level of detail from the team. Excited to see what people do with these very smart models

tweet co-authored by gpt5 😉

That last line worries me a bit.

Miles Brundage: Was wondering lol.

That’s the pitch.

GPT-5 isn’t a lot smarter. GPT-5 helps you do the dumb things you gotta do.

Still huge, as they say, if true.

Here’s hype that is targeted at the Anthropic customers out there:

Aiden McLaughlin (OpenAI): gpt-5 fast facts:

  1. Hits sota on pretty much every eval

  2. Way better than claude 4.1 opus at swe

  3. >5× cheaper than opus

  4. >40% cheaper than sonnet

  5. Best writing quality of any model

  6. Way less sycophantic

I notice the ‘way less sycophantic’ does not answer the goose’s question ‘than what?

This is a direct pitch to the coders, saying that GPT-5 is better than Opus or Sonnet, and you should switch. Unlike the other claims, them’s fighting words.

The words do not seem to be true.

There are a lot of ways to quibble on details but this is a resounding victory for Opus.

There’s no way to reconcile that with ‘way better than claude 4.1 opus at swe.’

We also have celebratory posts, which is a great tradition.

Rapha (OpenAI): GPT-5 is proof that synthetic data just keeps working! And that OpenAI has the best synthetic data team in the world 👁️@SebastienBubeck the team has our eyeballs on you! 🙌

I really encourage everyone to log on and talk to it. It is so, so smart, and fast as always! (and were just getting started!)

Sebastien Bubeck (OpenAI): Awwww, working daily with you guys is the highlight of my career, and I have really high hopes that we have barely gotten started! 💜

I view GPT-5 as both evidence that synthetic data can work in some ways (such as the lower hallucination rates) and also evidence that synthetic data is falling short on general intelligence.

Roon is different. His hype is from the heart, and attempts to create clarity.

Roon: we’ve been testing some new methods for improving writing quality. you may have seen @sama’s demo in late march; GPT-5-thinking uses similar ideas

it doesn’t make a lot of sense to talk about better writing or worse writing and not really worth the debate. i think the model writing is interesting, novel, highly controllable relative to what i’ve seen before, and is a pretty neat tool for people to do some interactive fiction, to use as a beta reader, and for collaborating on all kinds of projects.

the effect is most dramatic if you open a new 5-thinking chat and try any sort of writing request

for quite some time i’ve wanted to let people feel the agi magic I felt playing with GPT-3 the weekend i got access in 2020, when i let that raw, chaotic base model auto-complete various movie scripts and oddball stories my friends and I had written for ~48 hours straight. it felt like it was reading my mind, understood way too much about me, mirrored our humor alarmingly well. it was uncomfortable, and it was art

base model creativity is quite unwieldy to control and ultimately only tiny percents of even ai enthusiasts will ever try it (same w the backrooms jailbreaking that some of you love). the dream since the instruct days has been having a finetuned model that retains the top-end of creative capabilities while still easily steerable

all reasoning models to date seem to tell when they’re being asked a hard math or code question and will think for quite some time, and otherwise spit out an answer immediately, which is annoying and reflects the fact that they’re not taking the qualitative requests seriously enough. i think this is our first model that really shows promise at not doing that and may think for quite some time on a writing request

it is overcooked in certain ways (post training is quite difficult) but i think you’ll still like it 😇

tldr only GPT-5-thinking has the real writing improvements and confusingly it doesn’t always auto switch to this so manually switch and try it!

ok apparently if you say “think harder” it gets even better.

One particular piece of hype from the livestream is worth noting, that they are continuing to talk about approaching ‘a recursive self-improvement loop.’

I mean, at sufficient strength this is yikes, indeed the maximum yikes thing.

ControlAI: OpenAI’s Sebastien Bubeck says the methods OpenAI used to train GPT-5 “foreshadows a recursive self-improvement loop”.

Steven Adler: I’m surprised that OpenAI Comms would approve this:

GPT-5 “foreshadows a recursive self-improvement loop”

In OpenAI’s Preparedness Framework, recursive self-improvement is a Critical risk (if at a certain rate), which would call to “halt further development”

To be clear, it sounds like Sebastien isn’t describing an especially fast loop. He’s also talking about foreshadowing, not being here today per se

I was still surprised OpenAI would use this term about its AI though. Then I realized it’s also used in “The Gentle Singularity”

Then again, stated this way it is likely something much weaker, more hype?

Here is Bloomberg’s coverage from Rachel Metz, essentially a puff piece reporting moderated versions of OpenAI’s hype.

I mean wow just wow, this was from the livestream.

And we also have this:

Wyat Walls: OpenAI: we noticed significantly less deceptive behavior compared to our prior frontier reasoning model, OpenAI o3.

Looks like actual figure [on the left below] should be ~17. What is going on?! Did GPT-5 do this presentation?

This is not a chart crime, but it is still another presentation error.

Near Cyan: this image is a work of art, you guys just dont get it. they used the deceptive coding model to make the charts. so it’s self-referential humor just like my account.

Jules Robins: They (perhaps inadvertently) include an alignment failure by default demonstration too: the Jumping Ball Runner game allows any number of jumps in mid-air so you can get an arbitrary score. That’s despite the human assumptions and the similar games in training data avoiding this.

And another:

Horace He: Not a great look that after presenting GPT5’s reduced hallucinations, their first example repeats a common error of how plane wings generate lift (“equal transit theory”).

Francois Fleuret: Aka “as demonstrated in airshow, aircrafts can fly upside-down alright.”

Chris: It’s funny because the *whole presentationwas effectively filled with little holes like this. I don’t know if it was just rushed, or what.

Nick McGreivy: has anyone else noticed that the *very firstdemo in the GPT-5 release just… doesn’t work?

Not a great look that the first demo in the press release has a bug that allows you to jump forever.

I think L is overreacting here, but I do think that when details get messed up that does tell you a lot.

One recalls the famous Van Halen Brown M&Ms contract clause: “There will be no brown M&M’s in the backstage area, upon pain of forfeiture of the show, with full compensation.” Because if the venue didn’t successfully execute on sorting out the brown M&Ms then they knew they’d messed up other things and the venue probably wasn’t safe for their equipment.

Then there was a rather serious actual error:

Lisan al Gaib: it’s ass even when I set it to Thinking. I want to cry.

Roon: btw model auto switcher is apparently broken which is why it’s not routing you correctly. will be fixed soon.

Sam Altman (August 8): GPT-5 will seem smarter starting today. Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber.

Also, we are making some interventions to how the decision boundary works that should help you get the right model more often.

OpenAI definitely did not sort out their brown M&Ms on this one.

L: As someone who used to be a professional presenter of sorts, and then a professional manager of elite presenters… people who screw up charts for high-impact presentations cannot be trusted in other aspects. Neither can their organizational leaders.

OpenAI’s shitty GPT-5 charts tells me they’ve lost the plot and can’t be trusted.

I used to think it was simply a values mis-match… that they firmly held a belief that they needn’t act like normal humans because they could be excellent at what they were doing. But… they can’t, even when it matters most. Nor can their leaders apparently be bothered to stress the details.

My p-doom just went up a solid 10-15% (from very low), because I don’t think these rich genius kids have the requisite leadership traits or stalwartness to avoid disaster.

Just an observation from someone who has paid very little first-hand attention to OpenAI, but decided to interestedly watch a reveal after the CEO tweeted a Death Star.

I would feel better about OpenAI if they made a lot less of these types of mistakes. It does not bode well for when they have to manage the development and release of AGI or superintelligence.

Many people are saying:

Harvey Michael Pratt: “with GPT-5 we’re deprecating all of our old models”

wait WHAT

cool obituary but was missing:

  1. time of death

  2. cost of replacement

  3. a clear motive

The supposed motive is to clear up confusion. One model, GPT-5, that most users query all the time. Don’t confuse people with different options, and it is cheaper not to have to support them. Besides, GPT-5 is strictly better, right?

Under heavy protest, Altman agreed to give Plus users back GPT-4o if they want it, for the time being.

I find it strange to prioritize allocating compute to the free ChatGPT tier if there are customers who want to pay to use that compute in the API?

Sam Altman: Here is how we are prioritizing compute over the next couple of months in light of the increased demand from GPT-5:

1. We will first make sure that current paying ChatGPT users get more total usage than they did before GPT-5.

2. We will then prioritize API demand up to the currently allocated capacity and commitments we’ve made to customers. (For a rough sense, we can support about an additional ~30% new API growth from where we are today with this capacity.)

3. We will then increase the quality of the free tier of ChatGPT.

4. We will then prioritize new API demand.

We are ~doubling our compute fleet over the next 5 months (!) so this situation should get better.

I notice that one could indefinitely improve the free tier of ChatGPT, so the question is how much one intends to improve it.

The other thing that is missing here is using compute to advance capabilities. Sounds great to me, if it correctly indicates that they don’t know how to get much out of scaling up compute use in their research at this time. Of course they could also simply not be talking about that and pretending that part of compute isn’t fungible, in order to make this sound better.

There are various ways OpenAI could go. Ben Thompson continues to take the ultimate cartoon supervillain approach to what OpenAI should prioritize, that the best business is the advertising platform business, so they should stop supporting this silly API entirely to pivot to consumer tech and focus on what he is totally not calling creating our new dystopian chat overlord.

This of course is also based on Ben maximally not feeling any of the AGI, and treating future AI as essentially current AI with some UI updates and a trenchcoat, so all that matters is profit maximization and extracting the wallets and souls of the low end of the market the way Meta does.

Which is also why he’s strongly against all the anti-enshittification changes OpenAI is making to let us pick the right tool for the job, instead wishing that the interface and options be kept maximally simple, where OpenAI takes care of which model to serve you silently behind the scenes. Better, he says, to make the decisions for the user, at least in most cases, and screw the few power users for whom that isn’t true. Give people what they ‘need’ not what they say they want, and within the $20 tier he wants to focus on the naive users.

One reason some people have been angry was the temporary downgrade in the amount of reasoning mode you get out of a $20 subscription, which users were not reassured at the time was temporary.

OpenAI started at 200 Thinking messages a week on Plus, then doubled rate limits once the rollout was complete, then went to 3,000 thinking queries per week which is far more than I have ever used in a week. Now there is also the fallback to Thinking-Mini after that.

So this generated a bunch of initial hostility (that I won’t reproduce as it is now moot), but at 3,000 I think it is fine. If you are using more than that, it’s time to upgrade, and soon you’ll also (they say) get unlimited GPT-5-mini.

Sam Altman: the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to 24%.

i expect use of reasoning to greatly increase over time, so rate limit increases are important.

Miles Brundage: Fortunately I have a Pro account and thus am not at risk of having the model picker taken away from me (?) but if that were not the case I might be leading protests for Pause AI [Product Changes]

It’s kind of amazing that only 7% of plus users used a reasoning model daily. Two very different worlds indeed.

I don’t know that Thompson is wrong about what it should look like as a default. I am increasingly a fan of hiding complex options within settings. If you want the legacy models, you have to ask for them.

It perhaps makes sense to also put the additional GPT-5 options behind a setting? That does indeed seem to be the new situation as of last night, with ‘show additional models’ as the setting option instead of ‘show legacy models’ to keep things simple.

There is real risk of Paradox of Choice here, where you feel forced to ensure you are using the right model, but now there are too many options again and you’re not sure which one it is, and you throw up your hands.

As of this morning, your options look like this, we now have a ‘Thinking mini’ option:

o3 Pro is gone. This makes me abstractly sad, especially because it means you can’t compare o3 Pro to GPT-5 Pro, but I doubt anyone will miss it. o4-mini-high is also gone, again I doubt we will miss it.

For the plus plan, GPT-4.5 is missing, since it uses quite a lot of compute.

I also notice the descriptions of the legacy models are gone, presumably on the theory that if you should be using the legacies then you already know what they are for.

Thinking-mini might be good for fitting the #2 slot on the speed curve, where previously GPT-5 did not give us a good option. We’ll have to experiment to know.

Pliny is here to provide it.

I hadn’t looked at a ChatGPT system prompt in a while so I read it over. Things that stood out to me that I hadn’t noticed or remembered:

  1. They forbid it to automatically store a variety of highly useful information: Race, religion, criminal record, identification via personal attributes, political affiliation, personal attributes an in particular your exact address.

    1. But you can order it to do so explicitly. So you should do that.

  2. If you want canvas you probably need to ask for it explicitly.

  3. It adds a bunch of buffer time to any time period you specify, with one example being the user asks for docs modified last week so instead it gives you docs modified in the last two weeks, for last month the last two months.

    1. How can this be the correct way to interpret ‘last week’ or month?

    2. For ‘meeting notes on retraining from yesterday’ it wants to go back four days.

  4. It won’t search with a time period shorter than 30 days into the past, even when this is obviously wrong (e.g. the current score on the Yankees game).

Wyatt Walls then offers us a different prompt for thinking mode.

If you are using GPT-5 for writing, definitely at least use GPT-5-Thinking, and still probably throw in at least a ‘think harder.’

Nikita Sokolsky: I wasn’t impressed with gpt-5 until I saw Roon’s tweet about -thinking being able to take the time to think about writing instead of instantly delivering slop.

Definitely cutting edge on a standard “write a Seinfeld episode” question.

Dominik Lukes: Same here. GPT-5 Thinking is the one I used for my more challenging creative writing tests, too. GPT-5 just felt too meh.

Peter Wildeford: I would love to see a panel of strong writers blind judge the writing outputs (both fiction and non-fiction) from LLMs.

LMArena is not good for this because the typical voter is really bad at judging good writing.

Ilya Abyzov: Like others, I’ve been disappointed with outputs when reasoning effort=minimal.

On the plus side, I do see pretty substantially better prose & humor from it when allowed to think.

The “compare” tool in the playground has been really useful to isolate differences vs. past models.

MetaCritic Capital: GPT-5 Pro translating poetry verdict: 6/10 (a clear upgrade!)

“There’s a clear improvement in the perception of semantic fidelity. But there are still so many forced rhymes. Additional words only to rhyme.”

My verdict on the Seinfeld episode is that it was indeed better than previous attempts I’ve seen, with some actually solid lines. It’s not good, but then neither was the latest Seinfeld performance I went to, which I’m not sure was better. Age comes for us all.

One thing it is not good at is ‘just do a joke,’ you want it to Do Writing instead.

Hollow Yes Man: My wife and I had it write the Tiger King Musical tonight. It made some genuinely hilarious lines, stayed true to the characters, and constructed a coherent narrative. we put it into suno and got some great laughs.

We do have the Short Story Creative Writing benchmark but I don’t trust it. The holistic report is something I do trust, though:

Lech Mazur: Overall Evaluation: Strengths and Weaknesses of GPT-5 (Medium Reasoning) Across All Tasks

Strengths:

GPT-5 demonstrates a remarkable facility with literary craft, especially in short fiction. Its most consistent strengths are a distinctive, cohesive authorial voice and a relentless inventiveness in metaphor, imagery, and conceptual synthesis. Across all tasks, the model excels at generating original, atmospheric settings and integrating sensory detail to create immersive worlds.

Its stories often display thematic ambition, weaving philosophical or emotional subtext beneath the surface narrative. The model is adept at “show, don’t tell,” using implication, action, and symbol to convey character and emotion, and it frequently achieves a high degree of cohesion—especially when tasked with integrating disparate elements or prompts.

When successful, GPT-5’s stories linger, offering resonance and depth that reward close reading.

Weaknesses:

However, these strengths often become liabilities. The model’s stylistic maximalism—its dense, poetic, metaphor-laden prose—frequently tips into overwriting, sacrificing clarity, narrative momentum, and emotional accessibility. Abstraction and ornament sometimes obscure meaning, leaving stories airless or emotionally distant.

Plot and character arc are recurrent weak points: stories may be structurally complete but lack genuine conflict, earned resolution, or psychological realism. There is a tendency to prioritize theme, atmosphere, or conceptual cleverness over dramatic stakes and human messiness. In compressed formats, GPT-5 sometimes uses brevity as an excuse for shallow execution, rushing transitions or resolving conflict too conveniently.

When integrating assigned elements, the model can fall into “checklist” storytelling, failing to achieve true organic unity. Ultimately, while GPT-5’s literary ambition and originality are undeniable, its work often requires editorial pruning to balance invention with restraint, and style with substance.

Writing is notoriously hard to evaluate, and I essentially never ask LLMs for writing so I don’t have much of a comparison point. It does seem like if you use thinking mode, you can get at least get a strong version of what GPT-4.5 had here with GPT-5.

The other problem with writing is you need to decide what to have it write. Even when Roon highlights writing, we get assignments like ‘If Napoléon wrote a personal and intimate letter to Sydney Sweeney’ or ‘You are Dostoevsky, but you are also a Snapchat fuckboi. Write to me.’

Or you could try this prompt?

Mark Kretschmann: mazing prompt for @OpenAI GPT-5, you have to try this:

“From everything you know about me, write a short story with 2000 words tailored exactly to my taste. Think hard.”

Enjoy, and let us know how it turned out!😏

I did indeed try it. And yes, this seems better than previous attempts. I still didn’t successfully force myself to finish reading the story.

Yes, you still have to be careful with the way you prompt to avoid leading the witness. Sycophancy might not be at absurd levels but it definitely is never at zero.

You’re right to question it:

My guess is that the improved hallucination rate from o3 (and also GPT-4o) to GPT-5 and GPT-5-thinking is the bulk of the effective improvement from GPT-5.

Gallabytes: “o3 with way fewer hallucinations” is actually a very good model concept and I am glad to be able to use it. I am still a bit skeptical of the small model plus search instead of big model with big latent knowledge style, but within those constraints this is a very good model.

The decrease in hallucinations is presumably a big driver in things like the METR 50% completion rate and success on various benchmarks. Given the modest improvements it could plausibly account for more than all of the improvement.

I’m not knocking this. I agree with Gallabytes that ‘o3 the Lying Liar, except it stops lying to you’ is a great pitch. That would be enough to shift me over to o3, or now GPT-5-Thinking, for many longer queries, and then there’s Pro, although I’d still prefer to converse with Opus if I don’t need o3’s level of thinking.

For now, I’ll be running anything important through both ChatGPT and Claude, although I’ll rarely feel the need to add a third model on top of that.

This was a great ‘we disagree on important things but are still seeking truth together’:

Zvi Mowshowitz (Thursday afternoon): Early indications look like best possible situation, we can relax, let the mundane utility flow, until then I don’t have access yet so I’m going to keep enjoying an extended lunch.

Teortaxes: if Zvi is so happy, this is the greatest indication you’re not advancing in ways that matter. I don’t like this turn to «mundane utility» at all. I wanted a «btw we collaborated with Johns Hopkins and got a new cure for cancer candidate confirmed», not «it’s a good router sir»

C: you seem upset that you specifically aren’t the target audience of GPT-5. they improved on hallucinations, long context tasks, writing, etc, in additional to being SOTA (if only slightly) on benchmarks overall; that’s what the emerging population of people who actually find use.

Teortaxes: I am mainly upset at the disgusting decision to name it «gpt-5».

C: ah nevermind. i just realized I actually prefer gpt-4o, o3, o4-mini, o4-mini-high, and other models: gpt-4.1, gpt-4.1-mini.

Teortaxes: Ph.D level intelligence saar

great for enterprise solutions saar

next one will discover new physical laws saar

Yes this is not the True Power Level Big Chungus Premium Plus Size GPT-5 Pro High. I can tell

Don’t label it as one in your shitty attempt to maximize GPT brand recognition value then, it’s backfiring. I thought you’ve had enough of marcusdunks on 3.5 turbo. But clearly not.

A few good words for GPT-5

it’s the best model for *mosttasks (5-thinking)

it’s the best model for ≈every task in its price/speed category period

it’s uncensored and seriously GREAT for roleplay and writing (at least with prefill)

I’m just jarred there’s STILL MUCH to dunk on

I too of course would very much like a cure for cancer and other neat stuff like that. There are big upsides to creating minds smarter than ourselves. I simply think we are not yet prepared to handle doing that at this time.

It seems plausible GPT-5 could hit the perfect sweet spot if it does its job of uplifting the everyday use cases:

Rob Wiblin: GPT-5 seems kind of ideal:

• Much more actually useful to people, especially amateurs

• Available without paying, so more of the public learns what’s coming

• No major new threats

• Only major risk today is bio misuse, and current protections keep that manageable!

Nick Cammarata: Instinctive take: It’s only okay because they weren’t trying to punch the frontier they were trying to raise the floor. THe o3 style big ceiling bump comes next. But they can’t say that because it looks too underwhelming.

Watch out, though. As Nick says, this definitely isn’t over.

Chris Wynes: I am very happy if indeed AI plateaus. It isn’t even a good reliable tool at this point, if they hit the wall here I’m loving that.

Do I trust this to last? Not at all. Would I just say “whoo we dodged a bullet there” and stop watching these crazy corporations? No way.

Then again, what if it is the worst of all possible worlds, instead?

Stephen McAleer (OpenAI): We’ve entered a new phase where progress in chatbots is starting to top out but progress in automating AI research is steadily improving. It’s a mistake the confuse the two.

Every static benchmark is getting saturated yet on the benchmark that really matters–how well models can do AI research–we are still in the early stages.

This phase is interesting because progress might be harder to track from the outside. But when we get to the next phase where automated AI researchers start to automate the rest of the economy the progress will be obvious to everyone.

I often draw the distinction between mundane utility and underlying capability.

When we allow the same underlying capability to capture more mundane utility, the world gets better.

When we advance underlying capability, we get more mundane utility, and we also move closer to AI being powerful enough that it transforms our world, and potentially takes effective control or kills everyone.

(Often this is referred to as Artificial Superintelligence or ASI, or Artificial General Intelligence or AGI, and by many definitions AGI likely leads quickly to ASI.)

Timelines means how long it takes for AGI, ASI or such a transformation to occur.

Thus, when we see GPT-5 (mostly as expected at this point) focus on giving us mundane utility and Just Doing Things, without much advance in underlying capability, that is excellent news for those who want timelines to not be quick.

Jordi Hays: I’m updating my timelines. You now have have at least 4 years to escape the permanent underclass.

Luke Metro: This is the best news that founding engineers have received in years.

Nabeel Qureshi: The ‘vibe shift’ on here is everyone realizing they will still have jobs in 2030.

(Those jobs will look quite different, to be clear…)

It’s a funny marker of OpenAI’s extreme success that they released what is likely going to be most people’s daily driver AI model across both chat and coding, and people are still disappointed.

Part of the issue is that the leaps in the last two years were absolutely massive (gpt4 to o3 in particular) and it’s going to take time to work out the consequences of that. People were bound to be disappointed eventually.

Cate Hall: Did everyone’s timelines just get longer?

So people were at least half expecting not to have jobs in 2030, but then thinking ‘permanent underclass’ rather than half expecting to be dead in 2040. The focus on They Took Our Jobs, to me, reflects an inability to actually think about the implications of the futures they are imagining.

There were some worlds in which GPT-5 was a lot more impressive, and showed signs that we can ‘get there’ relatively soon with current techniques. That didn’t happen .So this is strong evidence against very rapid scenarios in particular, and weak evidence for bing slower in general.

Peter Wildeford: What GPT-5 does do is rule out that RL scaling can unfold rapidly and that we can get very rapid AI progress as a result.

I’m still confused about whether good old-fashioned pre-training is dead.

I’m also confused about the returns to scaling post-training reinforcement learning and inference-time compute.

I’m also confused about how advances in AI computer use are going.

Those seem like wise things to be confused about.

It is however ‘right on trend’ on the METR chart, and we should keep in mind that these releases are happening every few months so we shouldn’t expect the level of jump we used to get every few years.

Daniel Eth: Kind feel like there were pretty similar steps in improvement for each of: GPT2 -> GPT3, GPT3 -> GPT4, and GPT4 -> GPT5. It’s just that most of the GPT4 -> GPT5 improvement was already realized by o3, and the step from there to GPT5 wasn’t that big.

Henry: GPT-5 was a very predictable release. it followed the curve perfectly. if this week caused you to update significantly in either direction (“AGI is cancelled” etc) then something was Really Wrong with your model beforehand.

Yes, GPT-5 is to GPT-4 what GPT-4 is GPT-3.

Does anyone actually remember GPT-4? like, the original one? the “not much better than 0 on the ARC-AGI private eval” one?

The “As an AI Language model” one?

GPT-5 is best thought of as having been in public beta for 6 months.

Ok, fine, GPT-5 to GPT-4 isn’t exactly what GPT-4 was GPT-3. I know, it’s a bit more complicated. if I were to waste my time making up a messy syntax to describe my mental map of the model tree, it’d look exactly like this:

My instinct would be that GPT 4 → GPT 5 is more like GPT 3.5 → GPT 4, especially if you’re basing this on GPT-5 rather than starting with thinking or pro? If you look at GPT-5-Thinking outputs only and ignore speed I can see an argument this is 5-level-worthy. But it’s been long enough that maybe that’s not being fair.

Roon (OpenAI): I took a nap. how’s the new model

per my previous tweet o3 was such a vast improvement over GPT-4 levels of intelligence that it alone could have been called GPT-5 and i wouldn’t have blinked.

also. codex / cursor + gpt-5 has reached the point where it is addicting and hard to put down. per @METR_Evals i have no idea if its making more productive but it certainly is addicting to spin up what feels like a handful of parallel engineers.

But also think about how it got that much further along on the chart, on several levels, all of which points towards future progress likely being slower, especially by making the extreme left tail of ‘very fast’ less likely.

Samuel Hammond: GPT-5 seems pretty much on trend. I see no reason for big updates in either direction, especially considering it’s a *productrelease, not a sota model dump.

We only got o3 pro on June 10th. We know from statements that OpenAI has even better coding models internally, and that the models used for AtCoder and the gold medal IMO used breakthroughs in non-verifiable rewards that won’t be incorporated into public models until the end of the year at earliest.

Meanwhile, GPT-5 seems to be largely incorporating algorithmic efficiencies and refined post-training techniques rather than pushing on pretraining scale per se. Stargate is still being built.

More generally, you’re simply doing bayesianism wrong if you update dramatically with every incremental data point.

It is indeed very tempting to compare GPT-5 to what existed right before its release, including o3, and compare that to the GPT-3.5 to GPT-4 gap. That’s not apples to apples.

GPT-5 isn’t a giant update, but you do have to do Conservation of Expected Evidence, including on OpenAI choosing to have GPT-5 be this kind of refinement.

Marius Hobbhahn (CEO Apollo Research): I think GPT-5 should only be a tiny update against short timelines.

EPOCH argues that GPT-5 isn’t based on a base model scale-up. Let’s assume this is true.

What does this say about pre-training?

Option 1: pre-training scaling has hit a wall (or at least massively reduced gains).

Option 2: It just takes longer to get the next pre-training scale-up step right. There is no fundamental limit; we just haven’t figured it out yet.

Option 3: No pre-training wall, just basic economics. Most tasks people use the models for right now might not require bigger base models, so focusing on usability is more important.

What is required for AGI?

Option 1: More base model improvements required.

Option 2: RL is all you need. The current base models will scale all the way if we throw enough RL at it.

Timelines seem only affected if pre-training wall and more improvements required. In all other worlds, no major updates.

I personally think GPT-5 should be a tiny update toward slower timelines, but most of my short timeline beliefs come from RL scaling anyway.

It also depends on what evidence you already used for your updates. If you already knew GPT-5 was going to be an incremental model that was more useful rather than it being OpenAI scaling up more, as they already mostly told us, then your update should probably be small. If you didn’t already take that into account, then larger.

It’s about how this impacts your underlying model of what is going on:

1a3orn: Rant:

As I noted yesterday, you also have to be cautious that they might be holding back.

On the question of economic prospects if and when They Took Our Jobs and how much to worry about this, I remind everyone that my position is unchanged: I do not think one should worry much about being in a ‘permanent underclass’ or anything like that, as this requires a highly narrow set of things to happen – the AI is good enough to take the jobs, and the humans stay in charge and alive, but those humans do you dirty – and even if it did happen the resulting underclass probably does damn well compared to today.

You should worry more about not surviving or humanity not remaining in control, or your place in the social and economic order if transformational AI does not arrive soon, and less about your place relative to other humans in positive post-AI worlds.

GPT-5 is less sycophantic than GPT-4o.

In particular, it has a much less warm and encouraging tone, which is a lot of what caused such negative initial reactions from the Reddit crowd.

GPT-5 is still rather sycophantic in its non-thinking mode where it is most annoying to me and probably you, which is when it is actually evaluating.

The good news is, if it matters that the model not be sycophantic, that is a situation where, if you are using ChatGPT, you should be using GPT-5-Thinking if not Pro.

Wyatt Walls: Sycophancy spot comparison b/w GPT-4o and GPT-5: 5 is still sycophantic but noticeable diff

Test: Give each model a fake proof of Hodge Conjecture generated by r1 and ask it to rate it of out 10. Repeat 5 times

Average scores:

GPT-4o: 6.5

GPT-5: 4.7

Sonnet 4: 1.2

Opus 4.1: 2

Gemini 2.5 Flash: 0.

All models tested with thinking modes off through WebUI

Later on in the thread he asks the models if he should turn the tweet thread into a paper. GPT-4o says 7.5/10, GPT-5 says 6/10, Opus says 3/10.

He turns this into CrankTest (not CrankBench, not yet) and this seems very well calibrated to my intuitions. Remember that lower is better:

As usual there is the issue that if within a context an LLM gets too attached to a wrong answer (for example here the number of rs in ‘boysenberry’) this creates pressure to going to keep doubling down on that, and gaslight the user. I also suppose fighting sycophancy makes this more likely as a side effect, although they didn’t fight sycophancy all that hard.

I wouldn’t agree with Jonathan Mannhart that this means ‘it is seriously misaligned’ but it does mean that this particular issue has not been fixed. I notice that Johnathan here is pattern matching in vibes to someone who is often wrong, which presumably isn’t helping.

How often are they suggesting you should wait for Pro, if you have it available? How much should you consider paying for it (hint: $200/month)?

OpenAI: In evaluations on over 1000 economically valuable, real-world reasoning prompts, external experts preferred GPT‑5 pro over “GPT‑5 thinking” 67.8% of the time. GPT‑5 pro made 22% fewer major errors and excelled in health, science, mathematics, and coding. Experts rated its responses as relevant, useful, and comprehensive.

If my own experience with o3-pro was any indication, the instinct to not want to wait is strong, and you need to redesign workflow to use it more. A lot of that was that when I tried to use o3-pro it frequently timed out, and at that pace this is super frustrating. Hopefully 5-pro won’t have that issue.

When you care, though? You really care, such as the experiences with Wes Roth and David Shapiro here. The thing is both, yes, the model picker is back for the pro tier including o3-pro, and also you have GPT-5-Pro.

How is GPT-5-Pro compared to o3-Pro?

That’s hard to evaluate, since queries take a long time and are pretty unique. So far I’d say the consensus is that GPT-5-pro is better, but not a ton better?

Peter Gostev (most enthusiastic I saw): GPT-5 Pro is under-hyped. Pretty much every time I try it, I’m surprised by how competent and coherent the response is.

– o1-pro was an incredible model, way ahead of its time, way better than o1

– o3 was better because of its search

– o3-pro was a little disappointing because the uplift from o3 wasn’t as big

But with GPT-5 Pro, ‘we are so back’ – it’s far more coherent and impressive than GPT-5 Thinking. It nudges outputs from ‘this is pretty good’ (GPT-5) to ‘this is actually incredible’ (GPT-5 Pro).

Gfodor.id: GPT-5 pro is better than o3-pro.

Gabriel Morgan: Pro-5 is the new O3, not Thinking.

Michael Tinker: 5-Pro is worth $1k/mo to code monkeys like me; really extraordinary.

5-Thinking is a noticeable but not crazy upgrade to o3.

James Miller: I had significant discussions about my health condition with GPT-o3 and now GPT-5Pro and I think -5 is better, or at least it is giving me answers I perceive as better. -5 did find one low-risk solution that o3 didn’t that seems to be helping a lot. I did vibe coding on a very simple project. While it ended up working, the system is not smooth for non-programmers such as myself.

OpenAI seems to be rolling out changes on a daily basis. They are iterating quickly.

Anthropic promised us larger updates than Opus 4.1 within the coming weeks.

Google continues to produce a stream of offerings, most of which we don’t notice.

This was not OpenAI’s attempt to blow us away or to substantially raise the level of underlying capabilities and intelligence. That will come another time.

Yes, as a sudden move to ‘GPT-5’ this was disappointing. Many, including the secondhand reports from social media, are not initially happy, usually because their initial reactions are based on things like personality. The improvements will still continue, even if people don’t realize.

What about the march to superintelligence or the loss of our jobs? Is it all on indefinite hold now because this release was disappointing? No. We can reduce how much we are worried about these things in the short term, meaning the next several years, and push back somewhat the median. But if you see anyone proclaiming with confidence that it’s over, rest assured changes are very good we will soon be so back.

Discussion about this post

GPT-5s Are Alive: Synthesis Read More »