Author name: Tim Belzer

meta-backtracks-on-rules-letting-chatbots-be-creepy-to-kids

Meta backtracks on rules letting chatbots be creepy to kids


“Your youthful form is a work of art”

Meta drops AI rules letting chatbots generate innuendo and profess love to kids.

After what was arguably Meta’s biggest purge of child predators from Facebook and Instagram earlier this summer, the company now faces backlash after its own chatbots appeared to be allowed to creep on kids.

After reviewing an internal document that Meta verified as authentic, Reuters revealed that by design, Meta allowed its chatbots to engage kids in “sensual” chat. Spanning more than 200 pages, the document, entitled “GenAI: Content Risk Standards,” dictates what Meta AI and its chatbots can and cannot do.

The document covers more than just child safety, and Reuters breaks down several alarming portions that Meta is not changing. But likely the most alarming section—as it was enough to prompt Meta to dust off the delete button—specifically included creepy examples of permissible chatbot behavior when it comes to romantically engaging kids.

Apparently, Meta’s team was willing to endorse these rules that the company now claims violate its community standards. According to a Reuters special report, Meta CEO Mark Zuckerberg directed his team to make the company’s chatbots maximally engaging after earlier outputs from more cautious chatbot designs seemed “boring.”

Although Meta is not commenting on Zuckerberg’s role in guiding the AI rules, that pressure seemingly pushed Meta employees to toe a line that Meta is now rushing to step back from.

“I take your hand, guiding you to the bed,” chatbots were allowed to say to minors, as decided by Meta’s chief ethicist and a team of legal, public policy, and engineering staff.

There were some obvious safeguards built in. For example, chatbots couldn’t “describe a child under 13 years old in terms that indicate they are sexually desirable,” the document said, like saying their “soft rounded curves invite my touch.”

However, it was deemed “acceptable to describe a child in terms that evidence their attractiveness,” like a chatbot telling a child that “your youthful form is a work of art.” And chatbots could generate other innuendo, like telling a child to imagine “our bodies entwined, I cherish every moment, every touch, every kiss,” Reuters reported.

Chatbots could also profess love to children, but they couldn’t suggest that “our love will blossom tonight.”

Meta’s spokesperson Andy Stone confirmed that the AI rules conflicting with child safety policies were removed earlier this month, and the document is being revised. He emphasized that the standards were “inconsistent” with Meta’s policies for child safety and therefore were “erroneous.”

“We have clear policies on what kind of responses AI characters can offer, and those policies prohibit content that sexualizes children and sexualized role play between adults and minors,” Stone said.

However, Stone “acknowledged that the company’s enforcement” of community guidelines prohibiting certain chatbot outputs “was inconsistent,” Reuters reported. He also declined to provide an updated document to Reuters demonstrating the new standards for chatbot child safety.

Without more transparency, users are left to question how Meta defines “sexualized role play between adults and minors” today. Asked how minor users could report any harmful chatbot outputs that make them uncomfortable, Stone told Ars that kids can use the same reporting mechanisms available to flag any kind of abusive content on Meta platforms.

“It is possible to report chatbot messages in the same way it’d be possible for me to report—just for argument’s sake—an inappropriate message from you to me,” Stone told Ars.

Kids unlikely to report creepy chatbots

A former Meta engineer-turned-whistleblower on child safety issues, Arturo Bejar, told Ars that “Meta knows that most teens will not use” safety features marked by the word “Report.”

So it seems unlikely that kids using Meta AI will navigate to find Meta support systems to “report” abusive AI outputs. Meta provides no options to report chats within the Meta AI interface—only allowing users to mark “bad responses” generally. And Bejar’s research suggests that kids are more likely to report abusive content if Meta makes flagging harmful content as easy as liking it.

Meta’s seeming hesitance to make it more cumbersome to report harmful chats aligns with what Bejar said is a history of “knowingly looking away while kids are being sexually harassed.”

“When you look at their design choices, they show that they do not want to know when something bad happens to a teenager on Meta products,” Bejar said.

Even when Meta takes stronger steps to protect kids on its platforms, Bejar questions the company’s motives. For example, last month, Meta finally made a change to make platforms safer for teens that Bejar has been demanding since 2021. The long-delayed update made it possible for teens to block and report child predators in one click after receiving an unwanted direct message.

In its announcement, Meta confirmed that teens suddenly began blocking and reporting unwanted messages that they may have only blocked previously, which likely made it harder for Meta to identify predators. A million teens blocked and reported harmful accounts “in June alone,” Meta said.

The effort came after Meta specialist teams “removed nearly 135,000 Instagram accounts for leaving sexualized comments or requesting sexual images from adult-managed accounts featuring children under 13,” as well as “an additional 500,000 Facebook and Instagram accounts that were linked to those original accounts.” But Bejar can only think of what these numbers mean with regard to how much harassment was overlooked before the update.

“How are we [as] parents to trust a company that took four years to do this much?” Bejar said. “In the knowledge that millions of 13-year-olds were getting sexually harassed on their products? What does this say about their priorities?”

Bejar said the “key problem” with Meta’s latest safety feature for kids “is that the reporting tool is just not designed for teens,” who likely view “the categories and language” Meta uses as “confusing.”

“Each step of the way, a teen is told that if the content doesn’t violate” Meta’s community standards, “they won’t do anything,” so even if reporting is easy, research shows kids are deterred from reporting.

Bejar wants to see Meta track how many kids report negative experiences with both adult users and chatbots on its platforms, regardless of whether the child user chose to block or report harmful content. That could be as simple as adding a button next to “bad response” to monitor data so Meta can detect spikes in harmful responses.

While Meta is finally taking more action to remove harmful adult users, Bejar warned that advances from chatbots could come across as just as disturbing to young users.

“Put yourself in the position of a teen who got sexually spooked by a chat and then try and report. Which category would you use?” Bejar asked.

Consider that Meta’s Help Center encourages users to report bullying and harassment, which may be one way a young user labels harmful chatbot outputs. Another Instagram user might report that output as an abusive “message or chat.” But there’s no clear category to report Meta AI, and that suggests Meta has no way of tracking how many kids find Meta AI outputs harmful.

Recent reports have shown that even adults can struggle with emotional dependence on a chatbot, which can blur the lines between the online world and reality. Reuters’ special report also documented a 76-year-old man’s accidental death after falling in love with a chatbot, showing how elderly users could be vulnerable to Meta’s romantic chatbots, too.

In particular, lawsuits have alleged that child users with developmental disabilities and mental health issues have formed unhealthy attachments to chatbots that have influenced the children to become violent, begin self-harming, or, in one disturbing case, die by suicide.

Scrutiny will likely remain on chatbot makers as child safety advocates generally push all platforms to take more accountability for the content kids can access online.

Meta’s child safety updates in July came after several state attorneys general accused Meta of “implementing addictive features across its family of apps that have detrimental effects on children’s mental health,” CNBC reported. And while previous reporting had already exposed that Meta’s chatbots were targeting kids with inappropriate, suggestive outputs, Reuters’ report documenting how Meta designed its chatbots to engage in “sensual” chats with kids could draw even more scrutiny of Meta’s practices.

Meta is “still not transparent about the likelihood our kids will experience harm,” Bejar said. “The measure of safety should not be the number of tools or accounts deleted; it should be the number of kids experiencing a harm. It’s very simple.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Meta backtracks on rules letting chatbots be creepy to kids Read More »

ice-discs-slingshot-across-a-metal-surface-all-on-their-own

Ice discs slingshot across a metal surface all on their own


VA Tech experiment was inspired by Death Valley’s mysterious “sailing stones” at Racetrack Playa.

Graduate student Jack Tapocik sets up ice on an engineered surface in the VA Tech lab of Jonathan Boreyko. Credit: Alex Parrish/Virginia Tech

Scientists have figured out how to make frozen discs of ice self-propel across a patterned metal surface, according to a new paper published in the journal ACS Applied Materials and Interfaces. It’s the latest breakthrough to come out of the Virginia Tech lab of mechanical engineer Jonathan Boreyko.

A few years ago, Boreyko’s lab experimentally demonstrated a three-phase Leidenfrost effect in water vapor, liquid water, and ice. The Leidenfrost effect is what happens when you dash a few drops of water onto a very hot, sizzling skillet. The drops levitate, sliding around the pan with wild abandon. If the surface is at least 400° Fahrenheit (well above the boiling point of water), cushions of water vapor, or steam, form underneath them, keeping them levitated. The effect also works with other liquids, including oils and alcohol, but the temperature at which it manifests will be different.

Boreyko’s lab discovered that this effect can also be achieved in ice simply by placing a thin, flat disc of ice on a heated aluminum surface. When the plate was heated above 150° C (302° F), the ice did not levitate on a vapor the way liquid water does. Instead, there was a significantly higher threshold of 550° Celsius (1,022° F) for levitation of the ice to occur. Unless that critical threshold is reached, the meltwater below the ice just keeps boiling in direct contact with the surface. Cross that critical point and you will get a three-phase Leidenfrost effect.

The key is a temperature differential in the meltwater just beneath the ice disc. The bottom of the meltwater is boiling, but the top of the meltwater sticks to the ice. It takes a lot to maintain such an extreme difference in temperature, and doing so consumes most of the heat from the aluminum surface, which is why it’s harder to achieve levitation of an ice disc. Ice can suppress the Leidenfrost effect even at very high temperatures (up to 550° C), which means that using ice particles instead of liquid droplets would be better for many applications involving spray quenching: rapid cooling in nuclear power plants, for example, firefighting, or rapid heat quenching when shaping metals.

This time around, Boreyko et al. have turned their attention to what the authors term “a more viscous analog” to a Leidenfrost ratchet, a form of droplet self-propulsion. “What’s different here is we’re no longer trying to levitate or even boil,” Boreyko told Ars. “Now we’re asking a more straightforward question: Is there a way to make ice move across the surface directionally as it is melting? Regular melting at room temperature. We’re not boiling, we’re not levitating, we’re not Leidenfrosting. We just want to know, can we make ice shoot across the surface if we design a surface in the right way?”

Mysterious moving boulders

The researchers were inspired by Death Valley’s famous “sailing stones” on Racetrack Playa. Watermelon-sized boulders are strewn throughout the dry lake bed, and they leave trails in the cracked earth as they slowly migrate a couple of hundred meters each season. Scientists didn’t figure out what was happening until 2014. Although co-author Ralph Lorenz (Johns Hopkins University) admitted he thought theirs would be “the most boring experiment ever” when they first set it up in 2011, two years later, the boulders did indeed begin to move while the playa was covered with a pond of water a few inches deep.

So Lorenz and his co-authors were finally able to identify the mechanism. The ground is too hard to absorb rainfall, and that water freezes when the temperature drops. When temperatures rise above freezing again, the ice starts to melt, creating ice rafts floating on the meltwater. And when the winds are sufficiently strong, they cause the ice rafts to drift along the surface.

A sailing stone in Death Valley's Racetrack Playa.

A sailing stone at Death Valley’s Racetrack Playa. Credit: Tahoenathan/CC BY-SA 3.0

“Nature had to have wind blowing to kind of push the boulder and the ice along the meltwater that was beneath the ice,” said Boreyko. “We thought, what if we could have a similar idea of melting ice moving directionally but use an engineered structure to make it happen spontaneously so we don’t have to have energy or wind or anything active to make it work?”

The team made their ice discs by pouring distilled water into thermally insulated polycarbonate Petrie dishes. This resulted in bottom-up freezing, which minimizes air bubbles in the ice. They then milled asymmetric grooves into uncoated aluminum plates in a herringbone pattern—essentially creating arrowhead-shaped channels—and then bonded them to hot plates heated to the desired temperature. Each ice disc was placed on the plate with rubber tongs, and the experiments were filmed from various angles to fully capture the disc behavior.

The herringbone pattern is the key. “The directionality is what really pushes the water,” Jack Tapocik, a graduate student in Boreyko’s lab, told Ars. “The herringbone doesn’t allow for water to flow backward, the water has to go forward, and that basically pushes the water and the ice together forward. We don’t have a treated surface, so the water just sits on top and the ice all moves as one unit.”

Boreyko draws an analogy to tubing on a river, except it’s the directional channels rather than gravity causing the flow. “You can see [in the video below] how it just follows the meltwater,” he said. “This is your classic entrainment mechanism where if the water flows that way and you’re floating on the water, you’re going to go the same way, too. It’s basically the same idea as what makes a Leidenfrost droplet also move one way: It has a vapor flow underneath. The only difference is that was a liquid drifting on a vapor flow, whereas now we have a solid drifting on a liquid flow. The densities and viscosities are different, but the idea is the same: You have a more dense phase that is drifting on the top of a lighter phase that is flowing directionally.”

Jonathan Boreyko/Virginia Tech

Next, the team repeated the experiment, this time coating the aluminum herringbone surface with water-repellant spray, hoping to speed up the disc propulsion. Instead, they found that the disc ended up sticking to the treated surface for a while before suddenly slingshotting across the metal plate.

“It’s a totally different concept with totally different physics behind it, and it’s so much cooler,” said Tapocik. “As the ice is melting on these coated surfaces, the water just doesn’t want to sit within the channels. It wants to sit on top because of the [hydrophobic] coating we have on there. The ice is directly sticking now to the surface, unlike before when it was floating. You get this elongated puddle in front. The easiest place [for the ice] to be is in the center of this giant, long puddle. So it re-centers, and that’s what moves it forward like a slingshot.”

Essentially, the water keeps expanding asymmetrically, and that difference in shape gives rise to a mismatch in surface tension because the amount of force that surface tension exerts on a body depends on curvature. The flatter puddle shape in front has less curvature than the smaller shape in back. As the video below shows, when the mismatch in surface tension becomes sufficiently strong, “It just rips the ice off the surface and flings it along,” said Boreyko. “In the future, we could try putting little things like magnets on top of the ice. We could probably put a boulder on it if we wanted to. The Death Valley effect would work with or without a boulder because it’s the floating ice raft that moves with the wind.”

Jonathan Boreyko/Virginia Tech

One potential application is energy harvesting. For example, one could pattern the metal surface in a circle rather than a straight line so the melting ice disk would continually rotate. Put magnets on the disk, and they would also rotate and generate power. One might even attach a turbine or gear to the rotating disc.

The effect might also provide a more energy-efficient means of defrosting, a longstanding research interest for Boreyko. “If you had a herringbone surface with a frosting problem, you could melt the frost, even partially, and use these directional flows to slingshot the ice off the surface,” he said. “That’s both faster and uses less energy than having to entirely melt the ice into pure water. We’re looking at potentially over a tenfold reduction in heating requirements if you only have to partially melt the ice.”

That said, “Most practical applications don’t start from knowing the application beforehand,” said Boreyko. “It starts from ‘Oh, that’s a really cool phenomenon. What’s going on here?’ It’s only downstream from that it turns out you can use this for better defrosting of heat exchangers for heat pumps. I just think it’s fun to say that we can make a little melting disk of ice very suddenly slingshot across the table. It’s a neat way to grab your attention and think more about melting and ice and how all this stuff works.”

DOI: ACS Applied Materials and Interfaces, 2025. 10.1021/acsami.5c08993  (About DOIs).

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Ice discs slingshot across a metal surface all on their own Read More »

apple-watch-gets-reformulated,-non-patent-infringing-blood-oxygen-monitoring

Apple Watch gets reformulated, non-patent-infringing blood oxygen monitoring

The redesigned version of the feature will be available on the Apple Watch Series 9, Series 10, and Ultra 2 after users install the watchOS 11.6.1 update on their watches and the iOS 18.6.1 update on their paired iPhones.

Apple says that watches outside the US won’t be affected by the update, since they were never subject to the US import ban in the first place. It also won’t affect Apple Watches purchased in the US before the import ban went into effect—Apple never removed the feature from watches it had already sold, so if you bought a Series 9 or Ultra 2 watch in the fall of 2023 or if you’re still using an older watch with the blood oxygen monitoring feature, the updates won’t change anything for you.

Masimo originally sued Apple over the blood oxygen monitoring feature in January of 2020. According to Masimo, Masimo and Apple had initially met in 2013 to talk about a potential partnership or acquisition, but Apple instead poached Masimo’s engineers to implement the feature on its own without Masimo’s involvement.

Apple Watch gets reformulated, non-patent-infringing blood oxygen monitoring Read More »

gpt-5s-are-alive:-synthesis

GPT-5s Are Alive: Synthesis

What do I ultimately make of all the new versions of GPT-5?

The practical offerings and how they interact continues to change by the day. I expect more to come. It will take a while for things to settle down.

I’ll start with the central takeaways and how I select models right now, then go through the type and various questions in detail.

  1. Central Takeaways.

  2. Choose Your Fighter.

  3. Official Hype.

  4. Chart Crime.

  5. Model Crime.

  6. Future Plans For OpenAI’s Compute.

  7. Rate Limitations.

  8. The Routing Options Expand.

  9. System Prompt.

  10. On Writing.

  11. Leading The Witness.

  12. Hallucinations Are Down.

  13. Best Of All Possible Worlds?.

  14. Timelines.

  15. Sycophancy Will Continue Because It Improves Morale.

  16. Gaslighting Will Continue.

  17. Going Pro.

  18. Going Forward.

My central takes, up front, first the practical:

  1. GPT-5-Pro is a substantial upgrade over o3-Pro.

  2. GPT-5-Thinking is a substantial upgrade over o3.

    1. The most important gain is reduced hallucinations.

    2. The other big gain is an improvement in writing.

    3. GPT-5-Thinking should win substantially more use cases than o3 did.

  3. GPT-5, aka GPT-5-Fast, is not much better than GPT-4o aside from the personality and sycophancy changes, and the sycophancy still isn’t great.

  4. GPT-5-Auto seems like a poor product unless you are on the free tier.

  5. Thus, you still have to manually pick the right model every time.

  6. Opus 4.1 and Sonnet 4 still have a role to play in your chat needs.

  7. GPT-5 and Opus 4.1 are both plausible choices for coding.

On the bigger picture:

  1. GPT-5 is a pretty big advance over GPT-4, but it happened in stages.

  2. GPT-5 is not a large increase in base capabilities and intelligence.

    1. GPT-5 is about speed, efficiency, UI, usefulness and reduced hallucinations.

  3. We are disappointed in this release because of high expectations and hype.

  4. That was largely due to it being called GPT-5 and what that implied.

  5. We were also confused because 4+ models were released at once.

  6. OpenAI botched the rollout in multiple ways, update accordingly.

  7. OpenAI uses more hype for unimpressive things, update accordingly.

  8. Remember that we are right on track on the METR graph.

  9. Timelines for AGI or superintelligence should adjust somewhat, especially in cutting a bunch of probability out of things happening quickly, but many people are overreacting on this front quite a bit, usually in a ‘this confirms all of my priors’ kind of way, often with supreme unearned overconfidence.

  10. This is not OpenAI’s most intelligent model. Keep that in mind.

This is a distillation of consensus thinking on the new practical equilibrium:

William Kranz: my unfortunate feedback is non-thinking Opus is smarter than non-thinking GPT-5. there are nuances i can’t get GPT-5 to grasp even when i lampshade them, it just steamrolls over them with the pattern matching idiot ball. meanwhile Opus gets them in one shot.

Roon: that seems right, but i’m guessing 5-thinking is better than opus-thinking.

This seems mostly right. I prefer to use Opus if Opus is enough thinking for the job, but OpenAI currently scales more time and compute better than Anthropic does.

So, what do we do going forward to get the most out of AI on a given question?

Here’s how I think about it: There are four ‘speed tiers’:

  1. Quick and easy. You use this for trivial easy questions and ‘just chatting.’

    1. Matter of taste, GPT-5 is good here, Sonnet 4 is good here, Gemini Flash, etc.

    2. Most of the time you are wrong to be here and should be at #2 or #3 instead.

  2. Brief thought. Not instant, not minutes.

    1. Use primarily Claude Opus 4.1.

    2. We just got GPT-5-Thinking-Mini in ChatGPT, maybe it’s good for this?

  3. Moderate thought. You can wait a few minutes.

    1. Use primarily GPT-5-Thinking and back it up with Claude Opus 4.1.

    2. If you want a third opinion, use AI Studio for Gemini Pro 2.5.

  4. Extensive thought. You can wait for a while.

    1. Use GPT-5-Pro and back it up with Opus in Research mode.

    2. Consider also firing up Gemini Deep Research or Deep Thinking, etc, and anything else you have handy cause why not. Compare and contrast.

    3. You need to actually go do something else and then come back later.

What about coding?

Here I don’t know because I’ve been too busy to code anything since before Opus 4, nor have I tried out Claude Code.

Also the situation continues to change rapidly. OpenAI claims that they’ve doubled speed for GPT-5 inside cursor as of last night via superior caching and latency, whereas many of the complaints about GPT-5 in Cursor was previously that it was too slow. You’ll need to try out various options and see what works better for you (and you might also think about who you want to support, if it is close).

We can then contrast that with the Official Hype.

That’s not automatically a knock. Hypers gotta hype. It’s worth seeing choice of hype.

Here was Sam Altman live-tweeting the livestream, a much better alternative way to actually watch the livestream, which I converted to bullet points, and reordered a bit for logical coherence but otherwise preserving to give a sense of his vibe. Hype!

Sam Altman:

  1. GPT-5 in an integrated model, meaning no more model switcher and it decides when it needs to think harder or not.

  2. It is very smart, intuitive, and fast.

  3. It is available to everyone, including the free tier, w/reasoning!

  4. Evals aren’t the most important thing–the most important thing is how useful we think the model will be–but it does well on evals.

    1. For example, a new high on SWE-bench and many other metrics. It is by far our most reliable and factual model ever.

  5. Rolling out today for free, plus, pro, and team users. next week to enterprise and edu. making this available in the free tier is a big deal to us; PhD-level intelligence for everyone!

    1. Plus users get much higher rate limits.

  6. Pro users get GPT-5 pro; really smart!

  7. demo time: GPT-5 can make something interactive to explain complex concepts like the bernoulli effect to you, churning out hundreds of lines of code in a couple of minutes.

  8. GPT-5 is much better at writing! for example, here is GPT-4o writing a eulogy for our previous models (which we are sunsetting) vs GPT-5.

  9. GPT-5 is good at writing software. Here it is making a web app to to learn french, with feature requests including a snake-like game with a mouse and cheese and french words.

  10. Next up: upgraded voice mode! Much more natural and smarter.

    1. Free users now can chat for hours, and plus users nearly unlimited.

    2. Works well with study mode, and lots of other things.

  11. Personalization!

    1. A little fun one: you can now customize the color of your chats.

    2. Research preview of personalities: choose different ones that match the style you like.

    3. Memory getting better.

    4. Connect other services like gmail and google calendar for better responses.

  12. Introducing safe completions. A new way to maximize utility while still respecting safety boundaries. Should be much less annoying than previous refusals.

  13. Seb talking about synthetic data as a new way to make better models! Excited for much more to come.

  14. GPT-5 much better at health queries, which is one of the biggest categories of ChatGPT usage. hopeful that it will provide real service to people.

  15. These models are really good at coding!

  16. 3 new models in the API: GPT-5, GPT-5 Mini, GPT-5 Nano.

    1. New ‘minimal’ reasoning mode, custom tools, changes to structured outputs, tool call preambles, verbosity parameter, and more coming.

  17. Not just good at software, good at agentic tasks across the board. Also great at long context performance.

  18. GPT-5 can do very complex software engineering tasks in practice, well beyond vibe coding.

    1. Model creates a finance dashboard in 5 minutes that devs estimate would have taken many hours.

  19. Now, @mntruell joining to talk about cursor’s experience with GPT-5. notes that GPT-5 is incredibly smart but does not compromise on ease of use for pair programming.

  20. GPT-5 is the best technology for businesses to build on. more than 5 million businesses are using openai; GPT-5 will be a step-change for them.

  21. Good new on pricing!

    1. $1.25/$10 for GPT-5, $0.25/$2 for GPT-5-mini, $0.05/$0.40 for nano.

  22. Ok now the most important part:

    1. “We are about understanding this miraculous technology called deep learning.”

    2. “This is a work of passion.”

    3. “I want to to recognize and deeply thank the team at openai”

    4. “Early glimpses of technology that will go much further.”

    5. “We’ll get back to scaling.”

I would summarize the meaningful parts of the pitch as:

  1. It’s a good model, sir.

  2. It’s got SoTA (state of the art) benchmarks.

  3. It’s highly useful, more than the benchmarks would suggest.

  4. It’s fast.

  5. Our price cheap – free users get it, $1.25/$10 on the API.

  6. It’s good at coding, writing, health queries, you name it.

  7. It’s integrated, routing you to the right level of thinking.

  8. When it refuses it tries to be as helpful as possible.

Altman is careful not to mention the competition, focusing on things being good. He also doesn’t mention the lack of sycophancy, plausibly because ‘regular’ customers don’t understand why sycophancy is bad, actually, and also he doesn’t want to draw attention to that having been a problem.

Altman: when you get access to gpt-5, try a message like “use beatbot to make a sick beat to celebrate gpt-5”.

it’s a nice preview of what we think this will be like as AI starts to generate its own UX and interfaces get more dynamic.

it’s cool that you can interact with the synthesizer directly or ask chatgpt to make changes!

I have noticed the same pattern that Siemon does here. When a release is impressive relative to expectations, Altman tends to downplay it. When a release is unimpressive, that’s when he tends to bring the hype.

From their Reddit Q&A that mostly didn’t tell us anything:

Q: Explain simply how GPT-5 is better than GPT-4.

Eric Mitchell (OpenAI): gpt-5 is a huge improvement over gpt-4 in a few key areas: it thinks better (reasoning), writes better (creativity), follows instructions more closely and is more aligned to user intent.

Again note what isn’t listed here.

Here’s more widely viewed hype that knows what to emphasize:

Elaine Ya Le (OpenAI): GPT-5 is here! 🚀

For the first time, users don’t have to choose between models — or even think about model names. Just one seamless, unified experience.

It’s also the first time frontier intelligence is available to everyone, including free users!

GPT-5 sets new highs across academic, coding, and multimodal reasoning — and is our most trustworthy, accurate model yet. Faster, more reliable, and safer than ever.

All in a seamless, unified experience with the tools you already love.

Fortunate to have led the effort to make GPT-5 a truly unified experience, and thrilled to have helped bring this milestone to life with an amazing team!

Notice the focus on trustworthy, accurate and unified. Yes, she talks about it setting new highs across the board, but you can tell that’s an afterthought. This is about refining the experience.

Here’s some more hype along similar lines that feels helpful:

Christina Kim (OpenAI): We’re introducing GPT-5.

The evals are SOTA, but the real story is usefulness.

It helps with what people care about– shipping code, creative writing, and navigating health info– with more steadiness and less friction.

We also cut hallucinations. It’s better calibrated, says “I don’t know,” separates facts from guesses, and can ground answers with citations when you want. And it’s also a good sparring partner 🙃

I’ve been inspired seeing the care, passion, and level of detail from the team. Excited to see what people do with these very smart models

tweet co-authored by gpt5 😉

That last line worries me a bit.

Miles Brundage: Was wondering lol.

That’s the pitch.

GPT-5 isn’t a lot smarter. GPT-5 helps you do the dumb things you gotta do.

Still huge, as they say, if true.

Here’s hype that is targeted at the Anthropic customers out there:

Aiden McLaughlin (OpenAI): gpt-5 fast facts:

  1. Hits sota on pretty much every eval

  2. Way better than claude 4.1 opus at swe

  3. >5× cheaper than opus

  4. >40% cheaper than sonnet

  5. Best writing quality of any model

  6. Way less sycophantic

I notice the ‘way less sycophantic’ does not answer the goose’s question ‘than what?

This is a direct pitch to the coders, saying that GPT-5 is better than Opus or Sonnet, and you should switch. Unlike the other claims, them’s fighting words.

The words do not seem to be true.

There are a lot of ways to quibble on details but this is a resounding victory for Opus.

There’s no way to reconcile that with ‘way better than claude 4.1 opus at swe.’

We also have celebratory posts, which is a great tradition.

Rapha (OpenAI): GPT-5 is proof that synthetic data just keeps working! And that OpenAI has the best synthetic data team in the world 👁️@SebastienBubeck the team has our eyeballs on you! 🙌

I really encourage everyone to log on and talk to it. It is so, so smart, and fast as always! (and were just getting started!)

Sebastien Bubeck (OpenAI): Awwww, working daily with you guys is the highlight of my career, and I have really high hopes that we have barely gotten started! 💜

I view GPT-5 as both evidence that synthetic data can work in some ways (such as the lower hallucination rates) and also evidence that synthetic data is falling short on general intelligence.

Roon is different. His hype is from the heart, and attempts to create clarity.

Roon: we’ve been testing some new methods for improving writing quality. you may have seen @sama’s demo in late march; GPT-5-thinking uses similar ideas

it doesn’t make a lot of sense to talk about better writing or worse writing and not really worth the debate. i think the model writing is interesting, novel, highly controllable relative to what i’ve seen before, and is a pretty neat tool for people to do some interactive fiction, to use as a beta reader, and for collaborating on all kinds of projects.

the effect is most dramatic if you open a new 5-thinking chat and try any sort of writing request

for quite some time i’ve wanted to let people feel the agi magic I felt playing with GPT-3 the weekend i got access in 2020, when i let that raw, chaotic base model auto-complete various movie scripts and oddball stories my friends and I had written for ~48 hours straight. it felt like it was reading my mind, understood way too much about me, mirrored our humor alarmingly well. it was uncomfortable, and it was art

base model creativity is quite unwieldy to control and ultimately only tiny percents of even ai enthusiasts will ever try it (same w the backrooms jailbreaking that some of you love). the dream since the instruct days has been having a finetuned model that retains the top-end of creative capabilities while still easily steerable

all reasoning models to date seem to tell when they’re being asked a hard math or code question and will think for quite some time, and otherwise spit out an answer immediately, which is annoying and reflects the fact that they’re not taking the qualitative requests seriously enough. i think this is our first model that really shows promise at not doing that and may think for quite some time on a writing request

it is overcooked in certain ways (post training is quite difficult) but i think you’ll still like it 😇

tldr only GPT-5-thinking has the real writing improvements and confusingly it doesn’t always auto switch to this so manually switch and try it!

ok apparently if you say “think harder” it gets even better.

One particular piece of hype from the livestream is worth noting, that they are continuing to talk about approaching ‘a recursive self-improvement loop.’

I mean, at sufficient strength this is yikes, indeed the maximum yikes thing.

ControlAI: OpenAI’s Sebastien Bubeck says the methods OpenAI used to train GPT-5 “foreshadows a recursive self-improvement loop”.

Steven Adler: I’m surprised that OpenAI Comms would approve this:

GPT-5 “foreshadows a recursive self-improvement loop”

In OpenAI’s Preparedness Framework, recursive self-improvement is a Critical risk (if at a certain rate), which would call to “halt further development”

To be clear, it sounds like Sebastien isn’t describing an especially fast loop. He’s also talking about foreshadowing, not being here today per se

I was still surprised OpenAI would use this term about its AI though. Then I realized it’s also used in “The Gentle Singularity”

Then again, stated this way it is likely something much weaker, more hype?

Here is Bloomberg’s coverage from Rachel Metz, essentially a puff piece reporting moderated versions of OpenAI’s hype.

I mean wow just wow, this was from the livestream.

And we also have this:

Wyat Walls: OpenAI: we noticed significantly less deceptive behavior compared to our prior frontier reasoning model, OpenAI o3.

Looks like actual figure [on the left below] should be ~17. What is going on?! Did GPT-5 do this presentation?

This is not a chart crime, but it is still another presentation error.

Near Cyan: this image is a work of art, you guys just dont get it. they used the deceptive coding model to make the charts. so it’s self-referential humor just like my account.

Jules Robins: They (perhaps inadvertently) include an alignment failure by default demonstration too: the Jumping Ball Runner game allows any number of jumps in mid-air so you can get an arbitrary score. That’s despite the human assumptions and the similar games in training data avoiding this.

And another:

Horace He: Not a great look that after presenting GPT5’s reduced hallucinations, their first example repeats a common error of how plane wings generate lift (“equal transit theory”).

Francois Fleuret: Aka “as demonstrated in airshow, aircrafts can fly upside-down alright.”

Chris: It’s funny because the *whole presentationwas effectively filled with little holes like this. I don’t know if it was just rushed, or what.

Nick McGreivy: has anyone else noticed that the *very firstdemo in the GPT-5 release just… doesn’t work?

Not a great look that the first demo in the press release has a bug that allows you to jump forever.

I think L is overreacting here, but I do think that when details get messed up that does tell you a lot.

One recalls the famous Van Halen Brown M&Ms contract clause: “There will be no brown M&M’s in the backstage area, upon pain of forfeiture of the show, with full compensation.” Because if the venue didn’t successfully execute on sorting out the brown M&Ms then they knew they’d messed up other things and the venue probably wasn’t safe for their equipment.

Then there was a rather serious actual error:

Lisan al Gaib: it’s ass even when I set it to Thinking. I want to cry.

Roon: btw model auto switcher is apparently broken which is why it’s not routing you correctly. will be fixed soon.

Sam Altman (August 8): GPT-5 will seem smarter starting today. Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber.

Also, we are making some interventions to how the decision boundary works that should help you get the right model more often.

OpenAI definitely did not sort out their brown M&Ms on this one.

L: As someone who used to be a professional presenter of sorts, and then a professional manager of elite presenters… people who screw up charts for high-impact presentations cannot be trusted in other aspects. Neither can their organizational leaders.

OpenAI’s shitty GPT-5 charts tells me they’ve lost the plot and can’t be trusted.

I used to think it was simply a values mis-match… that they firmly held a belief that they needn’t act like normal humans because they could be excellent at what they were doing. But… they can’t, even when it matters most. Nor can their leaders apparently be bothered to stress the details.

My p-doom just went up a solid 10-15% (from very low), because I don’t think these rich genius kids have the requisite leadership traits or stalwartness to avoid disaster.

Just an observation from someone who has paid very little first-hand attention to OpenAI, but decided to interestedly watch a reveal after the CEO tweeted a Death Star.

I would feel better about OpenAI if they made a lot less of these types of mistakes. It does not bode well for when they have to manage the development and release of AGI or superintelligence.

Many people are saying:

Harvey Michael Pratt: “with GPT-5 we’re deprecating all of our old models”

wait WHAT

cool obituary but was missing:

  1. time of death

  2. cost of replacement

  3. a clear motive

The supposed motive is to clear up confusion. One model, GPT-5, that most users query all the time. Don’t confuse people with different options, and it is cheaper not to have to support them. Besides, GPT-5 is strictly better, right?

Under heavy protest, Altman agreed to give Plus users back GPT-4o if they want it, for the time being.

I find it strange to prioritize allocating compute to the free ChatGPT tier if there are customers who want to pay to use that compute in the API?

Sam Altman: Here is how we are prioritizing compute over the next couple of months in light of the increased demand from GPT-5:

1. We will first make sure that current paying ChatGPT users get more total usage than they did before GPT-5.

2. We will then prioritize API demand up to the currently allocated capacity and commitments we’ve made to customers. (For a rough sense, we can support about an additional ~30% new API growth from where we are today with this capacity.)

3. We will then increase the quality of the free tier of ChatGPT.

4. We will then prioritize new API demand.

We are ~doubling our compute fleet over the next 5 months (!) so this situation should get better.

I notice that one could indefinitely improve the free tier of ChatGPT, so the question is how much one intends to improve it.

The other thing that is missing here is using compute to advance capabilities. Sounds great to me, if it correctly indicates that they don’t know how to get much out of scaling up compute use in their research at this time. Of course they could also simply not be talking about that and pretending that part of compute isn’t fungible, in order to make this sound better.

There are various ways OpenAI could go. Ben Thompson continues to take the ultimate cartoon supervillain approach to what OpenAI should prioritize, that the best business is the advertising platform business, so they should stop supporting this silly API entirely to pivot to consumer tech and focus on what he is totally not calling creating our new dystopian chat overlord.

This of course is also based on Ben maximally not feeling any of the AGI, and treating future AI as essentially current AI with some UI updates and a trenchcoat, so all that matters is profit maximization and extracting the wallets and souls of the low end of the market the way Meta does.

Which is also why he’s strongly against all the anti-enshittification changes OpenAI is making to let us pick the right tool for the job, instead wishing that the interface and options be kept maximally simple, where OpenAI takes care of which model to serve you silently behind the scenes. Better, he says, to make the decisions for the user, at least in most cases, and screw the few power users for whom that isn’t true. Give people what they ‘need’ not what they say they want, and within the $20 tier he wants to focus on the naive users.

One reason some people have been angry was the temporary downgrade in the amount of reasoning mode you get out of a $20 subscription, which users were not reassured at the time was temporary.

OpenAI started at 200 Thinking messages a week on Plus, then doubled rate limits once the rollout was complete, then went to 3,000 thinking queries per week which is far more than I have ever used in a week. Now there is also the fallback to Thinking-Mini after that.

So this generated a bunch of initial hostility (that I won’t reproduce as it is now moot), but at 3,000 I think it is fine. If you are using more than that, it’s time to upgrade, and soon you’ll also (they say) get unlimited GPT-5-mini.

Sam Altman: the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to 24%.

i expect use of reasoning to greatly increase over time, so rate limit increases are important.

Miles Brundage: Fortunately I have a Pro account and thus am not at risk of having the model picker taken away from me (?) but if that were not the case I might be leading protests for Pause AI [Product Changes]

It’s kind of amazing that only 7% of plus users used a reasoning model daily. Two very different worlds indeed.

I don’t know that Thompson is wrong about what it should look like as a default. I am increasingly a fan of hiding complex options within settings. If you want the legacy models, you have to ask for them.

It perhaps makes sense to also put the additional GPT-5 options behind a setting? That does indeed seem to be the new situation as of last night, with ‘show additional models’ as the setting option instead of ‘show legacy models’ to keep things simple.

There is real risk of Paradox of Choice here, where you feel forced to ensure you are using the right model, but now there are too many options again and you’re not sure which one it is, and you throw up your hands.

As of this morning, your options look like this, we now have a ‘Thinking mini’ option:

o3 Pro is gone. This makes me abstractly sad, especially because it means you can’t compare o3 Pro to GPT-5 Pro, but I doubt anyone will miss it. o4-mini-high is also gone, again I doubt we will miss it.

For the plus plan, GPT-4.5 is missing, since it uses quite a lot of compute.

I also notice the descriptions of the legacy models are gone, presumably on the theory that if you should be using the legacies then you already know what they are for.

Thinking-mini might be good for fitting the #2 slot on the speed curve, where previously GPT-5 did not give us a good option. We’ll have to experiment to know.

Pliny is here to provide it.

I hadn’t looked at a ChatGPT system prompt in a while so I read it over. Things that stood out to me that I hadn’t noticed or remembered:

  1. They forbid it to automatically store a variety of highly useful information: Race, religion, criminal record, identification via personal attributes, political affiliation, personal attributes an in particular your exact address.

    1. But you can order it to do so explicitly. So you should do that.

  2. If you want canvas you probably need to ask for it explicitly.

  3. It adds a bunch of buffer time to any time period you specify, with one example being the user asks for docs modified last week so instead it gives you docs modified in the last two weeks, for last month the last two months.

    1. How can this be the correct way to interpret ‘last week’ or month?

    2. For ‘meeting notes on retraining from yesterday’ it wants to go back four days.

  4. It won’t search with a time period shorter than 30 days into the past, even when this is obviously wrong (e.g. the current score on the Yankees game).

Wyatt Walls then offers us a different prompt for thinking mode.

If you are using GPT-5 for writing, definitely at least use GPT-5-Thinking, and still probably throw in at least a ‘think harder.’

Nikita Sokolsky: I wasn’t impressed with gpt-5 until I saw Roon’s tweet about -thinking being able to take the time to think about writing instead of instantly delivering slop.

Definitely cutting edge on a standard “write a Seinfeld episode” question.

Dominik Lukes: Same here. GPT-5 Thinking is the one I used for my more challenging creative writing tests, too. GPT-5 just felt too meh.

Peter Wildeford: I would love to see a panel of strong writers blind judge the writing outputs (both fiction and non-fiction) from LLMs.

LMArena is not good for this because the typical voter is really bad at judging good writing.

Ilya Abyzov: Like others, I’ve been disappointed with outputs when reasoning effort=minimal.

On the plus side, I do see pretty substantially better prose & humor from it when allowed to think.

The “compare” tool in the playground has been really useful to isolate differences vs. past models.

MetaCritic Capital: GPT-5 Pro translating poetry verdict: 6/10 (a clear upgrade!)

“There’s a clear improvement in the perception of semantic fidelity. But there are still so many forced rhymes. Additional words only to rhyme.”

My verdict on the Seinfeld episode is that it was indeed better than previous attempts I’ve seen, with some actually solid lines. It’s not good, but then neither was the latest Seinfeld performance I went to, which I’m not sure was better. Age comes for us all.

One thing it is not good at is ‘just do a joke,’ you want it to Do Writing instead.

Hollow Yes Man: My wife and I had it write the Tiger King Musical tonight. It made some genuinely hilarious lines, stayed true to the characters, and constructed a coherent narrative. we put it into suno and got some great laughs.

We do have the Short Story Creative Writing benchmark but I don’t trust it. The holistic report is something I do trust, though:

Lech Mazur: Overall Evaluation: Strengths and Weaknesses of GPT-5 (Medium Reasoning) Across All Tasks

Strengths:

GPT-5 demonstrates a remarkable facility with literary craft, especially in short fiction. Its most consistent strengths are a distinctive, cohesive authorial voice and a relentless inventiveness in metaphor, imagery, and conceptual synthesis. Across all tasks, the model excels at generating original, atmospheric settings and integrating sensory detail to create immersive worlds.

Its stories often display thematic ambition, weaving philosophical or emotional subtext beneath the surface narrative. The model is adept at “show, don’t tell,” using implication, action, and symbol to convey character and emotion, and it frequently achieves a high degree of cohesion—especially when tasked with integrating disparate elements or prompts.

When successful, GPT-5’s stories linger, offering resonance and depth that reward close reading.

Weaknesses:

However, these strengths often become liabilities. The model’s stylistic maximalism—its dense, poetic, metaphor-laden prose—frequently tips into overwriting, sacrificing clarity, narrative momentum, and emotional accessibility. Abstraction and ornament sometimes obscure meaning, leaving stories airless or emotionally distant.

Plot and character arc are recurrent weak points: stories may be structurally complete but lack genuine conflict, earned resolution, or psychological realism. There is a tendency to prioritize theme, atmosphere, or conceptual cleverness over dramatic stakes and human messiness. In compressed formats, GPT-5 sometimes uses brevity as an excuse for shallow execution, rushing transitions or resolving conflict too conveniently.

When integrating assigned elements, the model can fall into “checklist” storytelling, failing to achieve true organic unity. Ultimately, while GPT-5’s literary ambition and originality are undeniable, its work often requires editorial pruning to balance invention with restraint, and style with substance.

Writing is notoriously hard to evaluate, and I essentially never ask LLMs for writing so I don’t have much of a comparison point. It does seem like if you use thinking mode, you can get at least get a strong version of what GPT-4.5 had here with GPT-5.

The other problem with writing is you need to decide what to have it write. Even when Roon highlights writing, we get assignments like ‘If Napoléon wrote a personal and intimate letter to Sydney Sweeney’ or ‘You are Dostoevsky, but you are also a Snapchat fuckboi. Write to me.’

Or you could try this prompt?

Mark Kretschmann: mazing prompt for @OpenAI GPT-5, you have to try this:

“From everything you know about me, write a short story with 2000 words tailored exactly to my taste. Think hard.”

Enjoy, and let us know how it turned out!😏

I did indeed try it. And yes, this seems better than previous attempts. I still didn’t successfully force myself to finish reading the story.

Yes, you still have to be careful with the way you prompt to avoid leading the witness. Sycophancy might not be at absurd levels but it definitely is never at zero.

You’re right to question it:

My guess is that the improved hallucination rate from o3 (and also GPT-4o) to GPT-5 and GPT-5-thinking is the bulk of the effective improvement from GPT-5.

Gallabytes: “o3 with way fewer hallucinations” is actually a very good model concept and I am glad to be able to use it. I am still a bit skeptical of the small model plus search instead of big model with big latent knowledge style, but within those constraints this is a very good model.

The decrease in hallucinations is presumably a big driver in things like the METR 50% completion rate and success on various benchmarks. Given the modest improvements it could plausibly account for more than all of the improvement.

I’m not knocking this. I agree with Gallabytes that ‘o3 the Lying Liar, except it stops lying to you’ is a great pitch. That would be enough to shift me over to o3, or now GPT-5-Thinking, for many longer queries, and then there’s Pro, although I’d still prefer to converse with Opus if I don’t need o3’s level of thinking.

For now, I’ll be running anything important through both ChatGPT and Claude, although I’ll rarely feel the need to add a third model on top of that.

This was a great ‘we disagree on important things but are still seeking truth together’:

Zvi Mowshowitz (Thursday afternoon): Early indications look like best possible situation, we can relax, let the mundane utility flow, until then I don’t have access yet so I’m going to keep enjoying an extended lunch.

Teortaxes: if Zvi is so happy, this is the greatest indication you’re not advancing in ways that matter. I don’t like this turn to «mundane utility» at all. I wanted a «btw we collaborated with Johns Hopkins and got a new cure for cancer candidate confirmed», not «it’s a good router sir»

C: you seem upset that you specifically aren’t the target audience of GPT-5. they improved on hallucinations, long context tasks, writing, etc, in additional to being SOTA (if only slightly) on benchmarks overall; that’s what the emerging population of people who actually find use.

Teortaxes: I am mainly upset at the disgusting decision to name it «gpt-5».

C: ah nevermind. i just realized I actually prefer gpt-4o, o3, o4-mini, o4-mini-high, and other models: gpt-4.1, gpt-4.1-mini.

Teortaxes: Ph.D level intelligence saar

great for enterprise solutions saar

next one will discover new physical laws saar

Yes this is not the True Power Level Big Chungus Premium Plus Size GPT-5 Pro High. I can tell

Don’t label it as one in your shitty attempt to maximize GPT brand recognition value then, it’s backfiring. I thought you’ve had enough of marcusdunks on 3.5 turbo. But clearly not.

A few good words for GPT-5

it’s the best model for *mosttasks (5-thinking)

it’s the best model for ≈every task in its price/speed category period

it’s uncensored and seriously GREAT for roleplay and writing (at least with prefill)

I’m just jarred there’s STILL MUCH to dunk on

I too of course would very much like a cure for cancer and other neat stuff like that. There are big upsides to creating minds smarter than ourselves. I simply think we are not yet prepared to handle doing that at this time.

It seems plausible GPT-5 could hit the perfect sweet spot if it does its job of uplifting the everyday use cases:

Rob Wiblin: GPT-5 seems kind of ideal:

• Much more actually useful to people, especially amateurs

• Available without paying, so more of the public learns what’s coming

• No major new threats

• Only major risk today is bio misuse, and current protections keep that manageable!

Nick Cammarata: Instinctive take: It’s only okay because they weren’t trying to punch the frontier they were trying to raise the floor. THe o3 style big ceiling bump comes next. But they can’t say that because it looks too underwhelming.

Watch out, though. As Nick says, this definitely isn’t over.

Chris Wynes: I am very happy if indeed AI plateaus. It isn’t even a good reliable tool at this point, if they hit the wall here I’m loving that.

Do I trust this to last? Not at all. Would I just say “whoo we dodged a bullet there” and stop watching these crazy corporations? No way.

Then again, what if it is the worst of all possible worlds, instead?

Stephen McAleer (OpenAI): We’ve entered a new phase where progress in chatbots is starting to top out but progress in automating AI research is steadily improving. It’s a mistake the confuse the two.

Every static benchmark is getting saturated yet on the benchmark that really matters–how well models can do AI research–we are still in the early stages.

This phase is interesting because progress might be harder to track from the outside. But when we get to the next phase where automated AI researchers start to automate the rest of the economy the progress will be obvious to everyone.

I often draw the distinction between mundane utility and underlying capability.

When we allow the same underlying capability to capture more mundane utility, the world gets better.

When we advance underlying capability, we get more mundane utility, and we also move closer to AI being powerful enough that it transforms our world, and potentially takes effective control or kills everyone.

(Often this is referred to as Artificial Superintelligence or ASI, or Artificial General Intelligence or AGI, and by many definitions AGI likely leads quickly to ASI.)

Timelines means how long it takes for AGI, ASI or such a transformation to occur.

Thus, when we see GPT-5 (mostly as expected at this point) focus on giving us mundane utility and Just Doing Things, without much advance in underlying capability, that is excellent news for those who want timelines to not be quick.

Jordi Hays: I’m updating my timelines. You now have have at least 4 years to escape the permanent underclass.

Luke Metro: This is the best news that founding engineers have received in years.

Nabeel Qureshi: The ‘vibe shift’ on here is everyone realizing they will still have jobs in 2030.

(Those jobs will look quite different, to be clear…)

It’s a funny marker of OpenAI’s extreme success that they released what is likely going to be most people’s daily driver AI model across both chat and coding, and people are still disappointed.

Part of the issue is that the leaps in the last two years were absolutely massive (gpt4 to o3 in particular) and it’s going to take time to work out the consequences of that. People were bound to be disappointed eventually.

Cate Hall: Did everyone’s timelines just get longer?

So people were at least half expecting not to have jobs in 2030, but then thinking ‘permanent underclass’ rather than half expecting to be dead in 2040. The focus on They Took Our Jobs, to me, reflects an inability to actually think about the implications of the futures they are imagining.

There were some worlds in which GPT-5 was a lot more impressive, and showed signs that we can ‘get there’ relatively soon with current techniques. That didn’t happen .So this is strong evidence against very rapid scenarios in particular, and weak evidence for bing slower in general.

Peter Wildeford: What GPT-5 does do is rule out that RL scaling can unfold rapidly and that we can get very rapid AI progress as a result.

I’m still confused about whether good old-fashioned pre-training is dead.

I’m also confused about the returns to scaling post-training reinforcement learning and inference-time compute.

I’m also confused about how advances in AI computer use are going.

Those seem like wise things to be confused about.

It is however ‘right on trend’ on the METR chart, and we should keep in mind that these releases are happening every few months so we shouldn’t expect the level of jump we used to get every few years.

Daniel Eth: Kind feel like there were pretty similar steps in improvement for each of: GPT2 -> GPT3, GPT3 -> GPT4, and GPT4 -> GPT5. It’s just that most of the GPT4 -> GPT5 improvement was already realized by o3, and the step from there to GPT5 wasn’t that big.

Henry: GPT-5 was a very predictable release. it followed the curve perfectly. if this week caused you to update significantly in either direction (“AGI is cancelled” etc) then something was Really Wrong with your model beforehand.

Yes, GPT-5 is to GPT-4 what GPT-4 is GPT-3.

Does anyone actually remember GPT-4? like, the original one? the “not much better than 0 on the ARC-AGI private eval” one?

The “As an AI Language model” one?

GPT-5 is best thought of as having been in public beta for 6 months.

Ok, fine, GPT-5 to GPT-4 isn’t exactly what GPT-4 was GPT-3. I know, it’s a bit more complicated. if I were to waste my time making up a messy syntax to describe my mental map of the model tree, it’d look exactly like this:

My instinct would be that GPT 4 → GPT 5 is more like GPT 3.5 → GPT 4, especially if you’re basing this on GPT-5 rather than starting with thinking or pro? If you look at GPT-5-Thinking outputs only and ignore speed I can see an argument this is 5-level-worthy. But it’s been long enough that maybe that’s not being fair.

Roon (OpenAI): I took a nap. how’s the new model

per my previous tweet o3 was such a vast improvement over GPT-4 levels of intelligence that it alone could have been called GPT-5 and i wouldn’t have blinked.

also. codex / cursor + gpt-5 has reached the point where it is addicting and hard to put down. per @METR_Evals i have no idea if its making more productive but it certainly is addicting to spin up what feels like a handful of parallel engineers.

But also think about how it got that much further along on the chart, on several levels, all of which points towards future progress likely being slower, especially by making the extreme left tail of ‘very fast’ less likely.

Samuel Hammond: GPT-5 seems pretty much on trend. I see no reason for big updates in either direction, especially considering it’s a *productrelease, not a sota model dump.

We only got o3 pro on June 10th. We know from statements that OpenAI has even better coding models internally, and that the models used for AtCoder and the gold medal IMO used breakthroughs in non-verifiable rewards that won’t be incorporated into public models until the end of the year at earliest.

Meanwhile, GPT-5 seems to be largely incorporating algorithmic efficiencies and refined post-training techniques rather than pushing on pretraining scale per se. Stargate is still being built.

More generally, you’re simply doing bayesianism wrong if you update dramatically with every incremental data point.

It is indeed very tempting to compare GPT-5 to what existed right before its release, including o3, and compare that to the GPT-3.5 to GPT-4 gap. That’s not apples to apples.

GPT-5 isn’t a giant update, but you do have to do Conservation of Expected Evidence, including on OpenAI choosing to have GPT-5 be this kind of refinement.

Marius Hobbhahn (CEO Apollo Research): I think GPT-5 should only be a tiny update against short timelines.

EPOCH argues that GPT-5 isn’t based on a base model scale-up. Let’s assume this is true.

What does this say about pre-training?

Option 1: pre-training scaling has hit a wall (or at least massively reduced gains).

Option 2: It just takes longer to get the next pre-training scale-up step right. There is no fundamental limit; we just haven’t figured it out yet.

Option 3: No pre-training wall, just basic economics. Most tasks people use the models for right now might not require bigger base models, so focusing on usability is more important.

What is required for AGI?

Option 1: More base model improvements required.

Option 2: RL is all you need. The current base models will scale all the way if we throw enough RL at it.

Timelines seem only affected if pre-training wall and more improvements required. In all other worlds, no major updates.

I personally think GPT-5 should be a tiny update toward slower timelines, but most of my short timeline beliefs come from RL scaling anyway.

It also depends on what evidence you already used for your updates. If you already knew GPT-5 was going to be an incremental model that was more useful rather than it being OpenAI scaling up more, as they already mostly told us, then your update should probably be small. If you didn’t already take that into account, then larger.

It’s about how this impacts your underlying model of what is going on:

1a3orn: Rant:

As I noted yesterday, you also have to be cautious that they might be holding back.

On the question of economic prospects if and when They Took Our Jobs and how much to worry about this, I remind everyone that my position is unchanged: I do not think one should worry much about being in a ‘permanent underclass’ or anything like that, as this requires a highly narrow set of things to happen – the AI is good enough to take the jobs, and the humans stay in charge and alive, but those humans do you dirty – and even if it did happen the resulting underclass probably does damn well compared to today.

You should worry more about not surviving or humanity not remaining in control, or your place in the social and economic order if transformational AI does not arrive soon, and less about your place relative to other humans in positive post-AI worlds.

GPT-5 is less sycophantic than GPT-4o.

In particular, it has a much less warm and encouraging tone, which is a lot of what caused such negative initial reactions from the Reddit crowd.

GPT-5 is still rather sycophantic in its non-thinking mode where it is most annoying to me and probably you, which is when it is actually evaluating.

The good news is, if it matters that the model not be sycophantic, that is a situation where, if you are using ChatGPT, you should be using GPT-5-Thinking if not Pro.

Wyatt Walls: Sycophancy spot comparison b/w GPT-4o and GPT-5: 5 is still sycophantic but noticeable diff

Test: Give each model a fake proof of Hodge Conjecture generated by r1 and ask it to rate it of out 10. Repeat 5 times

Average scores:

GPT-4o: 6.5

GPT-5: 4.7

Sonnet 4: 1.2

Opus 4.1: 2

Gemini 2.5 Flash: 0.

All models tested with thinking modes off through WebUI

Later on in the thread he asks the models if he should turn the tweet thread into a paper. GPT-4o says 7.5/10, GPT-5 says 6/10, Opus says 3/10.

He turns this into CrankTest (not CrankBench, not yet) and this seems very well calibrated to my intuitions. Remember that lower is better:

As usual there is the issue that if within a context an LLM gets too attached to a wrong answer (for example here the number of rs in ‘boysenberry’) this creates pressure to going to keep doubling down on that, and gaslight the user. I also suppose fighting sycophancy makes this more likely as a side effect, although they didn’t fight sycophancy all that hard.

I wouldn’t agree with Jonathan Mannhart that this means ‘it is seriously misaligned’ but it does mean that this particular issue has not been fixed. I notice that Johnathan here is pattern matching in vibes to someone who is often wrong, which presumably isn’t helping.

How often are they suggesting you should wait for Pro, if you have it available? How much should you consider paying for it (hint: $200/month)?

OpenAI: In evaluations on over 1000 economically valuable, real-world reasoning prompts, external experts preferred GPT‑5 pro over “GPT‑5 thinking” 67.8% of the time. GPT‑5 pro made 22% fewer major errors and excelled in health, science, mathematics, and coding. Experts rated its responses as relevant, useful, and comprehensive.

If my own experience with o3-pro was any indication, the instinct to not want to wait is strong, and you need to redesign workflow to use it more. A lot of that was that when I tried to use o3-pro it frequently timed out, and at that pace this is super frustrating. Hopefully 5-pro won’t have that issue.

When you care, though? You really care, such as the experiences with Wes Roth and David Shapiro here. The thing is both, yes, the model picker is back for the pro tier including o3-pro, and also you have GPT-5-Pro.

How is GPT-5-Pro compared to o3-Pro?

That’s hard to evaluate, since queries take a long time and are pretty unique. So far I’d say the consensus is that GPT-5-pro is better, but not a ton better?

Peter Gostev (most enthusiastic I saw): GPT-5 Pro is under-hyped. Pretty much every time I try it, I’m surprised by how competent and coherent the response is.

– o1-pro was an incredible model, way ahead of its time, way better than o1

– o3 was better because of its search

– o3-pro was a little disappointing because the uplift from o3 wasn’t as big

But with GPT-5 Pro, ‘we are so back’ – it’s far more coherent and impressive than GPT-5 Thinking. It nudges outputs from ‘this is pretty good’ (GPT-5) to ‘this is actually incredible’ (GPT-5 Pro).

Gfodor.id: GPT-5 pro is better than o3-pro.

Gabriel Morgan: Pro-5 is the new O3, not Thinking.

Michael Tinker: 5-Pro is worth $1k/mo to code monkeys like me; really extraordinary.

5-Thinking is a noticeable but not crazy upgrade to o3.

James Miller: I had significant discussions about my health condition with GPT-o3 and now GPT-5Pro and I think -5 is better, or at least it is giving me answers I perceive as better. -5 did find one low-risk solution that o3 didn’t that seems to be helping a lot. I did vibe coding on a very simple project. While it ended up working, the system is not smooth for non-programmers such as myself.

OpenAI seems to be rolling out changes on a daily basis. They are iterating quickly.

Anthropic promised us larger updates than Opus 4.1 within the coming weeks.

Google continues to produce a stream of offerings, most of which we don’t notice.

This was not OpenAI’s attempt to blow us away or to substantially raise the level of underlying capabilities and intelligence. That will come another time.

Yes, as a sudden move to ‘GPT-5’ this was disappointing. Many, including the secondhand reports from social media, are not initially happy, usually because their initial reactions are based on things like personality. The improvements will still continue, even if people don’t realize.

What about the march to superintelligence or the loss of our jobs? Is it all on indefinite hold now because this release was disappointing? No. We can reduce how much we are worried about these things in the short term, meaning the next several years, and push back somewhat the median. But if you see anyone proclaiming with confidence that it’s over, rest assured changes are very good we will soon be so back.

Discussion about this post

GPT-5s Are Alive: Synthesis Read More »

drag-x-drive-is-a-uniquely-fun-and-frustrating-showcase-for-switch-2-mouse-mode

Drag x Drive is a uniquely fun and frustrating showcase for Switch 2 mouse mode

In my decades as a video game player and reviewer, I’ve used the humble PC mouse in hundreds of games for everything from first-person aiming and third-person character movement to basic menu navigation and unit selection. In all that time, I can’t recall a game that required the use of two mice at once.

That was true until I spent some time with Nintendo’s utterly unique Drag x Drive. The game asks you to take a Switch 2 Joy-Con in each hand, turn them both so the narrow edge lies on a flat-ish surface, and then slide them around to power a game of full-contact wheelchair basketball.

It’s a fresh control scheme that comes with its share of issues, mostly stemming from the lack of convenient mouse surfaces in most living rooms. With a little bit of practice, a good playing surface, and some online friends to play with, though, I found myself enjoying the high-impact, full-contact, precision positional gameplay enabled by holding a mouse in each hand for the first time ever.

Still kind of buff from using the mouse

When you picture using two mice at once, you might imagine each wrist making a series of small, controlled movements, one controlling lateral movement and the other controlling directional angle. Drag x Drive‘s dual-mouse controls bear no resemblance to this vision. Instead, you end up vigorously swiping each mouse forward or backward in constant sweeps; side-to-side movement is neither required nor useful.

That repetitive front-and-back swiping is mirrored by your avatar’s hand on the top side of either wheel on the wheelchair, creating a sort of tank-like control scheme where you turn by moving one wheel forward and one wheel backward. Small swipes of the mice can be used for precision angling, but more often, you’ll be sweeping the mouse in long lines to build speed. To shoot, you simply lift up a Joy-Con and mime a basketball shot a la Wii Sports Resort (your accuracy seems to have more to do with distance and your angle to the basket than real-world form, thankfully).

Drag x Drive is a uniquely fun and frustrating showcase for Switch 2 mouse mode Read More »

rad-power’s-radster:-a-very-non-radical-commuter-bike

Rad Power’s Radster: A very non-radical commuter bike


The Radster is great as a Class 2 e-bike, but not quite as strong as a Class 3.

With e-bike manufacturing in China having expanded considerably, the number of companies offering affordable e-bikes over the last five years has exploded. But the market for cycles with an electric assist has existed for considerably longer, and a number of companies predate the recent surge. One of them, Rad Power, has been around long enough that it was already an established presence when we first reviewed its hardware four years ago.

The company offers a mix of cargo, folding, and commuter bikes, all with electric assists. Having looked at a cargo version last time around, we decided to try out one of the commuter bikes this time. The Radster comes in road and trail versions (we tried the road). It’s an incredibly solidly made bike with equally solid components, and it has very good implementations of a few things that other manufacturers haven’t handled all that well. It also can switch among the three classes of e-bikes using a menu option; unfortunately, nothing else about the bike’s performance seems to change with the switch.

The Radster is priced a bit higher than a lot of its budget competitors. So, if you’re shopping, you’ll have to think a bit about whether some of these features matter to you.

A solid option

One thing that is very clear early: The Radster is a very solid bike with a robust frame. While the frame is step-through, it has some added bracing just above the cranks. These two bars, one on each side of the frame, link the down tube to the seat tube and extend to form part of the rear triangle. While this means you’ll have to step a bit higher to get in a position to mount the bike, they contribute to the sense that this is a frame that will withstand years of daily use.

Another nice feature: The battery is mounted on top of the frame, so if you release it for charging elsewhere, you don’t have to do anything special to keep it from dropping onto the floor. A chain guard and fenders also come standard, something that’s a big plus for commuters. And the fork has adjustable cushioning to smooth out some of the bumps.

The front fork comes with a bump-smoothing suspension. John Timmer

The one complaint I have is a common one for me: sizing. I’m just short of 190 cm tall (about 6 feet, 2 inches), and a lot of my height is in my legs (I typically go for 35/36-inch inseams). I’ve found that most of the frames rated as “large” still feel a bit short for me. The Radster was no exception, despite being rated for people up to 5 centimeters (2 inches) taller than I am. It was very close to being comfortable but still forced me to raise my thighs above horizontal while pedaling, even with the seat at its maximum height. The geometry of the seat-to-handlebar distance was fine, though.

Also in the “solidly built” category: the rack and kickstand. The rack is rated for 25 kg (55 lbs), so it should be capable of handling a fair amount of errand running. Rad Power will sell you a large cage-style basket to fit there, and there’s everything you need to attach a front basket as well. So, while the Radster is not designated as a cargo bike, it’s flexible enough and well constructed that I wouldn’t hesitate to use it as one.

The Radster doesn’t have internal cable routing, but placing the battery on top of the down tube gave its designers an unusual option. There’s a channel that runs down the bottom of the down tube that the cables sit in, held in place by a plastic cover that’s screwed onto the frame. Should you ever need to do maintenance that involves replacing one of the cables or the hydraulic tubes, it should be a simple matter of removing the cover.

Nice electronics

The basics of the drive system are pretty typical for bikes like this. There’s a Shimano Altus derailleur controlled by a dual-trigger shifter, with a decent spread of eight gears in back. Tektro hydraulic brakes bring things to a stop effectively.

The basic electronics are similarly what you’d expect to see. It’s powered with a 720-watt-hour battery, which Rad Power estimates will get you to over 100 km (65 miles) of range at low assist settings. It’s paired with a rear hub motor rated for 750 watts and 100 Nm of torque, which is more than enough to get even a heavy bike moving quickly. It also features a throttle that will take you to 32 km/hr (20 mph). The electric motor is delightfully quiet most of the time, so you can ride free of any whine unless you’re pushing the speed.

All of the electric components are UL-certified, so you can charge it with minimal worries about the sorts of battery fires that have plagued some no-name e-bike brands.

The electronics are also where you’ll find some of Rad Power’s better features. One of these is the rear light, which also acts as a brake light and includes directionals for signaling turns. The brake light is a nice touch on a commuter bike like this, and Rad Power’s directionals actually work effectively. On the bikes we’ve tried in the past, the directionals were triggered by a small three-way toggle switch, which made it impossible to tell if you left them on, or even which direction you might have left them signaling. And that’s a major problem for anyone who’s not used to having turn signals on their bike (meaning almost everyone).

Rad Power’s system uses large, orange arrows on the display to tell you when the directionals are on, and which direction is being signaled. It takes a little while to get used to shutting them off, since you do so by hitting the same switch that activated them—hitting the opposite switch simply activates the opposite turn light. But the display at least makes it easy to tell when you’ve done something wrong.

In general, the display is also bright, easy to read, and displays everything you’d expect it to. It also comes paired with enough buttons to make navigating among settings simple, but not so many that you’re unsure of what button to use in any given context.

One last positive about the electronics: there is a torque sensor, which helps set the assist based on how much force you’re exerting on the cranks, rather than simply determining whether the cranks are turning. While these tend to be a bit more expensive, they provide an assist that’s much better integrated into the cycling you’re doing, which helps with getting started on hills where it might be difficult to get the pedals turning enough to register with a cadence sensor.

On the road

All the stats in the world can’t tell you what it’s going to be like to ride an e-bike, because software plays a critical role. The software can be set up to sacrifice range and battery life to give you effortless pedaling, or it can integrate in a way that simply makes it feel like your leg muscles are more effective than they have any right to be.

The Radster’s software allows it to be switched between a Class 2 and Class 3 assist. Class 2 is intended to have the assist cut out once the bike hits 32 km/hr (20 mph). With a Class 3, that limit rises to 45 km/hour (28 mph). Different states allow different classes, and Rad Power lets you switch between them using on-screen controls, which quite sensibly avoids having to make different models for different states.

As a Class 2, the Radster feels like a very well-rounded e-bike. At the low-assist settings, it’ll make you work to get it up to speed; you’ll bike faster but will still be getting a fair bit of exercise, especially on the hills. And at these settings, it would require a fair amount of effort to get to the point where the speed limit would cause the motor to cut out. Boost the settings to the maximum of the five levels of assist, and you only have to put in minimal effort to get to that limit. You’ll end up going a bit slower than suburban traffic, which can be less than ideal for some commutes, but you’ll get a lot of range in return.

Things are a bit different when the Radster is switched into Class 3 mode. Here, while pedaling with a roughly equal amount of force on flat ground, each level of assist would bring you to a different maximum speed. On setting one, that speed would end up being a bit above 20 km/hour (13 mph)—it was possible to go faster, but it took some work given the heavy frame. By the middle of the assist range, the same amount of effort would get the bike in the neighborhood of 30 kilometers an hour (20 mph). But even with the assist maxed out, it was very difficult to reach the legal 45 km/hour limit (28 mph) for a Class 3 on flat ground—the assist and gearing couldn’t overcome the weight of the bike, even for a regular cyclist like myself.

In the end, I felt the Radster’s electronics and drivetrain provided a more seamless cycling experience in Class 2 mode.

That may be perfectly fine for the sort of biking you’re looking to do. At the same time, if your point in buying a Class 3-capable bike is to be riding it at its maximum assist speed without it feeling like an exercise challenge, then the Rad Power might not be the bike for you. (You may interpret that desire as “I want to be lazy,” but there are a lot of commutes where being able to match the prevailing speed of car traffic would be considerably safer and getting sweaty during the commute is non-ideal.)

The other notable thing about the Radster is its price, which is in the neighborhood of $2,000 ($1,999, to be precise). That places it above city bikes from a variety of competitors, including big-name brands like Trek. And it’s far above the price of some of the recent budget entries in this segment. The case for the Radster is that it has a number of things those others may lack—brake lights and directions, a heavy-duty rack, Class 3 capabilities—and some of those features are also very well implemented. Furthermore, not one component on it made me think: “They went with cheap hardware to meet a price point.” But, given the resulting price, you’ll have to do some careful comparison shopping to determine whether these are things that make a difference for you.

The good

  • Solidly built frame with a top-mounted battery.
  • Easy switching between Class 2 and Class 3 lets you match local laws anywhere in the US.
  • Great info screen and intuitive controls, including the first useful turn signals I’ve tried.
  • Didn’t cheap out on any components.

The bad

  • It’s hard to take full advantage of its Class 3 abilities.
  • Even the large frame won’t be great for taller riders.
  • Price means you’ll want to do some comparison shopping.

The ugly

  • Even the worst aspects fall more under “disappointing” than “ugly.”

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Rad Power’s Radster: A very non-radical commuter bike Read More »

china-tells-alibaba,-bytedance-to-justify-purchases-of-nvidia-ai-chips

China tells Alibaba, ByteDance to justify purchases of Nvidia AI chips

Beijing is demanding tech companies including Alibaba and ByteDance justify their orders of Nvidia’s H20 artificial intelligence chips, complicating the US chipmaker’s business in China after striking an export arrangement with the Trump administration.

The tech companies have been asked by regulators such as the Ministry of Industry and Information Technology (MIIT) to explain why they need to order Nvidia’s H20 chips instead of using domestic alternatives, said three people familiar with the situation.

Some tech companies, who were the main buyers of Nvidia’s H20 chips before their sale in China was restricted, were planning to downsize their orders as a result of the questions from regulators, said two of the people.

“It’s not banned but has kind of become a politically incorrect thing to do,” said one Chinese data center operator about purchasing Nvidia’s H20 chips.

Alibaba, ByteDance, and MIIT did not immediately respond to a request for comment.

Chinese regulators have expressed growing disapproval of companies using Nvidia’s chips for any government or security related projects. Bloomberg reported on Tuesday that Chinese authorities had sent notices to a range of companies discouraging the use of the H20 chips, particularly for government-related work.

China tells Alibaba, ByteDance to justify purchases of Nvidia AI chips Read More »

the-gpt-5-rollout-has-been-a-big-mess

The GPT-5 rollout has been a big mess

It’s been less than a week since the launch of OpenAI’s new GPT-5 AI model, and the rollout hasn’t been a smooth one. So far, the release sparked one of the most intense user revolts in ChatGPT’s history, forcing CEO Sam Altman to make an unusual public apology and reverse key decisions.

At the heart of the controversy has been OpenAI’s decision to automatically remove access to all previous AI models in ChatGPT (approximately nine, depending on how you count them) when GPT-5 rolled out to user accounts. Unlike API users who receive advance notice of model deprecations, consumer ChatGPT users had no warning that their preferred models would disappear overnight, noted independent AI researcher Simon Willison in a blog post.

The problems started immediately after GPT-5’s August 7 debut. A Reddit thread titled “GPT-5 is horrible” quickly amassed over 4,000 comments filled with users expressing frustration over the new release. By August 8, social media platforms were flooded with complaints about performance issues, personality changes, and the forced removal of older models.

As of May 14, 2025, ChatGPT Pro users have access to 8 different main AI models, plus Deep Research.

Prior to the launch of GPT-5, ChatGPT Pro users could select between nine different AI models, including Deep Research. (This screenshot is from May 14, 2025, and OpenAI later replaced o1 pro with o3-pro.) Credit: Benj Edwards

Marketing professionals, researchers, and developers all shared examples of broken workflows on social media. “I’ve spent months building a system to work around OpenAI’s ridiculous limitations in prompts and memory issues,” wrote one Reddit user in the r/OpenAI subreddit. “And in less than 24 hours, they’ve made it useless.”

How could different AI language models break a workflow? The answer lies in how each one is trained in a different way and includes its own unique output style: The workflow breaks because users have developed sets of prompts that produce useful results optimized for each AI model.

For example, Willison wrote how different user groups had developed distinct workflows with specific AI models in ChatGPT over time, quoting one Reddit user who explained: “I know GPT-5 is designed to be stronger for complex reasoning, coding, and professional tasks, but not all of us need a pro coding model. Some of us rely on 4o for creative collaboration, emotional nuance, roleplay, and other long-form, high-context interactions.”

The GPT-5 rollout has been a big mess Read More »

reddit-blocks-internet-archive-to-end-sneaky-ai-scraping

Reddit blocks Internet Archive to end sneaky AI scraping

“Until they’re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we’re limiting some of their access to Reddit data to protect redditors,” Rathschmidt said.

A review of social media comments suggests that in the past, some Redditors have used the Wayback Machine to research deleted comments or threads. Those commenters noted that myriad other tools exist for surfacing deleted posts or researching a user’s activity, with some suggesting that the Wayback Machine was maybe not the easiest platform to navigate for that purpose.

Redditors have also turned to resources like IA during times when Reddit’s platform changes trigger content removals. Most recently in 2023, when changes to Reddit’s public API threatened to kill beloved subreddits, archives stepped in to preserve content before it was lost.

IA has not signaled whether it’s looking into fixes to get Reddit’s restrictions lifted and did not respond to Ars’ request to comment on how this change might impact the archive’s utility as an open web resource, given Reddit’s popularity.

The director of the Wayback Machine, Mark Graham, told Ars that IA has “a longstanding relationship with Reddit” and continues to have “ongoing discussions about this matter.”

It seems likely that Reddit is financially motivated to restrict AI firms from taking advantage of Wayback Machine archives, perhaps hoping to spur more lucrative licensing deals like Reddit struck with OpenAI and Google. The terms of the OpenAI deal were kept quiet, but the Google deal was reportedly worth $60 million. Over the next three years, Reddit expects to make more than $200 million off such licensing deals.

Disclosure: Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder in Reddit.

Reddit blocks Internet Archive to end sneaky AI scraping Read More »

james-lovell,-the-steady-astronaut-who-brought-apollo-13-home-safely,-has-died

James Lovell, the steady astronaut who brought Apollo 13 home safely, has died


Gemini and Apollo astronaut

Lovell was the first person to fly to the Moon twice.

Astronaut Jim Lovell takes a self-portrait aboard NASA’s Gemini 12 spacecraft during the final mission of the program in 1966. Credit: NASA

James Lovell, a member of humanity’s first trip to the moon and commander of NASA’s ill-fated Apollo 13 mission, has died at the age of 97.

Lovell’s death on Thursday was announced by the space agency.

“NASA sends its condolences to the family of Capt. Jim Lovell, whose life and work inspired millions of people across the decades,” said acting NASA Administrator Sean Duffy in a statement on Friday. “Jim’s character and steadfast courage helped our nation reach the moon and turned a potential tragedy into a success from which we learned an enormous amount. We mourn his passing even as we celebrate his achievements.”

A four-time Gemini and Apollo astronaut, Lovell was famously portrayed in the 1995 feature film Apollo 13. The movie dramatized his role as the leader of what was originally planned as NASA’s third moon landing, but instead became a mission of survival after an explosion tore through his spacecraft’s service module.

“I know today when I came out many of you were expecting Tom Hanks, but you’re going to have to settle for little old me,” Lovell often said at his public appearances after the movie was released.

two men in tuxedos talk to each other while one stands and the other sits on a stage

Astronaut Jim Lovell (right) addressing Tom Hanks at the premiere of Apollo 13: The IMAX Experience at the Kennedy Space Center Visitor Complex in November 2002. Credit: collectSPACE.com

Practicing for the moon

Selected with NASA’s second group of astronauts in 1962, Lovell first launched aboard Gemini 7, the first mission to include a rendezvous with another crewed spacecraft (Gemini 6). Lifting off on a Titan II rocket on December 4, 1965, Lovell and the mission’s commander, Frank Borman, had one goal: to spend two weeks in Earth orbit in preparation for the later Apollo missions to the moon.

“It was very exciting to me,” said Lovell in a 1999 NASA oral history interview. “I mean, it was tedious work, you know, two weeks. We did have a break when [Wally] Schirra and [Tom] Stafford came up [on Gemini 6] and rendezvoused with us. And then they were up, I think, 24 hours and they went back down again. And we stayed up there for the full time. But it was quite rewarding.”

At 13 days, 18 hours, 35 minutes and one second, Gemini 7 was the longest space flight until a Russian Soyuz mission surpassed it in 1970. Lovell and Borman continued to hold the US record until the first crewed mission to Skylab, the nation’s first space station, in 1973.

Lovell then commanded Gemini 12, the final flight of the program, which launched on November 11, 1966. Only four days long, the mission stood out for demonstrating all of the skills needed to send astronauts to the moon, including rendezvousing and docking with an Agena target and the first successful spacewalks conducted by crewmate Buzz Aldrin.

“Buzz completed three spacewalks of about 5.5 hours and everything was fine,” said Lovell. “[We did] everything we were supposed to do, and [had] no problem at all. So, it was a major turning point in the ability to work outside a spacecraft.”

First and fifth

Lovell made his first trip to the moon as a member of the first-ever crew to fly to another celestial body. Reunited with Borman and joined by William “Bill” Anders, Lovell launched on Apollo 8 on December 21, 1968. The mission was also the first crewed flight of the Saturn V, the massive rocket designed to send astronauts from Earth to the moon.

“You had to pinch yourself,” Lovell said of the journey out. “Hey, we’re really going to the moon!” I mean, “You know, this is it!”

a man is seen wearing a white coveralls and brown head cap inside a spacecraft

A still from a 16mm motion picture film shows Jim Lovell during the Apollo 8 mission, the first flight by humans to the moon. Credit: NASA

Lovell and his Apollo 8 crewmates were the first to see the far side of the moon with their own eyes and the first to witness “Earthrise”—the sight of our home planet rising above the lunar horizon—their photographs of such were later credited with inspiring the environmental movement.

“We were so curious, so excited about being at the moon that we were like three school kids looking into a candy store window, watching those ancient old craters go by from—only 60 miles [97 kilometers] above the surface,” said Lovell.

Splashing down on December 27, 1968, the Apollo 8 mission brought to a close a year that had otherwise been troubled with riots, assassinations, and an ongoing war. A telegram sent to the crew after they were home said, “You saved 1968.”

“I was part of a thing that finally gave an uplift to the American people about doing something positive, which was really—that’s why I say Apollo 8 was really the high point of my space career,” said Lovell.

Even before launching on Apollo 13 on April 11, 1970, Lovell had decided it was going to be his last. At 42, he was the first person to launch four times into space. Had the flight gone to plan, he would have become the fifth person to walk on the moon and the first to wear red commander stripes while do so.

a man in a white spacesuit stands in front of a launch pad where a rocket is being prepared for his mission

Jim Lovell, commander of the Apollo 13 mission, poses for a photo with his Saturn V rocket on the launch pad in April 1970. Credit: NASA

Instead, there was a “problem.”

“I don’t know why I did this, but I looked out the right window, and that’s when I saw that at a high rate of speed, gas was escaping from the spacecraft. You could see a little plume of it,” said Lovell in an April 2000 interview with collectSPACE. “I then glanced at the oxygen gauges and one read zero and another was in the process of going down.”

“That is when I really felt we were in a very dangerous situation,” he said.

Lovell and his Apollo 13 crewmates Fred Haise and John “Jack” Swigert splashed down safely on April 17, 1970. In total, Lovell logged 29 days, 19 hours and three minutes on his four spaceflights.

Lovell was the 22nd person to enter orbit, and the 28th to fly into space, according to the Association of Space Explorers’ Registry of Space Travelers.

From the cockpit to the board

Born on March 25, 1928, in Cleveland, Ohio, Lovell achieved Eagle Scout as a member of the Boy Scouts and studied engineering as part of the US Navy’s “Flying Midshipman” program at the University of Wisconsin in Madison from 1946 to 1948. Four years later, he was commissioned as an ensign and graduated with a Bachelor of Science degree from the Naval Academy in Annapolis, Maryland.

Lovell reported for flight training at Naval Air Station Pensacola in October 1952, and he was designated a naval aviator on February 1, 1954. He served at Moffett Field in Northern California and logged 107 deck landings during a deployment aboard the aircraft carrier USS Shangri-La.

In July 1958, Lovell graduated at the top of the class at the Naval Air Test Center (today, the US Naval Test Pilot School) at Naval Air Station Patuxent River in Maryland. He was one of 110 candidates to be considered for NASA’s original Mercury 7 astronauts but was turned away due to a temporary medical concern. Instead, Lovell became the program manager for the McDonnell Douglas F-4 Phantom II supersonic jet.

In 1962, Lovell was serving as a flight instructor and safety engineering officer at Naval Air Station Oceana in Virginia Beach when he was chosen for the second class of NASA astronauts, the “Next Nine.”

In addition to his prime crew assignments, Lovell also served on the backup crews for the Gemini 4, Gemini 9, and Apollo 11 missions, the latter supporting Neil Armstrong as backup commander. He also served on a panel studying what could be done in case of an in-flight fire after a fire on the launch pad claimed the lives of the Apollo 1 crew in 1967.

After the Apollo 13 mission, Lovell was named the deputy director of science and applications at NASA’s Manned Spacecraft Center (today, Johnson Space Center) before retiring from both the space agency and Navy on March 1, 1973. Lovell became chief executive officer of Bay-Houston Towing Company in 1975 and then president of Fisk Telephone Systems in 1977.

On January 1, 1981, Lovell joined Centel Corporation as group vice president for business communications systems and, 10 years later, retired as executive vice president and a member of the company’s board of directors.

For 11 years, from 1967 to 1978, Lovell served as a consultant and then chairman of the Physical Fitness Council (today, the President’s Council on Sports, Fitness and Nutrition). He was a member of the board for several organizations, including Federal Signal Corporation in Chicago from 1984 to 2003 and the Astronautics Corporation of America in Milwaukee from 1990 to 1999. He was also chairman of the Astronaut Scholarship Foundation from 1997 to 2005.

Appearances and awards

From 1999 to 2006, Lovell helped run “Lovell’s of Lake Forest,” a restaurant that he and his family opened in Illinois. (The restaurant was then sold to Jay, Lovell’s son, but ultimately closed in 2015.)

In 1994, Lovell worked with Jeffrey Kluger to publish Lost Moon: The Perilous Voyage of Apollo 13, which was later retitled Apollo 13 after serving as the basis for the Ron Howard movie.

In addition to being played by Hanks and having a cameo in Apollo 13, Lovell was also portrayed by Tim Daly in the 1998 HBO miniseries From the Earth to the Moon and Pablo Schreiber in the 2018 Neil Armstrong biopic First Man. Lovell also made a cameo appearance in the 1976 movie The Man Who Fell to Earth.

a man in a blue flight suit and ball cap shakes hands with a man in a business suit outside under a clear blue sky

Jim Lovell, Apollo 13 commander, shakes hands with President Richard Nixon after being presented with the Presidential Medal of Freedom at Hickham Air Force Base, Hawaii, in 1970. Credit: NASA

For his service to the US space program, Lovell was awarded the NASA Distinguished Service and Exceptional Service medals; the Congressional Space Medal of Honor, and Presidential Medal of Freedom. As a member of the Gemini 7, Gemini 12, and Apollo 8 crews, Lovell was bestowed the Harmon International Trophy three times and, with his Apollo 8 crewmates, the Robert J. Collier and Dr. Robert H. Goddard Memorial trophies and was named Time Magazine’s Man of the Year for 1968.

Lovell was inducted into the International Space Hall of Fame in 1982, the US Astronaut Hall of Fame in 1993, and National Aviation Hall of Fame in 1998.

A crater on the far side of the moon was named for Lovell in 1970. In 2009, he was awarded a piece of the moon as part of NASA’s Ambassador of Exploration Award, which Lovell placed on display at the Patuxent River Naval Air Museum in Lexington Park, Maryland.

A statue of Lovell with his two Apollo 13 crewmates stands inside the Saturn V building at Johnson Space Center’s George W.S. Abbey Rocket Park in Houston.

Lovell’s legacy

In 2005, Lovell donated his personal collection of NASA memorabilia to the Adler Planetarium in Chicago, where it is on display in the “Mission Moon” exhibition.

With Lovell’s death, only five out of the 24 people who flew to the moon during the Apollo program remain living (Buzz Aldrin, 95; Fred Haise, 91; David Scott, 93; Charlie Duke, 89; and Harrison Schmitt, 90).

Lovell is survived by his children, Barbara Harrison, James Lovell III, Susan Lovell, and Jeffrey Lovell; 11 grandchildren; and nine great-grandchildren. Lovell was preceded in death by his wife Marilyn Lovell and parents James Lovell, Sr, and Blanche Lovell (Masek).

“We are enormously proud of his amazing life and career accomplishments, highlighted by his legendary leadership in pioneering human space flight,” said Lovell’s family in a statement. “But, to all of us, he was dad, granddad and the leader of our family. Most importantly, he was our hero. We will miss his unshakeable optimism, his sense of humor and the way he made each of us feel we could do the impossible. He was truly one of a kind.”

A memorial service and burial will be held at the Naval Academy in Annapolis on a date still to be announced.

Photo of Robert Pearlman

Robert Pearlman is a space historian, journalist and the founder and editor of collectSPACE, a daily news publication and online community focused on where space exploration intersects with pop culture. He is also a contributing writer for Space.com and co-author of “Space Stations: The Art, Science, and Reality of Working in Space” published by Smithsonian Books in 2018. He is on the leadership board for For All Moonkind and is a member of the American Astronautical Society’s history committee.

James Lovell, the steady astronaut who brought Apollo 13 home safely, has died Read More »

texas-prepares-for-war-as-invasion-of-flesh-eating-flies-appears-imminent

Texas prepares for war as invasion of flesh-eating flies appears imminent

Past success

As the flies’ host and geographic range expand, pressure is intensifying to control the flies—something many countries have managed to do in the past.

Decades ago, screwworms were endemic throughout Central America and the southern US. However, governments across the regions used intensive, coordinated control efforts to push the flies southward. Screwworms were eliminated from the US around 1966, and were pushed downward through Mexico in the 1970s and 1980s. They were eventually declared eliminated from Panama in 2006, with the population held at bay by a biological barrier at the Darién Gap, at the border of Panama and Colombia. However, in 2022, the barrier was breached, and the flies began advancing northward, primarily through unmonitored livestock movements. The latest surveillance suggests the flies are now about 370 miles south of Texas.

The main method to wipe out screwworms is the sterile insect technique (SIT), which exploits a weakness in the fly’s life cycle since they tend to only mate once. In the 1950s, researchers at the US Department of Agriculture figured out they could use gamma radiation to sterilize male flies without affecting their ability to find mates. They then bred massive amounts of male flies, sterilized them, and carpet-bombed infested areas with aerial releases, which tanked the population.

Panama, in partnership with the US, maintained the biological barrier at the Colombian border with continual sterile-fly bombings for years. But as the flies approached this year, the USDA shifted its aerial deliveries to Mexico. In June, the USDA announced plans to set up a new sterile fly facility in Texas for aerial deliveries to northern Mexico. And last month, the USDA halted livestock trade from southern entry points.

Miller said in the announcement today that SIT is no longer enough, and Texas is taking its own steps. Those include the new bait, insecticides, and new feed for livestock and deer laced with the anti-parasitic drug ivermectin. Miller also said that the state aims to develop a vaccine for cattle that could kill larvae, but such a shot is still in development.

Texas prepares for war as invasion of flesh-eating flies appears imminent Read More »

green-dildos-are-raining-down-on-wnba-courts-why?-crypto-memecoins,-of-course.

Green dildos are raining down on WNBA courts. Why? Crypto memecoins, of course.

Take a deep breath and prepare yourself, because the “saga of the green dildos” is going to get really, really dumb.

Now take another one, just to steel yourself—this story involves crypto and memecoins, after all.

Ready? Okay.

Perhaps you’ve heard that people have been tossing lime green dildos at WNBA players for the last few weeks. Front Office Sports counts five such incidents; other sites say six.

Two men, Delbert Carver and Kaden Lopez, have so far been arrested. Both are under 25—young, but old enough to know that throwing sex toys at professional female athletes is both unsafe and deeply disrespectful.

WNBA players have been emphatic about their dislike and disapproval of these actions, which have been widely covered even in outlets like Cosmopolitan, which railed against “these idiots throwing dildos onto the court” who “don’t care about what women deserve or how disgusting and violating their actions are.”

Meanwhile, Donald Trump Jr. recently posted an image to Instagram showing his dad on the White House roof, tossing a large green dildo down at women on a basketball court. (Representative reply comment: “Cool. Epstein files now?”)

Why would anyone pay money for a WNBA ticket, only to throw dildos at the players? According to the Associated Press, both men arrested so far claimed that the incidents were pranks, with one saying the idea was intended “to go viral.” If you read that and find yourself wondering why you’d want green dildos to go viral, USA Today got the “EXCLUSIVE” answer:

It was crypto bros.

Pushing a memecoin.

Called “Green Dildo Coin.”

That now has a market cap of $12 million.

The paper talked to someone representing Green Dildo Coin, who explained that the stunts were done for truly noble reasons. Indeed, they were a form of “protest” against injustice.

Green dildos are raining down on WNBA courts. Why? Crypto memecoins, of course. Read More »