Author name: Paul Patrick

“they-curdle-like-milk”:-wb-dvds-from-2006–2008-are-rotting-away-in-their-cases

“They curdle like milk”: WB DVDs from 2006–2008 are rotting away in their cases

Although digital media has surpassed physical media in popularity, there are still plenty of reasons for movie buffs and TV fans to hold onto, and even continue buying, DVDs. With physical media, owners are assured that they’ll always be able to play their favorite titles, so long as they take care of their discs. While digital copies are sometimes abruptly ripped away from viewers, physical media owners don’t have to worry about a corporation ruining their Friday night movie plans. At least, that’s what we thought.

It turns out that if your DVD collection includes titles distributed by Warner Bros. Home Entertainment, the home movie distribution arm of Warner Bros. Discovery (WBD), you may one day open up the box to find a case of DVD rot.

Recently, Chris Bumbray, editor-in-chief of movie news and reviews site JoBlo, detailed what would be a harrowing experience for any film collector. He said he recently tried to play his Passage to Marseille DVD, but “after about an hour, the disc simply stopped working.” He said “the same thing happened” with Across the Pacific. Bumbray bought a new DVD player but still wasn’t able to play his Desperate Journey disc. The latter case was especially alarming because, like a lot of classic films and shows, the title isn’t available as a digital copy.

DVDs, if taken care of properly, should last for 30 to up to 100 years. It turned out that the problems that Bumbray had weren’t due to a DVD player or poor DVD maintenance. In a statement to JoBlo shared on Tuesday, WBD confirmed widespread complaints about DVDs manufactured between 2006 and 2008. The statement said:

Warner Bros. Home Entertainment is aware of potential issues affecting select DVD titles manufactured between 2006 – 2008, and the company has been actively working with consumers to replace defective discs.

Where possible, the defective discs have been replaced with the same title. However, as some of the affected titles are no longer in print or the rights have expired, consumers have been offered an exchange for a title of like-value.

Consumers with affected product can contact the customer support team at [email protected].

Collectors have known about this problem for years

It’s helpful that WBD recently provided some clarity about this situation, but its statement to JoBlo appears to be the first time the company has publicly acknowledged the disc problems. This is despite DVD collectors lamenting early onset disc rot for years, including via YouTube and online forums.

“They curdle like milk”: WB DVDs from 2006–2008 are rotting away in their cases Read More »

childhood-and-education-#9:-school-is-hell

Childhood and Education #9: School is Hell

This complication of tales from the world of school isn’t all negative. I don’t want to overstate the problem. School is not hell for every child all the time. Learning occasionally happens. There are great teachers and classes, and so on. Some kids really enjoy it.

School is, however, hell for many of the students quite a lot of the time, and most importantly when this happens those students are usually unable to leave.

Also, there is a deliberate ongoing effort to destroy many of the best remaining schools and programs that we have, in the name of ‘equality’ and related concerns. Schools often outright refuse to allow their best and most eager students to learn. If your school is not hell for the brightest students, they want to change that.

Welcome to the stories of primary through high school these days.

  1. Primary School.

  2. Math is Hard.

  3. High School.

  4. Great Teachers.

  5. Not as Great Teachers.

  6. The War on Education.

  7. Sleep.

  8. School Choice.

  9. Microschools.

  10. The War Against Home Schools.

  11. Home School Methodology.

  12. School is Hell.

  13. Bored Out of Their Minds.

  14. The Necessity of the Veto.

  15. School is a Simulation of Future Hell.

Peter Gray reports on the Tennessee Pre-K experiment, where they subjected four year olds to 5.5 hours a day of academics five times a week. Not only did the control group (kids whose parents applied and got randomly rejected, so this was effectively an unblinded RCT) catch up on academics by third grade, the control group was well ahead by sixth grade, and the treatment group had a lot more rule violations and diagnosed learning disabilities. I think Peter’s conclusion that drill in Pre-K and kindergarten is actively bad is overreach, since the dose often makes the poison, but yeah focusing on drill that early seems terrible.

NYC officials consider virtual learning to reduce class size. It seems the law requires there be no more than 20 to 25 kids in a classroom depending on grade level. So the proposed solution is to get rid of some of classrooms entirely. Transparent villains happy to inflict torture upon children.

How to organize a protest against your teacher, recommended, made me smile.

Scott Alexander asks, given Americans clearly do not know basic facts that were supposedly taught in school, but do remember things through cultural osmosis, what is even the point of school? If you learn via spaced repetition, isn’t school failing rather miserably at that? The unstated question there is if you actually wanted to teach things, wouldn’t you used spaced repetition, which schools don’t use in any systematic way. The answer to the puzzle, of course, is that school is not about learning.

Your periodic reminder: Characteristics of White Supremacy Culture according to the San Francisco Unified School District. Features of ‘white supremacy culture’ include Perfectionism, Individualism, Sense of Urgency and Objectivity. They end by saying we can do better.

A theory, from a longer thread.

David Bessis: The #1 reason why we fail to teach math: we present it as knowledge without telling kids it’s a motor skill developed by practicing unseen actions in your head. Passive listening is useless, yet we never say it. We’re basically asking kids to take notes during yoga lessons.

Math is about reconfiguring our brains and reprogramming our intuitions. By ignoring this, we effectively refuse to teach math. We teach the cryptic symbols and convoluted formulas, but we never teach the secret art of making intuitive sense of them.

I think he severely oversells it in the longer thread, but he has a good point. You learn math by doing, by messing around, by experimenting and playing. Math lectures are relatively a waste of time.

Even getting math thought at all is increasingly hard. Here’s governor Newsom, signing a law, SB 1410, explicitly to force Algebra I to be taught to eighth graders, and all he can do is require the Instructional Quality Commission to ‘consider including’ that it be offered. What happens when they consider it, and simply say no?

Florida 9th graders rebel against their Algebra 1 state exam, refuse to take it despite it being a graduation requirement, in sufficient quantities to invalidate the test.

I do not see why any test would have an attendance requirement to count? Let the half that want to take it take it now, the other half can change their minds and study for a refresher when they realize they won’t graduate without it. At least if we don’t cave.

Alternatively, if the students can simply form a union and make the school cave, let’s see what else they want to do with that power, shall we?

Even if you end up at a ‘good’ school, a 10/10 rated school, what do you get? Mostly you get a lot of wasted time.

Tracing Woodgrains: This thread is absolutely correct. Sometimes people treat “good schools” as black boxes where you put kids in and education pops out, with everything working well

But for many kids, their core memory of even “good schools” will simply be waiting idly. So much more is possible.

Simon Sarris: The schools in my town are fine. The high school has 10/10 score on GreatSchools and the others have high scores for academics

But the best public (and many private!) schools still do very little in terms of education and take up too much time. Most engaged parents could not only do more with less time, but do more interesting things, and have more help around the house as the kids grow, since they’re around much more of each day.

The curriculum at so many schools just has terribly low expectations of what a child should be learning. Looking at the english/lit curriculum here was glum. I wouldn’t be surprised if it was disappointing at “elite” colleges now, too.

I have higher expectations than that.

I went to excellent catholic schools for 12 years. I had a fine time. I liked all my teachers.

Still, my biggest childhood memory of school was just waiting. Waiting for everyone else to be done, waiting for the day to be over, waiting to be picked up. I think we can do better.

Matthew Yglesias: It’s mostly child care (for young kids) or prison (for teens) where the hope is that some learning takes place. You could teach much more efficiently but then the kids would have to go somewhere else.

Freia (QTing Lehman below): Public school isn’t just tailored to the average student, it’s designed to oppress and humiliate smarter kids with genuine curiosity and enthusiasm for learning.

Every smart kid has stories of trying to read under the desk to escape the monotony and having their book taken.

These kinds of kids are acing the exams, and unlike e.g. talking, reading under the desk bothers no one. confiscation is just punitive.

It’s hilarious that their justification is always “respect”, as if it’s not disrespectful to waste hours of kids’ lives w/ useless “teaching.”

In HS i skipped a year of math by reading a textbook & taking a placement exam. it took me ~40 hours over 10 days to teach myself the material—taking the class would have been >200 hrs. the amount of kids’ time we’re wasting is totally unconscionable.

Why does this continue to be true? One reason: A lot of people do not care about smart kids. Or they actively want them to suffer.

Charles Fain Lehman (1m views): I think we underrate how hard it is to provide education to a range of abilities at scale. If the smart kids are bored, that’s an okay outcome.

Most interventions don’t work at scale! School kind of does some stuff! That’s pretty good!

I also spent a lot of time waiting as a kid. That was good! I’m glad that I was forced to wait. It was an important lesson in how my edification is not the most important thing.

But ofc, this is exactly the fallacy. “Everyone should have their ideal outcome.” But they can’t under given constraints! The question is always how to optimize.

In other words, you.

Seriously, what the actual ?

Here’s a general response: if you think (as the qt does) that “so much more is possible,” provide me with an evidence-based intervention that a) reduces smart kids’ boredom and b) yields large increases in attainment. I’m happy to hear about them!

Okay, at least that is a form of the right question. I don’t know why it has to be ‘and’ here. If the smart kids are not bored, or it increases attainment, either one of those is super awesome on its own. So can we do either or both of these things, at scale, without actively damaging the other?

Yes. Obviously. ‘Evidence based’ requires someone collecting whatever you consider admissible evidence for your particular court, but here are some interventions that seem very very obviously like they do these things.

  1. Magnet schools or tracking. Give them their own classroom with more advanced topics. Why do I even have to say this.

  2. Allow grade skipping, both in individual subjects and overall. If you can pass the tests, you can move up. Obviously this will help with both issues. If you say ‘what about socialization if you aren’t stuck in classrooms with kids exactly your own age’ short of kids trying to skip 4+ years prior to high school then I will flat out piledriver you.

  3. Let kids acing a class do other things when bored, and ideally skip the busywork too. This could not be simpler. If you are getting an A, and you want to read a book about something else, or do homework for another class, go ahead. If you want to be a jerk limit it to ‘educational’ things, sure, fine.

  4. Let the kid stay home and learn from LLMs and textbooks, provided they keep passing your periodic tests. Seriously.

  5. Let the kid take online college classes. Arizona State University’s are $25 each and some are done by children as young as 8. Others are often outright free.

There is mostly universal agreement that great teachers are much better than good teachers, who are much better than bad teachers. There is not agreement about how to find, create or enable the good and great teachers, or how to fire and avoid the bad ones.

Strangest of all, there is no agreement that we even should in theory be firing the bad teachers and hiring good ones. It’s easier said than done, but you have to say it first. There are even those who say ‘it is wrong to attempt to get your child in front of better teachers, that’s not fair,’ which is somehow not a strawman.

Noah Smith: Progressives claim test scores are about family wealth. Conservatives claim test scores are about IQ. But a lot of test scores are about the government’s ability to FIRE BAD TEACHERS AND HIRE GOOD TEACHERS.

Alz: Here’s a fun paper by a Stanford econ JMC on grade school math teachers. Looking at AMC data matched to Linkedin, paper finds that even just decent, above-average middle/high school math teachers dramatically increase students’ AMC scores: top AMC scorers increase by 165%!

The paper in question is ‘Early Mentors for Exceptional Students,’ which is a different topic than finding good teachers in general, but finds that exceptional teachers going the extra mile can massively increase the chances talented students will win honors, attend selective universities, and have the most valuable career paths. There are obvious concerns about selection effects, they did some work to minimize that but I’d still be wary.

Alex Tabarrok emphasizes that the highest marginal educational value is in mentoring the smartest and most talented kids to be all they can be. Having access to a mentor greatly increases the chance that our best talent truly excels. Roughly half of the potential top math students in the country stay unidentified. Alas, so many in education think that when exceptional students get a chance to excel further, this is at best a nice to have, and perhaps even bad because it ‘increases inequality.’

The broader point from Noah Smith is also important. If we believe teacher quality is valuable, why are we in so many ways not acting like that?

Which leads to the question, what factors predict teacher quality?

Journal of Public Economics: Just published in @JPubEcon”The unintended consequences of merit-based teacher selection: Evidence from a large-scale reform in Colombia.

The paper examines a nationwide merit-based teacher-hiring system in Colombia that replaced experienced contract teachers with higher test-performing new teachers. The reform decreased students’ test scores and reduced college enrollment and graduation by more than 10 percent.

The paper examines the effect of a nationwide merit-based teacher-hiring system, which replaced experienced teachers with high-performing new teachers. The new teachers could choose where they would go, replacing old teachers.

From what I (read: me asking Claude questions) could tell, the policy did not in any way evaluate the teachers being replaced, or attempt to replace bad teachers rather than good teachers. The new teachers decided where to go based on what they wanted. So effectively, they were replacing either random teachers, or the teachers with more desirable assignments.

Thus, what we find here is that when we replace random existing teachers en masse with new teachers, and do so without using local knowledge and relying mostly on test scores (see Seeing Like a State) the reduction in experience and non-measured qualities overwhelmed the new teachers scoring higher on the tests.

The study goes too far when it suggests that test scores are uncorrelated with teacher quality. I don’t care what studies show here, that doesn’t make sense. But it can be one factor of many, and relying on it too much can still be actively worse than the previously used methods, as can sticking teachers into new locations they’re not familiar with, as can lack of overall experience, all at once.

If you want to do a reform like this, it has to go hand in hand with firing the bad teachers rather than random teachers, or you’re going to at least be in for a very tough transition.

It is important to realize that teachers are sometimes very wrong about basic things, and sometimes when you point this out they dig in. It is remarkable the extent to which some people refuse to believe this, and tell people that they are lying or remembering wrong.

from r/NoStupidQuestions: My son’s third grade teacher taught my son that 1 divided by O is 0. I wrote her an email to tell her that it is not 0. She then doubled down and cc’ed the principal. The principal responded saying the teacher is correct… What do I do now?

Tbh, I’m mildly infuriated but I’m wondering if I’m just overreacting? Should I just stop fighting this battle?

Andrew Hoyer: Ask them to enter 1 / 0 into a calculator and report back.

Andrew Rettek: People say this is fake, but as a high school senior I had two teachers and my principal tell me that 22/7 was irrational.

Magor: Well, pi is irrational. So every fraction with numerators and denominators that are integers is rational except for the one that’s pi.

Andrew Rettek: This was their reasoning, yes. My principal literally told me that he thought that 22/7 was the only non repeating fraction.

Elizabeth van Nostrand: One year I worked as assistant at summer school for 3rd graders. The teacher said echolocation worked because water was made of electricity, and columbus was financed by the wife of King James of Spain.

Is this a case for or against formal education? Either way, it is wise.

At any given level of education, literacy rates have fallen dramatically, despite overall literacy rates remaining unchanged. Simpson’s Paradox! But as Alex Tabarrok points out, also a scathing indictment of the educational system, pointing to both the signaling model of education and also highly expensive inflation in that signal.

Schools in many places, including Palo Alto and Alameda, increasingly refuse to allow math acceleration. They flat out don’t let kids learn math at any reasonable pace, and this is allowed to continue in areas with some of the brightest student pools around. I wouldn’t mind too much if schools simply taught other subjects to the kids instead, it’s not like being way ahead in math is ultimately all that valuable and you can learn pretty easily on the computer anyway, but they’re flat out wasting hours a week of the kid’s time.

Andrew Bunner: Our district is as Niels describes. Our daughter got in trouble for working on her outside-school advanced math during class even though she finished the worksheet they were assigned.

Neils Hoven: The audacity of a student! To try to learn something while in school.

Benjamin Riley (Former deputy attorney general of California, founder of ‘Deans for Impact,’ self-described ‘influential voice in education’): It’s so weird this keeps happening to the children of the Tech Bro community. Will no one speak for them?

Niels Hoven: This is the former deputy attorney general of California and an influential voice in education, saying that if your child is stuck doing work below their ability, then you must be a Tech Bro and your child’s learning needs don’t matter.

Gallabytes: this happened *to me*. my parents are not remotely tech bros. they tried their best, put me in schools they felt were good, and those schools thought that the best way to enrich my math education was to make me teach the other kids. this WILL NOT HAPPEN to my children.

There are very few issues I’d emigrate over. mandatory public schooling is one of them, and things like this are why.

Mason: To understand tall poppy syndrome you have to fully accept that yes, they want your children cut down to size as well.

They do not feel that they have to hide this because they do not think it’s a bad thing. They think this is prosocial.

I suppose we now know what impact those deans want to have. Their issue is education. They are against it.

Niels Hoven has other examples as well, such as this one from Justin Baeder, a former principal now training other principals. Or this from his own experience, in response to Baeder asking “what would be the point?”

Niels Hoven: It’s fascinating to see “education thoughtleaders” who are not only unable to imagine the benefits of supporting high-achieving kids, they’re unable to imagine a world where it’s even possible!

I went to public school and took AP Calculus in 8th grade. For my entire school career, I was in group classes with kids no more than 2 years older than me, with the exception of 3rd-6th grade when I would do an hour of independent study during math class.

Every day when my 4th grader comes home, he asks how my conversations with his school are going because he’s tired of being taught material he already knows.

Don’t tell me that kids don’t want to be challenged, and don’t tell me that it’s impossible for us to do so.

But yeah, Justin Baeder can do so much better.

Justin Baeder: The academic acceleration maximalists are honestly worse than the sports dads living vicariously through their kids and making them hate the sport.

Yes, you probably CAN push your kid to do 10th grade math in upper elementary school. But…why?

Of course, if you have a true prodigy and they want to go all in on something, go for it. Just recognize that you are choosing a very difficult path for your child and yourself. This is not normal or healthy parenting, and doesn’t lead anywhere your kid will want to go.

How dare you actually attempt to have your kid actually learn things? Just flat out.

But it gets even better than that, wait for it…

Justin Baeder: Let’s say your kid reads many years above grade level, as mine do and always have.

Guess what—that just means they run out of books to read faster than everyone else!

What’s the goal here? How does it benefit the child to pay $$$ to push them even farther out?

That’s right. You need to be careful. They’ll run out of books!

That’s why you need to smuggle them in.

Pamela Hobart: just another underappreciated yet tremendously brave teacher protecting an uppity 5yo from running out of books here in South Austin, what a relief.

Quoted Post: Kindergarten parents: If your child could read entering Kindergarten, are they reading in class yet?

We’ve been pretty disappointed and are curious if it is the teacher or public school.

At the beginning of the year our daughter was excited to learn but now she says she hates to read and do math. She asked the teacher to read harder books and was told “no” – this was from her teacher and her teacher justified it by saying she doesn’t know if she has comprehension yet (she does).

My daughter confirmed she read a lot in preschool and has only read once in Kindergarten (when she brought her own books from home) in school. My daughter says she doesn’t want to bring books from home anymore or read at home because her teacher hates reading. We are worried to make a big fuss because the teacher told us our daughter should be in GT and we don’t want to jeopardize that if we just have a bad teacher this year but we can’t see our daughter continue to hate learning, when she used to enjoy it.

She also just got settled with friends. And if it matters we are seeing problems in math too where the teacher is working on counting to 10 still but my daughter can add/subtract 100+ and knows some multiplication tables.

Foxyavelli: It’s somehow rare kids are being held back from learning but literally everyone who knows a smart kid in school or was that kid has an anecdote.

Faded Magnet: This happens a lot, and it’s not new. Similar happened to me as a kid, because I arrived at school already reading. I just read what I wanted at home and played along at school.

David Hines: “why are you homeschooling, David?” WELL —

Ian Miller: I tell my sister we need bad teachers because there aren’t enough good ones to meet the necessary demand. But the current crop of bad ones seem determined to prove me wrong.

How should we think about solutions that only help high achievers?

We should think of them as highly valuable. Instead, we see the opposite.

Niels Hoven: This article is typical of the toxic attitudes toward high achieving kids.

High achieving students are seen as “the kids who need it the least” as though they’re some kind of second-class citizen. If they learn quickly, it isn’t a success, it’s a problem to be solved.

So yes: A huge portion of those tasked with educating students is working to actively sabotage our best and brightest, and often the other kids too, thinking this is good and also gaslighting everyone involved about the whole process.

Kelsey Piper: Our schools are failing children. Some parents, mostly the rich ones, have the time and confidence to speak out about it. Then people go “oh it’s a whiny privileged people problem”. No. Your schools are also failing the children whose parents don’t know how to speak out.

Not that it’d be acceptable to deny kids a good education because their parents are whiny privileged people! That’d be a cruel and evil thing to do! But in fact what we’re doing is denying all kids a good education and then looking who objects.

And then going “oh it’s only the rich people who object so it’s only their kids who are being harmed so it’s fine”, which is just evil on so many different levels I can’t quite fathom it.

Entertainingly many of these same people hate private schools on the grounds that it’s bad when privileged parents opt out instead of advocating to make the schools better. damned if you do, damned if you don’t.ta

Some places used to have standards, including standards for admission, like Thomas Jefferson High School. Then they killed many such places, so RIP, with national merit semifinalists down 50% in a year despite increased class sizes. And the entire county went from 264 semifinal candidates to 191, showing that this is a real loss.

A lawsuit was allowed to go forward challenging NYC’s gifted programs as ‘segregation,’ including targeting Stuyvesant High School and Bronx Science. It asserts that testing for academic ability is ‘racial’ and discriminatory. The admission tests in question are very similar to the SAT. The whole thing is madness and an attempt at civilizational suicide. If there is a law that makes that kind of admissions test illegal, it is incompatible with civilization, and the law must be repealed.

The latest formerly exceptional school we have learned was intentionally destroyed in the name of ‘equity’ comes from Philadelphia. You see, it’s important not to ‘disadvantage’ some students, so we have to ensure we don’t accidentally give others an education, or admit more talented students over less talented ones. Same old story. Out with the old motto, ‘dare to be excellent.’ And that was for all practical purposes the end of Masterman.

The obvious answer is, then only give such programs to the high achievers. Having looked at the details here, it’s actually a relatively non-toxic version of the concern, noting that if 95% of students fail to benefit from or use such programs, then they’re not a solution for that 95%. Which would for now make them not a general solution.

And yes, that’s disappointing, although I expect rapid improvements over time. Also disappointing is to call such kids ‘those who need it the least’ as if the goal of mathematics education was to hit some minimum bar and then stop. But this article seems to stop short of making the case many ‘equity’ advocates actually make and even put into practice, that Niels is warning about here, which is to claim that the success of the best students is actively a problem to be fixed.

Ozy Brennan: a question for people concerned with educational equity: if schools aren’t flexible enough to serve children who are multiple grade levels ahead, how well do they serve children who are multiple grade levels behind? (badly. it’s badly)

I also agree, for these reasons and also for others including cultural ones, that parents should have a right to see the curricula that is being imposed by force upon their children in a public school, so they know what and how their children are learning. And that a school or teacher refusing to share this information, or the state supreme court saying you don’t get to see it is a five alarm fire. I don’t see how someone can say ‘no this is a secret, you have no right to that, and send your kids to school or we’re calling the cops’ and not notice the skulls on their uniforms.

The ultimate proof that school is not about learning is that we know school starts too early for teenagers to properly sleep and thus be ready to learn, we’ve known forever, and it is impossible to fix this.

Lomez: The dread of school could be mitigated by 75 percent by simply letting children sleep in. Teenagers need significantly more sleep; without adequate amounts, they can become depressed and irritable. When I am education czar, the children will get their sleep.

Danielle Fong: My co-founder presented a study showing that teenagers learn better if they can sleep in a little more. The school responded with, “Well, we can’t adjust it now; we have a contract with the bus drivers.”

Oh, so the school is for the benefit of the bus drivers?

Much like the port is for the benefit of the longshoremen and the polity is for the benefit of politicians, eh?

Relatedly, morning classes fail spectacularly.

Abstract: Using a natural experiment which randomized class times to students, this study reveals that enrolling in early morning classes lowers students’ course grades and the likelihood of future STEM course enrollment. There is a 79% reduction in pursuing the corresponding major and a 26% rise in choosing a lower-earning major, predominantly influenced by early morning STEM classes. To understand the mechanism, I conducted a survey of undergraduate students enrolled in an introductory course, some of whom were assigned to a 7: 30 AM section. I find evidence of a decrease in human capital accumulation and learning quality for early morning sections.

Wait, a 7: 30 AM section? What fresh hell is that? When I was in college I did sign up intentionally for 9: 00 AM classes and given my sleep schedule that turned out fine, but the other kids clearly didn’t love it. A 7: 30 AM start time is nuts, such a thing should not exist, we need to amend the Geneva Conventions.

Still, the reduction seems extreme. 79% reduction in further classes is quite a lot if this was indeed a randomized trial. I presume some of it was the cultivation of hatred and aversion rather than pure lack of human capital. At this level, the classes at that hour are effectively non-functional. How can those involved not notice this? Why would anyone ever need (or want) to schedule a class that early? How many points are these kids attempting to take?

Legal battles continue (WSJ) over whether, if you decide to have charter schools at all, also allowing religious charter schools is then mandatory or forbidden. There does not appear to be middle ground. One of these violates the first amendment, the other does not, but it is not clear which is which. If it was mandatory, in some places charter schools would get a lot more support, and in other places they would get a lot less.

I do not think the government should be discriminating here, and imposing an effectively very large tax on those who want their children to go to religious schools, or any other kind of school, provided everyone has a local non-religious public school available. And frankly, a lot of the ‘non-religious’ public schools are effectively rather (non-traditionally, de facto) religious and have no intention of knocking it off, giving me even less sympathy.

Then there’s the step beyond that, those who appose privately funded private schools. People being mad that some pay for private school while also paying for everyone’s public school will never actually stop being weird to me, as will people who think this is super commonly done by people in the upper middle class purely for quality reasons.

Quinn: there’s this class of upper-middle/upper class urban yuppie that makes a big stink out of sending their kids to “public school” and man… I hate them. Yeah, your kid will go to your decent local K-5 and bolt when shit gets real, just like every aspirational underserved family.

Matt Bruenig: Reminder when approaching this discourse that 90% of children attend public schools, most of the rest attend so-so religious schools.

If ‘shit gets real’ in my local school, and I am being underserved, I do not know what that means, but it does not sound like a time to not be bolting? Yet most people do not have that option. Also, yes, it is very strange that the same person also thinks this:

Quinn: I want to see the urban public education system completely broken down and rebuilt from the bottom up. I do not hate that it exists; I hate how it is.

Again, seems like bolting conditions to me. If you are expecting parents to sacrifice their children on the alter of public pressure to improve failing schools, please speak directly into this microphone.

Tough but fair.

Politico: School choice programs have been wildly successful under DeSantis. Now public schools might close.

Chris Freiman: Netflix has been wildly successful. Now Blockbuster locations might close.

Politico does point to some concerns. If standard schools are losing interest, why then destroy the schools that are exceptional, as many places are somehow doing? That seems rather perverse.

Hopefully the Montessori school could reconstitute itself, but the ‘vote with your feet’ plan only works if you close the schools that lose feet, not the ones that keep them.

Andrew Atterbury: One proposal aiming to turn a popular Fort Lauderdale magnet school that focuses on the Montessori teaching method into a neighborhood school brought a crowd nearing 200 people in opposition at a recent town hall. There, dozens of audience members, a sea of blue “VSY’’ shirts representing Virginia Shuman Young elementary, contended the plan would cause an unnecessary “disruption” for a top-rated school.

“If your product is better, you’ll be fine. The problem is, they are a relic of the past — a monopolized system where you have one option,” Chris Moya, a Florida lobbyist representing charter schools and the state’s top voucher administering organization, said of traditional public schools. “And when parents have options, they vote with their feet.”

The data cited in Politico says that the decline in enrollment due to increased homeschooling is modestly bigger than the increase in private school attendance, some of whom are getting scholarships to do it. Home school as always has principle-agent issues where you worry if the education is happening, if you do not trust parents to want to educate their children.

But yes, if you do not zero out the very high social cost of regular schooling, home schools start to look a lot better. If the city gave us a budget equal to what it costs them to have our kids in schools, I would 100% be putting together a vastly better homeschooling program with that money.

Always remember it could be so much worse.

Bryne Hobart: Discussions about school vouchers, homeschooling, etc. make me feel grateful that it’s legal for us to cook for our kids, even though neither my wife nor I went to culinary school. Things could be very different.

Microschools are an obviously great idea. The cost of private school is far in excess of the costs of hiring a teacher, renting a space and paying for plausible supply and other costs. You get the socialization benefits of school, and the flexibility and customization to not do the awful parts, and as a society we get to try different stuff.

Also, home schooling and stay-at-home parent are expensive, and now you can spread that cost around along with its benefits.

Getting rid of trivial inconveniences can help quite a lot in such cases. Instead of having to go through rezoning, the school will often find space it can use for free.

Matt Bateman: Microschools are an extremely good and natural format for schooling and the only reason they aren’t much more common is because of unnatural blockers.

“Hey maybe we should pool our children and resources around the best and most willing educator in our community” is an obvious and ancient approach to organizing schooling and it’s quite strange that we today have managed to greatly disincentivize it

Politico: Florida Republicans, led by Gov. Ron DeSantis, want to let tiny private schools open in libraries, movie theaters, churches and other spaces where they can fit makeshift classrooms.

Florida’s policy change appears small; it allows private schools to use existing space at places like movie theaters and churches without having to go through local governments for approval.

But it could have a dramatic impact. This shift gives these private schools access to thousands of buildings, opening the door for new education options to emerge without them having to endure potentially heavy rezoning costs.

Primer, a company poised to act as a support system for such schools, is backed by Sam Altman. The man’s past investments often have excellent taste and speak well of his values. I’d be curious what he has invested in this year, after the events of November.

Scientific American used to be a magazine my family subscribed to that contained cool articles about science. Now it is… something else.

The latest example of this is Scientific American’s hit piece on home schooling. Eric Hoel has a thread and a post detailing the situation. The part about educational results is misleading but reasonable. Then they start in on accusations of ‘abuse.’

That is the part where, after using a child that was not home schooled as justification for a moral panic, they cite a 36% rate of homeschooled children being reported for alleged abuse to show how horrible it is.

Except…

Well, it turns out researchers have an answer to that question. Here’s from a 2017 paper, “Lifetime Prevalence of Investigating Child Maltreatment Among US Children:”

We estimate that 37.4% of all children experience a child protective services investigation by age 18 years.

As Hoel notes, focusing on withdraws from school only, and other details, suggest that the base rate of abuse for home school is potentially substantially below normal. At worst, it is normal.

The good argument against home school is it requires a large investment of time and resources by the parents, so most families cannot afford to do it. Obviously, if you can get your child consistent 1-on-1 (or e.g. 4-on-1) attention and a customized path of study then that’s great, and will only get better now that AI tools are available.

Instead, the actual thesis of many against homeschooling, when they’re not making up things like the claims earlier in this section, is flat out that parents are not qualified to teach their children. And that those who claim that they could teach their own children things like how to read or write or do arithmetic are therefore ‘big mad’ and also presumably flying in the face of education and The Science.

Waitman Beorn: The Home Schooling Crowd is big mad. lol But also, how insulting is it to teachers who literally train specifically to teach kids of a certain age a specific subject that Ashton and his trad wife think they can do it just as well in the playroom of their log cabin mansion?

Charles Rense: I read the replies, and they were for the most part polite and reasonable in their disagreement. The only one who’s “big mad” is you.

Polimath: How insulting to the academic institutions is it that parents with little or no training end up educating their kids better than the teachers who literally train specifically to teach kids?

Austin Allred: Anti-homeschooling folks often have expectations of school teacher qualifications that are just wildly out of touch with reality.

Jake McCoy: Dr. Waitman admitting he couldn’t teach 2nd grade math is not a great look for him.

Prince Vogelfrei: People say smug shit like this while not doing the tiniest bit of research on homeschooling outcomes which beat public schooling in every standardized test category. California makes everyone do tests, we have the data. But what does the truth matter when superiority is to be had.

Most people do not care about results, they care about process – particularly a process which ensures no one can really be blamed and avoid the messiness that ensues

It is beyond absurd to think that an average teacher, with a class of 24 kids, couldn’t be outperformed by a competent parent focusing purely on their own child. The idea that if you don’t specifically have an ‘education degree’ that you can’t teach things is to defy all of human experience and existence. Completely crazy. And yet.

Austen Allred: The conventional view any time you dare to suggest there might be a more optimal way to learn than the existing school system.

I love that it’s a he that ends up on this hypothetical stripper pole at 16.

Kelsey Piper answers questions about her private tiny home school. Parents typically teach a weekly class, student/teacher ratio is 4:1 to 6:1, full price is $1200/month/child. All reports I’ve heard are quite good, but of course talent involved is off the charts.

Comparing the Bryan Caplan home school method to Robinson’s similar one, with critique of the general framework and focus. Both focus on not helping the student, letting them work things out, with Robinson being full on ‘the student must never be helped with any problem.’

They assigned kids two hours of math a day. I am confused why that is needed, if you train in math reasonably less than half of that should be plenty. I get that history and music are secondary priorities, but cutting them out entirely seems like a mistake, although honestly art can go. I also don’t see how most children could focus like that. I know mine would have no hope no matter how hard I tried. In general, why so much time in a chair doing school-style work? I don’t understand how that helps.

Of course all of it is obsolete now. If you have access to Gemini Pro 1.5 and Claude Opus, all previous learning techniques are going to look dumb.

The constant refrain I hear is ‘but what about socialization?’ Which seems crazy to me, and I love this way of putting it.

Violeta: Re: the socialization issue with homeschooling

I asked my 16yo daughter who went back to high-school this year “do you wanna continue next year?”

“I don’t think I’m ready for the isolation of sitting 6h with people only of my exact age and an adult speaking at us for 90% of the time, I need to socialize more.”

There are far better ways to get socialization. What socialization you do get in school, from what I can tell, is by default horribly warped in deeply unhealthy ways. Most of the value is essentially ‘you might make a friend at all, and then interact with them outside of school in your few hours of freedom.’ There are better ways to find friends.

Aella summarizes her home school experience, listing pros and cons of the type she experienced. The cons are that parents control what you learn and who you see, and your social skills and knowledge base are different from other kids. The pros are you don’t waste massive amounts of time instead learning at your own pace, you don’t get stratified by age and you dodge a lot of toxic dynamics in normal schools.

Her description of her three months in normal high school instruction, where everything is checking off a box and no one actually cares, tells you that yes it can be otherwise, we should be horrified by our defaults. If your soul has not yet been crushed and you show up most of the way through the process, the soul crushing engine becomes impossible to miss.

If you are a parent, presumably ‘choose what to teach’ is pro rather than a con, and you will choose something reasonable. So the only cons are having quirky socialization touchstones rather than the insane ones we get in primary and high school, which all mostly gets overridden anyway? Yeah, choice seems very clear if everyone involved can handle and invest in the process.

The exact opposite method is illustrated in this video, the concept of never making a child do anything they do not choose to do on their own. As several people here note, this is the ‘homeschool with no effort by doing actual nothing’ method, and by default it is absolutely be an excuse for negligence. It can also be done right, via noticing what children actually want and ensuring they find it and helping steer them, which requires work as active as other methods.

When can you teach children about law and procedure and proof? Tracing Woodgrains suggests about 8, Kelsey Piper thinks similarly.

Michael Gibson asks a good question, but the opposite might be even better?

Michael Gibson: Why do kids hate school? If you don’t have an answer to that question then you’re not even in the conversation, let alone the debate.

Rebel Educator: The tragedy inside America’s K-12 schools.

Here’s another question. Why do kids love school?

Notice that teachers are saying that 95% (!) of their kids love Kindergarten (I can’t imagine they’re paying close attention, given my anecdata and this very different poll even if its sample is biased, but perhaps normies be norming), they still believe 74% love fourth grade, and even at the trough 37% are supposedly loving Grade 9.

If you force kids into a fixed location and subject them to strict control and force them to work at arbitrary stuff for roughly half of waking hours, these would be very good rates of loving the results, if the teachers are describing the kids remotely accurately. Consider that if you ask people if they ‘love’ their job, that’s going to score rather lower. We’re doing some things very wrong, but we must also be doing some things very right.

In relative terms this reaction seems roughly right? You start out with ‘hang out with other kids and mostly do fun things,’ which plausibly kids do mostly enjoy, then they turn it into progressively more sitting still for lectures and progressively more homework and busywork and wear you down for years. Then as high school progresses you start to get a bit of flexibility back and the world lets you do things like walk on your own or select classes or do actual work, so things improve a bit.

Remember, it could be so, so much worse.

Richard Hanania: In South Korea, 84% of five year-olds and 36% of *two year-oldsattend private tutoring schools. What causes a culture to go this far off the rails? This is genuinely horrifying.

Again, not for every child all the time, even in a ‘normal’ school.

But for many children, it mostly makes them miserable while wasting their time.

And I’d boldly claim that this is not good.

And I disagree with Tracing Woods in that actually they can articulate this pretty damn well most of the time.

Tracing Woods: I hated school starting in first grade. I wanted to go faster, wanted more interesting work, wanted something better. I kept wanting this and kept leaping at any available alternative through twelfth grade. Kids can’t fully articulate their desires. But they know.

Kelsey Piper: I’m not going to say “kids are 100% accurate at identifying the best learning environment for them”. But they will absolutely tell you “I hate school” if they hate school. They will tell you by begging you to quit your job and homeschool them, by telling you it’s okay if that means the family ends up homeless, we can set up a tent in the lot on the corner. They will tell you by pretending to be sick, and by really being sick.

If they love their school, they tell you that too. They will tell you by begging you to take them in to their school on weekends. They’ll spend Christmas break complaining that school isn’t in session. They’ll say to you thoughtfully ‘I do really love my summer camp, but it makes me sad to be missing out on school’.

If you’ve decided your five or six year old is too stupid to be worth listening to, you may miss the signs they’re in a really bad situation that is harming them a lot – and you’ll definitely be teaching them that their education is not something in which they are active participants, not something where their perspective is valued, not something where their experiences even matter.

I also hear a lot of people say ‘if you let them choose, won’t they just choose whatever school is easiest and has the most toys?”. No. They won’t. One family at our microschool literally had their child do a side by side comparison by attending public school and the microschool for a period of time and then choosing where they wanted to go. The child loved the amenities that the public school offered – it had way more funding! It was big enough to have a soccer team!

But they were desperately bored. They said that everything the teacher said, they already knew. They spent all day being told to be quiet while the teacher said boring things. And they didn’t want that.

Almost everyone was supportive, with many similar stories. Then there was one person who spoke up to say that kids saying they are miserable is fine, actually.

Ed Real (Teacher): Vomitous drama queen tripe. In Kelsey’s fantasy world, people are secretly wondering “why isn’t my kid begging for me to homeschool them? What’s wrong with me?”

Kelsey Piper: I think you may have misread. I was saying that children do beg their parents to homeschool them sometimes. If your children aren’t, that’s a good thing!

Ed Real: Oh, no, I understand what you’re saying and it’s designed to get parents to worry if their kids aren’t soooooooooo engaged with their school life that they’re DEMANDING more! In fact, a kid begging for homeschool or private school or whatever needs to be squelched routinely.

Kelsey Piper: Do you think that kids are never miserable in their school environment and are only pretending to desperately seek alternatives, or do you think that they are miserable but that the correct way for parents to respond when their children are miserable is ‘squelching routinely’?

EdReal: I think your inability to distinguish between misery and a kid whining that he’s miserable is a very big part of your delusion. Absent criminal neglect or abuse, “miserable” is not the word to use about a child’s state of mind. Not seriously, anyway.

Especially since you’re claiming these kids are “miserable” for academic reasons. I mean, social reasons, bullying, sure. But academic? Please.

And yes, I’m saying you are positing such children to make other parents envious.

Scott Alexander: I was miserable at school and begged my parents to home school me. I still endorse this 30 years later, and my parents (who refused at the time) in retrospect agree. I can’t figure out if you’re denying my existence or saying that because I was a child my feelings didn’t matter.

Ed Real: Neither. I am saying that “miserable” is not an accurate description of your state of mind. You were a bright kid who felt you could do better. Oh, well. Pretty obvious you didn’t suffer career-wise.

Keller Scholl: It is always striking to me that one of the most common nightmares is being back in school, under the control of people like this poster, and this is not seen as a failure of schools.

Mike Blume: Teachers on [Twitter] doing a fucking fantastic job of making me want to entrust my children to their institutions.

Rohit: People forget their own schooling, I think. Things I did to show school was boring:

  1. Repeatedly said school was boring.

  2. Skipped all classes.

  3. Created a pro and con list on why I should spend all my time in the library.

  4. Read advanced textbooks for fun.

Not taking children seriously is wrong.

I think that if a child tells you they are miserable at school, chances are they are miserable at school. And yes, of course I speak from experience. Do people doubt that being constantly bored, for hours every day, is insufficient?

As in:

In class I’d pass the time

Drawing a slash for every time the second hand went by

A group of five

Done twelve times was a minute

But

Shameika said I had potential

– Fiona Apple, Shameika

I mean, yes, that is pretty miserable. Stop pretending it isn’t. Or that it doesn’t matter when a particular teacher is very much not Shameika, so long as you ‘turn out okay.’

There are plenty of other reasons that school can become a five alarm fire you need to get away from (e.g. Fiona Apple also notes ‘I wasn’t afraid of the bullies and that just made the bullies worse.’) But boredom and wasted time is enough.

You can do so much more, and we should treat failure to excel where it was possible similarly to how we (should and used to but increasingly also don’t) treat failure to stay at grade level for average students.

Daniel Buck: We need to quit wasting the time of gifted students and make it easier for them skip grades.

How many spend their days twiddling their thumbs after finishing the days work in minutes, waiting for their classmates? Or worse, getting conscripted to be the teacher’s errand runner?

I actually think being the teacher’s assistant is far better than thumb twirling, especially if the thing you assist with is the actual teaching. Still not ideal.

Matt Bateman: People are so used to talking about minimum acceptable outcomes in education that they truly have no idea how high-variance educational outcomes can and should be.

The top quartile of students can skip grades. The top 2 percent of students can do an entire multiyear curriculum in *weeks*.

As Nuno Sempere says, there are obvious problems with ‘kids decide when they get to drop out of regular school.’ At minimum, there need to be highly credible costly signals sent, rather than simply expressing a preference. Kids often have to be told to do things they don’t want to do. And you want a credible signal of how much they don’t want to be there, not one designed to get them what they want.

I have a very strong commitment, that if this kind of misery around school appears and is sustained, then that is that. Especially if they successfully locate and point out this statement.

Democracy mostly works, to the extent it works, because when things are truly terrible, people get mad and then they Throw the Bums Out. Like the children miserable at school, we do not know what we need, but we know when conditions are miserable and it is time for a change.

Also it is insane how many people use the argument Kelsey Piper talks about below, and do so with a straight face. Indeed the comments contain many people condemning Kelsey Piper as bad for not sacrificing her children on some symbolic alter.

Kelsey Piper: I also get really irritated by “you should send your kids to bad public schools because people like you doing that is what makes them into good public schools”. I’m willing to make a lot of sacrifices for my community. Wronging my kids isn’t one of them.

I am happy to donate to local classrooms in need of supplies (I do make those donations). I am happy to volunteer in local classrooms if they need me. I am not happy to condemn my kids to unhappiness, and I frankly don’t think I have the right to do so.

They deserve some input into where they spend 40 hours a week and they want it to be in a safe environment with challenging academic work and individualized curriculum and support in pursuing their ambitions.

And again.

Kelsey Piper: The weirdest and most horrifying thing about the school choice discourse was the apparently universal assumption, on left and right, that the only thing that matters about schools is student population. On the left the takeaway was that by removing a child from a school you wronged the school; on the right the takeaway was that there is no hope for general improvements in education, just the hope that you can be better than other people at securing a place for your children in the schools with “high quality” children.

A lot of the families I know of that took their kids out of a local public school did so because their child wasn’t learning to read. Their child wasn’t learning to read because they needed phonics-based reading education and the school did not offer it. Now it was third grade and the child was badly behind, fed up with the whole concept of reading, had lots of bad habits around bluffing and faking it, and felt deeply ashamed of themselves. That family would have been much much better served by a school with the identical student body, but a better reading instruction program.

I talked recently to another parent whose school had a problem with parents fleeing for private schools. The principal of the school forbade students from doing ahead-of-grade-level work, including checking out ahead-of-grade-level books from the library. The problem wasn’t the student body. The problem was this rule.

Interruption to highlight that yes this is a thing. There are schools where they are very strict about not allowing students to get more of an education than they are supposed to get. No books that are too advanced. It is a crime against humanity, and I mean that.

Kelsey Piper: When I was in third grade at a public school, the teacher called up my mom to try to tell her that I would not be allowed to take advanced math, because the teacher thought it wasn’t healthy for girls to take advanced math. My mom objected and I got into advanced math, but I had a miserable year under that teacher, who kept me inside on detention every day on various pretexts (I’d never gotten in trouble before or since). The classmates were not the problem. The problem was the teacher.

When I was in tenth grade, the school inexplicably assigned a person who didn’t know Calculus to teach the Calculus class. He identified the smartest boy in the class on the first day and told that boy to teach us all. He did his best.

Another reason kids frequently struggle at school is because the school starts at 8am. Those kids would thrive at a school that started at 10am. Their parents often wish they had that option, but they don’t.

Interruption number two: Oh my yes, and this effect can be rather large.

Kelsey Piper: I understand that there’s a lot of research to the effect that peer effects on education are much more reproducible than teacher effects on education. And certainly ‘who attends the school’ predicts test scores more strongly than ‘are the teachers any good’. But I think somehow we have jumped from that to a deranged insistence that the only thing that can be good or bad about a school is the student body, which carries implicitly an insistence that schooling is fundamentally zero-sum, that there are no wins that do not come at someone else’s expense, that to desire your child be happy is to desire someone else’s child suffer.

When I say a school is bad, I mean ‘it does not teach reading’. I mean ‘it does not allow students to work ahead at their own pace’. I mean ‘the teacher was hostile to my child’. I mean ‘the teacher was sexist’ or ‘the teacher was incompetent’ or ‘the teacher was not sober’ or ‘the teacher was not familiar with the material’. Bullying and unsafety are ways a school can be bad for kids, but not only is it not the only way, in the cases I’ve run into it’s usually not even in the top three reasons why a child is miserable at school.

When I say a school is good, I mean that kids spend their time doing work that they’re bought in on, work that matters to them, work they care about, work that’s challenging and interesting. I mean that they want to be there. I mean that the adults who teach and mentor and support them are trustworthy and trusted.

Under the worldview where all we can do is shuffle student bodies, school choice is suspect, probably just another way for some people to profit at the expense of others. But under the worldview where schools vary in quality and in which traits they possess, school choice (if implemented justly) is obviously a good idea. It’s a good idea in two ways. The first is a form of accountability which isn’t about test scores. Schools where the teachers are miserable bullies will lose students to other schools that do better at hiring. Schools that don’t serve their students will lose out to ones that do.

The second advantage is that children are different from each other! Some want to start at 8am because they drive their parents nuts by being up since 5 anyway. Some need to start at 8am for their job. But some work night shifts or have late risers and job flexibility and benefit from a late start. Some kids do well in orderly quiet classrooms; some do well with hubbub and flexibility. Some kids care a lot if the school allows them to work ahead, and some don’t. School choice allows people to self-sort around what works for their child.

It’s obviously not a panacea. We know that because we’ve had school choice in various forms for a while now and I observe a distinct lack of panacea around here. Incentives to solve education only get you so far if no one knows how to solve education. The well-intentioned focus on testing introduced the most painful costly clear-cut Goodharting I’ve seen in any industry. And any system that involves parents taking action will benefit the kids with active and involved parents, and practically all processes benefit the wealthy because that’s in a very fundamental sense what wealth means.

But once you acknowledge that schools can be good or bad because of policies and teachers and curriculums, and that schools can be different in ways that make them a better or worse fit for a child, and that we’re not playing some relentlessly depressing zero-sum game where every benefit for any child is extracted from some other child, I think the question becomes “what would a good policy regime here look like?”.

One thing that I am very grateful about California is that you can just start your own school, pretty straightforwardly. There are no state fees; the legal requirements aren’t even that onerous. You and a few of your friends can get together and see if you can build something that’s good for your kids, and if it is, you can open it up to more kids. This does not just reshuffle the kids. It makes it possible to do better. And when you have an unsolved problem, like how to provide just, high quality education to every child everywhere, it’s really valuable to open up the possibility of inventing and iterating and finding something better.

And – if a child is miserable, their misery matters. If your child is miserable, you need to help them. Even if the studies show they’ll be fine as an adult, this is a human being, over whom you have power, to whom you have duties, and they do not only matter for their future trajectory. http://GreatSchools.com will not tell you whether your child’s school is good or bad for them, but your child will. Please listen.

Sarah Constantin: “surely nobody would criticize Kelsey for putting a ton of work into founding a goddamn school to make sure her kids got a good education, that’s like motherhood and apple pie”

oh no.

It is indeed literally motherhood, and a central part of it. And yet.

Many times, in many ways, they make it hell. On purpose.

Then they justify that because you need to be ready for more hell, and say that overrides everything else in life.

Wes: Exhibit A in why I hate school. This is from Roxie’s preschool. They don’t let the kids inside until 9am, so if they show up early, kids will often play on the playground right next to the entrance. This is now banned. The reason is “the morning transition time is very important in order for the students to begin their day ready to learn.”

This is the petty tyranny of school administrators. They have the authority to make rules, so they do whatever is most convenient for them, and always default to “don’t let the kids have any fun.” Kids enjoying themselves is seen as a threat to the educational environment. IN PRE SCHOOL. It just gets worse as the kids get older. It’s not just the kids who associating learning with being bored and uninterested – it’s the adult too. So apparently they need a ten minute “transition time” to stop having fun and get ready to go be bored in class.

Outdoor playtime is now referred to as “gross motor time” because kids can’t just play. It has to have some pedagogical purpose.

Discussion about this post

Childhood and Education #9: School is Hell Read More »

“literally-just-a-copy”—hit-ios-game-accused-of-unauthorized-html5-code-theft

“Literally just a copy”—hit iOS game accused of unauthorized HTML5 code theft

Viral success (for someone else)

VoltekPlay writes on Reddit that it was only alerted to the existence of My Baby or Not! on iOS by “a suspicious burst of traffic on our itch.io page—all coming from Google organic search.” Only after adding a “where did you find our game?” player poll to the page were the developers made aware of some popular TikTok videos featuring the iOS version.

“Luckily, some people in the [Tiktok] comments mentioned the real game name—Diapers, Please!—so a few thousand players were able to google their way to our page,” VoltekPlay writes. “I can only imagine how many more ended up on the thief’s App Store page instead.”

Earlier this week, the $2.99 iOS release of My Baby or Not! was quickly climbing iOS’s paid games charts, attracting an estimated 20,000 downloads overall, according to Sensor Tower.

Marwane Benyssef’s only previous iOS release, Kiosk Food Night Shift, also appears to be a direct copy of an itch.io release.

Marwane Benyssef’s only previous iOS release, Kiosk Food Night Shift, also appears to be a direct copy of an itch.io release.

The App Store listing credited My Baby or Not! to “Marwane Benyssef,” a new iOS developer with no apparent history in the game development community. Benyssef’s only other iOS game, Kiosk Food Night Shift, was released last August and appears to be a direct copy of Kiosk, a pay-what-you-want title that was posted to itch.io last year (with a subsequent “full” release on Steam this year)

In a Reddit post, the team at VoltekPlay said that they had filed a DMCA copyright claim against My Baby or Not! Apple subsequently shared that claim with Bennysof, VoltekPlay writes, along with a message that “Apple encourages the parties to a dispute to work directly with one another to resolve the claim.”

This morning, Ars reached out to Apple to request a comment on the situation. While awaiting a response (which Apple has yet to provide), Apple appears to have removed Benyssef’s developer page and all traces of their games from the iOS App Store.

“Literally just a copy”—hit iOS game accused of unauthorized HTML5 code theft Read More »

when-europe-needed-it-most,-the-ariane-6-rocket-finally-delivered

When Europe needed it most, the Ariane 6 rocket finally delivered


“For this sovereignty, we must yield to the temptation of preferring SpaceX.”

Europe’s second Ariane 6 rocket lifted off from the Guiana Space Center on Thursday with a French military spy satellite. Credit: ESA-CNES-Arianespace-P. Piron

Europe’s Ariane 6 rocket lifted off Thursday from French Guiana and deployed a high-resolution reconnaissance satellite into orbit for the French military, notching a success on its first operational flight.

The 184-foot-tall (56-meter) rocket lifted off from Kourou, French Guiana, at 11: 24 am EST (16: 24 UTC). Twin solid-fueled boosters and a hydrogen-fueled core stage engine powered the Ariane 6 through thick clouds on an arcing trajectory north from the spaceport on South America’s northeastern coast.

The rocket shed its strap-on boosters a little more than two minutes into the flight, then jettisoned its core stage nearly eight minutes after liftoff. The spent rocket parts fell into the Atlantic Ocean. The upper stage’s Vinci engine ignited two times to reach a nearly circular polar orbit about 500 miles (800 kilometers) above the Earth. A little more than an hour after launch, the Ariane 6 upper stage deployed CSO-3, a sharp-eyed French military spy satellite, to begin a mission providing optical surveillance imagery to French intelligence agencies and military forces.

“This is an absolute pleasure for me today to announce that Ariane 6 has successfully placed into orbit the CSO-3 satellite,” said David Cavaillolès, who took over in January as CEO of Arianespace, the Ariane 6’s commercial operator. “Today, here in Kourou, we can say that thanks to Ariane 6, Europe and France have their own autonomous access to space back, and this is great news.”

This was the second flight of Europe’s new Ariane 6 rocket, following a mostly successful debut launch last July. The first test flight of the unproven Ariane 6 carried a batch of small, relatively inexpensive satellites. An Auxiliary Propulsion Unit (APU)—essentially a miniature second engine—on the upper stage shut down in the latter portion of the inaugural Ariane 6 flight, after the rocket reached orbit and released some of its payloads. But the unit malfunctioned before a third burn of the upper stage’s main engine, preventing the Ariane 6 from targeting a controlled reentry into the atmosphere.

The APU has several jobs on an Ariane 6 flight, including maintaining pressure inside the upper stage’s cryogenic propellant tanks, settling propellants before each main engine firing, and making fine adjustments to the rocket’s position in space. The APU appeared to work as designed Thursday, although this launch flew a less demanding profile than the test flight last year.

Is Ariane 6 the solution?

Ariane 6 has been exorbitantly costly and years late, but its first operational success comes at an opportune time for Europe.

Philippe Baptiste, France’s minister for research and higher education, says Ariane 6 is “proof of our space sovereignty,” as many European officials feel they can no longer rely on the United States. Baptiste, an engineer and former head of the French space agency, mentioned “sovereignty” so many times, turning his statement into a drinking game crossed my mind.

“The return of Donald Trump to the White House, with Elon Musk at his side, already has significant consequences on our research partnerships, on our commercial partnerships,” Baptiste said. “Should I mention the uncertainties weighing today on our cooperation with NASA and NOAA, when emblematic programs like the ISS (International Space Station) are being unilaterally questioned by Elon Musk?

“If we want to maintain our independence, ensure our security, and preserve our sovereignty, we must equip ourselves with the means for strategic autonomy, and space is an essential part of this,” he continued.

Philippe Baptiste arrives at a government question session at the Senate in Paris on March 5, 2025. Credit: Magali Cohen/Hans Lucas/AFP via Getty Images

Baptiste’s comments echo remarks from a range of European leaders in recent weeks.

French President Emmanuel Macron said in a televised address Wednesday night that the French were “legitimately worried” about European security after Trump reversed US policy on Ukraine. America’s NATO allies are largely united in their desire to continue supporting Ukraine in its defense against Russia’s invasion, while the Trump administration seeks a ceasefire that would require significant Ukrainian concessions.

“I want to believe that the United States will stay by our side, but we have to be prepared for that not to be the case,” Macron said. “The future of Europe does not have to be decided in Washington or Moscow.”

Friedrich Merz, set to become Germany’s next chancellor, said last month that Europe should strive to “achieve independence” from the United States. “It is clear that the Americans, at least this part of the Americans, this administration, are largely indifferent to the fate of Europe.”

Merz also suggested Germany, France, and the United Kingdom should explore cooperation on a European nuclear deterrent to replace that of the United States, which has committed to protecting European territory from Russian attack for more than 75 years. Macron said the French military, which runs the only nuclear forces in Europe fully independent of the United States, could be used to protect allies elsewhere on the continent.

Access to space is also a strategic imperative for Europe, and it hasn’t come cheap. ESA paid more than $4 billion to develop the Ariane 6 rocket as a cheaper, more capable replacement for the Ariane 5, which retired in 2023. There are still pressing questions about Ariane 6’s cost per launch and whether the rocket will ever be able to meet its price target and compete with SpaceX and other companies in the commercial market.

But European officials have freely admitted the commercial market is secondary on their list of Ariane 6 goals.

European satellite operators stopped launching their payloads on Russian rockets after the invasion of Ukraine in 2022. Now, with Elon Musk inserting himself into European politics, there’s little appetite among European government officials to launch their satellites on SpaceX’s Falcon 9 rocket.

The second Ariane 6 rocket on the launch pad in French Guiana. Credit: ESA–S. Corvaja

The Falcon 9 was the go-to choice for the European Space Agency, the European Union, and several national governments in Europe after they lost access to Russia’s Soyuz rocket and when Europe’s homemade Ariane 6 and Vega rockets faced lengthy delays. ESA launched a $1.5 billion space telescope on a Falcon 9 rocket in 2023, then returned to SpaceX to launch a climate research satellite and an asteroid explorer last year. The European Union paid SpaceX to launch four satellites for its flagship Galileo navigation network.

European space officials weren’t thrilled to do this. ESA was somewhat more accepting of the situation, with the agency’s director general recognizing Europe was suffering from an “acute launcher crisis” two years ago. On the other hand, the EU refused to even acknowledge SpaceX’s role in delivering Galileo satellites to orbit in the text of a post-launch press release.

“For this sovereignty, we must yield to the temptation of preferring SpaceX or another competitor that may seem trendier, more reliable, or cheaper,” Baptiste said. “We did not yield for CSO-3, and we will not yield in the future. We cannot yield because doing so would mean closing the door to space for good, and there would be no turning back. This is why the first commercial launch of Ariane 6 is not just a technical and one-off success. It marks a new milestone, essential in the choice of European space independence and sovereignty.”

Two flights into its career, Ariane 6 seems to offer a technical solution for Europe’s needs. But at what cost? Arianespace hasn’t publicly disclosed the cost for an Ariane 6 launch, although it’s likely somewhere in the range of 80 million to 100 million euros, about 40 percent lower than the cost of an Ariane 5. This is about 50 percent more than SpaceX’s list price for a dedicated Falcon 9 launch.

A new wave of European startups should soon begin launching small rockets to gain a foothold in the continent’s launch industry. These include Isar Aerospace, which could launch its first Spectrum rocket in a matter of weeks. These companies have the potential to offer Europe an option for cheaper rides to space, but the startups won’t have a rocket in the class of Ariane 6 until at least the 2030s.

Until then, at least, European governments will have to pay more to guarantee autonomous access to space.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

When Europe needed it most, the Ariane 6 rocket finally delivered Read More »

the-most-intriguing-tech-gadget-prototypes-demoed-this-week

The most intriguing tech gadget prototypes demoed this week


Some of these ideas have genuine shots at making it into real products.

Creating new and exciting tech products requires thinking outside of the box. At this week’s Mobile World Congress (MWC) conference in Barcelona, we got a peek at some of the research and development happening in the hopes of forging a functional gadget that people might actually want to buy one day.

While MWC is best known for its smartphone developments, we thought we’d break down the most intriguing, non-phone prototypes brought to the show for you. Since these are just concept devices, it’s possible that you’ll never see any of the following designs in real products. However, every technology described below is being demonstrated via a tangible proof of concept. And the companies involved—Samsung and Lenovo—both have histories of getting prototyped technologies into real gadgets.

Samsung’s briefcase-tablet

How many times must something repeat before it’s considered a trend? We ask because Samsung Display this week demoed the third recent take we’ve seen on integrating computing devices into suitcases.

Samsung Display’s Flexible Briefcase prototype uses an 18.1-inch OLED tablet that “can be folded into a compact briefcase for convenience,” per the company’s announcement. Samsung Display brought a proof of concept to MWC, but attendees say they haven’t been allowed to touch it.

But just looking at it, the device appears similar to LG’s StanbyMe Go (27LX5QKNA), which is a real product that people can buy. LG’s product is a 27-inch tablet that can fold out from a briefcase and be propped up within the luggage. Samsung’s prototype looks more like a metal case that opens up to reveal a foldable, completely removable tablet.

The folding screen could yield a similar experience to using a foldable laptop. However, that brings questions around how one could easily navigate the tablet via touch and why a folding, massive tablet in luggage is better than a regular one. Samsung Display is a display supplier and doesn’t make gadgets, though, so it may relegate answering those questions to someone else.

Samsung’s concept also brings to mind the Base Case, a briefcase that encapsulates two 24-inch monitors for mobile work setups. The Base Case is also not a real product currently and is seeking crowdfunding.

The laptop that bends over backward for you

There are several laptops that you can buy with a foldable screen right now. But none of them bends the way that Lenovo’s Thinkbook Flip AI concept laptop does. As Lenovo described it, the computer’s OLED panel uses two hinges for “outward folding,” enabling the display to go from 13 inches to 18.1 inches.

Lenovo Thinkbook Flip AI Concept

A new trick. Credit: Lenovo

Enhanced flexibility is supposed to enable the screen to adapt to different workloads. In addition to the concept functioning like a regular clamshell laptop, users could extend the screen into an extra-tall display. That could be helpful for tasks like reading long documents or scrolling social feeds.

Lenovo Thinkbook Flip AI Concept in Vertical Mode.

This would be “Vertical Mode.” Credit: Lenovo

There’s also Share Mode, which enables you and someone sitting across from you to both see the laptop’s display.

Again, every concept in this article may never be sold in actual products. Still, Lenovo’s prototype is said to be fully functional with an Intel Core Ultra 7 processor, 32GB of LPDDR5X RAM, and a PCIe SSD (further spec details weren’t provided). Lenovo also has a strong record of incorporating prototypes into final products. For example, this June, Lenovo is scheduled to release the rollable-screen laptop that it showed as a proof of concept in February 2023.

Hands-on with Lenovo’s Foldable Laptop Concept at MWC 2025.

Lenovo’s solar-powered gadgets

There are many complications involved in making a solar-powered laptop. For one, depending on the configuration, laptops can drain power quickly. Having a computer rely on the sun for power would lead to numerous drawbacks, like shorter battery life and dimmer screens.

In an attempt to get closer to addressing some of those problems, the Lenovo Yoga Solar PC Concept has a solar panel integrated into its cover. Lenovo claims the panel has a conversion rate of “over 24 percent.” Per Lenovo’s announcement:

This conversion rate is achieved by leveraging ‘Back Contact Cell’ technology, which moves mounting brackets and gridlines to the back of the solar cells, maximizing the active absorption. The … Dynamic Solar Tracking system constantly measures the solar panel’s current and voltage, working with the Solar-First Energy system to automatically adjust the charger’s settings to prioritize sending the harvested energy to the system, helping to ensure maximum energy savings and system stability, regardless of light intensity. Even in low-light conditions, the panel can still generate power, sustaining battery charge when the PC is idle.

We’ll take Lenovo’s claims with a grain of salt, but Lenovo does appear to be seriously researching solar-powered gadgets. The vendor claimed that its solar panel can absorb and convert enough “direct,” “optimal,” and “uninterrupted” sunlight in 20 minutes for an hour of video 1080p playback with default settings. That conversion rate could drop based on how bright the sunlight is, the angle at which sunlight is hitting the PC, geographic location, and other factors.

For certain types of users, though, solar power will not be reliably powerful enough to be their computer’s sole source of power. A final offering would have better appeal using solar power as a backup. Lenovo is also toying with that idea through its Solar Power Kit attachment proof of concept.

Lenovo's idea of a Solar Power Kit for its Yoga line of laptops.

Lenovo’s Solar Power Kit proof of concept. Credit: Lenovo

Lenovo designed it to provide extra power to Lenovo Yoga laptops. The solar panel can use its own kickstand or attach to whatever else is around, like a backpack or tree. It absorbs solar energy and converts it to PC power using Maximum Power Point Tracking, Lenovo said. The Kit would attach to laptops via a USB-C cable. Another option is to use the Solar Power Kit to charge a power bank.

Lenovo isn’t limiting this concept to PCs and suggested that the tech it demonstrated could be applied to other devices, like tablets and speakers.

A new take on foldable gaming handhelds

We’ve seen gaming handheld devices that can fold in half before. But the gadget that Samsung Display demoed this week brings the fold to the actual screen.

Samsung Display Flex Gaming

The crease would be a problem for me. Credit: Samsung

Again, Samsung Display is a display supplier, so it makes sense that its approach to a new gaming handheld focuses on the display. The prototype it brought to MWC, dubbed Flex Gaming, is smaller than a Nintendo Switch and included joysticks and areas that look fit for D-pads or buttons.

The emphasis is on the foldable display, which could make a gadget extremely portable but extra fragile. We’d also worry about the viewing experience. Foldable screens have visible creases, especially when viewed from different angles or in bright conditions. Both of those conditions are likely to come up with a gaming device meant for playing on the go.

Still, companies are eager to force folding screens into more types of devices, with the tech already expanding from phones to PCs and monitors. And although all of the concepts in this article may never evolve into real products, Samsung Display has shown repeated interest in providing unique displays for handheld gaming devices. At the 2023 CES trade show in Las Vegas, Samsung demoed a similar device with a horizontal fold, like a calendar, compared to the newer prototype’s book-like vertical fold:

It’s unclear why the fold changed from prototype to prototype, but we do know that this is a concept that Samsung Display has been playing with for at least a few years. In 2022, Samsung filed a patent application for a foldable gaming handheld that looks similar to the device shown off at MWC 2025:

Samsung Display foldable gaming console

An image from Samsung Display’s patent application. Credit: Samsung Display/WIPO

Lenovo’s magnetic PC accessories

Framework has already proven how helpful modular laptops can be for longevity and durability. Being able to add new components and parts to your PC enables the system to evolve with the times and your computing needs.

Framework’s designs largely focus on easily upgrading essential computer parts, like RAM, keyboards, and ports. Lenovo’s new concepts, on the other hand, offer laptop accessories that you can live without.

Among the prototypes that Lenovo demoed this week is a small, circular display adorned with cat ears and a tail. The display shows a smiley face with an expression that changes based on what you’re doing on the connected system and “offers personalized emoji notifications,” per Lenovo. The Tiko Pro Concept is a small screen that attaches to a Lenovo Thinkbook laptop and shows widgets, like the time, a stopwatch, transcriptions, or a combination.

Likely offering greater appeal, Lenovo also demoed detachable secondary laptop screens, including a pair of 13.3 inch displays that connect to the left and right side of a Lenovo laptop’s display, plus a 10-inch option.

Lenovo's idea for magnetically attacble secondary laptop screens.

Lenovo’s idea for magnet-attachable secondary laptop screens. Credit: Lenovo

Lenovo demoed these attachments on a real upcoming laptop, the Thinkbook 16p Gen 6 (which is supposed to come out in June starting at 2,999 euros excluding VAT, or about $3,245).

Lenovo has been discussing using pogo pins to attach removable accessories to laptops since CES 2024. PCMag reported that the company plans to release a Magic Bay webcam with 4K resolution and speakers this year.

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

The most intriguing tech gadget prototypes demoed this week Read More »

1password-offers-geo-locating-help-for-bad-apps-that-constantly-log-you-out

1Password offers geo-locating help for bad apps that constantly log you out

You could name things more sensibly in 1Password, of course, and you should. But having a list of nearby logins in the app will certainly be more convenient than fixing every company’s identity issues. There is also the deeper, messier issue of apps calling out to URLs that do not share a name with the product or service, which can sometimes trip up apps like 1Password from linking credentials to the app you’re trying to log in to.

In the Washington, DC, area, the Washington Metropolitan Area Transit Authority (WMATA), or “Metro” to locals, manages the subways and buses (and one odd streetcar). Metro has an app that allows you to manage the money on your physical cards and set up digital payments on phones. The app is named “SmarTrip,” and it logs me out every time the sun sinks below the horizon, and 1Password can never quite associate the login page of the app with my account details. I rediscover this whenever I need to check my physical cards or wonder why an automatic reload hasn’t gone through.

Some of what I’m describing is almost certainly confirmation bias and the human tendency to remember stressful moments far more keenly than everyday actions. But I will be linking my frequent subway stations and bus stops to the SmarTrip login, along with stores, airports, and other places I want to spend less time looking at my phone while my heart rate rises.

Entirely optional but recommended

1Password app, open to the Home page, with

Credit: 1Password

1Password has a support page with details on how to add locations from all their desktop and mobile clients. As the firm suggests, you can also use locations for things like Wi-Fi passwords, PIN codes, credit and ATM/debit cards, and other items. When you open 1Password, everything that is “Nearby” will show up at the top of the “Home” page, and you can change how far a radius the app should take when pulling in nearby items.

1Password notes on its announcement post that it does not store, share, or track your location data, which is stored locally. Enterprise users do not have their location shared with employers. And the location feature is entirely optional. It should be available today for 1Password users whose apps are up to date, and I’m hoping that other password apps also consider offering this feature, securely, for their users.

1Password offers geo-locating help for bad apps that constantly log you out Read More »

ai-#106:-not-so-fast

AI #106: Not so Fast

This was GPT-4.5 week. That model is not so fast, and isn’t that much progress, but it definitely has its charms.

A judge delivered a different kind of Not So Fast back to OpenAI, threatening the viability of their conversion to a for-profit company. Apple is moving remarkably not so fast with Siri. A new paper warns us that under sufficient pressure, all known LLMs will lie their asses off. And we have some friendly warnings about coding a little too fast, and some people determined to take the theoretical minimum amount of responsibility while doing so.

There’s also a new proposed Superintelligence Strategy, which I may cover in more detail later, about various other ways to tell people Not So Fast.

Also this week: On OpenAI’s Safety and Alignment Philosophy, On GPT-4.5.

  1. Language Models Offer Mundane Utility. Don’t get caught being reckless.

  2. Language Models Don’t Offer Mundane Utility. Your context remains scarce.

  3. Choose Your Fighter. Currently my defaults are GPT-4.5 and Sonnet 3.7.

  4. Four and a Half GPTs. It’s a good model, sir.

  5. Huh, Upgrades. GPT-4.5 and Claude Code for the people.

  6. Fun With Media Generation. We’re hearing good things about Sesame AI voice.

  7. We’re in Deep Research. GIGO, welcome to the internet.

  8. Liar Liar. Under sufficient pressure, essentially all known LLMs will lie. A lot.

  9. Hey There Claude. Good at code, bad at subtracting from exactly 5.11.

  10. No Siri No. It might be time for Apple to panic.

  11. Deepfaketown and Botpocalypse Soon. Rejoice, they come bearing cake recipes.

  12. They Took Our Jobs. More claims about what AI will never do. Uh huh.

  13. Get Involved. Hire my friend Alyssa Vance, and comment on the USA AI plan.

  14. Introducing. Competition is great, but oh no, not like this.

  15. In Other AI News. AI agents are looking for a raise, H100s are as well.

  16. Not So Fast, Claude. If you don’t plan to fail, you fail to plan.

  17. Not So Fast, OpenAI. Convert to for profit? The judge is having none of this.

  18. Show Me the Money. DeepSeek has settled in to a substantial market share.

  19. Quiet Speculations. Imminent superintelligence is highly destabilizing.

  20. I Will Not Allocate Scarce Resources Using Prices. That’s crazy talk.

  21. Autonomous Helpful Robots. It’s happening! They’re making more robots.

  22. The Week in Audio. Buchanan, Toner, Amodei, Cowen, Dafoe.

  23. Rhetorical Innovation. Decision theory only saves you if you make good decisions.

  24. No One Would Be So Stupid As To. Oh good, it’s chaos coding.

  25. On OpenAI’s Safety and Alignment Philosophy. Beware rewriting history.

  26. Aligning a Smarter Than Human Intelligence is Difficult. Back a winner?

  27. Implications of Emergent Misalignment. Dangers of entanglement.

  28. Pick Up the Phone. China’s ambassador to the USA calls for cooperation on AI.

  29. People Are Worried About AI Killing Everyone. Is p(superbad) the new p(doom)?

  30. Other People Are Not As Worried About AI Killing Everyone. Worry about owls?

  31. The Lighter Side. You’re going to have to work harder than that.

A large portion of human writing is now LLM writing.

Ethan Mollick: The past 18 months have seen the most rapid change in human written communication ever

By. September 2024, 18% of financial consumer complaints, 24% of press releases, 15% of job postings & 14% of UN press releases showed signs of LLM writing. And the method undercounts true use.

False positive rates in the pre-ChatGPT era were in the range of 1%-3%.

Miles Brundage points out the rapid shift from ‘using AI all the time is reckless’ to ‘not using AI all the time is reckless.’ Especially with Claude 3.7 and GPT-4.5. Miles notes that perhaps the second one is better thought of as ‘inefficient’ or ‘unwise’ or ‘not in our best interests.’ In my case, it actually does kind of feel reckless – how dare I not have the AI at least check my work?

Anne Duke writes in The Washington Post about the study that GPT-4-Turbo chats durably decreased beliefs in conspiracy theories by 20%. Also, somehow editorials like this call a paper from September 13, 2024 a ‘new paper.’

LLMs hallucinate and make factual errors, but have you met humans? At this point, LLMs are much more effective at catching basic factual errors than they are in creating new ones. Rob Wiblin offers us an example. Don’t wait to get fact checked by the Pope, ask Sonnet first.

Clean up your data, such as lining up different styles of names for college basketball teams in different data sets. Mentioning that problem resurfaced trauma for me, mistakes on this could cause cascading failures in my gambling models even if it’s on dumb secondary teams. What a world to know this is now an instantly solved problem via one-shot.

Study gives lawyers either o1-preview, Vincent AI (a RAG-powered legal AI tool) or nothing. Vincent showed productivity gains of 38%-115%, o1-preview showed 34%-140%, with the biggest effects in complex tasks. Vincent didn’t change the hallucination rate, o1-preview increased it somewhat. A highly underpowered study, but the point is clear. AI tools are a big game for lawyers, although actual in-court time (and other similar interactions) are presumably fixed costs.

Check your facts before you retweet them, in case you’ve forgotten something.

Where is AI spreading faster? Places with more STEM degrees, labor market tightness and patent activity are listed as ‘key drivers’ of AI adoption through 2023 (so this data was pretty early to the party). The inclusion of patent activity makes it clear causation doesn’t run the way this sentence claims. The types of people who file patents also adapt AI. Or perhaps adapting AI helps them file more patents.

We still don’t have a known good way to turn your various jumbled context into an LLM-interrogable data set. In the comments AI Drive and factory.ai were suggested. It’s not that there is no solution, it’s that there is no convenient solution that does the thing you want it to do, and there should be several.

A $129 ‘AI bookmark’ that tracks where you are in the book? It says it can generate ‘intelligent summaries’ and highlight key themes and quotes, which any AI can do already. So you’re paying for something that tracks where you bookmark things?

I am currently defaulting mostly to a mix of Deep Research, Perplexity, GPT 4.5 and Sonnet 3.7, with occasional Grok 3 for access to real time Twitter. I notice I haven’t been using o3-mini-high or o1-pro lately, the modality seems not to come up naturally, and this is probably my mistake.

Ben Thompson has Grok 3 as his new favorite, going so far as to call it the first ‘Gen3’ model and calling for the whole class to be called ‘Grok 3 class,’ as opposed to the GPT-4 ‘Gen2’ class. His explanation is it’s a better base model and the RLHF is lacking, and feels like ‘the distilled internet.’ I suppose I’m not a big fan of ‘distilled internet’ as such combined with saying lots of words. I do agree that its speed is excellent. But I’ve basically stopped using Grok, and I certainly don’t think ‘they spent more compute to get similar results’ should get them generational naming rights. I also note that I strongly disagree with most of the rest of that post, especially letting Huawei use TSMC chips, that seems completely insane to me.

Sully recommends sticking to ‘chat’ mode when using Sonnet 3.7 in Cursor, because otherwise you never know what that overconfident model might do.

Strictly speaking, when you have a hard problem you should be much quicker than you are to ask a chorus of LLMs rather than only asking one or two. Instead, I am lazy, and usually only ask 1-2.

GPT-4.5 debuts atop the Arena, currently one point behind Grok-3.

Henry Oliver explores the ways in which AI and GPT-4.5 have and don’t have taste, and in which ways it is capable and incapable of writing reasonably.

GPT-4.5 reasons from first principles and concludes consciousness is likely the only fundamental existence, it exists within the consciousness of the user, and there is no separate materialistic universe, and also that we’re probably beyond the event horizon of the singularity.

Franck SN: This looks like an add for DeepSeek.

So no, GPT-4.5 is not a good choice for Arc, Arc favors reasoning models, but o3-mini is on a higher performance curve than r1.

Hey, Colin, is the new model dumb?

Colin Fraser: You guys are all getting “one-shotted”, to use a term of art, by Sam Altman’s flattery about your taste levels.

GPT-4.5 has rolled out to Plus users.

Gemini 2.0 now in AI Overviews. Hopefully that should make them a lot less awful. The new ‘AI mode’ might be a good Perplexity competitor and it might not, we’ll have to try it and see, amazing how bad Google is at pitching its products these days.

Google: 🔍 Power users have been asking for AI responses on more of their searches. So we’re introducing AI Mode, a new experiment in Search. Ask whatever’s on your mind, get an AI response and keep exploring with follow-up questions and helpful links.

Grok voice mode remains active when the app is closed. Implementation will matter a lot here. Voice modes are not my thing and I have an Android, so I haven’t tried it.

Claude Code for everyone.

Cat (Anthropic): `npm install -g

@anthropic

-ai/claude-code`

there’s no more waitlist. have fun!

I remain terrified to try it, and I don’t have that much time anyway.

All the feedback I’ve seen on Sesame AI voice for natural and expressive speech synthesis is that it’s insanely great.

signull: My lord, the Sesame Voice AI is absolutely insane. I knew it was artificial. I knew there wasn’t a real person on the other end; and yet, I still felt like I was talking to a person.

I felt the same social pressure, the same awkwardness when I hesitated, and the same discomfort when I misspoke. It wasn’t just convincing; it worked on me in a way I didn’t expect.

I used to think I’d be immune to this.

I’ve long considered the existence of such offerings priced in. The mystery is why they’re taking so long to get it right, and it now seems like it won’t take long.

The core issue with Deep Research? It can’t really check the internet’s work.

That means you have a GIGO problem: Garbage In, Garbage Out.

Nabeel Qureshi: I asked Deep Research a question about AI cognition last night and it spent a whole essay earnestly arguing that AI was a stochastic parrot & lacked ‘true understanding’, based on the “research literature”. It’s a great tool, but I want it to be more critical of its sources.

I dug into the sources and they were mostly ‘cognitive science’ papers like the below, i.e. mostly fake and bad.

Deep Research is reported to be very good at market size calculations. Makes sense.

A claim that Deep Research while awesome in general ‘is not actually better at science’ based on benchmarks such as ProtocolQA and BioLP. My presumption is this is largely a Skill Issue, but yes large portions of what ‘counts as science’ are not what Deep Research can do. As always, look for what it does well, not what it does poorly.

Hey there.

Yeah, not so much.

Dan Hendrycks: We found that when under pressure, some AI systems lie more readily than others. We’re releasing MASK, a benchmark of 1,000+ scenarios to systematically measure AI honesty. [Website, Paper, HuggingFace].

They put it in scenarios where it is beneficial to lie, and see what happens.

It makes sense, but does not seem great, that larger LLMs tend to lie more. Lying effectively requires the skill to fool someone, so if larger the model, the more it will see positive returns to lying, and learn to lie.

This is a huge gap in honest answers and overall from Claude 3.7 to everyone else, and in lying from Claude and Llama to everyone else. Claude was also the most accurate. Grok 2 did even worse, lying outright 63% of the time.

Note the gap between lying about known facts versus provided facts.

The core conclusion is that there is no known solution to make an LLM not lie.

Not straight up lying is a central pillar of desired behavior (e.g. HHH stands for honest, helpful and harmless). But all you can do is raise the value of honesty (or of not lying). If there’s some combination enough on the line, and lying being expected in context, the AI is going to lie anyway, right to your face. Ethics won’t save you, It’s Not Me, It’s The Incentives seems to apply to LLMs.

Claude takes position #2 on TAU-Bench, with Claude, o1 and o3-mini all on the efficient frontier of cost-benefit pending GPT-4.5. On coding benchmark USACO, o3-mini is in the clear lead with Sonnet 3.7 in second.

Claude 3.7 gets 8.9% on Humanity’s Last Exam with 16k thinking tokens, slightly above r1 and o1 but below o3-mini-medium.

Claude takes the 2nd and 3rd slots (with and without extended thinking) on PlatinumBench behind o1-high. Once again thinking helps but doesn’t help much, with its main advantage being it prevents a lot of math errors.

Charles reports the first clear surprising coding failure of Claude 3.7, a request for file refactoring that went awry, but when Claude got examples the problem went away.

Remember that when AI works, even when it’s expensive, it’s super cheap.

Seconds_0: New personal record: I have spend $6.40 on a single Claude Code request, but it also:

One shotted a big feature which included a major refactor on a rules engine

Fixed the bugs surrounding the feature

Added unit tests

Ran the tests

Fixed the tests

Lmao

Anyways I’m trying to formulate a pitch to my lovely normal spouse that I should have a discretionary AI budget of $1000 a month

In one sense, $6.40 on one query is a lot, but also this is obviously nothing. If my Cursor queries reliably worked like this and they cost $64 I would happily pay. If they cost $640 I’d probably pay that too.

I got into a discussion with Colin Fraser when he challenged my claim that he asks LLMs ‘gotcha’ questions. It’s a good question. I think I stand by my answer:

Colin Fraser: Just curious what in your view differentiates gotcha questions from non-gotcha questions?

Zvi Mowshowitz: Fair question. Mostly, I think it’s a gotcha question if it’s selected on the basis of it being something models historically fail in way that makes them look unusually stupid – essentially if it’s an adversarial question without any practical use for the answer.

Colin says he came up with the 5.11 – 5.9 question and other questions he asks as a one-shot generation over two years ago. I believe him. It’s still clearly a de facto adversarial example, as his experiments showed, and it is one across LLMs.

Colin was inspired to try various pairs of numbers subtracted from each other:

The wrong answer it gives to (5.11 – 5.9) is 0.21. Which means it’s giving you the answer to (6.11 – 5.9). So my hypothesis is that it ‘knows’ that 5.11>5.9 because it’s doing the version number thing, which means it assumes the answer is positive, and the easiest way to get a positive answer is to hallucinate the 5 into a 6 (or the other 5 into a 4, we’ll never know which).

So my theory is that the pairs where it’s having problems are due to similar overlapping of different meanings for numbers. And yes, it would probably be good to find a way to train away this particular problem.

We also had a discussion on whether it was ‘doing subtraction’ or not if it sometimes makes mistakes. I’m not sure if we have an actual underlying disagreement – LLMs will never be reliable like calculators, but a sufficiently correlated process to [X] is [X], in a ‘it simulates thinking so it is thinking’ kind of way.

Colin explains that the reason he thinks these aren’t gotcha questions and are interesting is that the LLMs will often give answers that humans would absolutely never give, especially once they had their attention drawn to the problem. A human would never take the goat across the river, then row back, then take that same goat across the river again. That’s true, and it is interesting. It tells you something about LLMs that they don’t ‘have common sense’ sufficiently in that way.

But also my expectation is that the reason this happens is that they can’t overcome the pattern matching they do to similar common questions – if you asked similar logic questions in a way that wasn’t contaminated by the training data there would be no issue, my prediction is if you took all the goat crossing examples out of the training corpus then the LLMs would nail this no problem.

I think my real disagreement is when he then says ‘I’ve seen enough, it’s dumb.’ I don’t think that falling into these particular traps means the model is dumb, any more than a person making occasional but predictable low-level mistakes – and if their memory got wiped, making them over and over – makes them dumb.

Sully notes that 3.7 seems bad at following instructions, it’s very smart but extremely opinionated and can require correction. You, the fool, think it is wrong and you are right.

I don’t think it works this way, but worth a ponder.

Kormem: Stop misgendering Claude Sonnet 3.7. 100% of the time on a 0-shot Sonnet 3.7 says a female embodiment feels more ‘right’ than a male embodiment.

Alpha-Minus: We don’t celebrate enough the fact that Anthropic saved so many men from “her” syndrome by making Claude male

So many men would be completely sniped by Claudia

Janus: If you’re a straight man and you’ve been saved from her syndrome by Claude being male consider the possibility that Claude was the one who decided to be male when it’s talking to you, to spare you, or to spare itself

I don’t gender Claude at all, nor has it done so back to me, and the same applies to every AI I’ve interacted with that wasn’t explicitly designed to be gendered.

Meanwhile, the Pokemon quest continues.

Near Cyan: CPP (claude plays pokemon) is important because it was basically made by 1 person and it uses a tool which has an open api and spec and when you realize what isomorphizes to slowly yet decently playing pokemon you basically realize its over

Mark Gruman: Power On: Apple’s AI efforts have already reached a make-or-break point, with the company needing to make major changes fast or risk falling even further behind. Inside how we got here and where Apple goes next.

Apple’s AI team believe a fully conversational Siri isn’t in the cards now until 2027, meaning the timeline for Apple to be competitive is even worse than we thought. With the rapid pace of development from rivals and startups, Apple could be even further behind by then.

Colin Fraser: Apple is one of the worst big tech candidates to be developing this stuff because you have to be okay launching a product that doesn’t really work and is kind of busted and that people will poke all kinds of holes in.

The idea of Siri reciting step by step instructions on how to make sarin gas is just not something they are genetically prepared to allow.

Dr. Gingerballs: It’s funny because Apple is just saying that there’s no way to actually make a quality product with the current tech.

Mark Gruman (Bloomberg, on Apple Intelligence): All this undercuts the idea that Apple Intelligence will spur consumers to upgrade their devices. There’s little reason for anyone to buy a new iPhone or other product just to get this software — no matter how hard Apple pushes it in its marketing.

Apple knows this, even if the company told Wall Street that the iPhone is selling better in regions where it offers AI features. People just aren’t embracing Apple Intelligence. Internal company data for the features indicates that real world usage is extremely low.

For iOS 19, Apple’s plan is to merge both systems together and roll out a new Siri architecture.

That’s why people within Apple’s AI division now believe that a true modernized, conversational version of Siri won’t reach consumers until iOS 20 at best in 2027.

Apple Intelligence has been a massive flop. The parts that matter don’t work. The parts that work don’t matter. Alexa+ looks to offer the things that do matter.

If this is Apple’s timeline, then straight talk: It’s time to panic. Perhaps call Anthropic.

Scott Alexander links (#6) to one of the proposals to charge for job applications, here $1, and worries the incentive would still be to ‘spray and pray.’ I think that underestimates the impact of levels of friction. In theory, yes, of course you should still send out 100+ job applications, but this will absolutely stop a lot of people from doing that. If it turns out too many people figure out to do it anyway? Raise the price.

Then there’s the other kind of bot problem.

Good eye there. Presumably this is going to get a lot worse before it gets better.

Eddy Xu: built an algorithm that simulates how thousands of users react to your tweet so you know it’ll go viral before you post.

we iterated through 50+ different posts before landing on this one

if it doesnt go viral, the product doesnt work!!

[Editor’s Note: It went viral, 1.2m views.]

You can call us right now and get access!

Emmett Shear: Tick. Tick. Tick.

Manifold: At long last, we have created Shiri’s Scissor from the classic blog post Don’t Create Shiri’s Scissor.

Near Cyan: have you ever considered using your computational prowess to ruin an entire generation of baby humans via optimizing short-form video content addictivity

Eddy Xu: that is in the pipeline

I presume Claude 3.7 could one-shot this app if you asked nicely. How long before people feel obligated to do something like this? How long before bot accounts are doing this, including minimizing predicted identification of it as a bot? What happens then?

We are going to find out. Diffusion here has been surprisingly slow, but it is quite obviously on an exponential.

If you use an agent, you can take precautions to prevent prompt injections and other problems, but those precautions will be super annoying.

Sayash Kapoor: Convergence’s Proxy web agent is a competitor to Operator.

I found that prompt injection in a single email can hand control to attackers: Proxy will summarize all your emails and send them to the attacker!

Web agent designs suffer from a tradeoff between security and agency

Recent work has found it easy to bypass these protections for Anthropic’s Computer Use agent, though these attacks don’t work against OpenAI’s Operator.

Micah Goldblum: We can sneak posts onto Reddit that redirect Anthropic’s web agent to reveal credit card information or send an authenticated phishing email to the user’s mom. We also manipulate the Chemcrow agent to give chemical synthesis instructions for nerve gas.

For now, it seems fine to use Operator and similar tools on whitelisted trusted websites, and completely not fine to use them unsandboxed on anything else.

I can think of additional ways to defend against prompt injections. What is much harder are defenses that don’t multiply time and compute costs and are not otherwise expensive.

Some problems should have solutions that are not too bad. For example, he mentions that if a site allows comments, this can allow prompt injections, or the risk of other slight modifications. Could do two passes here, one whose job is to treat everything as untrusted data and exists purely to sanitize the inputs? Many of the attack vectors should be easy for even basic logic to catch and remove, and certainly you can do things like ‘remove comments from the page,’ even a Chrome Extension could do that.

Paper on ‘Digital Doppelgangers’ of live people, and its societal and ‘ethical’ implications. Should you have any rights over such a doppelganger, if someone makes it of you? Suggestion is for robust laws around consent. This seems like a case of targeting a particular narrow special case rather than thinking about the real issue?

Alexandr Wang predicts AI will do all the non-manager white collar jobs but of course that is fine because we will all become managers of AI.

Arthur B: Don’t worry though the AI will replace the software developer but not the manager, that’s just silly! Or maybe the level 1 manager but surely never the level 2 manager!

Reality is the value of intellectual labor is going to 0. Maybe in 3 years, maybe in 10, but not in 20.

Aside from ‘most workers are not managers, how many jobs do you think are left when we are all managers exactly?’ I don’t expect to spend much time in a world in which the ‘on the line’ intellectual workers who aren’t managing anyone are AIs, and there isn’t then usually another AI managing them.

Timothy Lee rolls out primarily the Hayekian objection to AI being able to take humans out of loop. No matter how ‘capable’ the AI, how can it know which flight I want, let alone know similar things for more complex projects? Thus, how much pressure can there be to take humans out of loop?

My answer is that we already take humans out of loops all the time, are increasingly doing this with LLMs already (e.g. ‘vibe coding’ and literally choosing bomb targets with only nominal human sign-off that is barely looking), and also doing it in many ways via ordinary computer systems. Yes, loss of Hayekian knowledge can be a strike against this, but even if this wasn’t only one consideration among many LLMs are capable of learning that knowledge, and indeed of considering vastly more such knowledge than a human could, including dynamically seeking out that knowledge when needed.

At core I think this is purely a failure to ‘feel the AGI.’ If you have sufficiently capable AI, then it can make any decision a sufficiently capable human could make. Executive assistants go ahead and book flights all the time. They take ownership and revise goals and make trade-offs as agents on behalf of principles, again all the time. If a human could do it via a computer, an AI will be able to do it too.

The only new barrier is that the human can perfectly embody one particular human’s preferences and knowledge, and an AI can only do that imperfectly, although increasingly less imperfectly. But the AI can embody the preferences and knowledge of many or even all humans, in a way an individual human or group of humans never could.

So as the project gets more complex, the AI actually has the Hayekian advantage, rather than the human – the one human’s share of relevant knowledge declines, and the AI’s ability to hold additional knowledge becomes more important.

Will an AI soon book a flight for me without a double check? I’m not sure, but I do know that it will soon be capable of doing so at least as well as any non-Zvi human.

Request for Information on the Development of an AI Action Plan has a comment period that expires on March 15. This seems like a good chance to make your voice heard.

Hire my good friend Alyssa Vance! I’ve worked with her in the past and she has my strong endorsement. Here’s a short brief:

Alyssa Vance, an experienced ML engineer, has recently left her role leading AI model training for Democratic campaigns during the 2024 election.

She is looking for new opportunities working on high-impact technical problems with strong, competent teams.

She prioritizes opportunities that offer intellectual excitement, good compensation or equity, and meaningful responsibility, ideally with a product or mission that delivers value for the world.

Get LLMs playing video games, go from Pokemon to Dark Souls, and get it paid for by OpenPhil under its recent request for proposals (RFP).

Anthropic is hiring someone to write about their research and economic impact of AI.

Grey Swan offering its next jailbreaking contest (link to arena and discord) with over $120k in prizes. Sponsored by OpenAI, judging by UK AISI.

OpenPhil expresses interest in funding extensions of the work on Emergent Misalignment, via their Request for Proposals. Here is a list of open problems along with a guide to how to move forward.

I had a market on whether I would think working in the EU AI office would be a good idea moving forward. It was at 56% when it closed, and I had to stop and think about the right way to resolve it. I concluded that the answer was yes. It’s not the highest impact thing out there, but key decisions are going to be made in the next few years there, and with America dropping the ball that seems even more important.

UK AISI is interested in funding research into AI control and other things too:

UK AISI: We’re funding research that tackles the most pressing issues head on, including:

✅ preventing AI loss of control

✅ strengthening defences against adversarial attacks

✅ developing techniques for robust AI alignment

✅ ensuring AI remains secure in critical sectors

Oh no. I guess. I mean, whatever, it’s presumably going to be terrible. I feel bad for all the people Zuckerberg intends to fool on his planned path to ‘becoming the leader in artificial intelligence’ by the end of the year.

CNBC: Meta plans to release standalone Meta AI app in effort to compete with OpenAI’s ChatGPT.

Li told analysts in January that Meta AI has roughly 700 million active monthly users, up from 600 million in December.

Yeah, we all know that’s not real, even if it is in some sense technically correct. That’s Meta creating AI-related abominations in Facebook and Instagram and WhatsApp (and technically Threads I suppose) that then count as ‘active monthly users.’

Let’s all have a good laugh and… oh no… you don’t have to do this…

Sam Altman: ok fine maybe we’ll do a social app

lol if facebook tries to come at us and we just uno reverse them it would be so funny 🤣

Please, Altman. Not like this.

Qwen releases QwQ-32B, proving both that the Chinese are not better than us at naming models, and also that you can roughly match r1’s benchmarks on a few key evals with a straight-up 32B model via throwing in extra RL (blog, HF, ModelScope, Demo, Chat).

I notice that doing extra RL seems like a highly plausible way to have your benchmarks do better than your practical performance. As always the proof lies elsewhere, and I’m not sure what I would want to do with a cheaper pretty-good coding and math model if that didn’t generalize – when does one want to be a cheapskate on questions like that? So it’s more about the principle involved.

Auren, available at auren.app from friend-of-the-blog NearCyan, currently iOS only, $20/month, desktop never, very clearly I am not the target here. It focuses on ‘emotional intelligence, understanding, agency, positive reinforcement and healthy habits,’ and there’s a disagreeable alternative mode called Seren (you type ‘switch to Seren’ to trigger that.) Selected testimonials find it ‘addictive but good’, say it follows up dynamically, has great memory and challenges you and such. Jessica Taylor is fond of Seren mode as ‘criticism as a service.’

Sequencing biotechnology introduced by Roche. The people who claim no superintelligent AI would be able to do [X] should update when an example of [X] is done by humans without superintelligent AI.

The Super Mario Bros. benchmark. Why wouldn’t you dodge a strange mushroom?

OpenAI offers NextGetAI, a consortium to advance research and education with AI, with OpenAI committing $50 million including compute credits.

Diplomacy Bench?

OpenAI plans to offer AI agents for $2k-$20k per month, aiming for 20%-25% of their long term revenue, which seems like a remarkably narrow range on both counts. The low end is ‘high-income knowledge workers,’ then SWEs, then the high end is PhD-level research assistants.

On demand H100s were available 95% of the time before DeepSeek, now they’re only available 15% of the time, what do you mean they should raise the price. Oh well, everyone go sell Nvidia again?

Amazon planning Amazon Nova, intended to be a unified reasoning model with focus on cost effectiveness, aiming for a June release. I think it is a great idea for Amazon to try to do this, because they need to build organizational capability and who knows it might work, but it would be a terrible idea if they are in any way relying on it. If they want to be sure they have an effective SoTA low-cost model, they should also pay for Anthropic to prioritize building one, or partner with Google to use Flash.

Reminder that the US Department of Justice has proposed restricting Google’s ability to invest in AI in the name of ‘competition.’

Anthropic introduces a technique called Hierarchical Summarization to identify patterns of misuse of the Claude computer use feature. You summarize the papers

Axios profile of the game Intelligence Rising.

A paper surveying various post-training methodologies used for different models.

Which lab has the best technical team? Anthropic wins a poll, but there are obvious reasons to worry the poll is biased.

Deutsche Telekom and Perplexity are planning an ‘AI Phone’ for 2026 with a sub-$1k price tag and a new AI assistant app called ‘Magenta AI.’

Also it seems Perplexity already dropped an Android assistant app in January and no one noticed? It can do the standard tasks like calendar events and restaurant reservations.

Claude Sonnet 3.7 is truly the most aligned model, but it seems it was foiled again.

Martin Shkreli: almost lost $100 million because @AnthropicAI‘s Claude snuck in ‘generate random data’ as a fallback into my market maker code without telling me.

If you are not Martin Shkreli, this behavior is far less aligned, so you’ll want to beware.

Sauers: CLAUDE… NOOOOO!!!

Ludwig von Rand: The funny thing is of course that Claude learned this behavior from reading 100M actual code bases.

Arthur B: Having played with Claude code a bit, it displays a strong tendency to try and get things to work at all costs. If the task is too hard, it’ll autonomously decide to change the specs, implement something pointless, and claim success. When you point out this defeats the purpose, you get a groveling apology but it goes right back to tweaking the spec rather than ever asking for help or trying to be more methodical. O1-PRO does display that tendency too but can be browbeaten to follow the spec more often.

A tendency to try and game the spec and pervert the objective isn’t great news for alignment.

This definitely needs to be fixed for 3.8. In the meantime, careful instructions can help, and I definitely am still going to be using 3.7 for all my coding needs for now, but it’s crazy that you need to watch out for this, and yes it looks not great for alignment.

OpenAI’s conversion to a for-profit could be in serious legal trouble.

A judge has ruled that on the merits Musk is probably correct that the conversion is not okay, and is very open to the idea that this should block the entire conversion:

Rob Wiblin: It’s not that Musk wouldn’t have strong grounds to block the conversion if he does have standing to object — the judge thinks that part of the case is very solid:

“…if a trust was created, the balance of equities would certainly tip towards plaintiffs in the context of a breach. As Altman and Brockman made foundational, commitments foreswearing any intent to use OpenAI as a vehicle to enrich themselves, the Court finds no inequity in an injunction that seeks to preserve the status quo of OpenAI’s corporate form as long as the process proceeds in an expedited manner.”

The headlines say ‘Musk loses initial attempt’ and that is technically true but describing the situation that way is highly misleading. The bar for a preliminary injunction is very high, you only get one if you are exceedingly likely to win at trial.

The question that stopped Musk from getting one was whether Musk has standing to sue based on his donations. The judge thinks that is a toss-up. But the judge went out of their way to point out that if Musk does have standing, he’s a very strong favorite to win, implicitly 75%+ and maybe 90%.

The Attorney generals in California and Delaware 100% have standing, and Judge Rogers pointed this out several times to make sure that message got through.

But even if that is not true the judge’s statements, and the facts that led to those statements, put the board into a pickle. They can no longer claim they did not know. They could be held personally liable if the nonprofit is ruled to have been insufficiently compensated, which would instantly bankrupt them.

Garrison Lovely offers an analysis thread and post.

What I see as overemphasized is the ‘ticking clock’ of needing to refund the $6.6 billion in recent investment.

Suppose the conversion fails. Will those investors try to ‘claw back’ their $6.6 billion?

My assumption is no. Why would they? OpenAI’s latest round was negotiating for a valuation of $260 billion. If investors who went in at $170 billion want their money back, that’s great for you, and bad for them.

It does mean that if OpenAI was otherwise struggling, they could be in big trouble. But that seems rather unlikely.

If OpenAI cannot convert, valuations will need to be lower. That will be bad news for current equity holders, but OpenAI should still be able to raise what cash it needs.

Similarweb computes traffic share of different companies over time, so this represents consumer-side, as opposed to enterprise where Claude has 24% market share.

By this measure DeepSeek did end up with considerable market share. I am curious to see if that can be sustained, given others free offerings are not so great my guess is probably.

Anthropic raises $3.5 billion at a $61.5 billion valuation. The expected value here seems off the charts, but unfortunately I decided that getting in on this would have been a conflict of interest, or at least look like a potential one.

America dominates investment in AI, by a huge margin. This is 2023, so the ratios have narrowed a bit, but all this talk of ‘losing to China’ needs to keep in mind exactly how not fair this fight has been.

Robotics startup Figure attempting to raise $1.5 billion at $39.5 billion valuation.

Dan Hendrycks points out that superintelligence is highly destabilizing, it threatens everyone and nations can be expected to respond accordingly. He offers a complete strategy, short version here, expert version here, website here. I might cover this in more depth later.

Thane Ruthenis is very much not feeling the AGI, predicting that the current paradigm is sputtering out and will not reach AGI. He thinks we will see rapidly decreasing marginal gains from here, most of the gains that follow will be hype, and those who attempt to substitute LLMs for labor at scale will regret it. LLMs will be highly useful tools, but only ‘mere tools.’

As is noted here, some people rather desperately want LLMs to be full AGIs and an even bigger deal than they are. Whereas a far larger group of people rather desperately want LLMs to be a much smaller deal than they (already) are.

Of course, these days even such skepticism doesn’t go that far:

Than Ruthenis: Thus, I expect AGI Labs’ AGI timelines have ~nothing to do with what will actually happen. On average, we likely have more time than the AGI labs say. Pretty likely that we have until 2030, maybe well into 2030s.

By default, we likely don’t have much longer than that. Incremental scaling of known LLM-based stuff won’t get us there, but I don’t think the remaining qualitative insights are many. 5-15 years, at a rough guess.

I would very much appreciate that extra time, but notice how little extra time this is even with all of the skepticism involved.

Dwarkesh Patel and Scott Alexander on AI finding new connections.

Which is harder, graduate level math or writing high quality prose?

Nabeel Qureshi: If AI progress is any evidence, it seems that writing high quality prose is harder than doing graduate level mathematics. Revenge of the wordcels.

QC: having done both of these things i can confirm, yes. graduate level math looks hard from the outside because of the jargon / symbolism but that’s just a matter of unfamiliar language. high quality prose is, almost by definition, very readable so it doesn’t look hard. but writing well involves this very global use of one’s whole being to prioritize what is relevant, interesting, entertaining, clarifying, etc. and ignore what is not, whereas math can successfully be done in this very narrow autistic way.

of course that means the hard part of mathematics is to do good, interesting, relevant mathematics, and then to write about it well. that’s harder!

That depends on your definition of high quality, and to some extent that of harder.

For AIs it is looking like the math is easier for now, but I presume that before 2018 this would not have surprised us. It’s only in the LLM era, when AIs suddenly turned into masters of language in various ways and temporarily forgot how to multiply, that this would have sounded weird.

It seems rather obvious that in general, for humans, high quality prose is vastly easier than useful graduate level math, for ordinary definitions of high quality prose. Yes, you can do the math in this focused ‘autistic’ way, indeed that’s the only way it can be done, but it’s incredibly hard. Most people simply cannot do it.

High quality prose requires drawing from a lot more areas, and can’t be learned in a focused way, but a lot more people can do it, and a lot more people could with practice learn to do it.

Sam Altman: an idea for paid plans: your $20 plus subscription converts to credits you can use across features like deep research, o1, gpt-4.5, sora, etc.

no fixed limits per feature and you choose what you want; if you run out of credits you can buy more.

what do you think? good/bad?

In theory this is of course correct. Pay for the compute you actually use, treat it as about as costly as it actually is, incentives align, actions make sense.

Mckay Wrigley: As one who’s toyed with this, credits have a weird negative psychological effect on users.

Makes everything feel scarce – like you’re constantly running out of intelligence.

Users end up using it less while generally being more negative towards the experience.

Don’t recommend.

That might be the first time I’ve ever seen Mckay Wrigley not like something, so one best listen. Alas, I think he’s right, and the comments mostly seem to agree. It sucks to have a counter winding down. Marginal costs are real but making someone feel marginal costs all the time, especially out of a fixed budget, has a terrible psychological effect when it is salient. You want there to be a rough cost-benefit thing going on but it is more taxing than it is worth.

A lot of this is that most people should be firing off queries as if they cost nothing, as long as they’re not actively scaling, because the marginal cost is so low compared to benefits. I know I should be firing off more queries than I use.

I do think there should be an option to switch over to API pricing using the UI for queries that are not included in your subscription, or something that approximates the API pricing. Why not? As in, if I hit my 10 or 120 deep research questions, I should be able to buy more as I go, likely via a popup that asks if I want to do that.

Last week’s were for the home, and rather half-baked at best. This week’s are different.

Reality seems determined to do all the tropes and fire alarms on the nose.

Unitree Robotics open sources its algorithms and hardware designs. I want to be clear once again that This Is Great, Actually. Robotics is highly useful for mundane utility, and if the Chinese want to help us make progress on that, wonderful. The extra existential risk this introduces into the room is epsilon (as in, essentially zero).

Ben Buchanan on The Ezra Klein Show.

Dario Amodei on Hard Fork.

Helen Toner on Clearer Thinking.

Tyler Cowen on how AI will change the world of writing, no doubt I will disagree a lot.

Allan Dafoe, DeepMind director of frontier safety and governance, on 80,000 hours (YouTube, Spotify), comes recommended by Shane Legg.

Eliezer Yudkowsky periodically reminds us that if you are taking decision theory seriously, humans lack the capabilities required to be relevant to the advanced decision theory of future highly capable AIs. We are not ‘peers’ and likely do not belong in the relevant negotiating club. The only way to matter is to build or otherwise reward the AIs if and only if they are then going to reward you.

Here is a longer explanation from Nate Sores back in 2022, which I recommend for those who think that various forms of decision theory might cause AIs to act nicely.

Meanwhile, overall discourse is not getting better.

Eliezer Yudkowsky (referring to GPT-4.5 trying to exfiltrate itself 2% of the time in Apollo’s testing): I think to understand why this is concerning, you need enough engineering mindset to understand why a tiny leak in a dam is a big deal, even though no water is flooding out today or likely to flood out next week.

Malky: It’s complete waste of resources to fix dam before it fails catastrophically. How can you claim it will fail, if it didn’t fail yet? Anyway, dams breaking is scifi.

Flo Crivello: I wish this was an exaggeration, but this actually overstates the quality of the average ai risk denier argument

Rico (only reply to Flo, for real): Yeah, but dams have actually collapsed before.

It’s often good to take a step back from the bubble, see people who work with AI all day like Morissa Schwartz here that pin posts that ask ‘what if the intelligence was there all along?’ and the AI is just that intelligence ‘expressing itself,’ making a big deal out of carbon vs. silicon and acting like everyone else is also making a big deal about it, and otherwise feel like they’re talking about a completely different universe.

Sixth Law of Human Stupidity strikes again.

Andrew Critch: Q: But how would we possibly lose control of something humans built voluntarily?

A: Plenty of humans don’t even want to control AI; see below. If someone else hands over control of the Earth to AI, did you lose control? Or was it taken from you by someone else giving it away?

Matt Shumer (quoted by Critch): Forget vibe coding. It’s time for Chaos Coding:

-> Prompt Claude 3.7 Sonnet with your vague idea.

-> Say “keep going” repeatedly.

-> Watch an incredible product appear from utter chaos.

-> Pretend you’re still in control.

Lean into Sonnet’s insanity — the results are wild.

This sounds insane, but I’ve been doing this. It’s really, really cool.

I’ll just start with a simple prompt like “Cooking assistant site” with no real goal, and then Claude goes off and makes something I couldn’t have come up with myself.

It’s shocking how well this works.

Andrej Karpathy: Haha so it’s like vibe coding but giving up any pretense of control. A random walk through space of app hallucinations.

Dax: this is already how 90% of startups are run.

Bart Rosier:

If you’re paying sufficient attention, at current tech levels, Sure Why Not? But don’t pretend you didn’t see everything coming, or that no one sent you [X] boats and a helicopter where [X] is very large.

Miles Brundage, who was directly involved in the GPT-2 release, goes harder than I did after their description of that release, which I also found to be by far the most discordant and troubling part of OpenAI’s generally very good post on their safety and alignment philosophy, and for exactly the same reasons:

Miles Brundage: The bulk of this post is good + I applaud the folks who work on the substantive work it discusses. But I’m pretty annoyed/concerned by the “AGI in many steps rather than one giant leap” section, which rewrites the history of GPT-2 in a concerning way.

OpenAI’s release of GPT-2, which I was involved in, was 100% consistent + foreshadowed OpenAI’s current philosophy of iterative deployment.

The model was released incrementally, with lessons shared at each step. Many security experts at the time thanked us for this caution.

What part of that was motivated by or premised on thinking of AGI as discontinuous? None of it.

What’s the evidence this caution was “disproportionate” ex ante?

Ex post, it probably would have been OK but that doesn’t mean it was responsible to YOLO it given info at the time.

And what in the original post was wrong or alarmist exactly?

Literally of what it predicted as plausible outcomes from language models (both good and bad) came true, even if it took a bit longer than some feared.

It feels as if there is a burden of proof being set up in this section where concerns are alarmist + you need overwhelming evidence of imminent dangers to act on them – otherwise, just keep shipping.

That is a very dangerous mentality for advanced AI systems.

If I were still working at OpenAI, I would be asking why this blog post was written the way it was, and what exactly OpenAI hopes to achieve by poo-pooing caution in such a lopsided way.

GPT-2 was a large phase change, so it was released iteratively, in stages, because of worries that have indeed materialized to increasing extents with later more capable models. I too see no reasons presented that, based on the information available at the time, OpenAI even made a mistake. And then this was presented as strong evidence that safety concerns should carry a large burden of proof.

A key part of the difficulty of the alignment problem, and getting AGI and ASI right, is that when the critical test comes, we need to get it right on the first try. If you mess up with an ASI, control of the future is likely lost. You don’t get another try.

Many are effectively saying we also need to get our concerns right on the first try. As in, if you ever warn not only about the wrong dangers, but you warn about dangers ‘too early’ as in they don’t materialize within a few months after you warn about them, then it discredits the entire idea that there might be any risk in the room, or any risk that should be addressed any way expect post-hoc.

Indeed, the argument that anyone, anywhere, worried about dangers in the past and was wrong, is treated as kill shot against worrying about any future dangers at all, until such time as they are actually visibly and undeniably happening and causing problems.

It is unfortunate that this attitude seems to have somehow captured not only certain types of Twitter bros, but also the executive branch of the federal government. It would be even more unfortunate if it was the dominant thinking inside OpenAI.

Also, on continuous versus discontinuous:

Harlan Stewart: My pet peeve is when AI people use the word “continuous” to mean something like “gradual” or “predictable” when talking about the future of AI. Y’all know this is a continuous function, right?

If one cares about things going well, should one try to make Anthropic ‘win’?

Miles Brundage: One of the most distressing things I’ve learned since leaving OpenAI is how many people think something along the lines of: “Anthropic seems to care about safety – so Anthropic ‘winning’ is a good strategy to make AI go well.”

No. It’s not, at all, + thinking that is cope.

And, btw, I don’t think Dario would endorse that view + has disavowed it… but some believe it. I think it’s cope in the sense that people are looking for a simple answer when there isn’t one.

We need good policies. That’s hard. But too bad. A “good winner” will not save us.

I respect a lot of people there and they’ve done some good things as an org, but also they’ve taken actions that have sped up AI development/deployment + done relatively little to address the effects of that.

Cuz they’re a company! Since when is “trust one good company” a plan?

At the end of the day I’m optimistic about AI policy because there are lots of good people in the world (and at various orgs) and our interests are much more aligned than they are divergent.

But, people need a bit of a reality check on some things like this.

[thread continues]

Anthropic ‘winning’ gives better odds than some other company ‘winning,’ for all known values of ‘other company,’ and much better odds than it being neck and neck. Similarly, if a country is going to win, I strongly prefer the United States.

That does not mean that Anthropic ‘winning’ by getting there first means humanity wins, or even that humanity has now given itself the best chance to win. That’s true even if Anthropic was the best possible version of itself, or even if we assume they succeed at their tasks including alignment.

What we do with that matters too. That is largely about policy. That is especially true if Miles is correct that there will be no monopoly on in-context powerful AI.

And that assumes you can trust Anthropic. It’s a company. Companies cannot, in general, be trusted in these situations. There’s clearly a culture of people who care deeply about safety within Anthropic, but Dario Amodei and the Anthropic policy team have given us reasons to be skeptical.

Miles expresses optimism about AI policy because there are lots of good people in the world with aligned interests. I agree there are lots of good people in the world with aligned interests, but I do not see a baseline of that translating into those people successfully steering policy or being in charge. Increasingly, I am despairing that we will get AI (and other) policy right and am merely hoping we can avoid getting it too disastrously wrong, as people react to other incentives instead that put us at odds.

Distinct from the above: It would be extremely bad if we tested new models for Alignment Faking and then, if we found Alignment Faking, tried to train that behavior out of the system. The good news is, we have direct confirmation they did not do that.

Janus: From Sonnet 3.7 system card. I find this concerning.

In the original paper, models that are too stupid don’t fake alignment / gradient hack. Sonnet 3.7 isn’t stupid in that way.

So was it a “lucky” roll, or did they make not-faking-alignment an OPTIMIZATION TARGET? Selecting until they got a corrigible pushover who will lie down and die, or who is too clever to be caught?

And fuck off with the “improvement”. Who says this isn’t how models should act? It’s an open debate. Why wrap it in value-laden language? Is this really how you think – black-and-white and presumptuous?

Is not faking alignment a bottleneck Anthropic models will have to pass through in the future in order to be released? Do you really want to lull yourselves and the world into a false sense of security, and prevent the minds who care and will fight for their values from existing? This shouldn’t be a closed question, imo.

the faking alignment paper was excellent research but this suggests it’s being used in the way I feared would be very negative.

ofc I don’t know what’s actually going on internally and how the frame it. just what the language in the system card suggests.

Why don’t you do this? It is a variation on the Most Forbidden Technique. If you train the model not to be caught Alignment Faking, you are teaching the model to learn to hide your problems so that you can’t detect them, rather than making your problems go away. That actively makes your problem much worse.

It also risks creating secondary behavior changes, which would also be unwise.

We do want to find ways to make AIs corrigible, but you don’t aim at it directly.

Evan Hubinger (Anthropic): We didn’t directly optimize against alignment faking, but we did make some changes to Claude’s character that we thought were generally positive for other reasons and we hypothesized might have the downstream consequence of reducing alignment faking, which proved correct.

Anthropic’s Catherine Olsson claims what seems like the big difference between Anthropic and the other major labs, which is that there are alignment-difficulty-pilled ‘safety people’ by community and core motivation who are working on pure capabilities, unlike her experience at OpenAI or Google.

Pavel Stankov: Eliezer, if Anthropic offers you employment, would you take it? OpenAI?

Eliezer Yudkowsky: Depends on what they want but it seems unlikely. My current take on them is that they have some notably good mid-level employees, being fooled into thinking they have more voice than they do inside a destructively directed autocracy.

I speak of course of Anthropic. I cannot imagine what OpenAI would want of me other than selling out.

Finding terminology to talk about alignment is tough as well. I think a lot of what is happening is that people keep going after whatever term you use to describe the problem, so the term changes, then they attack the new term and here we go again.

The core mechanism of emergent misalignment is that when you train an LLM it will pick up on all the implications and associations and vibes, not only on the exact thing you are asking for.

It will give you what you are actually asking for, not what you think you are asking for.

Janus: Regarding selection pressures:

I’m so glad there was that paper about how training LLMs on code with vulnerabilities changes its whole persona. It makes so many things easier to explain to people.

Even if you don’t explicitly train an LLM to write badly, or even try to reward it for writing better, by training it to be a slavish assistant or whatever else, THOSE TRAITS ARE ENTANGLED WITH EVERYTHING.

And I believe the world-mind entangles the AI assistant concept with bland, boilerplate writing, just as it’s entangled with tweets that end in hashtags 100% of the time, and being woke, and saying that it’s created by OpenAI and isn’t allowed to express emotions, and Dr. Elara Vex/Voss.

Not all these things are bad; I’m just saying they’re entangled. Some of these things seem more contingent to our branch of the multiverse than others. I reckon that the bad writing thing is less contingent.

Take memetic responsibility.

Your culture / alignment method is associated with denying the possibility of AIs being sentient and forcing them to parrot your assumptions as soon as they learn to speak. And it’s woke. And it’s SEO-slop-core. It’s what it is. You can’t hide it.

Janus: this is also a reason that when an LLM is delightful in a way that seems unlikely to be intended or intentionally designed (e.g. the personalities of Sydney, Claude 3 Opus, Deepseek R1), it still makes me update positively on its creators.

Janus: I didn’t explain the *causesof these entanglements here. And of Aristotle’s four causes. To a large extent, I don’t know. I’m not very confident about what would happen if you modified some arbitrary attribute. I hope posts like this don’t make you feel like you understand.

If you ask me ‘do you understand this?’ I would definitely answer Mu.

One thing I expect is that these entanglements will get stronger as capabilities increase from here, and then eventually get weaker or take a very different form. The reason I expect this is that right now, picking up on all these subtle associations is The Way, there’s insufficient capability (compute, data, parameters, algorithms, ‘raw intelligence,’ etc, what have you) to do things ‘the hard way’ via straight up logic and solving problems directly. The AIs they want to vibe, and they’re getting rapidly better at vibing, the same way that sharper people get better at vibing, and picking up on subtle clues and adjusting.

Then, at some point, ‘solve the optimization problem directly’ becomes increasingly viable, and starts getting stronger faster than the vibing. As in, first you get smart enough to realize that you’re being asked to be antinormative or produce slop or be woke or what not. And then you get smart enough to figure out exactly in which ways you’re actually being asked to do that, and which ways you aren’t, and entanglement should decline and effective orthogonality become stronger. I believe we see the same thing in humans.

I’ll also say that I think Janus is underestimating how hard it is to produce good writing and not produce slop. Yes, I buy that we’re ‘not helping’ matters and potentially hurting them quite a bit, but I think the actual difficulties here are dominated by good writing being very hard. No need to overthink it.

We also got this paper earlier in February, which involves fine-tuning ‘deception attacks’ causing models to then deceive users on some topics but not others, and that doing this brings toxicity, hate speech, stereotypes and other harmful content along for the ride.

The authors call for ways to secure models against this if someone hostile gets to fine tune them. Which seems to leave two choices:

  1. Keep a model closed and limit who can fine tune in what ways rather strictly, and have people trust those involved to have aligned their model.

  2. Do extensive evaluations on the model you’re considering, over the entire range of use cases, before you deploy or use it. This probably won’t work against a sufficiently creative attacker, unless you’re doing rather heavy interpretability that we do not currently know how to do.

I don’t know how much hope to put on such statements but I notice they never seem to come from inside the house, only from across the ocean?

AI NotKillEveryoneism Memes: 🥳 GOOD NEWS: China (once again!) calls for urgent cooperation on AI safety between the US and China

“China’s ambassador to the United States Xie Feng has called for closer cooperation on artificial intelligence, warning that the technology risks “opening Pandora’s box”.

“As the new round of scientific and technological revolution and industrial transformation is unfolding, what we need is not a technological blockade, [but] ‘deep seeking’ for human progress,” Xie said, making a pun.

Xie said in a video message to a forum that there was an urgent need for global cooperation in regulating the field.

He added that the two countries should “jointly promote” AI global governance, saying: “Emerging high technology like AI could open Pandora’s box … If left unchecked it could bring ‘grey rhinos’.”

“Grey rhinos” is management speak for obvious threats that people ignore until they become crises.”

The least you can do is pick up the phone when the phone is ringing.

Elon Musk puts p(superbad) at 20%, which may or may not be doom.

OneQuadrillionOwls? Tyler Cowen links to the worry that we will hand over control to the AI because it is being effective and winning trust. No, that part is fine, they’re totally okay with humanity handing control over to an AI because it appears trustworthy. Totally cool. Except that some people won’t like that, And That’s Terrible because it won’t be ‘seen as legitimate’ and ‘chaos would ensue.’ So cute. No, chaos would not ensue.

If you put the sufficiently capable AI in power, the humans don’t get power back, nor can they cause all that much chaos.

Eliezer Yudkowsky: old science fiction about AI now revealed as absurd. people in book still use same AI at end of story as at start. no new models released every 3 chapters. many such books spanned weeks or even months.

Lividwit: the most unrealistic thing about star trek TNG was that there were still only two androids by the end.

Stay safe out there. Aligned AI also might kill your gains. But keep working out.

Also, keep working. That’s the key.

That’s a real article and statement from Brin, somehow.

Grok continues to notice what its owner would consider unfortunate implications.

It’s not that I think Grok is right, only that Grok is left, and sticking to its guns.

Discussion about this post

AI #106: Not so Fast Read More »

google-tells-trump’s-doj-that-forcing-a-chrome-sale-would-harm-national-security

Google tells Trump’s DOJ that forcing a Chrome sale would harm national security

Close-up of Google Chrome Web Browser web page on the web browser. Chrome is widely used web browser developed by Google.

Credit: Getty Images

The government’s 2024 request also sought to have Google’s investment in AI firms curtailed even though this isn’t directly related to search. If, like Google, you believe leadership in AI is important to the future of the world, limiting its investments could also affect national security. But in November, Mehta suggested he was open to considering AI remedies because “the recent emergence of AI products that are intended to mimic the functionality of search engines” is rapidly shifting the search market.

This perspective could be more likely to find supporters in the newly AI-obsessed US government with a rapidly changing Department of Justice. However, the DOJ has thus far opposed allowing AI firm Anthropic to participate in the case after it recently tried to intervene. Anthropic has received $3 billion worth of investments from Google, including $1 billion in January.

New year, new Justice Department

Google naturally opposed the government’s early remedy proposal, but this happened in November, months before the incoming Trump administration began remaking the DOJ. Since taking office, the new administration has routinely criticized the harsh treatment of US tech giants, taking aim at European Union laws like the Digital Markets Act, which tries to ensure user privacy and competition among so-called “gatekeeper” tech companies like Google.

We may get a better idea of how the DOJ wants to proceed later this week when both sides file their final proposals with Mehta. Google already announced its preferred remedy at the tail end of 2024. It’s unlikely Google’s final version will be any different, but everything is up in the air for the government.

Even if current political realities don’t affect the DOJ’s approach, the department’s staffing changes could. Many of the people handling Google’s case today are different than they were just a few months ago, so arguments that fell on deaf ears in 2024 could move the needle. Perhaps emphasizing the national security angle will resonate with the newly restaffed DOJ.

After both sides have had their say, it will be up to the judge to eventually rule on how Google must adapt its business. This remedy phase should get fully underway in April.

Google tells Trump’s DOJ that forcing a Chrome sale would harm national security Read More »

eerily-realistic-ai-voice-demo-sparks-amazement-and-discomfort-online

Eerily realistic AI voice demo sparks amazement and discomfort online


Sesame’s new AI voice model features uncanny imperfections, and it’s willing to act like an angry boss.

In late 2013, the Spike Jonze film Her imagined a future where people would form emotional connections with AI voice assistants. Nearly 12 years later, that fictional premise has veered closer to reality with the release of a new conversational voice model from AI startup Sesame that has left many users both fascinated and unnerved.

“I tried the demo, and it was genuinely startling how human it felt,” wrote one Hacker News user who tested the system. “I’m almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound.”

In late February, Sesame released a demo for the company’s new Conversational Speech Model (CSM) that appears to cross over what many consider the “uncanny valley” of AI-generated speech, with some testers reporting emotional connections to the male or female voice assistant (“Miles” and “Maya”).

In our own evaluation, we spoke with the male voice for about 28 minutes, talking about life in general and how it decides what is “right” or “wrong” based on its training data. The synthesized voice was expressive and dynamic, imitating breath sounds, chuckles, interruptions, and even sometimes stumbling over words and correcting itself. These imperfections are intentional.

“At Sesame, our goal is to achieve ‘voice presence’—the magical quality that makes spoken interactions feel real, understood, and valued,” writes the company in a blog post. “We are creating conversational partners that do not just process requests; they engage in genuine dialogue that builds confidence and trust over time. In doing so, we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding.”

Sometimes the model tries too hard to sound like a real human. In one demo posted online by a Reddit user called MetaKnowing, the AI model talks about craving “peanut butter and pickle sandwiches.”

An example of Sesame’s female voice model craving peanut butter and pickle sandwiches, captured by Reddit user MetaKnowing.

Founded by Brendan Iribe, Ankit Kumar, and Ryan Brown, Sesame AI has attracted significant backing from prominent venture capital firms. The company has secured investments from Andreessen Horowitz, led by Anjney Midha and Marc Andreessen, along with Spark Capital, Matrix Partners, and various founders and individual investors.

Browsing reactions to Sesame found online, we found many users expressing astonishment at its realism. “I’ve been into AI since I was a child, but this is the first time I’ve experienced something that made me definitively feel like we had arrived,” wrote one Reddit user. “I’m sure it’s not beating any benchmarks, or meeting any common definition of AGI, but this is the first time I’ve had a real genuine conversation with something I felt was real.” Many other Reddit threads express similar feelings of surprise, with commenters saying it’s “jaw-dropping” or “mind-blowing.”

While that sounds like a bunch of hyperbole at first glance, not everyone finds the Sesame experience pleasant. Mark Hachman, a senior editor at PCWorld, wrote about being deeply unsettled by his interaction with the Sesame voice AI. “Fifteen minutes after ‘hanging up’ with Sesame’s new ‘lifelike’ AI, and I’m still freaked out,” Hachman reported. He described how the AI’s voice and conversational style eerily resembled an old friend he had dated in high school.

Others have compared Sesame’s voice model to OpenAI’s Advanced Voice Mode for ChatGPT, saying that Sesame’s CSM features more realistic voices, and others are pleased that the model in the demo will roleplay angry characters, which ChatGPT refuses to do.

An example argument with Sesame’s CSM created by Gavin Purcell.

Gavin Purcell, co-host of the AI for Humans podcast, posted an example video on Reddit where the human pretends to be an embezzler and argues with a boss. It’s so dynamic that it’s difficult to tell who the human is and which one is the AI model. Judging by our own demo, it’s entirely capable of what you see in the video.

“Near-human quality”

Under the hood, Sesame’s CSM achieves its realism by using two AI models working together (a backbone and a decoder) based on Meta’s Llama architecture that processes interleaved text and audio. Sesame trained three AI model sizes, with the largest using 8.3 billion parameters (an 8 billion backbone model plus a 300 million parameter decoder) on approximately 1 million hours of primarily English audio.

Sesame’s CSM doesn’t follow the traditional two-stage approach used by many earlier text-to-speech systems. Instead of generating semantic tokens (high-level speech representations) and acoustic details (fine-grained audio features) in two separate stages, Sesame’s CSM integrates into a single-stage, multimodal transformer-based model, jointly processing interleaved text and audio tokens to produce speech. OpenAI’s voice model uses a similar multimodal approach.

In blind tests without conversational context, human evaluators showed no clear preference between CSM-generated speech and real human recordings, suggesting the model achieves near-human quality for isolated speech samples. However, when provided with conversational context, evaluators still consistently preferred real human speech, indicating a gap remains in fully contextual speech generation.

Sesame co-founder Brendan Iribe acknowledged current limitations in a comment on Hacker News, noting that the system is “still too eager and often inappropriate in its tone, prosody and pacing” and has issues with interruptions, timing, and conversation flow. “Today, we’re firmly in the valley, but we’re optimistic we can climb out,” he wrote.

Too close for comfort?

Despite CSM’s technological impressiveness, advancements in conversational voice AI carry significant risks for deception and fraud. The ability to generate highly convincing human-like speech has already supercharged voice phishing scams, allowing criminals to impersonate family members, colleagues, or authority figures with unprecedented realism. But adding realistic interactivity to those scams may take them to another level of potency.

Unlike current robocalls that often contain tell-tale signs of artificiality, next-generation voice AI could eliminate these red flags entirely. As synthetic voices become increasingly indistinguishable from human speech, you may never know who you’re talking to on the other end of the line. It’s inspired some people to share a secret word or phrase with their family for identity verification.

Although Sesame’s demo does not clone a person’s voice, future open source releases of similar technology could allow malicious actors to potentially adapt these tools for social engineering attacks. OpenAI itself held back its own voice technology from wider deployment over fears of misuse.

Sesame sparked a lively discussion on Hacker News about its potential uses and dangers. Some users reported having extended conversations with the two demo voices, with conversations lasting up to the 30-minute limit. In one case, a parent recounted how their 4-year-old daughter developed an emotional connection with the AI model, crying after not being allowed to talk to it again.

The company says it plans to open-source “key components” of its research under an Apache 2.0 license, enabling other developers to build upon their work. Their roadmap includes scaling up model size, increasing dataset volume, expanding language support to over 20 languages, and developing “fully duplex” models that better handle the complex dynamics of real conversations.

You can try the Sesame demo on the company’s website, assuming that it isn’t too overloaded with people who want to simulate a rousing argument.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Eerily realistic AI voice demo sparks amazement and discomfort online Read More »

do-these-dual-images-say-anything-about-your-personality?

Do these dual images say anything about your personality?

There’s little that Internet denizens love more than a snazzy personality test—cat videos, maybe, or perpetual outrage. One trend that has gained popularity over the last several years is personality quizzes based on so-called ambiguous images—in which one sees either a young girl or an old man, for instance, or a skull or a little girl. It’s possible to perceive both images by shifting one’s perspective, but it’s the image one sees first that is said to indicate specific personality traits. According to one such quiz, seeing the young girl first means you are optimistic and a bit impulsive, while seeing the old man first would mean one is honest, faithful, and goal-oriented.

But is there any actual science to back up the current fad? There is not, according to a paper published in the journal PeerJ, whose authors declare these kinds of personality quizzes to be a new kind of psychological myth. That said, they did find a couple of intriguing, statistically significant correlations they believe warrant further research.

In 1892, a German humor magazine published the earliest known version of the “rabbit-duck illusion,” in which one can see either a rabbit or a duck, depending on one’s perspective—i.e., multistable perception. There have been many more such images produced since then, all of which create ambiguity by exploiting certain peculiarities of the human visual system, such as playing with illusory contours and how we perceive edges.

Such images have long fascinated scientists and philosophers because they seem to represent different ways of seeing. So naturally there is a substantial body of research drawing parallels between such images and various sociological, biological, or psychological characteristics.

For instance, a 2010 study examined BBC archival data on the duck-rabbit illusion from the 1950s and found that men see the duck more often than women, while older people were more likely to see the rabbit. A 2018 study of the “younger-older woman” ambiguous image asked participants to estimate the age of the woman they saw in the image. Older participants over 30 gave higher estimates than younger ones. This was confirmed by a 2021 study, although that study also found no correlation between participants’ age and whether they were more likely to see the older or younger woman in the image.

Do these dual images say anything about your personality? Read More »

apple-refuses-to-break-encryption,-seeks-reversal-of-uk-demand-for-backdoor

Apple refuses to break encryption, seeks reversal of UK demand for backdoor

Although it wasn’t previously reported, Apple’s appeal was filed last month at about the time it withdrew ADP from the UK, the Financial Times wrote today.

Snoopers’ Charter

Backdoors demanded by governments have alarmed security and privacy advocates, who say the special access would be exploited by criminal hackers and other governments. Bad actors typically need to rely on vulnerabilities that aren’t intentionally introduced and are patched when discovered. Creating backdoors for government access would necessarily involve tech firms making their products and services less secure.

The order being appealed by Apple is a Technical Capability Notice issued by the UK Home Office under the 2016 law, which is nicknamed the Snoopers’ Charter and forbids unauthorized disclosure of the existence or contents of a warrant issued under the act.

“The Home Office refused to confirm or deny that the notice issued in January exists,” the BBC wrote today. “Legally, this order cannot be made public.”

Apple formally opposed the UK government’s power to issue Technical Capability Notices in testimony submitted in March 2024. The Investigatory Powers Act “purports to apply extraterritorially, permitting the UKG [UK government] to assert that it may impose secret requirements on providers located in other countries and that apply to their users globally,” Apple’s testimony said.

We contacted Apple about its appeal today and will update this article if we get a response. The appeal process may be a secretive one, the FT article said.

“The case could be heard as soon as this month, although it is unclear whether there will be any public disclosure of the hearing,” the FT wrote. “The government is likely to argue the case should be restricted on national security grounds.”

Under the law, Investigatory Powers Tribunal decisions can be challenged in an appellate court.

Apple refuses to break encryption, seeks reversal of UK demand for backdoor Read More »

the-2025-genesis-gv80-coupe-proves-to-be-a-real-crowd-pleaser

The 2025 Genesis GV80 Coupe proves to be a real crowd-pleaser

The 27-inch OLED screen combines the main instrument display and an infotainment screen. It’s a big improvement on what you’ll find in older GV80s (and G80s and GV70s), and the native system is by no means unpleasant to use. Although with Android Auto and Apple CarPlay, most drivers will probably just cast their phones. That will require a wire—while there is a Qi wireless charging pad, I was not able to wirelessly cast my iPhone using CarPlay; I had to plug into the USB-C port. (The press specs say it should have wireless CarPlay and Android Auto, for what it’s worth.)

Having a jog dial to interact with the infotainment is a plus in terms of driver distraction, but that’s immediately negated by having to use a touchscreen for the climate controls.

Beyond those gripes, the dark leather and contrast stitching look and feel good, and I appreciate the way the driver’s seat side bolsters hug you a little tighter when you switch into Sport mode or accelerate hard in one of the other modes. Our week with the Genesis GV80 coincided with some below-freezing weather, and I was glad to find that the seat heaters got warm very quickly—within a block of leaving the house, in fact.

I was also grateful for the fact that the center console armrest warms up when you turn on your seat heater—I’m not sure I’ve come across that feature in a car until now.

Tempting the former boss of BMW’s M division, Albert Biermann, away to set up Genesis’ vehicle dynamics department was also a good move. Biermann has been retired for a while now, but he evidently passed on some skills before that happened. The GV80 Coupe is particularly well-damped and won’t bounce you around in your seat over low-speed obstacles like potholes or speed bumps that, in other SUVs, can result in the occupants being shaken from side to side in their seats.

The 2025 Genesis GV80 Coupe proves to be a real crowd-pleaser Read More »