Author name: Tim Belzer

the-us-is-now-the-largest-investor-in-commercial-spyware

The US is now the largest investor in commercial spyware

Paragon, responding to the committee’s findings, accused Italian authorities of refusing to conduct a thorough technical verification—an assessment it argued could have resolved the issue.

Apart from focusing on investment, the Atlantic Council notes that the global spyware market is “growing and evolving,” with its dataset expanded to include four new vendors, seven new resellers or brokers, 10 new suppliers, and 55 new individuals linked to the industry.

Newly identified vendors include Israel’s Bindecy and Italy’s SIO. Among the resellers are front companies connected to NSO products, such as Panama’s KBH and Mexico’s Comercializadora de Soluciones Integrales Mecale, as highlighted by the Mexican government. New suppliers named include the UK’s Coretech Security and UAE’s ZeroZenX.

The report highlights the central role that these resellers and brokers play, stating that it is “a notably under-researched set of actors.” According to the report, “These entities act as intermediaries, obscuring the connections between vendors, suppliers, and buyers. Oftentimes, intermediaries connect vendors to new regional markets.”

“This creates an expanded and opaque spyware supply chain, which makes corporate structures, jurisdictional arbitrage, and ultimately accountability measures a challenge to disentangle,” Sarah Graham, who coauthored the report, tells WIRED.

“Despite this, resellers and brokers are not a current feature of policy responses,” she says.

The study reveals the addition of three new countries linked to spyware activity—Japan, Malaysia, and Panama. Japan in particular is a signatory to international efforts to curb spyware abuse, including the Joint Statement on Efforts to Counter the Proliferation and Misuse of Commercial Spyware and the Pall Mall Process Code of Practice for States.

“The discovery of entities operating in new jurisdictions, like Japan, highlights potential conflicts of interest between international commitments and market dynamics,” Graham says.

Despite efforts by the Biden administration to constrain the spyware market through its executive order, trade and visa restrictions, and sanctions, the industry has continued to operate largely without restraint.

The US is now the largest investor in commercial spyware Read More »

senator-blasts-microsoft-for-making-default-windows-vulnerable-to-“kerberoasting”

Senator blasts Microsoft for making default Windows vulnerable to “Kerberoasting”

Wyden said his office’s investigation into the Ascension breach found that the ransomware attackers’ initial entry into the health giant’s network was the infection of a contractor’s laptop after using Microsoft Edge to search Microsoft’s Bing site. The attackers were then able to expand their hold by attacking Ascension’s Active Directory and abusing its privileged access to push malware to thousands of other machines inside the network. The means for doing so, Wyden said: Kerberoasting.

“Microsoft has become like an arsonist”

“Microsoft’s continued support for the ancient, insecure RC4 encryption technology needlessly exposes its customers to ransomware and other cyber threats by enabling hackers that have gained access to any computer on a corporate network to crack the passwords of privileged accounts used by administrators,” Wyden wrote. “According to Microsoft, this threat can be mitigated by setting long passwords that are at least 14 characters long, but Microsoft’s software does not require such a password length for privileged accounts.”

Additionally, Green noted, the continuing speed of GPUs means that even when passwords appear to be strong, they can still fall to offline cracking attacks. That’s because the security cryptographic hashes created by default RC4/Kerberos use no cryptographic salt and a single iteration of the MD4 algorithm. The combination means an offline cracking attack can make billions of guesses per second, a thousandfold advantage over the same password hashed by non-Kerberos authentication methods.

Referring to the Active Directory default, Green wrote:

It’s actually a terrible design that should have been done away with decades ago. We should not build systems where any random attacker who compromises a single employee laptop can ask for a message encrypted under a critical password! This basically invites offline cracking attacks, which do not need even to be executed on the compromised laptop—they can be exported out of the network to another location and performed using GPUs and other hardware.

More than 11 months after announcing its plans to deprecate RC4/Kerberos, the company has provided no timeline for doing so. What’s more, Wyden said, the announcement was made in a “highly technical blog post on an obscure area of the company’s website on a Friday afternoon.” Wyden also criticized Microsoft for declining to “explicitly warn its customers that they are vulnerable to the Kerberoasting hacking technique unless they change the default settings chosen by Microsoft.”

Senator blasts Microsoft for making default Windows vulnerable to “Kerberoasting” Read More »

one-of-google’s-new-pixel-10-ai-features-has-already-been-removed

One of Google’s new Pixel 10 AI features has already been removed

Google is one of the most ardent proponents of generative AI technology, as evidenced by the recent launch of the Pixel 10 series. The phones were announced with more than 20 new AI experiences, according to Google. However, one of them is already being pulled from the company’s phones. If you go looking for your Daily Hub, you may be disappointed. Not that disappointed, though, as it has been pulled because it didn’t do very much.

Many of Google’s new AI features only make themselves known in specific circumstances, for example when Magic Cue finds an opportunity to suggest an address or calendar appointment based on your screen context. The Daily Hub, on the other hand, asserted itself multiple times throughout the day. It appeared at the top of the Google Discover feed, as well as in the At a Glance widget right at the top of the home screen.

Just a few weeks after release, Google has pulled the Daily Hub preview from Pixel 10 devices. You will no longer see it in Google Discover nor in the home screen widget. After being spotted by 9to5Google, the company has issued a statement explaining its plans.

“To ensure the best possible experience on Pixel, we’re temporarily pausing the public preview of Daily Hub for users. Our teams are actively working to enhance its performance and refine the personalized experience. We look forward to reintroducing an improved Daily Hub when it’s ready,” a Google spokesperson said.

One of Google’s new Pixel 10 AI features has already been removed Read More »

microsoft-ends-openai-exclusivity-in-office,-adds-rival-anthropic

Microsoft ends OpenAI exclusivity in Office, adds rival Anthropic

Microsoft’s Office 365 suite will soon incorporate AI models from Anthropic alongside existing OpenAI technology, The Information reported, ending years of exclusive reliance on OpenAI for generative AI features across Word, Excel, PowerPoint, and Outlook.

The shift reportedly follows internal testing that revealed Anthropic’s Claude Sonnet 4 model excels at specific Office tasks where OpenAI’s models fall short, particularly in visual design and spreadsheet automation, according to sources familiar with the project cited by The Information, who stressed the move is not a negotiating tactic.

Anthropic did not immediately respond to Ars Technica’s request for comment.

In an unusual arrangement showing the tangled alliances of the AI industry, Microsoft will reportedly purchase access to Anthropic’s models through Amazon Web Services—both a cloud computing rival and one of Anthropic’s major investors. The integration is expected to be announced within weeks, with subscription pricing for Office’s AI tools remaining unchanged, the report says.

Microsoft maintains that its OpenAI relationship remains intact. “As we’ve said, OpenAI will continue to be our partner on frontier models and we remain committed to our long-term partnership,” a Microsoft spokesperson told Reuters following the report. The tech giant has poured over $13 billion into OpenAI to date and is currently negotiating terms for continued access to OpenAI’s models amid ongoing negotiations about their partnership terms.

Stretching back to 2019, Microsoft’s tight partnership with OpenAI until recently gave the tech giant a head start in AI assistants based on language models, allowing for a rapid (though bumpy) deployment of OpenAI-technology-based features in Bing search and the rollout of Copilot assistants throughout its software ecosystem. It’s worth noting, however, that a recent report from the UK government found no clear productivity boost from using Copilot AI in daily work tasks among study participants.

Microsoft ends OpenAI exclusivity in Office, adds rival Anthropic Read More »

ai-vs.-maga:-populists-alarmed-by-trump’s-embrace-of-ai,-big-tech

AI vs. MAGA: Populists alarmed by Trump’s embrace of AI, Big Tech

Some Republicans are still angry over the deplatforming of Trump by tech executives once known for their progressive politics. They had been joined by a “vocal and growing group of conservatives who are fundamentally suspicious of the benefits of technological innovation,” Thierer said.

With MAGA skeptics on one side and Big Tech allies of the president on the other, a “battle for the soul of the conservative movement” is under way.

Popular resentment is now a threat to Trump’s Republican Party, warn some of its biggest supporters—especially if AI begins displacing jobs as many of its exponents suggest.

“You can displace farm workers—what are they going to do about it? You can displace factory workers—they will just kill themselves with drugs and fast food,” Tucker Carlson, one of the MAGA movement’s most prominent media figures, told a tech conference on Monday.

“If you do that to lawyers and non-profit sector employees, you will get a revolution.”

It made Trump’s embrace of Silicon Valley bosses a “significant risk” for his administration ahead of next year’s midterm elections, a leading Republican strategist said.

“It’s a real double-edged sword—the administration is forced to embrace [AI] because if the US is not the leader in AI, China will be,” the strategist said, echoing the kind of argument made by Sacks and fellow Trump adviser Michael Kratsios for their AI policy platform.

“But you could see unemployment spiking over the next year,” the strategist said.

Other MAGA supporters are urging Trump to tone down at least his public cheerleading for an AI sector so many of them consider a threat.

“The pressure that is being placed on conservatives to fall in line… is a recipe for discontent,” said Toscano.

By courting AI bosses, the Republican Party, which claims to represent the pro-family movement, religious communities, and American workers, appeared to be embracing those who are antithetical to all of those groups, he warned.

“The current view of things suggests that the most important members of the party are those that are from Silicon Valley,” Toscano said.

Additional reporting by Cristina Criddle in San Francisco.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

AI vs. MAGA: Populists alarmed by Trump’s embrace of AI, Big Tech Read More »

childhood-and-education-#14:-the-war-on-education

Childhood and Education #14: The War On Education

The purported main purpose of school, and even of childhood, is educating children.

Many people are actively opposed to this idea.

Either they have other priorities that matter more, or sometimes they outright think that your child learning things is bad and you should feel bad for wanting that.

Some even say it openly. They actively and openly want to stop your child from learning things, and want to put your child into settings where they will not learn things. And they say that this is good.

Or they simply assert that the primary point of education is as a positional good, where what matters is your relative standing. And then they pretend this doesn’t imply they both should and are going around preventing children from learning.

In other places, we simply epicly fail at education and don’t seem to care. Or education ‘experts’ claim that things that obviously work don’t work, or things that obviously don’t work, do work.

Consider this section some combination of peek into this alternative universe of thought and the fun of multiple meta levels of shooting fish in a barrel?

I present, HT to Pamela Hobart who makes many of the same points: Freddie DeBoer writes the long ‘Education Doesn’t Work 3.0’ which is ‘a comprehensive argument that education cannot close academic gaps.’

What? Was it supposed to do that? Would you want it to?

Very obviously the only way to close academic gaps fully is to go all handicapper general and ban bright kids from getting educations. Thus, The War on Education.

Freddie starts off saying we can’t admit some kids aren’t smart, and some kids will naturally do better at school than others, to which I say you just admitted it, and I’m happy to admit it, and everyone I talk to is willing to admit it, so who is this mysterious we. It is, presumably, a Certain Type of Guy who is an ‘education expert’ of some kind and presumably has a maximum of one child who has gone to school.

Freddie DeBoer: Our educational debates are largely useless because most people engaged in those debates assume out of hand that, absent unusual circumstances like severe neglect or abuse or the presence of developmental or cognitive disabilities, any student can be taught to any level of academic success, and any failure to induce academic success in students is the result of some sort of unfortunate error.

Well, it depends.

If you mean ‘those debates’ as in those between those ‘education experts’? Then perhaps yes, they make these types of absurdly stupid assumptions. If you mean ‘debates among actual regular humans,’ then no. Obviously not. One would question whether Freddie has met such people.

Education can raise the absolute performance of most students modestly, but it almost never meaningfully reshuffles the relative distribution of ability and achievement.

Um, again, what exactly were we trying to do? Educate the children? Or make sure we don’t educate the children? Half and half?

I mean, I guess Freddie then does a job repeatedly exposing ‘the contradictions’ as it were in the entire equality project, but the barrel already has a lot of bullet holes, the water is leaking and the fish are dead.

So we get more fun lines like this:

We have spent an immense amount of effort, manpower, time, and treasure on forcing students to meet procrustean academic standards, despite the fact that we have overwhelming evidence that their relative performance is largely fixed.

Yes, obviously, also yes the extra money is mostly being wasted but even if it wasn’t the whole point was presumably to (drum roll) educate the children.

Why in the world would we spend tons of resources and time on relative education, which by definition is zero sum and a red queen’s race? That doesn’t make sense. There’s a fixed amount of relative education.

At the end of this essay, I will argue that education is important, does matter, and is worth funding – but that what’s now assumed to be its primary purpose, moving students around in quantitative educational metrics, is actually what education does worst.

Who thought that was its primary purpose, either the metrics or the thing itself?

Meanwhile, the reason this was brought to my attention is that his ‘absolute learning has value’ t-shirt is raising questions supposedly answered by the shirt:

What This Essay Does Not Argue:

  • That absolute learning (that is, learning as measured against a standard or benchmark or criterion) has no value; rather, relative learning is practically and morally dominant in these discussions because only relative learning (sometimes discussed in terms of educational mobility) can better one’s economic fortunes, and it is that potential that underlies our entire modern educational debates and the reason for obsession with achievement gaps.

The next section is ‘I Assure You, You Do Care About Relative Learning.’

I assure him that I don’t.

His first argument is that relative learning indicates absolute learning. That is true but saying this means therefore you care about relative learning (checkmate, liberals?) is not how logic or words work. Caring about the territory does not mean you care about the (not very accurate) map.

Second, while I am happy to concede that absolute learning happens all the time, this should not be mistaken for saying that absolute learning is easily achieved, reliable, or consistent.

I don’t understand why this is supposed to be a relevant argument here. It seems like he’s saying I care about [X], but actually [X] is hard, so instead I care about [Y]?

Most importantly, though, is a simple reality: the consequences of education are derived from relative performance, not absolute.

In the vast majority of scenarios where education is relevant, applicants of whatever type are being evaluated relative to peers.

There’s saying the quiet part out loud, and then there’s this.

The purpose of education is… to do well on applications?

He concedes that one might learn to drive and then use this skill to usefully operate a moving vehicle, but says this type of education is rare – that most education has no actual use whatsoever, other than as a positional good to grab a larger share of stuff.

Then he goes through that schools are ‘not guilty’ because improving educational outcomes is impossible anyway. Transferring does nothing. Charter schools don’t work. Interventions don’t work (literally “Nothing “Works””), full null hypothesis.

All right, so now we have a section ‘So What Should We Do?’

Very obviously, if you actually believed all that, you would want to dramatically reduce spending, both in money and in the time of children, on school, since school is almost entirely about relative position. Spending more on school, trying to achieve more or improve performance, in this model, is a defection against everyone else. So we should ban attempts to educate children, beyond some basic skills, and focus on practical stuff like learning to drive. Completely reorient childhood.

So having pre registered that, let’s see what he recommends.

  1. Improve air quality. Okay, sure, that is one of the somethings that work, although again I don’t understand why he thinks improving performance is good.

  2. Lower our educational standards. Don’t make kids learn (for example) abstract math. Yes, that makes perfect sense for Freddie, if the learning is useless, you shouldn’t require it. Again, if he is right then we should go farther, and ban such learning. Why are we letting kids engage in a zero sum competition?

  3. Soft tracking. Again, good idea, not sure what it has to do with the post.

  4. Invest in a robust safety net. Maybe? That’s a different department.

Then he tries to pivot back to ‘actually education matters.’

Education creates the conditions for children and young adults to discover ideas, literature, science, and art that might otherwise remain inaccessible.

It provides the structured time and social environment where curiosity can blossom, where students can learn how to think about problems that don’t have easy answers, and where they can build lasting relationships with peers and mentors.

The point of school, then, is not to guarantee that every child climbs into the top decile of performance but to offer each student the chance to cultivate knowledge, resilience, and imagination in ways that enrich their lives.

So absolute learning of something does matter after all, then. I mean, this description does not match what I know about actual schools, nor would I design anything like a current school if those were the goals. And he doesn’t seem interested in a redesign or asking how to maximize the things that he thinks matter. But hey.

Meanwhile, here’s the top comment, so yes things do get pretty insane:

James K: This is what I pay you for, Freddie, thank you for being so clear-headed about this topic.

I’ve been teaching for 16 years now and it boggles my mind that the band teacher can literally get on stage and say “We have the beginner, intermediate, and advanced band for you” and of course the baseball team can be divided into Varsity and JV, but I am not allowed to say that some kids are not smart enough to handle my AP classes because this means I don’t BELIEVE IN THEM or am supporting TRACKING (always said in the tones people reserve for the words ‘eugenics’ or ‘segregation’).

I mean, yes. It means you support tracking because tracking is good. It means you don’t believe in them in the sense that you don’t believe in things that aren’t real.

So no, in that sense Freddie isn’t arguing with a strawman. Which means that the entire system of education is being run by people who are at war with education.

A Tennessee teen is suing his school for ‘compensatory education’ after graduating with a 3.4 GPA, but being unable to read, or even spell his own name, and the school system has the audacity to defend against that lawsuit.

But the school took no action, the suit says, other than giving him 24 hours to complete his assignments.

But even this “solution” was a problem. Because when William was at home with his schoolwork, he relied on AI programs like ChatGPT and Grammarly to complete his assignments for him, according to the judge who ruled on his suit last week. As a result, William continued to achieve high marks on his classwork throughout his entire four years of high school, even though teachers knew he was illiterate.

If you can’t read, using ChatGPT is kind of crazy – you’re presumably scanning or copy pasting in text you don’t understand, then copying out text you don’t understand and hoping for the best.

Scott Alexander asks: What happened to NAEP Scores? He says they are ‘not good’:

Well, they’re not great obviously, to the extent you can trust the scores to map to Reality, but they are still above the start of the graph in 1998, and we’re talking about a seven point difference. That’s less than a fifth of a standard deviation. This is nothing, if anything this shows that Covid didn’t change things much?

The comments section of Scott’s post is full of despair about classroom conditions getting worse, shifts to teaching strategies that don’t work (see the Mississippi reforms but in reverse), discipline collapsing and teachers having no tools if kids don’t play along, many teachers quitting, chronic absenteeism happening and being accepted and tolerated, and many families not prioritizing education, on top of the continuing trends involving smartphones. That’s on top of the obvious ‘Covid took away a lot of schooling’ concern that Scott starts with.

What seems more meaningful than the overall smaller drop is the widening gap between low and high performers, another trend predating Covid. Scott has several graphs showing this, and I am convinced this is real, with a variety of causes. If you are properly equipped and motivated, you can avoid the pitfalls described above, and you have access to the entire web and world and a lot of new resources, now even including AI. Whereas when the bottom falls out, the bottom falls out.

Meanwhile in 12th grade, nearly half scored below ‘the basic level,’ which involves things like ‘using percentages to solve real-world problems,’ and reading scores hit a new low. What we are doing, including adding funding, is clearly not working. Or rather, it is working hard, only not at the goal of children learning academic skills.

The War on Algebra in particular is still perhaps the craziest thing I’ve ever seen, actively preventing children from learning math out of spite. In the sense that it both very clearly purely destructive and evil, and also horribly unpopular across the board, and also high salience to a lot of voters. Yes, I do know what their arguments are for doing it, and they very much do not make it any better.

And yet it still happened, and it happens across the board, straight up Handicapper General style.

Ro Khanna: It is absurd that Palo Alto School district just voted to remove honors biology for all students & already removed honors English. They call it de-laning. I call it an assault on excellence. I took many honors classes at Council Rock High in PA.

Autumn Looijen: I ran the campaign to bring algebra back to SF’s middle schools.

It was the most popular thing I ever worked on. Voters don’t want to take away opportunities for kids who can’t afford private school.

If anyone wants to put this on the ballot in Palo Alto, happy to advise.

Maud Maron: We are trying to do a version of this in NYC! Would love to have you speak to NYC parents who want to have algebra & geometry options in Middle School.

Meanwhile in San Francisco, the war rages on. It seems the city has not yet been retaken by sanity on all fronts yet, although there are some promising signs. All of this seems like it has to be beyond unpopular, in a ‘cause families to move out’ way, yet here we were again not too long ago (it got better, for now):

Garry Tan: San Francisco schools is trying its absolute hardest to make sure all middle income families who could move out of the city do so right away.

“Grading for Equity” is going to be a real disaster and I guess this is a boon for SF private schools and Burlingame housing prices.

For education bureaucrats who ruin our public schools with the most unfair and anti-merit polices: BUSINESS IS BOOMING.

Someone needs to investigate the Schools of Education that spawn these policies because it is a real danger to public schools everywhere.

Basically this scam is Idiocracy in real life.

Mike Solana: the san francisco board of education must immediately fire the superintendent. if they do not, they must all be removed from power.

I don’t get how this falls under ‘you can just do things’ but it seems it did, at least until people sounded the alarm?

John Trasvina (The Voice of SF): Without seeking approval of the San Francisco Board of Education, Superintendent of Schools Maria Su plans to unveil a new Grading for Equity plan on Tuesday that will go into effect this fall at 14 high schools and cover over 10,000 students.

The school district is already negotiating with an outside consultant to train teachers in August in a system that awards a passing C grade to as low as a score of 41 on a 100-point exam.

Were it not for an intrepid school board member, the drastic change in grading with implications for college admissions and career readiness would have gone unnoticed and unexplained. It is buried in a three-word phrase on the last page of a PowerPoint presentation embedded in the school board meeting’s 25-page agenda.

Grading for Equity eliminates homework or weekly tests from being counted in a student’s final semester grade. All that matters is how the student scores on a final examination, which can be taken multiple times.

Under the San Leandro Unified School District’s grading for equity system touted by the San Francisco Unified School District and its consultant, a student with a score as low as 80 can attain an A and as low as 21 can pass with a D.

Derek Thompson: New SF public school plan would

– eliminate homework and weekly tests from counting toward semester grade

– allow students to take the final exam multiple times

– convert all B grades into As, and all Fs into Cs

It’s hard to see the difference between this policy and what you’d get if a bunch of 10yos locked the teachers in a closet and rewrote the rules.

Karen Vaites: More media attention here, please! 🚩🚩🚩

Jared Walczak: The sad irony is that Grading for Equity is virtually the opposite of Teaching for Equity, because under this system, the only kids who might get a real education are those from families that take more into their own hands, bringing higher expectations and resources to bear.

So, effectively no grading, then. You can do whatever you want all semester, no homework (so perhaps there’s some upside here?), phone out in class every day, whatever, all you have to do to pass is get 21% on an exam you can take multiple times. That was going to be it.

And Maria Su could just do this on her own? What?

It turns out that enough backlash does matter, and this combination of graft and civilizational suicide took the loss on this one.

SF Standard: Just in: SFUSD is delaying a planned “grading for equity” initiative after the proposal sparked furious backlash.

Kelsey Piper: SF superintendent backed off immediately after the flood of negative feedback. This strikes me as a pretty dramatic change from how previous standards erosions were received, and a really good sign.

Most politicians want to make their constituents happy, and often their information environment is kind of terrible for that. It’s worth advocating for the stuff that matters to you. Don’t be an asshole, but be clear and outspoken.

San Francisco’s turnaround happened very fast. The Bay Area could become one of the best-governed parts of the United States inside a few years if we work to make it happen.

Well, maybe. They say they are ‘delaying’ the initiative. Which means they’re presumably going to keep paying the consultants, and they are going to try again to destroy all the incentives and measurements involved in education.

Fighting against algebra and grading is bad enough, but reading?

As in, people who want to ban teaching kids to read until age 6. No. Seriously.

Because they’re ‘not ready.’

Erik Hoel: 62% of American kids have a tablet at age 6.

They spend 3.5 hours every day on screens (increasingly, TikTok).

And because our school system waits so long to teach reading, they never get a chance to become readers.

“Education experts” have been saying for decades that we must wait to start teaching reading until 6-7 for neuroscientific reasons. These reasons appear, as far as I can tell, to be basically made up. Consider this recent article, which quotes a bunch of experts on this.

E.g., Maryanne Wolf says that brain myelination needs to reach a certain stage, and that teaching reading prior to 5 is “really wrong” and that she would ban teaching reading prior to 6 nationwide if she could.

Siberian Fox: what in the fuck

I was playing Pokémon before 6 if I recall correctly, if not, other games that require reading

good to know that there are people in the US that think this should be super illegal

In a good school, a 1st grader will be reading quite a lot, actually.

I’m not going to bother quoting more of the evidence because this is so utterly Obvious Nonsense as to be beyond belief. Frankly, if I had a child that was 6 years old and couldn’t read I would not be thinking ‘good it is finally time,’ I would be debating exactly how much to panic and reassuring my wife not to panic far more.

The 3 year old I am currently supervising can somewhat read. I could read before my first memory (which was at 5 and involves reading books) so I don’t know exactly when it happened, and I learned without anyone trying to teach me.

I am going maximum opposite. There is no higher priority than teaching a child to read as early as they can handle it, and every actual parent knows this. There is a reason why the advice is constantly read to them, read with them, push reading. Reading enables everything else. The entire ‘education’ establishment really does need to flat out join the delenda est club.

In the name of them developing empathy for the people you force them to teach? As long as they pass a certain threshold of knowledge, the rest of their childhood, and indeed life, belongs to the people, and they’re a horrible person if they think otherwise, and the purpose of school is to teach them this?

No, seriously, this is something quite a lot of people, especially those in education, actually believe.

Setting all ethical or moral considerations aside, and even assuming that is the goal, what in the world makes you think this is going to work in the direction you want?

Tracing Woods: If a child is in a class, they should be there because it is the best environment to help them learn, not so they can act as an unpaid tutor to provide vague “peer effects” to others A system that abuses children as resources instead of teaching them to their level is unethical.

Joe McReynolds (3.4m views): That’s what life *is*, though! If you’re unusually smart/talented, your primary purpose in life is to help lift up others who weren’t born/raised as lucky as you were. Learning that sooner rather than later is important for developing a sense of altruism and communitarianism.

“With great power comes great responsibility” is a simple, true statement. To the extent being born with unusual (intellectual) power is an “innate characteristic,” that good luck means that you owe the universe hard work. You’re born with a debt that takes a lifetime to repay.

There it is, very explicitly. These people actually believe this. If you’re talented, your purpose in life is to be enslaved, to be forced to help others. Your life does not belong to you. Your labor does not belong to you. Your time does not belong to you. Who cares whether that benefits you? You belong to the people, from each according to their ability, at the barrel of a gun.

Kelsey Piper: If I were actively trying to extinguish my children’s sense of altruism, compassion and responsibility I can’t think of a better plan than forcing them to spend all of their time doing random ‘altruistic’ chores they didn’t choose, and aren’t equipped to succeed at.

If you say to your kid “I’m going to volunteer this weekend to help socialize cats”, they’ll probably come along and they may discover a lifelong love of helping animals! if you force them to spend all their time on it, guess what, they’re gonna hate it.

if you want your children to be people who give generously to their communities and their broader world, be that kind of person yourself and let them witness the ways in which this is part of the good for you and for your community.

SteelBlaidd: Why do they need to go to college to learn to be teachers if kids can be expected to do it in elementary school?

Kelsey Piper: some people out here believe that homeschooling is immoral since you don’t know enough to teach your kids and also that a smart 9 yo can do it.

Ben Hoffman: The important thing is that the 9yo is forced to do it without pay to teach them that being educated means going along with nonsense dramas, which qualifies them to get paid to go along with nonsense dramas when old enough that that’s dramatically appropriate.

Not only that the smart 9you can do it, that they should be forced to do it without pay. While the parent is forbidden to do it, because they are unqualified.

Ryan Moulton (QTing Joe above): Everybody is dunking on this, and I get why, but I’m a little more sympathetic. Particularly in lower grades, developing empathy for people different from you or dumber than you is a really important thing to get out of school, comparably important to getting through math faster.

Sarah Constantin: t is super common for kids to openly taunt anyone who’s not as good as them (at anything!) and i do think it’s important for them to learn manners, grace, and sportsmanship…

but it doesn’t deserve as much time in the day as math class.

Gallabytes: I would add that these kinds of assignments don’t necessarily breed empathy it’s pretty easy for it to create contempt instead.

Sarah Constantin: yeahhh.

Gallabytes: feels like people talk about school in far mode as some inscrutable thing.

if I put *youin a room with people you had 4 sigma on and told you to teach them math Or Else, how would you feel? how would this make you feel about them?

I believe we should treat Joe’s perspective the same way we would treat others who would force people with certain characteristics to labor for no compensation.

See my discussion of Alpha School for extensive previous discussions.

Tracing Woods (reference documents and more details in thread): in 1930, researchers studied ability grouping and concluded you needed to adjust the curriculum to make it work

in 1960, more confidently so

then in 1990, they studied grouping without changing curriculum, concluded it was useless, and advocated to get rid of ability grouping

over time the field got better and better at studying the form of ability grouping that everybody had known was pointless for sixty years while just sorta disregarding the form that kept getting results

I get so mad every time I read this stupid study

the field of education set itself back generations because it kept listening to people who thought ability grouping was “antidemocratic and antiegalitarian” and as such badly wanted it not to work

we had it figured out in 1936 and then we threw it away for kicks.

I am not a fan of the idea of educating children in 2025 primary via traditional classes. Traditional classes feel like learning a lot more than they actually cause learning.

But I accept that we are going to keep doing this for a while.

Given you are going to have traditionally shaped classes on various subjects, very obviously you want to track their progress and group those children by ability.

Grouping children into classes by ability has the advantages that it, as covered in Education #11:

  1. Helps almost all children learn more, whether they are behind, ahead or neither. There are some corner cases where kids are ‘on the bubble’ between tracks, or get tracked wrong, but mostly this is opportunity cost, that they missed benefits.

  2. Is universally popular with parents, to the extent that ‘ending tracking’ is the least popular serious policy proposal we have ever seen. As in David Shor says ‘removing advanced classes from schools’ is literally the single most unpopular policy Blue Rose has ever polled (yet there goes Palo Alto doing exactly this.)

  3. Is even popular with the classroom teachers themselves.

Ability grouping, done wisely, so utterly obviously works as to make the alternate hypothesis absurd.

Tracing Woods: will someone struggling with basic arithmetic and someone who knows calculus benefit from the same instruction? no.

would selective schools and the students in them benefit from opening their doors to everyone? they certainly don’t seem to think so!

do athletics orgs advocate grouping young LeBron into his local mixed-ability rec league so he can trarin and progress? no.

do gifted kids who accelerate learn more advanced material when they’re presented with it? yes.

if people see school as a democratic equalizer where everyone should learn the same things, they find ability grouping doesn’t work (doesn’t accomplish that goal). if people see school as a place where people should learn specific subjects and progess to whatever level they can, they find it does work (does accomplish that goal). and because education research is dominated by the former, onlookers glance at its output and say “huh, the results are mixed. guess we’ll never know!”

there is no substitute for understanding what is going on at a ground level.

The question is how to make it work best, not whether it works. Very obviously, as the next section discusses, it is possible to massively screw it up if you try hard enough.

Jordan Michelson: Why would *teacher’s unionsoppose ability grouping? It makes no sense.

Matthew Yglesias: Ideology.

Karen Vaites: The average American would be shocked by the degree to which K-12 education is ruled by ideology. Beliefs about teaching and how we want learning to work often trump evidence about what does, in fact, work.

Tracing Woods: “Trace, why are you making such a big deal out of something everybody already agrees with?”

Because the people we trust to direct society on this topic at every level oppose it.

And I hope that maybe if I shout enough about that people will really internalize what that means.

Exactly. Everyone agrees we want [X], where [X] is tracking. We keep talking about it because [~X] keeps actually happening.

I presume opposition is mostly ideology. Full stop. They want to prevent the wrong kids learning too much. They are sacrificing the kids on their alter.

Academics and education ‘experts,’ despite the literature and all actual observations and everyone involved saying that tracking helps all kids learn better, keep lamenting that parents want tracking, and work to destroy it, often in the name of ‘equity.’

It is common to see people claim ‘the research’ says that ‘downstreamed’ kids who are grouped at lower ability do worse rather than better as a result and that ‘the research’ supports this. As far as I can tell this is simply not true, these people simply think it ‘should’ be true, and seek out ways to say it anyway.

Tracing Woods: what happens when academics study parental perception of ability grouping?

They lament that parents of students at all levels favor it even though it’s BAD.

Virtually no parents support ending ability grouping.

Parents of kids in both remedial and G/T programs agreed that their kids should be grouped with kids of equal ability

80% of special-ed parents, 90% of parents with kids in remedial courses, and 98% of parents with kids in advanced programs agreed that their kids were helped by it.

This was all very disappointing to the authors.

Why do the authors here, like many other academics and education experts and many school principals who somehow end up actually destroying such programs, say that this is bad, despite everyone involved in the actual schools agreeing it helps all of the students?

Because the ‘educators’ who determine policy (as opposed to the teachers whose job is to actually educate children) consistently have decided that they do not care about the life experiences of families and children or helping children learn.

What they care about, other than money, is preventing learning rather than causing learning. Or, as they call it, ‘equality’ or ‘equity.’ Never mind that this ‘equity’ directly hurts the students who are otherwise being ‘denied’ it, what matters is that they be given ‘opportunity to learn and equality of educational opportunity.’ Educational opportunity shall be destroyed until this is achieved. If that leads to everyone getting a worse education, even the worst off kids, well, that’s not their department’s KPI.

I don’t quite agree with Anton that these people ‘hate you and your children.’ They only hate that you and your children might do better than other children, and want to prevent this from happening. They only hate you if you oppose this goal.

Garry Tan: Ability grouping in school (honors/AP) depends on your frame. If school’s job is to equalize, grouping looks like a fail. If it’s to let kids sprint ahead, it works.

Academia worships equalizing— so the School of Education bureaucrats become anti-education, ruining schools.

When you examine the list of nonprofits and academics that want to remove advanced math from classrooms and water down the standards for all students it will leave you shaken.

It’s not a fringe movement. It is School of Education Orthodoxy.

Tracing Woods: This is a fair question! Opposition to ability grouping is a fringe idea opposed by the great majority of parents. So which obscure, fringe organizations are pushing it?

Let’s ask the National Council of Teachers of English what they think:

Or what about the most prominent law casebook publishers?

Or consider the National Association of Secondary School Principals.

How about the Association of State

Supervisors of Mathematics, NCSM: Leadership in

Mathematics Education, and the National Council of Teachers of Mathematics (NCTM)?

You know it.

If it is a choice between the form of academia that wants to prevent children from learning, and the form of academia that helps teach useful things, and one or the other must be destroyed?

The choice seems clear.

Tracing Woods: my modest proposal to every university that has published research claiming ability grouping (with paired curricular modification) doesn’t work:

detrack. remove your admissions standards. remove course prerequisites.

if detracking works, let’s create Harvard Community College.

However, you do have to choose a reasonable implementation. Is it possible we also in some ways are screwing implementation up so badly that adding what we call ability grouping to the mix, as implemented in practice, could make things worse?

North Carolina excluded half its qualified students from advanced math. They tried to pass a law to fix some of this. The schools fought back.

Tracing Woods: What happened when North Carolina changed its laws to require top-scoring students to be placed in advanced math?

The state board of education changed the test cutoffs, subverting the intent of the law by dropping almost all students from the top-scoring category.

Janet Johnson and John Wittle: The law was intended to help high school students who excelled in their math classes move into the advanced track. Before the scoring change, 11% of high school students statewide scored at Level 5, the highest level, with some districts seeing rates as high as 25%. The EVAAS Prediction vs. Performance table (above) showed that in 2009, 42,144 students were predicted to be successful in 8th-grade algebra, and only 18,670 students were enrolled. Using EVAAS prediction as the metric would have given 23,474 more students access to advanced math.

After the law passed, which required schools to admit all Level 5 students to advanced classes, and the state changed the scoring scale, fewer than 1,500 (too few to report) high school students in the entire state achieved Level 5.

The schools had technically complied with the law while completely subverting its intent. These charts are from the NC School Report Cards.14 After 2019, the Math 1 Performance charts show no Level 5 and very little Level 4, similar to this.

The full post on ‘the Algebra gatekeepers’ keeps outlining all the tactics used to ensure that kids do not learn algebra, especially disadvantaged kids. As you read it, it keeps getting worse.

For example, we attended a meeting with the parents of a middle school girl who had earned an A in 7th-grade pre-algebra but was denied enrollment in 8th-grade algebra despite her and her parents’ wishes. The teachers argued that her formative assessments didn’t align with her summative performance, suggesting her previous success didn’t guarantee future outcomes.

The language arts teacher claimed the student had “appeared to struggle” during benchmark activities and that earning an A seemed “harder for her than for other students who achieved similar summative data results.” The math teacher who had given her an A pointed to C grades on some earlier formative assessments, arguing that despite her subsequent A performance on chapter tests and other summative data points, these initial benchmark scores indicated she “sometimes struggles with foundational concepts during the formative assessment process.”

The administrators nodded knowingly as teachers referenced “inconsistent performance across benchmark measures” and “concerns about the gap between formative and summative data trends.” They suggested that while her final grades represented solid summative data, the formative assessment patterns revealed “areas of concern” that made advanced placement inadvisable, regardless of what the summative data actually showed about her mastery of the subject matter. This is just an example of the kind of talk the parents encountered, so the church advocacy group started bringing someone from our staff to the meetings to help them cut through this.

This type of circular reasoning, where success was reframed as evidence of potential failure, was typical of how schools justified excluding qualified students from advanced courses.

Administrators would routinely promise that students could move to advanced tracks “later, in high school,” but our analyses and other research we found indicate students rarely moved to the advanced track after the 8th grade.

The tracking system created rigid pathways where missing 8th-grade algebra typically meant students couldn’t reach calculus by graduation, limiting their college and career options.

One fifth grader exemplified the problem. He had scored at the highest level on every test, had a 98% EVAAS prediction for success, and straight As on his report card. He was also officially classified as Academically Gifted by the school. When a new advanced math class was created, he wasn’t invited. When he asked to join, his parents were told no, he needed to be “recommended.” School officials told us, with straight faces, that his consistent past success was no indication he could succeed in advanced math, that they were keeping him out for his own good.

What North Carolina is doing here, excluding lots of qualified students, does at least seem better than ending algebra entirely for everyone, I suppose.

Pamela Hobart also looked into the same writeup and offers her own thread, notes that this steps fully into cartoon villainy.

The true teaching of algebra to kids who are ready for it is almost impossible to find. Even if you send your child to a ‘gifted’ school, they mostly won’t let kids get more than a year or two ahead of ‘schedule.’ Schools instead think it is better kids be bored for five years, for their own good you see.

Raymond Arnold points to the best objection I have seen, which is that if a more advanced option exists then many parents will push for it even when it is inappropriate for their particular child, or use it to push their child way too hard, and while this means some kids are bored it is saving a lot of families from being forced into a red queen’s race.

This is a real cost, but the prior should be rather extremely stacked against ‘if we let kids learn more then parents would try to have their kids learn too much and this would be bad,’ especially when you can gate the advanced classes with objective tests. Yes, parents can push their kids to study harder to try and pass those tests, but that’s a risk I am willing to take.

What is the steelman case that ‘ability grouping doesn’t work’ or ‘ability grouping has been tried and didn’t work’ in some particular context?

This by Karen Vaites is perhaps the closest, in particular on early reading grouping, convinced me that in practice you really can mess it up badly enough to make things actively worse. This was convincing that we’re messing up badly enough that this is a real possibility.

As in, what happens in practice is that you group kids by a measurement of abstract ‘reading level’ and then focus on ‘achievement’ of ‘reading level,’ forbid them to read anything beyond ‘reading level,’ and don’t ask what actual skills they need, don’t move them between groupings as their skills change, and then wonder why it isn’t working.

One could almost say, if you look at the details, that the teachers are using ‘reading groups’ as a substitute for actually teaching the children to read. You put them in a group and then you did your job. Again, yeah, I can see how that wouldn’t work. Indeed, if you are outsourcing the teaching job to other kids, then at that point you actively want uneven groups, because you want to group students with student teachers.

Whereas once students get to ‘escape velocity’ on reading, which the better students have relatively early, they no longer need a teacher, they just need motivation and permission to read books. Whereas the system seems designed to stop kids from reading books they want to read if they are deemed ‘too hard’? A kid can tell you if a book is too hard, they won’t want to read it.

One big complaint is that it is ‘hard to measure reading level.’ I don’t think it is hard. You can observe a lot just by watching. The problem is that you’re measuring a set of distinct reading skills as if it was one number, and then treating that one number as real, and also abdicating all the real work.

Sarah Sparks: But evidence suggests that the practice may be less beneficial than teachers think: It can exacerbate achievement gaps and even slow reading growth for some children unless the groups are fluid and focused on skills rather than overall achievement.

“What we’ve discovered is that it’s fine to have a group of students of different levels, as long as they all are working on the same learning needs,” said Carol Connor, an education professor at the University of California, Irvine, who developed the program. “You can have students of different reading abilities who all need to work on decoding. … What doesn’t work is if you put your kids who already know how to code in a group to learn how to code, again. You receive more behavior problems because they’re really bored, … and our research suggests that it has a negative effect on their growth.”

Karen Vaites: Tim Shanahan breaks down key research and its instructional implications in The Instructional Level Concept Revisited: Teaching with Complex Text. As researchers looked into the effectiveness working at reading level, studies found that it “has made no difference—that is the kids taught from grade level materials do as well as those at an instructional level—or the instructional level placements have led to less learning.”

More recently, he highlights additional new evidence from a study of third graders: “Results indicate that weaker readers, using texts at two, three, and four grade levels above their instructional levels with the assistance of lead readers [other, better reading, third graders], outscored both proficient and less proficient students in the control group across multiple measures of reading achievement.”

From a question: My daughter is in first grade. Her classroom teachers have all the books in the classroom library leveled, and students are not allowed to go beyond their reading level during “Independent” reading.

From the answer: Your daughter’s aspirations as a reader are the problem here. Some kids are allowed to read the red-dot books and others are stuck with the baby books with the blue dots. She wants to be a red dot kid, to hang with the red dot kids, to be seen as a red dot kid … but her teacher can only see her as a blue dot kid and she must learn to stay to her own bookshelf with her own kind if she is going to succeed in this classroom.

In the meantime, explain to your daughter that the teacher is trying to help her but that we teachers sometimes don’t get it right, and that you can’t always “fight city hall.”

So yes, you group primarily by what aspects the kids need to work on most, and that works better. Sure. I can totally believe that is a better strategy. Skill issue.

Instead of using it to figure out what kids need to do to learn to read and putting them in position to learn that, it sounds like grouping is being used to prevent kids from meaningfully reading? The purpose is to gate reading behind general tests? To spread ‘equality’ to progress on different reading aspects?

Sarah Sparks: It sounds like good sense. “Kids should be reading just-right texts as they grow as readers.” That just sounds sensible, doesn’t it? Many urban legends do… until you know better.

I can see why that might actively backfire. This isn’t about ‘ability grouping’ not working. It’s about failing to actually group by the relevant ability, and it’s about the ‘just-right text’ theory that seems to me obviously wrong.

Sarah Sparks: During Tier 1 instruction, you want all kids working with grade level texts; students reading below grade level will need scaffolding and support (as well as targeted Tier 2 and/or 3 intervention).

This promotes equity, for it’s the best mechanism for helping below-benchmark students to catch up.

It also honors the fact that a fifth grader who reads at second grade level is still thinking at the level of a fifth grader, and he or she will remain engaged and motivated by learning content and vocabulary at his or her developmental level. (No more baby books for big kids, y’all!)

For details on how to do this, check out:

Ignore the ‘this promotes equity’ framing, since you could simply say ‘this promotes learning to read.’ Equity via catching up those lagging is good, and you call it learning.

The theory here is that age matters a lot. That if you are a fifth grader, your ability to learn is inherently much stronger than that of a second grader, whereas the ability of different second graders is alike? Equality (of those at the same age) for me, inequality (of those at different ages) for thee. Whereas the correct model is that each kid has a different ability to learn different things, that usually improves steadily with age.

But also note that this is saying that the best way for many students at second grade reading level to learn reading is to assign them to read fifth grade level books, indeed to mandate it. Yes, I can believe that. So why are we so often telling kids at second grade reading level or literally in second grade or both that they can’t read the fifth grade books even when they want to?

The other theory present in this proposal is, how about using techniques that actually teach kids reading. And yeah, I agree, that would be great.

Tracing Woods (replying to Vaites): This is an extremely useful article that deserves a full, thorough response.

My short response is that I agree narrowly (most leveled readers seem quite bad, training specific skills matters, in-class grouping for reading is quite popular and often pretty uninspired) and disagree broadly (there is pretty strong evidence for the value of several forms of ability grouping, drawing from eg Direct Instruction, Success for All, acceleration/gifted literature; ability grouping has acquired a bad reputation for reasons mostly unrelated to its performance; “grade level” is the wrong measure) in a way that would be productive to hash out more fully.

I agree fully that the real question is cultural.

In case you didn’t realize that there is a war. There has been for a while.

Discussion about this post

Childhood and Education #14: The War On Education Read More »

why-accessibility-might-be-ai’s-biggest-breakthrough

Why accessibility might be AI’s biggest breakthrough

For those with visual impairments, language models can summarize visual content and reformat information. Tools like ChatGPT’s voice mode with video and Be My Eyes allow a machine to describe real-world visual scenes in ways that were impossible just a few years ago.

AI language tools may be providing unofficial stealth accommodations for students—support that doesn’t require formal diagnosis, workplace disclosure, or special equipment. Yet this informal support system comes with its own risks. Language models do confabulate—the UK Department for Business and Trade study found 22 percent of users identified false information in AI outputs—which could be particularly harmful for users relying on them for essential support.

When AI assistance becomes dependence

Beyond the workplace, the drawbacks may have a particular impact on students who use the technology. The authors of a 2025 study on students with disabilities using generative AI cautioned, “Key concerns students with disabilities had included the inaccuracy of AI answers, risks to academic integrity, and subscription cost barriers,” they wrote. Students in that study had ADHD, dyslexia, dyspraxia, and autism, with ChatGPT being the most commonly used tool.

Mistakes in AI outputs are especially pernicious because, due to grandiose visions of near-term AI technology, some people think today’s AI assistants can perform tasks that are actually far outside their scope. As research on blind users’ experiences suggested, people develop complex (sometimes flawed) mental models of how these tools work, showing the need for higher awareness of AI language model drawbacks among the general public.

For the UK government employees who participated in the initial study, these questions moved from theoretical to immediate when the pilot ended in December 2024. After that time, many participants reported difficulty readjusting to work without AI assistance—particularly those with disabilities who had come to rely on the accessibility benefits. The department hasn’t announced the next steps, leaving users in limbo. When participants report difficulty readjusting to work without AI while productivity gains remain marginal, accessibility emerges as potentially the first AI application with irreplaceable value.

Why accessibility might be AI’s biggest breakthrough Read More »

in-court-filing,-google-concedes-the-open-web-is-in-“rapid-decline”

In court filing, Google concedes the open web is in “rapid decline”

Advertising and the open web

Google objects to this characterization. A spokesperson calls it a “cherry-picked” line from the filing that has been misconstrued. Google’s position is that the entire passage is referring to open-web advertising rather than the open web itself. “Investments in non-open web display advertising like connected TV and retail media are growing at the expense of those in open web display advertising,” says Google.

If we assume this is true, it doesn’t exactly let Google off the hook. As AI tools have proliferated, we’ve heard from Google time and time again that traffic from search to the web is healthy. When people use the web more, Google makes more money from all those eyeballs on ads, and indeed, Google’s earnings have never been higher. However, Google isn’t just putting ads on websites—Google is also big in mobile apps. As Google’s own filings make clear, in-app ads are by far the largest growth sector in advertising. Meanwhile, time spent on non-social and non-video content is stagnant or slightly declining, and as a result, display ads on the open web earn less.

So, whether Google’s wording in the filing is meant to address the web or advertising on the web may be a distinction without a difference. If ads on websites aren’t making the big bucks, Google’s incentives will undoubtedly change. While Google says its increasingly AI-first search experience is still consistently sending traffic to websites, it has not released data to show that. If display ads are in “rapid decline,” then it’s not really in Google’s interest to continue sending traffic to non-social and non-video content. Maybe it makes more sense to keep people penned up on its platform where they can interact with its AI tools.

Of course, the web isn’t just ad-supported content—Google representatives have repeatedly trotted out the claim that Google’s crawlers have seen a 45 percent increase in indexable content since 2023. This metric, Google says, shows that open web advertising could be imploding while the web is healthy and thriving. We don’t know what kind of content is in this 45 percent, but given the timeframe cited, AI slop is a safe bet.

If the increasingly AI-heavy open web isn’t worth advertisers’ attention, is it really right to claim the web is thriving as Google so often does? Google’s filing may simply be admitting to what we all know: the open web is supported by advertising, and ads increasingly can’t pay the bills. And is that a thriving web? Not unless you count AI slop.

In court filing, Google concedes the open web is in “rapid decline” Read More »

nobel-laureate-david-baltimore-dead-at-87

Nobel laureate David Baltimore dead at 87

Nobel Prize-winning molecular biologist and former Caltech president David Baltimore—who found himself at the center of controversial allegations of fraud against a co-author—has died at 87 from cancer complications. He shared the 1975 Nobel Prize in Physiology for his work upending the then-consensus that cellular information flowed only in one direction. Baltimore is survived by his wife of 57 years, biologist Alice Huang, as well as a daughter and granddaughter.

“David Baltimore’s contributions as a virologist, discerning fundamental mechanisms and applying those insights to immunology, to cancer, to AIDS, have transformed biology and medicine,” current Caltech President Thomas F. Rosenbaum said in a statement. “David’s profound influence as a mentor to generations of students and postdocs, his generosity as a colleague, his leadership of great scientific institutions, and his deep involvement in international efforts to define ethical boundaries for biological advances fill out an extraordinary intellectual life.”

Baltimore was born in New York City in 1938. His father worked in the garment industry, and his mother later became a psychologist at the New School and Sarah Lawrence. Young David was academically precocious and decided he wanted to be a scientist after spending a high school summer learning about mouse genetics at the Jackson Laboratory in Maine. He graduated from Swarthmore College and earned his PhD in biology from Rockefeller University in 1964 with a thesis on the study of viruses in animal cells. He joined the Salk Institute in San Diego, married Huang, and moved to MIT in 1982, founding the Whitehead Institute.

Baltimore initially studied viruses like polio and mengovirus that make RNA copies of the RNA genomes to replicate, but later turned his attention to retroviruses, which have enzymes that make DNA copies of viral RNA. He made a major breakthrough when he proved the existence of that viral enzyme, now known as reverse transcriptase. Previously scientists had thought that the flow of information went from DNA to RNA to protein synthesis. Baltimore showed that process could be reversed, ultimately enabling researchers to use disabled retroviruses to insert genes into human DNA to correct genetic diseases.

Nobel laureate David Baltimore dead at 87 Read More »

yes,-ai-continues-to-make-rapid-progress,-including-towards-agi

Yes, AI Continues To Make Rapid Progress, Including Towards AGI

That does not mean AI will successfully make it all the way to AGI and superintelligence, or that it will make it there soon or on any given time frame.

It does mean that AI progress, while it could easily have been even faster, has still been historically lightning fast. It has exceeded almost all expectations from more than a few years ago. And it means we cannot disregard the possibility of High Weirdness and profound transformation happening within a few years.

GPT-5 had a botched rollout and was only an incremental improvement over o3, o3-Pro and other existing OpenAI models, but was very much on trend and a very large improvement over the original GPT-4. Nor would one disappointing model from one lab have meant that major further progress must be years away.

Imminent AGI (in the central senses in which that term AGI used, where imminent means years rather than decades) remains a very real possibility.

Part of this is covering in full Gary Marcus’s latest editorial in The New York Times, since that is the paper of record read by many in government. I felt that piece was in many places highly misleading to the typical Times reader.

Imagine if someone said ‘you told me in 1906 that there was increasing imminent risk of a great power conflict, and now it’s 1911 and there has been no war, so your fever dream of a war to end all wars is finally fading.’ Or saying that you were warned in November 2019 that Covid was likely coming, and now it’s February 2020 and no one you know has it, so it was a false alarm. That’s what these claims sound like to me.

I have to keep emphasizing this because it now seems to be an official White House position, with prominent White House official Sriram Krishnan going so far as to say on Twitter that AGI any time soon has been ‘disproven,’ and David Sacks spending his time ranting and repeating Nvidia talking points almost verbatim.

When pressed, there is often a remarkably narrow window in which ‘imminent’ AGI is dismissed as ‘proven wrong.’ But this is still used as a reason to structure public policy and one’s other decisions in life as if AGI definitely won’t happen for decades, which is Obvious Nonsense.

Sriram Krishnan: I’ll write about this separately but think this notion of imminent AGI has been a distraction and harmful and now effectively proven wrong.

Prinz: “Imminent AGI” was apparently “proven wrong” because OpenAI chose to name a cheap/fast model “GPT-5” instead of o3 (could have been done 4 months earlier) or the general reasoning model that won gold on both the IMO and the IOI (could have been done 4 months later).

Rob Miles: I’m a bit confused by all the argument about GPT-5, the truth seems pretty mundane: It was over-hyped, they kind of messed up the launch, and the model is good, a reasonable improvement, basically in line with the projected trend of performance over time.

Not much of an update.

To clarify a little, the projected trend GPT-5 fits with is pretty nuts, and the world is on track to be radically transformed if it continues to hold. Probably we’re going to have a really wild time over the next few years, and GPT-5 doesn’t update that much in either direction.

Rob Miles is correct here as far as I can tell.

If imminent means ‘within the next six months’ or maybe up to a year I think Sriram’s perspective is reasonable, because of what GPT-5 tells us about what OpenAI is cooking. For sensible values of imminent that are more relevant to policy and action, Sriram Krishnan is wrong, in a ‘I sincerely hope he is engaging in rhetoric rather than being genuinely confused about this, or his imminently only means in the next year or at most two’ way.

I am confused how he can be sincerely mistaken given how deep he is into these issues, or that he shares his reasons so we can quickly clear this up because this is a crazy thing to actually believe. I do look forward to Sriram providing a full explanation as to why he believes this. So far we we only have heard ‘GPT-5.’

Not only is imminent AGI not disproven, there are continuing important claims that it is likely. Here is some clarity on Anthropic’s continued position, as of August 31.

Prinz: Jack, I assume no changes to Anthropic’s view that transformative AI will arrive by the end of next year?

Jack Clark: I continue to think things are pretty well on track for the sort of powerful AI system defined in machines of loving grace – buildable end of 2026, running many copies 2027. Of course, there are many reasons this could not occur, but lots of progress so far.

Anthropic’s valuation has certainly been on a rocket ship exponential.

Do I agree that we are on track to meet that timeline? No. I do not. I would be very surprised to see it go down that fast, and I am surprised that Jack Clark has not updated based on, if nothing else, previous projections by Anthropic CEO Dario Amodei falling short. I do think it cannot be ruled out. If it does happen, I do not think you have any right to be outraged at the universe for it.

It is certainly true that Dario Amodei’s early predictions of AI writing most of the code, as in 90% of all code within 3-6 months after March 11. This was not a good prediction, because the previous generation definitely wasn’t ready and even if it had been that’s not how diffusion works, and has been proven definitively false, it’s more like 40% of all code generated by AI and 20%-25% of what goes into production.

Which is still a lot, but a lot less than 90%.

Here’s what I said at the time about Dario’s prediction:

Zvi Mowshowitz (AI #107): Dario Amodei says AI will be writing 90% of the code in 6 months and almost all the code in 12 months. I am with Arthur B here, I expect a lot of progress and change very soon but I would still take the other side of that bet. The catch is: I don’t see the benefit to Anthropic of running the hype machine in overdrive on this, at this time, unless Dario actually believed it.

I continue to be confused why he said it, it’s highly unstrategic to hype this way. I can only assume on reflection this was an error about diffusion speed more than it was an error about capabilities? On reflection yes I was correctly betting ‘no’ but that was an easy call. I dock myself more points on net here, for hedging too much and not expressing the proper level of skepticism. So yes, this should push you towards putting less weight on Anthropic’s projections, although primarily on the diffusion front.

As always, remember that projections of future progress include the possibility, nay the inevitability, of discovering new methods. We are not projecting ‘what if the AI labs all keep ramming their heads against the same wall whether or not it works.’

Ethan Mollick: 60 years of exponential growth in chip density was achieved not through one breakthrough or technology, but a series of problems solved and new paradigms explored as old ones hit limits.

I don’t think current AI has hit a wall, but even if it does, there many paths forward now.

Paul Graham: One of the things that strikes me when talking to AI insiders is how they believe both that they need several new discoveries to get to AGI, and also that such discoveries will be forthcoming, based on the past rate.

My talks with AI insiders also say we will need new discoveries, and we definitely will need new major discoveries in alignment. But it’s not clear how big those new discoveries need to be in order to get there.

I agree with Ryan Greenblatt that precise timelines for AGI don’t matter that much in terms of actionable information, but big jumps in the chance of things going crazy within a few years can matter a lot more. This is similar to questions of p(doom), where as long as you are in the Leike Zone of a 10%-90% chance of disaster, you mostly want to react in the same ways, but outside that range you start to see big changes in what makes sense.

Ryan Greenblatt: Pretty short timelines (<10 years) seem likely enough to warrant strong action and it's hard to very confidently rule out things going crazy in <3 years.

While I do spend some time discussing AGI timelines (and I’ve written some posts about it recently), I don’t think moderate quantitative differences in AGI timelines matter that much for deciding what to do. For instance, having a 15-year median rather than a 6-year median doesn’t make that big of a difference. That said, I do think that moderate differences in the chance of very short timelines (i.e., less than 3 years) matter more: going from a 20% chance to a 50% chance of full AI R&D automation within 3 years should potentially make a substantial difference to strategy.

Additionally, my guess is that the most productive way to engage with discussion around timelines is mostly to not care much about resolving disagreements, but then when there appears to be a large chance that timelines are very short (e.g., >25% in <2 years) it's worthwhile to try hard to argue for this. I think takeoff speeds are much more important to argue about when making the case for AI risk.

I do think that having somewhat precise views is helpful for some people in doing relatively precise prioritization within people already working on safety, but this seems pretty niche.

Given that I don’t think timelines are that important, why have I been writing about this topic? This is due to a mixture of: I find it relatively quick and easy to write about timelines, my commentary is relevant to the probability of very short timelines (which I do think is important as discussed above), a bunch of people seem interested in timelines regardless, and I do think timelines matter some.

Consider reflecting on whether you’re overly fixated on details of timelines.

Jason Calacanis of the All-In Podcast (where he is alongside AI Czar David Sacks) has a bold prediction, if you believe that his words have or are intended to have meaning. Which is an open question.

Jason: Before 2030 you’re going to see Amazon, which has massively invested in [AI], replace all factory workers and all drivers … It will be 100% robotic, which means all of those workers are going away. Every Amazon worker. UPS, gone. FedEx, gone.

Aaron Slodov: hi @Jason how much money can i bet you to take the other side of the factory worker prediction?

Jason (responding to video of himself saying the above): In 2035 this will not be controversial take — it will be reality.

Hard, soul-crushing labor is going away over the next decade. We will be deep in that transition in 2030, when humanoid robots are as common as bicycles.

Notice the goalpost move of ‘deep in that transition’ in 2030 versus saying full replacement by 2030, without seeming to understand there is any contradiction.

These are two very different predictions. The original ‘by 2030’ prediction is Obvious Nonsense unless you expect superintelligence and a singularity, probably involving us all dying. There’s almost zero chance otherwise. Technology does not diffuse that fast.

Plugging 2035 into the 2030 prediction is also absurd, if we take the prediction literally. No, you’re not going to have zero workers at Amazon, UPS and FedEx within ten years unless we’ve not only solved robotics and AGI, we’ve also diffused those technologies at full scale. In which case, again, that’s a singularity.

I am curious what his co-podcaster David Sacks or Sriram Krishnan would say here. Would they dismiss Jason’s confident prediction as already proven false? If not, how can one be confident that AGI is far? Very obviously you can’t have one without the other.

GPT-5 is not a good reason to dismiss AGI, and to be safe I will once again go into why, and why we are making rapid progress towards AGI.

GPT-5 and GPT-4 were both major leaps in benchmarks from the previous generation.

The differences are dramatic, and the time frame between releases was similar.

The actual big difference? That there was only one incremental release between GPT-3 and GPT-4, GPT-3.5, with little outside competition. Whereas between GPT-4 and GPT-5 we saw many updates. At OpenAI alone we saw GPT-4o, and o1, and o3, plus updates that didn’t involve number changes, and at various points Anthropic’s Claude and Google’s Gemini were plausibly on top. Our frog got boiled slowly.

Epoch AI: However, one major difference between these generations is release cadence. OpenAI released relatively few major updates between GPT-3 and GPT-4 (most notably GPT-3.5). By contrast, frontier AI labs released many intermediate models between GPT-4 and 5. This may have muted the sense of a single dramatic leap by spreading capability gains over many releases.

Benchmarks can be misleading, especially as we saturate essentially all of them often well ahead of predicted schedules, but the overall picture is not. The mundane utility and user experience jumps across all use cases are similarly dramatic. The original GPT-4 was a modest aid to coding, GPT-5 and Opus 4.1 transform how it is done. Most of the queries I make with GPT-5-Thinking or GPT-5-Pro would not be worth bothering to give to the original GPT-4, or providing the context would not even be possible. So many different features have been improved or added.

This ideas, frequently pushed by among others David Sacks, that everyone’s models are about the same and aren’t improving? These claims simply are not true. Observant regular users are not about to be locked into one model or ecosystem.

Everyone’s models are constantly improving. No one would seriously consider using models from the start of the year for anything but highly esoteric purposes.

The competition is closer than one would have expected. There are three major labs, OpenAI, Anthropic and Google, that each have unique advantages and disadvantages. At various times each have had the best model, and yes currently it is wise to mix up your usage depending on your particular use case.

Those paying attention are always ready to switch models. I’ve switched primary models several times this year alone, usually switching to a model from a different lab, and tested many others as well. And indeed we must switch models often either way, as it is expected that everyone’s models will change on the order of every few months, in ways that break the same things that would break if you swapped GPT-5 for Opus or Gemini or vice versa, all of which one notes typically run on three distinct sets of chips (Nvidia for GPT-5, Amazon Trainium for Anthropic and Google TPUs for Gemini) but we barely notice.

Most people notice AI progress much better when it impacts their use cases.

If you are not coding, and not doing interesting math, and instead asking simple things that do not require that much intelligence to answer correctly, then upgrading the AI’s intelligence is not going to improve your satisfaction levels much.

Jack Clark: Five years ago the frontier of LLM math/science capabilities was 3 digit multiplication for GPT-3. Now, frontier LLM math/science capabilities are evaluated through condensed matter physics questions. Anyone who thinks AI is slowing down is fatally miscalibrated.

David Shapiro: As I’ve said before, AI is “slowing down” insofar as most people are not smart enough to benefit from the gains from here on out.

Once you see this framing, you see the contrast everywhere.

Patrick McKenzie: I think a lot of gap between people who “get” LLMs and people who don’t is that some people understand current capabilities to be a floor and some people understand them to be either a ceiling or close enough to a ceiling.

And even if you explain “Look this is *obviouslya floor” some people in group two will deploy folk reasoning about technology to say “I mean technology decays in effectiveness all the time.” (This is not considered an insane POV in all circles.)

And there are some arguments which are persuasive to… people who rate social pressure higher than received evidence of their senses… that technology does actually frequently regress.

For example, “Remember how fast websites were 20 years ago before programmers crufted them up with ads and JavaScript? Now your much more powerful chip can barely keep up. Therefore, technological stagnation and backwards decay is quite common.”

Some people would rate that as a powerful argument. Look, it came directly from someone who knew a related shibboleth, like “JavaScript”, and it gestures in the direction of at least one truth in observable universe.

Oh the joys of being occasionally called in as the Geek Whisperer for credentialed institutions where group two is high status, and having to titrate how truthful I am about their worldview to get message across.

As in, it’s basically this graph but for AI:

Here’s another variant of this foolishness, note the correlation to ‘hitting a wall’:

Prem Kumar Aparanji: It’s not merely the DL “hitting a wall” (as @GaryMarcus put it & everybody’s latched on) now as predicted, even the #AI data centres required for all the training, fine-tuning, inferencing of these #GenAI models are also now predicted to be hitting a wall soon.

Quotes from Futurism: For context, Kupperman notes that Netflix brings in just $39 billion in annual revenue from its 300 million subscribers. If AI companies charged Netflix prices for their software, they’d need to field over 3.69 billion paying customers to make a standard profit on data center spending alone — almost half the people on the planet.

“Simply put, at the current trajectory, we’re going to hit a wall, and soon,” he fretted. “There just isn’t enough revenue and there never can be enough revenue. The world just doesn’t have the ability to pay for this much AI.”

Prinz: Let’s assume that AI labs can charge as much as Netflix per month (they currently charge more) and that they’ll never have any enterprise revenue (they already do) and that they won’t be able to get commissions from LLM product recommendations (will happen this year) and that they aren’t investing in biotech companies powered by AI that will soon have drugs in human trial (they already have). How will they ever possibly be profitable?

He wrote a guest opinion essay. Things didn’t go great.

That starts with the false title (as always, not entirely up to the author, and it looks like it started out as a better one), dripping with unearned condescension, ‘The Fever Dream of Imminent ‘Superintelligence’ Is Finally Breaking,’ and the opening paragraph in which he claims Altman implied GPT-5 would be AGI.

Here is the lead:

GPT-5, OpenAI’s latest artificial intelligence system, was supposed to be a game changer, the culmination of billions of dollars of investment and nearly three years of work. Sam Altman, the company’s chief executive, implied that GPT-5 could be tantamount to artificial general intelligence, or A.G.I. — A.I. that is as smart and as flexible as any human expert.

Instead, as I have written, the model fell short. Within hours of its release, critics found all kinds of baffling errors: It failed some simple math questions, couldn’t count reliably and sometimes provided absurd answers to old riddles. Like its predecessors, the A.I. model still hallucinates (though at a lower rate) and is plagued by questions around its reliability. Although some people have been impressed, few saw it as a quantum leap, and nobody believed it was A.G.I. Many users asked for the old model back.

GPT-5 is a step forward but nowhere near the A.I. revolution many had expected. That is bad news for the companies and investors who placed substantial bets on the technology.

Did you notice the stock market move in AI stocks, as those bets fell down to Earth when GPT-5 was revealed? No? Neither did I.

The argument above is highly misleading on many fronts.

  1. GPT-5 is not AGI, but this was entirely unsurprising – expectations were set too high, but nothing like that high. Yes, Altman teased that it was possible AGI could arrive relatively soon, but at no point did Altman claim that GPT-5 would be AGI, or that AGI would arrive in 2025. Approximately zero people had median estimates of AGI in 2025 or earlier, although there are some that have estimated the end of 2026, in particular Anthropic (they via Jack Clark continue to say ‘powerful’ AI buildable by end of 2026, not AGI arriving 2026).

  2. The claim that it ‘couldn’t count reliably’ is especially misleading. Of course GPT-5 can count reliably. The evidence here is a single adversarial example. For all practical purposes, if you ask GPT-5 to count something, it will count that thing.

  3. Old riddles is highly misleading. If you give it an actual old riddle it will nail it. What GPT-5 and other models get wrong are, again, adversarial examples that do not exist ‘in the wild’ but are crafted to pattern match well-known other riddles while having a different answer. Why should we care?

  4. GPT-5 still is not fully reliable but this is framed as it being still highly unreliable, when in most circumstances this is not the case. Yes, if you need many 9s of reliability LLMs are not yet for you, but neither are humans.

  5. AI valuations and stocks continue to be rising not falling.

  6. Yes, the fact that OpenAI chose to have GPT-5 not be a scaled up model does tell us that directly scaling up model size alone has ‘lost steam’ in relative terms due to the associated costs, but this is not news, o1 and o3 (and GPT-4.5) tell us this as well. We are now working primarily on scaling and improving in other ways, but very much there are still plans to scale up more in the future. In the context of all the other facts quoted about other scaled up models, it seems misleading to many readers to not mention that GPT-5 is not scaled up.

  7. Claims here are about failures of GPT-5-Auto or GPT-5-Base, whereas the ‘scaled up’ version of GPT-5 is GPT-5-Pro or at least GPT-5-Thinking.

  8. Gary Marcus clarifies that his actual position is on the order of 8-15 years to AGI, with 2029 being ‘awfully unlikely.’ Which is a highly reasonable timeline, but that seems pretty imminent. That’s crazy soon. That’s something I would want to be betting on heavily, and preparing for at great cost, AGI that soon seems like the most important thing happening in the world right now if likely true?

    1. The article does not give any particular timeline, and does not imply we will never get to AGI, but I very much doubt those reading the post would come away with the impression that things strictly smarter than people are only about 10 years away. I mean, yowsers, right?

The fact about ‘many users asked for the old model back’ is true, but lacking the important context that what users wanted was the old personality, so it risks giving an uninformed user the wrong impression.

To Gary’s credit, he then does hedge, as I included in the quote, acknowledging GPT-5 is indeed a good model representing a step forward. Except then:

And it demands a rethink of government policies and investments that were built on wildly overinflated expectations.

Um, no? No it doesn’t. That’s silly.

The current strategy of merely making A.I. bigger is deeply flawed — scientifically, economically and politically. Many things, from regulation to research strategy, must be rethought.

As many now see, GPT-5 shows decisively that scaling has lost steam.

Again, no? That’s not the strategy. Not ‘merely’ doing that. Indeed, a lot of the reason GPT-5 was so relatively unimpressive was GPT-5 was not scaled up so much. It was instead optimized for compute efficiency. There is no reason to have to rethink much of anything in response to a model that, as explained above, was pretty much exactly on the relevant trend lines.

I do appreciate this:

Gary Marcus: However, as I warned in a 2022 essay, “Deep Learning Is Hitting a Wall,” so-called scaling laws aren’t physical laws of the universe like gravity but hypotheses based on historical trends.

As in, the ‘hitting the wall’ claim was back in 2022. How did that turn out? Look at GPT-5, look at what we had available in 2022, and tell me we ‘hit a wall.’

What does ‘imminent’ superintelligence mean in this context?

Gary Marcus (NYT): The chances of A.G.I.’s arrival by 2027 now seem remote.

Notice the subtle goalpost move, as AGI ‘by 2027’ means AGI 2026. These people are gloating, in advance, that someone predicted a possibility of privately developed AGI in 2027 (with a median in 2028, in the AI 2027 scenario OpenBrain tells the government but does not release its AGI right away to the public) and then AGI will have not arrived, to the public, in 2026.

According to my sources (Opus 4.1 and GPT-5 Thinking) even ‘remote’ still means on the order of 2% chance in the next 16 months, implying an 8%-25% chance in 5 years. I don’t agree, but even if one did, that’s hardly something one can safety rule out.

But then, there’s this interaction on Twitter that clarifies what Gary Marcus meant:

Gary Marcus: Anyone who thinks AGI is impossible: wrong.

Anyone who thinks AGI is imminent: just as wrong.

It’s not that complicated.

Peter Wildeford: what if I think AGI is 4-15 years away?

Gary Marcus: 8-15 and we might reach an agreement. 4 still seems awfully unlikely to me. to many core cognitive problems aren’t really being addressed, and solutions may take a while to roll once we find the basic insights we are lacking.

But it’s a fair question.

That’s a highly reasonable position one can take. Awfully unlikely (but thus possible) in four years, likely in 8-15, median timeline of 2036 or so.

Notice that on the timescale of history, 8-15 years until likely AGI, the most important development in the history of history if and when it happens, seems actually kind of imminent and important? That should demand an aggressive policy response focused on what we are going to do when we get to do that, not be treated as a reason to dismiss this?

Imagine saying, in 2015, ‘I think AGI is far away, we’re talking 18-25 years’ and anticipating the looks you would get.

The rest of the essay is a mix of policy suggestions and research direction suggestions. If indeed he is right about research directions, of which I am skeptical, we would still expect to see rapid progress soon as the labs realize this and pivot.

A common tactic among LLM doubters, which was one of the strategies used in the NYT editorial, is to show a counterexample, where a model fails a particular query, and say ‘model can’t do [X]’ or the classic Colin Fraser line of ‘yep it’s dumb.’

Here’s a chef’s kiss example I saw on Monday morning:

I mean, that’s very funny, but it is rather obvious how it happened with the strawberries thing all over Twitter and thus the training data, and it tells us very little about overall performance.

In such situations, we have to differentiate between different procedures, the same as in any other scientific experiment. As in:

Did you try to make it fail, or try to set it up to succeed? Did you choose an adversarial or a typical example? Did you get this the first time you tried it or did you go looking for a failure? Are you saying it ‘can’t [X]’ because it can’t ever do [X], because it can’t ever do [X] out of the box, it can’t reliably do [X], or it can’t perfectly do [X], etc?

If you conflate ‘I can elicit wrong answers on [X] if I try’ with ‘it can’t do [X]’ then the typical reader will have a very poor picture.

Daniel Litt (responding to NYT article by Gary Marcus that says ‘[GPT-5] failed some simple math questions, couldn’t count reliably’): While it’s true one can elicit poor performance on basic math question from frontier models like GPT-5, IMO this kind of thing (in NYTimes) is likely to mislead readers about their math capabilities.

Derya Unutmaz: AI misinformation at the NYT is at its peak. What a piece of crap “newspaper” it has become. It’s not even worth mentioning the author of this article-but y’all can guess. Meanwhile, just last night I posted a biological method invented by GPT-5 Pro, & I have so much more coming!

Ethan Mollick: This is disappointing. Purposefully underselling what models can do is a really bad idea. It is possible to point out that AI is flawed without saying it can’t do math or count – it just isn’t true.

People need to be realistic about capabilities of models to make good decisions.

I think the urge to criticize companies for hype blends into a desire to deeply undersell what models are capable of. Cherry-picking errors is a good way of showing odd limitations to an overethusiastic Twitter crowd, but not a good way of making people aware that AI is a real factor.

Shakeel: The NYT have published a long piece by Gary Marcus on why GPT-5 shows scaling doesn’t work anymore. At no point does the piece mention that GPT-5 is not a scaled up model.

[He highlights the line from the post, ‘As many now see, GPT-5 shows decisively that scaling has lost steam.’]

Tracing Woods: Gary Marcus is a great demonstration of the power of finding a niche and sticking to it

He had the foresight to set himself up as an “AI is faltering” guy well in advance of the technology advancing faster than virtually anyone predicted, and now he’s the go-to

The thing I find most impressive about Gary Marcus is the way he accurately predicted AI would scale up to an IMO gold performance and then hit a wall (upcoming).

Gary Marcus was not happy about these responses, and doubled down on ‘but you implied it would be scaled up, no takesies backsies.’

Gary Marcus (replying to Shakeel directly): this is intellectually dishonest, at BEST it at least as big as 4.5 which was intended as 5 which was significantly larger than 4 it is surely scaled up compared to 4 which is what i compared it to.

Shakeel: we know categorically that it is not an OOM scale up vs. GPT-4, so … no. And there’s a ton of evidence that it’s smaller than 4.5.

Gary Marcus (QTing Shakeel): intellectually dishonest reply to my nytimes article.

openai implied implied repeatedly that GPT-5 was a scaled up model. it is surely scaled up relative to GPT-4.

it is possible – openAI has been closed mouth – that it is same size as 4.5 but 4.5 itself was surely scaled relative to 4, which is what i was comparing with.

amazing that after years of discussion of scaling the new reply is to claim 5 wasn’t scaled at all.

Note that if it wasn’t, contra all the PR, that’s even more reason to think that OpenAI knows damn well that is time for leaning on (neuro)symbolic tools and that scaling has reached diminishing returns.

JB: It can’t really be same in parameter count as gpt4.5 they really struggled serving that and it was much more expensive on the API to use

Gary Marcus: so a company valued at $300b that’s raised 10 of billions didn’t have the money to scale anymore even though there whole business plan was scaling? what does that tell you?

I am confused how one can claim Shakeel is being intellectually dishonest. His statement is flat out true. Yes, of course the decision not to scale

It tells me that they want to scale how much they serve the model and how much they do reasoning at inference time, and that this was the most economical solution for them at the time. JB is right that very, very obviously GPT-4.5 is a bigger model than GPT-5 and it is crazy to not realize this.

A post like this would be incomplete if I failed to address superforecasters.

I’ve been over this several times before, where superforecasters reliably have crazy slow projections for progress and even crazier predictions that when we do make minds smarter than ourselves that is almost certainly not an existential risk.

My coverage of this started way back in AI #14 and AI #9 regarding existential risk estimates, including Tetlock’s response to AI 2027. One common theme in such timeline projections is predicting Nothing Ever Happens even when this particular something has already happened.

Now that the dust settled on models getting IMO Gold in 2025, it is a good time to look back on the fact that domain experts expected less progress in math than we got, and superforecasters expected a lot less, across the board.

Forecasting Research Institute: Respondents—especially superforecasters—underestimated AI progress.

Participants predicted the state-of-the-art accuracy of ML models on the MATH, MMLU, and QuaLITY benchmarks by June 2025.

Domain experts assigned probabilities of 21.4%, 25%, and 43.5% to the achieved outcomes. Superforecasters assigned even lower probabilities: just 9.3%, 7.2%, and 20.1% respectively.

The International Mathematical Olympiad results were even more surprising. AI systems achieved gold-level performance at the IMO in July 2025. Superforecasters assigned this outcome just a 2.3% probability. Domain experts put it at 8.6%.

Garrison Lovely: This makes Yudkowsky and Paul Christiano’s predictions of IMO gold by 2025 look even more prescient (they also predicted it a ~year before this survey was conducted).

Note that even Yudkowsky and Christiano had only modest probability that the IMO would fall as early as 2025.

Andrew Critch: Yeah sorry forecasting fam, ya gotta learn some AI if you wanna forecast anything, because AI affects everything and if ya don’t understand it ya forecast it wrong.

Or, as I put it back in the unrelated-to-AI post Rock is Strong:

Everybody wants a rock. It’s easy to see why. If all you want is an almost always right answer, there are places where they almost always work.

The security guard has an easy to interpret rock because all it has to do is say “NO ROBBERY.” The doctor’s rock is easy too, “YOU’RE FINE, GO HOME.” This one is different, and doesn’t win the competitions even if we agree it’s cheating on tail risks. It’s not a coherent world model.

Still, on the desk of the best superforecaster is a rock that says “NOTHING EVER CHANGES OR IS INTERESTING” as a reminder not to get overexcited, and to not assign super high probabilities to weird things that seem right to them.

Thus:

Daniel Eth: In 2022, superforecasters gave only a 2.3% chance of an AI system achieving an IMO gold by 2025. Yet this wound up happening. AI progress keeps being underestimated by superforecasters.

I feel like superforecasters are underperforming in AI (in this case even compared to domain experts) because two reference classes are clashing:

• steady ~exponential increase in AI

• nothing ever happens.

And for some reason, superforecasters are reaching for the second.

Hindsight is hindsight, and yes you will get a 98th percentile result 2% of the time. But I think at 2.3% for 2025 IMO Gold, you are not serious people.

That doesn’t mean that being serious people was the wise play here. The incentives might well have been to follow the ‘nothing ever happens’ rock. We still have to realize this, as we can indeed smell what the rock is cooking.

A wide range of potential paths of AI progress are possible. There are a lot of data points that should impact the distribution of outcomes, and one must not overreact to any one development. One should especially not overreact to not being blown away by progress for a span of a few months. Consider your baseline that’s causing that.

My timelines for hitting various milestones, including various definitions of AGI, involve a lot of uncertainty. I think not having a lot of uncertainty is a mistake.

I especially think saying either ‘AGI almost certainly won’t happen within 5 years’ or ‘AGI almost certainly will happen within 15 years,’ would be a large mistake. There are so many different unknowns involved.

I can see treating full AGI in 2026 as effectively a Can’t Happen. I don’t think you can extend that even to 2027, although I would lay large odds against it hitting that early.

A wide range of medians seem reasonable to me. I can see defending a median as early as 2028, or one that extends to 2040 or beyond if you think it is likely that anything remotely like current approaches cannot get there. I have not put a lot of effort into picking my own number since the exact value currently lacks high value of information. If you put a gun to my head for a typical AGI definition I’d pick 2031, but with no ‘right to be surprised’ if it showed up in 2028 or didn’t show up for a while. Consider the 2031 number loosely held.

To close out, consider once again: Even if you we agreed with Gary Marcus and said 8-15 years, with median 2036? Take a step back and realize how soon and crazy that is.

Discussion about this post

Yes, AI Continues To Make Rapid Progress, Including Towards AGI Read More »

supreme-court-chief-justice-lets-trump-fire-ftc-democrat,-at-least-for-now

Supreme Court Chief Justice lets Trump fire FTC Democrat, at least for now

1935 Supreme Court is key precedent

The key precedent in the case is Humphrey’s Executor v. United States, a 1935 ruling in which the Supreme Court unanimously held that the president can only remove FTC commissioners for inefficiency, neglect of duty, or malfeasance in office. Trump’s termination notices to Slaughter and Bedoya said they were being fired simply because their presence on the commission “is inconsistent with my Administration’s priorities.”

The Trump administration argues that Humphrey’s Executor shouldn’t apply to the current version of the FTC because it exercises significant executive power. But the appeals court, in a 2-1 ruling, said “the present-day Commission exercises the same powers that the Court understood it to have in 1935 when Humphrey’s Executor was decided.”

“The government has no likelihood of success on appeal given controlling and directly on point Supreme Court precedent,” the panel majority said.

But while the government was found to have no likelihood of success in the DC Circuit appeals court, its chances are presumably much better in the Supreme Court. The Supreme Court previously stayed District Court decisions in cases involving Trump’s removal of Democrats from the National Labor Relations Board, the Merit Systems Protection Board, and the Consumer Product Safety Commission.

In a 2020 decision involving the Consumer Financial Protection Bureau, the court said in a footnote that its 1935 “conclusion that the FTC did not exercise executive power has not withstood the test of time.” If the Supreme Court ultimately rules in favor of Trump, it could throw out the Humphrey’s Executor ruling or clarify it in a way that makes it inapplicable to the FTC.

But Humphrey’s Executor is still a binding precedent, Slaughter’s opposition to the administrative stay said. “This Court should not grant an administrative stay where the court below simply ‘follow[ed] the case which directly controls,’ as it was required to do,” the Slaughter filing said.

Supreme Court Chief Justice lets Trump fire FTC Democrat, at least for now Read More »

f1-in-italy:-look-what-happens-when-the-downforce-comes-off

F1 in Italy: Look what happens when the downforce comes off

That was enough to allow Piastri past. However, the team instructed the championship leader to slow down and relinquish the position to Norris. It was a team mistake, not a driver mistake, and McLaren is doing everything in its power to ensure the eventual champion gets there because of their driving and not some external factor. Piastri didn’t sound exactly happy on the radio. But F1 is a team sport, and racing drivers are employees—when your boss gives you an order, it’s wise to do what they ask and argue about it after the fact, if continued employment is one of your goals.

Oscar Piastri (L) and Lando Norris (R) have a very 21st century relationship. Jakub Porzycki/NurPhoto via Getty Images

For many, a slow pit stop is just one of those things bestowed by the racing gods, and even Verstappen pointed that out when informed by his engineer of the change in positions behind him. After the race, Norris seemed a little embarrassed to have been given the place back, but the emerging consensus from former drivers was that, since Norris had been asked about pit stop priority, and had been undercut anyway, that was sufficient to excuse the request.

McLaren’s approach to handling its drivers is markedly different from the all-out war we saw when Lewis Hamilton and Fernando Alonso raced for it in 2007. Then, neither went home with the big trophy at the end of the year—their infighting allowed Kimi Raikkonen to take the title for Ferrari instead.

That won’t happen this year; either Norris or Piastri will be crowned at the end of the year, with the other having to wait at least another year. The pair have even been asked how they want the team to celebrate in the event the other driver wins—a sensitivity that feels refreshingly new for Formula 1.

Formula 1 heads to Azerbaijan in two weeks for another low-downforce race. Can we expect another Verstappen victory?

F1 in Italy: Look what happens when the downforce comes off Read More »