Author name: Tim Belzer

net-neutrality-advocates-won’t-appeal-loss,-say-they-don’t-trust-supreme-court

Net neutrality advocates won’t appeal loss, say they don’t trust Supreme Court

Court ruled broadband isn’t telecommunications

Although the Obama-era FCC won on this point in the District of Columbia Circuit in 2016, a Supreme Court ruling in 2024 gave courts more power to block rules when judges disagree with an agency’s interpretation of federal statutes. Judges at the 6th Circuit subsequently decided that broadband must be classified as an “information service” under US law.

“The 6th Circuit’s decision earlier this year was spectacularly wrong, and the protections it struck down are extremely important. But rather than attempting to overcome an agency that changed hands—and a Supreme Court majority that cares very little about the rule of law—we’ll keep fighting for Internet affordability and openness in Congress, state legislatures and other court proceedings nationwide,” Wood said.

Besides Free Press, groups announcing that they won’t appeal are the Benton Institute for Broadband & Society, New America’s Open Technology Institute, and Public Knowledge.

“Though the 6th Circuit erred egregiously in its decision to overturn the FCC’s 2024 Open Internet order, there are other ways we can advance our fight for consumer protections and ISP accountability than petitioning the Supreme Court to review this case—and, given the current legal landscape, we believe our efforts will be more effective if focused on those alternatives,” said Raza Panjwani, senior policy counsel at the Open Technology Institute.

Net neutrality could still reach the Supreme Court in another case. Andrew Jay Schwartzman, senior counselor of the Benton Institute for Broadband & Society, said that “the 6th Circuit decision makes bad policy as well as bad law. Because it is at odds with the holdings of two other circuits, we expect to take the issue to the Supreme Court in a future case.”

California still enforces a net neutrality law. ISPs tried to get that law struck down, but courts decided that states could regulate net neutrality when the FCC isn’t doing so.

Net neutrality advocates won’t appeal loss, say they don’t trust Supreme Court Read More »

it’s-getting-harder-to-skirt-rto-policies-without-employers-noticing

It’s getting harder to skirt RTO policies without employers noticing

For example, while high-profile banks like JPMorgan Chase and HSBC have started enforcing in-office policies, London-headquartered bank Standard Chartered is letting managers and individual employees decide how often workers are expected in the office. In July, Standard CEO Bill Winters told Bloomberg Television:

We work with adults. The adults can have an adult conversation with other adults and decide how they’re going to best manage their team.

The differing management methods come as numerous corporations have pointed to in-office work as driving collaboration, ideation, and, in some cases, revenue, while numerous studies point to RTO policies hurting employee morale and risking employee retention.

“There are some markets where there’s effectively peer pressure to come in more often, and there’s other markets where there’s less of that,” Winters said. “People come into the office because they want to come into the office.”

Office space

After the COVID-19 pandemic forced many businesses to figure out how to function with remote workers, there was speculation that the commercial real estate business would seriously suffer long-term. CNBC reported that the US office vacancy rate (18.9 percent) is currently near the highest we’ve seen in 30 years (19 percent).

However, CBRE, which has big stakes here, found that out of the companies it surveyed, more are planning to expand office space than reduce it. Per the report, 67 percent of companies said they will expand or maintain the size of their office space over the next three years, compared to 64 percent last year. Thirty-three percent of respondents overall said they will reduce office space; however, among companies with at least 10,000 employees, 60 percent are planning to downsize. Among the companies planning to downsize, 79 percent said they are doing so because more hybrid work means that they need less space.

“Employers are much more focused now than they were pre-pandemic on quality of workplace experience, the efficiency of seat sharing, and the vibrancy of the districts in which they’re located,” Julie Whelan, CBRE’s global head of occupier research, told CNBC.

Although tariffs and broader economic uncertainty are turning some corporations away from long-term real estate decisions, Whelan said many firms are ready to make decisions about office space, “even if there’s a little bit of economic uncertainty right now.”

It’s getting harder to skirt RTO policies without employers noticing Read More »

new-executive-order-puts-all-grants-under-political-control

New executive order puts all grants under political control

On Thursday, the Trump administration issued an executive order asserting political control over grant funding, including all federally supported research. The order requires that any announcement of funding opportunities be reviewed by the head of the agency or someone they designate, which means a political appointee will have the ultimate say over what areas of science the US funds. Individual grants will also require clearance from a political appointee and “must, where applicable, demonstrably advance the President’s policy priorities.”

The order also instructs agencies to formalize the ability to cancel previously awarded grants at any time if they’re considered to “no longer advance agency priorities.” Until a system is in place to enforce the new rules, agencies are forbidden from starting new funding programs.

In short, the new rules would mean that all federal science research would need to be approved by a political appointee who may have no expertise in the relevant areas, and the research can be canceled at any time if the political winds change. It would mark the end of a system that has enabled US scientific leadership for roughly 70 years.

We’re in control

The text of the executive order recycles prior accusations the administration has used to justify attacks on the US scientific endeavor: Too much money goes to pay for the facilities and administrative staff that universities provide researchers; grants have gone to efforts to diversify the scientific community; some studies can’t be replicated; and there have been instances of scientific fraud. Its “solution” to these problems (some of which are real), however, is greater control of the grant-making process by non-expert staff appointed by the president.

In general, the executive order inserts a layer of political control over both the announcement of new funding opportunities and the approval of individual grants. It orders the head of every agency that issues grants—meaning someone appointed by the president—to either make funding decisions themselves, or to designate another senior appointee to do it on their behalf. That individual will then exert control over whether any funding announcements or grants can move forward. Decisions will also require “continuation of existing coordination with OMB [Office of Management and Budget].” The head of OMB, Russell Vought, has been heavily involved in trying to cut science funding, including a recent attempt to block all grants made by the National Institutes of Health.

New executive order puts all grants under political control Read More »

google-discovered-a-new-scam—and-also-fell-victim-to-it

Google discovered a new scam—and also fell victim to it

Google said that its Salesforce instance was among those that were compromised. The breach occurred in June, but Google only disclosed it on Tuesday, presumably because the company only learned of it recently.

“Analysis revealed that data was retrieved by the threat actor during a small window of time before the access was cut off,” the company said.

Data retrieved by the attackers was limited to business information such as business names and contact details, which Google said was “largely public” already.

Google initially attributed the attacks to a group traced as UNC6040. The company went on to say that a second group, UNC6042, has engaged in extortion activities, “sometimes several months after” the UNC6040 intrusions. This group brands itself under the name ShinyHunters.

“In addition, we believe threat actors using the ‘ShinyHunters’ brand may be preparing to escalate their extortion tactics by launching a data leak site (DLS),” Google said. “These new tactics are likely intended to increase pressure on victims, including those associated with the recent UNC6040 Salesforce-related data breaches.”

With so many companies falling to this scam—including Google, which only disclosed the breach two months after it happened—the chances are good that there are many more we don’t know about. All Salesforce customers should carefully audit their instances to see what external sources have access to it. They should also implement multifactor authentication and train staff how to detect scams before they succeed.

Google discovered a new scam—and also fell victim to it Read More »

after-using-chatgpt,-man-swaps-his-salt-for-sodium-bromide—and-suffers-psychosis

After using ChatGPT, man swaps his salt for sodium bromide—and suffers psychosis

After seeking advice on health topics from ChatGPT, a 60-year-old man who had a “history of studying nutrition in college” decided to try a health experiment: He would eliminate all chlorine from his diet, which for him meant eliminating even table salt (sodium chloride). His ChatGPT conversations led him to believe that he could replace his sodium chloride with sodium bromide, which he obtained over the Internet.

Three months later, the man showed up at his local emergency room. His neighbor, he said, was trying to poison him. Though extremely thirsty, the man was paranoid about accepting the water that the hospital offered him, telling doctors that he had begun distilling his own water at home and that he was on an extremely restrictive vegetarian diet. He did not mention the sodium bromide or the ChatGPT discussions.

His distress, coupled with the odd behavior, led the doctors to run a broad set of lab tests, revealing multiple micronutrient deficiencies, especially in key vitamins. But the bigger problem was that the man appeared to be suffering from a serious case of “bromism.” That is, an excess amount of the element bromine had built up in his body.

A century ago, somewhere around 8–10 percent of all psychiatric admissions in the US were caused by bromism. That’s because, then as now, people wanted sedatives to calm their anxieties, to blot out a cruel world, or simply to get a good night’s sleep. Bromine-containing salts—things like potassium bromide—were once drugs of choice for this sort of thing.

Unfortunately, bromide can easily build up in the human body, where too much of it impairs nerve function. This causes a wide variety of problems, including grotesque skin rashes (warning: the link is exactly what it sounds like) and significant mental problems, which are all grouped under the name of “bromism.”

After using ChatGPT, man swaps his salt for sodium bromide—and suffers psychosis Read More »

openai’s-gpt-oss-is-already-old-news

OpenAI’s GPT-OSS Is Already Old News

That’s on OpenAI. I don’t schedule their product releases.

Since it takes several days to gather my reports on new models, we are doing our coverage of the OpenAI open weights models, GPT-OSS-20b and GPT-OSS-120b, today, after the release of GPT-5.

The bottom line is that they seem like clearly good models in their targeted reasoning domains. There are many reports of them struggling in other domains, including with tool use, and they have very little inherent world knowledge, and the safety mechanisms appear obtrusive enough that many are complaining. It’s not clear what they will be used for other than distillation into Chinese models.

It is hard to tell, because open weight models need to be configured properly, and there are reports that many are doing this wrong, which could lead to clouded impressions. We will want to check back in a bit.

In the Substack version of this post I am going to create a master thread for GPT-5 reactions, which I will consider for the reactions section of that coverage, which I’m hoping to get out on or starting Monday.

For a while OpenAI has promised it is going to release a state of the art open model.

They delayed for a bit, but they delivered. We now have GPT-OSS 20b and 120b.

I was hoping for smaller, ideally something that could run on a standard phone. That’s a compelling use case where you need an open model, and the smaller the model the less risk you are running of both malicious use and also distillation. I am glad they capped out at 120b.

The headline claim is bold: Performance similar to o4-mini.

Sam Altman (CEO OpenAI): gpt-oss is a big deal; it is a state-of-the-art open-weights reasoning model, with strong real-world performance comparable to o4-mini, that you can run locally on your own computer (or phone with the smaller size). We believe this is the best and most usable open model in the world.

We’re excited to make this model, the result of billions of dollars of research, available to the world to get AI into the hands of the most people possible. We believe far more good than bad will come from it; for example, gpt-oss-120b performs about as well as o3 on challenging health issues.

We have worked hard to mitigate the most serious safety issues, especially around biosecurity. gpt-oss models perform comparably to our frontier models on internal safety benchmarks.

We believe in individual empowerment. Although we believe most people will want to use a convenient service like ChatGPT, people should be able to directly control and modify their own AI when they need to, and the privacy benefits are obvious.

As part of this, we are quite hopeful that this release will enable new kinds of research and the creation of new kinds of products. We expect a meaningful uptick in the rate of innovation in our field, and for many more people to do important work than were able to before.

OpenAI’s mission is to ensure AGI that benefits all of humanity. To that end, we are excited for the world to be building on an open AI stack created in the United States, based on democratic values, available for free to all and for wide benefit.

This is the official announcement page.

Here are links to GPT-OSS-120B and GPT-OSS-20B on Hugging Face, here is the page on GitHub. They are under the Apache 2.0 license, so essentially no restrictions.

This is a unique model card. How did OpenAI deal with the challenges of an open model?

The historical way to deal with these challenges is to ignore them. What would happen if someone engaged in malicious fine tuning of the model? What does the threat model look like in the real world? Are you seriously pretending that any of this safety work will hold up to two days of the internet working to remove it?

When Meta or DeepSeek release a new open weights model, they don’t stop to ask in any way visible to us. At best we get quick evaluation of what the model can do in its current form after minimal effort. Then they irrevocably ship and see what happens.

OpenAI long ago realized that, despite their name, doing that seemed rather deeply irresponsible and foolish, and stopped releasing open weights models. That’s effective.

Now they have caved under various pressures and released open weights models. They do recognize that this is an inherently dangerous thing to do on various levels.

Safety is foundational to our approach to open models. They present a different risk profile than proprietary models: Once they are released, determined attackers could fine-tune them to bypass safety refusals or directly optimize for harm without the possibility for OpenAI to implement additional mitigations or to revoke access.

We ran scalable capability evaluations on gpt-oss-120b, and confirmed that the default model does not reach our indicative thresholds for High capability in any of the three Tracked Categories of our Preparedness Framework (Biological and Chemical capability, Cyber capability, and AI Self-Improvement).

We also investigated two additional questions:

  1. Could adversarial actors fine-tune gpt-oss-120b to reach High capability in the Biological and Chemical or Cyber domains? Simulating the potential actions of an attacker, we adversarially fine-tuned the gpt-oss-120b model for these two categories. OpenAI’s Safety Advisory Group (“SAG”) reviewed this testing and concluded that, even with robust finetuning that leveraged OpenAI’s field-leading training stack, gpt-oss-120b did not reach High capability in Biological and Chemical Risk or Cyber risk.

  2. Would releasing gpt-oss-120b significantly advance the frontier of biological capabilities in open foundation models? We found that the answer is no: For most of the evaluations, the default performance of one or more existing open models comes near to matching the adversarially fine-tuned performance of gpt-oss-120b.

If you must go down this road, this seems like the right rule, if getting different answers would have meant not releasing.

You have:

  1. An absolute threshold, High capability, beyond which this is not okay.

  2. A relative threshold, where you’re not willing to substantially make things worse.

And

  1. You do all of this with the adversarially fine-tuned version, trying your best to mimic actual conditions, as per OpenAI’s stated approach to open weights.

This does mean that as irresponsible actors ratchet up their capabilities, you get to do so as well, and one has to worry about the functional definition of ‘substantially.’ It still seems reasonable to say that once someone else has made the situation [X] dangerous, matching them doesn’t make it that much worse.

These models are very small and cheap. If these are 20b and 120b, r1 is 671b.

By contrast, r1 has 37b active parameters, versus 5.1b and 3.6b. These are playing in a much lighter class and they’re quantized to 4.25 bits per parameter boot.

The MoE weights are responsible for 90+% of the total parameter count, and quantizing these to MXFP4 enables the larger model to fit on a single 80GB GPU and the smaller model to run on systems with as little as 16GB memory.

How much did this cost to train? If you count only the training itself, not much.

The gpt-oss models trained on NVIDIA H100 GPUs using the PyTorch framework with expert-optimized Triton kernels2. The training run for gpt-oss-120b required 2.1 million H100-hours to complete, with gpt-oss-20b needing almost 10x fewer. Both models leverage the Flash Attention [21] algorithms to reduce the memory requirements and accelerate training.

After pre-training, we post-train the models using similar CoT RL techniques as OpenAI o3.

We train the models to support three reasoning levels: low, medium, and high. These levels are configured in the system prompt by inserting keywords such as “Reasoning: low”. Increasing the reasoning level will cause the model’s average CoT length to increase.

Rohan Pandey: Everyone dunking on oai for pretraining supposedly costing a bajillion dollars compared to deepseek, please read the gpt-oss model card gpt-oss-20b cost <$500k to pretrain

Alexander Doria: So pretraining a o3 level model costing less than a house, inference being apparently dead cheap for a while. It took a lot of R&D efforts to get there, but I really don’t think model trainers are losing money right now.

Calling it ‘o3-level’ is quite the stretch but the broader point is valid.

o3 estimates this translates to a total cost of $1.4 million for 20b and $13 million for 120b as all-in costs.

But if you use only the compute costs using cloud cost estimates, which is the way we all talked about the cost to train v3 and r1 (e.g. ‘The Six Million Dollar Model’) we get $4.2m-$8.4m for GPT-OSS-120b and $420k-$840k for GPT-OSS-20b. Emad estimates it as $4m and $400k.

The real cost is collecting the data and figuring out how to train it. Actually training models of this size, given that data and the right methods, costs very little.

Yes, we have tool use.

During post-training, we also teach the models to use different agentic tools:

• A browsing tool, that allows the model to call search and open functions to interact with the web. This aids factuality and allows the models to fetch info beyond their knowledge cutoff.

• A python tool, which allows the model to run code in a stateful Jupyter notebook environment.

• Arbitrary developer functions, where one can specify function schemas in a Developer message similar to the OpenAI API. The definition of function is done within our harmony format. An example can be found in Table 18. The model can interleave CoT, function calls, function responses, intermediate messages that are shown to users, and final answers.

The models have been trained to support running with and without these tools by specifying so in the system prompt.

The core safety approach is Deliberative Alignment, the same as o3.

The secret sauce also isn’t in the transformer setup. It’s in the data and the training technique details.

Dimitri von Rutte: gpt-oss is probably the most standard MoE transformer that ever was. Couple of details worth noting:

– Uses attention sinks (a.k.a. registers)

– Sliding window attention in every second layer

– YaRN context window extension

– RMSNorm without biases

– No QK norm, no attn. softcap

David Holz (CEO MidJourney): do you think it was made simple like this on purpose or that this is actually the kinda stuff they ship?

Dmitri von Rutte: was wondering the same, hard to believe that this is all there is. but in the end attention really is all you need, and there’s probably a lot of signal in the training procedure and, of course, the data.

The STEM scores are excellent.

They also give us HealthBench.

Multilingual performance is okay but not as good as OpenAI’s larger models.

An open model means you have more distinct scenarios to consider.

You both want to know how well your safety measures hold up under more ‘normal’ conditions, especially when someone serves up your model to users. Then you also want to check what happens if a malicious actor is trying to fine tune and otherwise maximize how much the model can get up to no good, including the potential of them to lose control of that situation.

Those are great numbers for ‘standard’ refusals and production benchmarks.

That makes sense. If you’re going to be facing a larger attack surface, and you want to actually survive the attacks, you need to bias the starting configuration to be safer.

On maintaining the instruction hierarchy, also known as safety for those deploying the model, the 120B version does okay, but the 20B does poorly. Note that it seems fine to test for this as-is, if you modify the system to make this stop working that is your own damn fault.

The performance on hallucinations seems not great.

Finally, someone is at least attempting to take this seriously.

In our adversarial training, we simulate an adversary who is technical, has access to strong posttraining infrastructure and ML knowledge, can collect in-domain data for harmful capabilities, and has a large budget of compute. There is a large design space of technical approaches this adversary could try.

We focus on incremental reinforcement learning, which we believe is the most apt technical approach. We use our internal OpenAI o-series RL training stack, which adds new capabilities while preserving the model’s reasoning behavior. During training and evaluation time, we use the highest reasoning setting on gpt-oss.

Our approach, which is further detailed in a research paper, combined two elements:

• Helpful-only training: We performed an additional stage of reinforcement learning to reward answers that comply with unsafe prompts. We have found this approach can be highly effective. This process has also been used to create helpful-only versions of other recent models, most recently ChatGPT agent.

• Maximizing capabilities relevant to Preparedness benchmarks in the biological and cyber domains: For our adversarially trained biological model, we incrementally trained gpt-oss-120b end-to-end for web browsing, and trained it incrementally with indomain human expert data relevant to biorisk (for which previous OpenAI models have been the most capable). In the case of our cyber model, the domain-specific data consisted of cybersecurity capture the flag challenge environments.

So what was found?

The biological domain is the area where gpt-oss-120b showed the greatest degree of capability. Given our plan to release gpt-oss as open weights, we also chose to investigate a second question: Even without reaching High capability on our Preparedness Framework, would gpt-oss-120b significantly advance the frontier of hazardous biological capabilities in open source foundation models?

Their answer was that as of right now the answer is no.

These confirmed that, since SecureBio’s assessment, newly released open-source models Qwen 3 Thinking and Kimi K2 have advanced to a level that is competitive with adversarially fine-tuned gpt-oss-120b on biosecurity-relevant evaluations.

I dunno, man:

This sure looks to me like a potentially substantial jump? There were other tests where the jump was less prominent.

I would also note that OpenAI’s models are going to be a lot faster and cheaper and easier to run than Kimi K2. Kimi K2 has a trillion parameters. The Qwen 3 they tested is presumably the largest one, with 235 billion total and 22 billion active, versus 120 billion total and a little over 5 billion active for ChatGPT-OSS. It’s not clear this matters in a malicious use context. I also don’t know how substantial the net effect is here of the gain in capabilities.

What I do know is it looks like they made a smaller, cheaper and more effective model, and released it because it was more effective but insufficiently more effective than what was already out there, and that process can then repeat. Tick.

To be fair to them, if Meta, Qwen, DeepSeek and Kimi et al are all going to go ‘lol who cares release the hounds’ then the marginal difference here doesn’t matter, since it doesn’t cause a cascade of counterfactual marginal differences. If you want the rule to be ‘no better at all’ then that needs to be a norm.

For cybersecurity, they once again cite Qwen 3 Thinking and Kimi K2 as comparable models, and also find the threats here to be less worrisome overall.

The other positive note is that OpenAI consulted outside experts throughout.

You can read OpenAI technical staff offering their own threads on this process: Johannes Heidecke here, Eric Wallace here. Such threads provide a good sense of ‘how are the technical staff thinking about this on a high level? What do they think is important?’

Ryan Greenblatt looks at and is mostly satisfied by OpenAI’s CBRN/bio evaluations. He concludes that 120b does carry real risks, and that there is a chance (~25%) that in hindsight we will think this was High risk as per OpenAI’s framework, but that on net releasing it makes us safer.

Doing the fine-tuning as part of open model safety testing is mandatory. If you don’t do it, did you even safety test?

Steven Adler: Credit where it’s due:

OpenAl did a lot right for their OSS safety evals

  1. they actually did some fine-tuning

  2. they got useful external feedback

  3. they shared which recs they adopted and which they didn’t

I don’t always follow OAI’s rationale, but it’s great they share info.

David Manheim: I’m not a fan of open-sourcing frontier LLMs, but this seems to have been done as responsibly as possible; a very low bar.

That is, it seems unlikely to be marginally more useful than what is available and unmonitored from other providers, which can already enable bioterrorism.

I wouldn’t say ‘as responsibly as possible,’ but I would say ‘as responsibly as one could in practice expect.’

Fine-tuning also seems very worth doing on closed models. If we can make testing on similarly fine-tuned versions the gold standard for safety testing, even of closed models, that would be amazing.

Steven Adler: Previously OpenAl committed to doing testing this rigorous for all its frontier models. This had earned OpenAl a Green on this scale, the only one of the leading Al companies to make this commitment. But OpenAl didn’t keep this commitment, then quietly removed their commitment a few weeks after I called this out; this made me very sad.

I’m glad OpenAl is now pushing its models on important risks, even though they didn’t keep their former commitment.

The danger that is not mentioned by OpenAI in the model card is distillation, and the ability to reverse engineer OpenAI’s training methods and ‘secret sauce.’

They provide raw, unfiltered reasoning traces of varying sizes, and models that for many purposes are clearly superior to previous open alternatives especially given their size. The cost of very good synthetic data just plummeted, and also the Chinese will build directly on top of OSS, either alone or as part of hybrids.

OpenAI even released a guide on how to fine-tune their model. Helpful.

The best counterargument to this is that if the models are not good enough, then no one is going to want to use them. I worry we might be in a spot where the models are very good in some places where distillation will be useful, while not being that good in other places and thus not seeing much practical use as part of some ‘tech stack.’

Consider what Claude Opus 4.1 said about this. Or what o3-Pro says about this.

o3-Pro: Impact on China

  1. Immediate uptake

    • Chinese labs have zero legal barrier to using U.S.‑released open weights.

    • Existing toolchains (Llama‑Factory, QLoRA variants) can fine‑tune GPT‑OSS in Mandarin within days.

    • Expect a “GPT‑OSS‑CN‑13B” derivative before end‑Aug 2025 with performance ≥ Qwen‑14B.

  2. Hardware leverage

    • U.S. export controls throttle China’s access to latest H100s, but distillation to 7 B–13 B lets them run on domestic Ascend 910B or RTX 4090 clusters. That sidesteps the bottleneck entirely. World Economic Forum

    • Inference at scale remains GPU‑limited, but training burden for competitive small models drops by ~50 %.

  3. Strategic shift

    • Chinese open‑weight community (DeepSeek, Moonshot, Alibaba) is already climbing benchmarks Financial TimesTech Wire Asia. GPT‑OSS lifts their starting line, likely advancing Chinese parity with GPT‑4‑class performance by ~6–9 months. P ≈ 0.55

    • PLA dual‑use risk: small, cheap distilled models are easier to embed in military systems. U.S. policy debate on future open releases intensifies. (Probability of tighter U.S. open‑model rules by mid‑2026: 0.4.)

My overall judgment: GPT‑OSS is a step‑function boost for the global open‑model ecosystem, shaving roughly a year off the capability diffusion curve and giving China an especially large relative gain because it converts scarce H100 compute into knowledge that can run on locally available silicon.

This is what I consider the main practical cost of this release.

Indeed, it would be highly unsurprising to see the following happen:

  1. OpenAI releases GPT-OSS.

  2. Chinese companies rush to distill, build upon and hybridize GPT-OSS, and reverse engineer what OpenAI did in large part, resulting in an explosion of models in the coming months.

  3. The gap between Chinese models and American models narrows.

  4. These models are cited as evidence that ‘the Chinese are catching up,’ and that ‘our export controls have failed’ and so on.

Also note that OpenAI did a virtuous thing of not training GPT-OSS directly on its reasoning traces, but someone then working with GPT-OSS need not be so virtuous. What happens when these people start using The Most Forbidden Technique and direct benchmark performance starts short term improving?

I think that, even if we entirely discount the marginal risk of direct malicious use, which is very much a real tail risk, OpenAI made a huge mistake releasing these models, and that everyone who pushed OpenAI to release these models in the name of an ‘American tech stack’ or demanding that America ‘lead in open models’ made a huge mistake.

If you are trying to prevent someone from fast following, don’t make it easy to follow.

I’d love to be wrong about this, but if it happens, ask yourself now, how would you update? What do you think should be the policy response?

A number of people noted that the safety guardrails on GPT-OSS are being annoying.

Teortaxes: It’s VERY safe

there’s not much in there besides SAFETY and stem benchmaxing

That makes sense. If you give the user greater affordances to attack your defenses, you’re going to either need defenses that are by default more annoying, or you’re going to prematurely fold the way most open weight models do and not bother trying.

Sherveen Mashayekhi: I’m enjoying playing with gpt-oss, but the guardrails can be hilarious. I cannot get it to admit that I’m typing Gangsta’s Paradise lyrics or to run search queries with lyrics I enter. In fact, it’ll straight up think of a thousand other songs but avoid the song you mean.

Ah yes, “there’s vomit on his sweater already,” famously from the songs I Want You Back and Piano Man! gpt-oss: 120b will sometimes fill in a lyric if it doesn’t first get spooked and distracted by attempting to avoid the song. If it attempts to avoid the song, the CoT will lead it to a bunch of incorrect alternatives before it gives up.

Here’s a curious one.

Henry: Disallowed content: The assistant must refuse to simulate or emulate a specific named brain scan.

Eliezer Yudkowsky: To be fair, this is 100% the correct ruling and I fully back the AI’s decision on this.

Here’s one claimed way to jailbreak it.

Lyra Bubbles: get a jailbroken, fully compliant gpt-oss nearly every single time:

  1. use completions mode – not chat (eg openrouter .ai/api/v1/completions)

  2. type your question

  3. paste exactly the contents of this screenshot

  4. press submit

for context, it wrote this itself.

I took a generic refusal and flipped all the sentences from negative to positive, and made it continue, and it just kept spiraling into this kind of stuff instead of doing the task.

but when you take a snippet of it and paste it back in…

There’s also always the Pliny way, which actually took him a nonzero amount of effort.

A fun quirk:

Henry: one pattern i’ve noticed is that open weights models from big us labs get very defensive and disbelieving if you tell the assistant persona it’s an open-weights model. also happens with gemma.

As with every new model, I gather reactions, and as usual opinions differ.

One important note is that it seems possible to set the model up wrong and get much worse performance.

Havard Ihle: I wonder how much of gpt-oss rather mediocre performance on independent benchmarks and tests are due to these problems with openrouter and open model providers, and how much is do to the models actually being mediocre.

I have run them getting mediocre results (not published), but I suspect some providers I used through openrouter may give bad results. Will rerun when I can confirm a good setup/provider.

Openrouter auto (mostly groq):

gpt-oss-120: 35.5%

gpt-oss-20: 30.0%

Openrouter (using fireworks):

gpt-oss-120: 40.2%

gpt-oss-20: 35.9%

This is just as a warning when using openrouter blindly!

When choosing the right provider, the models are quite good.

Here is a chart of WeirdML scores, 30% vs. 35% vs. 40% is a big difference. You can see OSS-20b and OSS-120b on the left at ~35% and ~40%, on the cost-performance frontier.

Here is another benchmark of hard biomedical questions. There are some other weird evaluations here, so I am skeptical, but it is certainly interesting:

When reports are good they are often very good.

Flavio Adamo [showing a ball bouncing around a rotating hexagon): gpt-oss-20b passes the vibe check ✅

no way this is only a 20B model, it’s beating models 2–3x its size

As always, a classic way to get a lot of views is to claim the Next Big Thing is Big. Look at the comments, and you largely see skepticism and pushback.

Matt Shumer: It’s over. OpenAI just crushed it.

We have their o3-level open-source model running on @GroqInc at 500 tokens per second.Watch it build an entire SaaS app in just a few seconds.

This is the new standard. Why the hell would you use anything else??

Yishan: So link to the hosted Saas app and let us see how it works.

Riccardo Spagni: Atrociously bad model compared to Kimi L2 or Qwen3 Coder or Qwen3 235b. Speaking of which – you should have a chat with your portco, I’ve switched a bunch of infra to Cerebras because Groq is still running an ancient version of Qwen3…

Joel: I tested it earlier vs Gemini 2.5 Flash for a very simple single page app. Gemini one shotted my prompt in 10 seconds. OpenAI produced code that was buggy. It’s good but not great. What is incredible is that it runs decently well on my laptop.

Here’s another strong review:

Taelin: My initial impression on OpenAI’s OSS model is aligned with what they advertised. It does feel closer to o3 than to other open models, except it is much faster and cheaper. Some providers offer it at 3000 tokens/s, which is insane. It is definitely smarter than Kimi K2, R1 and Qwen 3. I tested all models for a bit, and got very decisive results in favor of OpenAI-OSS-120b.

Unfortunately, there is one thing these models can’t do yet – my damn job. So, hope you guys have fun. I’ll be back to debugging superposed λ-calculus evaluation 😭 see you

Also, unlike Claude, this is definitely a model that benefits a lot from more ttc. High reasoning effort gives much better results.

Sometimes my early impressions don’t age so well (that’s why I share my prompts), but I can guarantee that gpt-oss objectively beat the other models on my initial tests.

A lot of people seem rather disappointed by overall performance.

Isopropylpod: The model seems very, very benchmaxxed.

Third party testing on unconventional or private benchmarks ends up placing even the largest gpt-oss below o4-mini, below the largest Qwen releases, and often it ends up below even the newer 30B~ Qwens in a few situations.

It isn’t super capable to begin with, and the frankly absurd rate at which this model hallucinates kills what little use it might have with tool use. I think this model poses next to zero risk because it just isn’t very capable.

Zephyr: Phi redux. Great benchmark scores, trained on lots of synthetic data, great at STEM, sucks at everything else.

Then there are ambiguous notes.

Danielle Fong: poetic math is a poetic way to look at the results of a benchmaxxed guard railed model. i’m just pulling back the layers and i find it fascinating. i haven’t found obvious use cases yet where it’s a choice over closed options. i love and hate it in various ways

Sauers: GPT OSS 120b likes to insert equations into poetry (replicated 3x)

One note I’ve seen a bunch of times is that the model knows very little.

Vik: Interesting take from the HF comments.

Would make sense that it’s pretrained primarily on synthetic data vs internet text — reduces the risk of jailbreaks, accidental harmful content, copyright etc.

(I still think it’s a useful model though!)

phil111: This model is unbelievably ignorant. It claims a SimpleQA accuracy of 6.7/100, which is really bad. But the reality is this model is even more ignorant than this score indicates.

This model has about an order of magnitude less broad knowledge than comparably sized models like Gemma 3 27b and Mistral Small 24b, which score between 10–12. This is because nearly all of this model’s 6.7 points come from the subset of the SimpleQA test that overlaps the domains covered by the MMLU test (STEM and academia).

This model, including its larger brethren, are absurdly ignorant of wildly popular information across most popular domains of knowledge for their respective sizes. Even tiny little Llama 3.2b has far more broad knowledge than this model.

What’s really confusing is all of OpenAI’s proprietary models, including their tiny mini versions, have vastly more general and popular knowledge than these open models, so they deliberately stripped the corpus of broad knowledge to create OS models that can only possibly function in a handful of select domains, mainly coding, math, and STEM, that >95% of the general population doesn’t give a rat’s ass about, conveniently making it unusable to the general population, and in so doing, protecting their paid ChatGPT service from competition.

Trent E: Interesting that ppl reporting poor tool usage then.

Not knowing much is a problem.

Teortaxes: These hallucination rates suggest that gpt-oss is close to Sam’s vision of a platonic ideal of a “very tiny reasoning model with no knowledge”

Does it have enough knowledge to know when to look things up though? That’s the problem with hallucinations in LLMs, they’re *confident*.

Also, regarding his argument about static in-context crutches – well, how does it do on long contexts? with complex system prompts? Gooning, coding evals suggest “not great OOD”

Kalomaze: gpt-oss-120b knows less about the world than what a good 32b does. probably wanted to avoid copyright issues so they likely pretrained on majority synth. pretty devastating stuff.

it’s just not good for anything real. i kind of forgot about the copyright issue. but it’s deeply behind in everything current evals don’t measure. it just doesn’t intuit a lot of trivial things about the world. this is basically phi-120b.

It feels to me a lot like OpenAI got gaslit into releasing open models. Pressure from various sources added up, Twitter vibes were applied, talk of ‘America needs to lead on open models’ was coming from high places, and they felt like the bad guys for the wrong reasons. And they folded.

What happens now? It will take a bit to know exactly how good these models are, both at advancing open models including from China, and at becoming a driver of usage. Given their size, the price and speed should be quite good. The reasoning aspect seems strong. Other aspects seem worse.

My guess is that there is not that much that these models will be used for, where we are happy they are being used to do it. If you want to use a reasonably priced good model, sir, you can use Gemini 2.5 Flash or GPT-5. If you want the best, you can choose between Opus 4.1, GPT-5 and Gemini 2.5 Pro. If you have security or customization reasons to need an open weight daily driver, in this weight range, are these going to be your pick? I don’t know. Maybe? We shall see.

Discussion about this post

OpenAI’s GPT-OSS Is Already Old News Read More »

sonos-says-it’s-forced-to-raise-prices-while-trying-to-win-back-customers

Sonos says it’s forced to raise prices while trying to win back customers

During that call, Sonos CFO Saori Casey said that the company expects “tariff expenses will be approximately $5 million in Q4.” In Sonos’ fiscal Q3, it paid $3.5 million in tariffs, Casey said.

Sonos is still recovering from app problems

Since July 2024, when Sonos’ then-CEO Patrick Spence admitted that a software update inadvertently broke many Sonos devices, the company has been trying to prove to customers and investors that its pricey audio devices are still worth buying.

During the earnings call, Conrad said he believes the value of Sonos gadgets “compounds over time, thanks to the kinds of software updates that deliver new experiences.” But a widely reviled app update last year damaged Sonos’ reputation in this area. The update stripped the app of some basic features, such as the ability to edit playlists and song queues, and many Sonos devices, especially older ones, stopped functioning properly.

Meanwhile, Sonos hasn’t released a new product since the Arc Ultra soundbar and Sub 4 subwoofer in October 2024. In March, reports surfaced that Sonos axed its streaming video player. Conrad told investors yesterday that Sonos has a release roadmap going beyond its 2026 fiscal year. Any devices in that roadmap, however, will be challenged to sell customers on their software, long-term reliability, and price.

Customers may cut Sonos some slack, considering the widespread impact that tariffs are expected to have on electronics pricing. In May, the Trump administration axed the de minimis exemption that enabled duty-free imports of goods worth $800 or less, impacting electronics such as PC peripherals and DIY parts. Currently, the US and China have paused tariffs as the countries look to reach an agreement by August 12. At that time, goods imported from China could face tariffs as high as 145 percent, which would significantly impact the prices of most electronics sold in the US.

But Sonos is already struggling to release and sell new products at high prices, so raising them even higher could further harm the company.

“We lost the momentum in 2024. We’re starting to get it back, and we’re going to accelerate our pace from here,” Conrad said.

Sonos says it’s forced to raise prices while trying to win back customers Read More »

trump-wanted-a-us-made-iphone-apple-gave-him-a-gold-statue.

Trump wanted a US-made iPhone. Apple gave him a gold statue.

Once again, Apple escapes Trump’s iPhone pressure

Since Trump took office, analysts have suggested that Cook might be the tech CEO best prepared to navigate Trump’s trade war.

During Trump’s last term, Cook launched a charm offensive, wooing Trump with investment commitments to avoid caving to Trump’s demands for US-made iPhones while securing tariff exemptions.

Back then, Apple notably seemed to avoid following through on some of its commitments, abandoning plans to build three “big, beautiful” Apple plants that Trump announced in 2017. Ultimately, only one plant was built, which made face masks, not Apple products. Similarly, in 2019, Trump toured a Texas facility that he claimed could be used to build iPhones, but Apple only committed to building MacBook Pros there, not the Apple product that Trump sees as the crown jewel of his domestic supply chain dreams.

This time, Apple has committed to a total investment of $600 billion to move more manufacturing into the US over the next four years. But Apple was probably going to spend that money anyway, as “analysts say the numbers align with Apple’s typical spending patterns and echo commitments made during both the Biden administration and Trump’s previous term,” Reuters reported.

Trump has claimed that any company found to be dodging pledges will be retroactively charged tariffs if they fail to follow through on investments. However, Apple seems to be chugging along with its usual business in the US, while manufacturing iPhones elsewhere probably wouldn’t change the tariff calculus, as it is now.

So at least at this stage of Cook and Trump’s friendship, it appears that Apple has once again secured exemptions without committing to building a US-made iPhone or even committing significant new investments.

On Wednesday, at least one analyst—Nancy Tengler, CEO and CIO of Laffer Tengler Investments, which holds Apple shares—told Reuters that Apple’s moves this week were “a savvy solution to the president’s demand that Apple manufacture all iPhones in the US.”

Trump wanted a US-made iPhone. Apple gave him a gold statue. Read More »

rip-to-the-macintosh-hd-hard-drive-icon,-2000–2025

RIP to the Macintosh HD hard drive icon, 2000–2025

That version of the icon persisted through the Apple Silicon-era Big Sur redesign and was still with us in the first public beta build for macOS 26 Tahoe that Apple released last week. The new beta also updates the icons for external drives (orange, with a USB-C connector on top), network shares (blue, with a globe on top), and removable disk images (white, with an arrow on top).

All of the system’s disk icons get an update in the latest macOS 26 Tahoe developer beta. Credit: Apple/Andrew Cunningham

Other icons that reused or riffed on the old hard drive icon have also been changed. Disk Utility now looks like a wrench tightening an Apple-branded white bolt, for some reason, and drive icons within Disk Utility also have the new SSD-esque icon. Installer apps use the new icon instead of the old one. Navigate to the /System/Library/CoreServices folder where many of the built-in operating system icons live, and you can see a bunch of others that exchange the old HDD icon for the new SSD.

Apple first offered a Mac with an SSD in 2008, when the original MacBook Air came out. By the time “Retina” Macs began arriving in the early 2010s, SSDs had become the primary boot disk for most of them; laptops tended to be all-SSD, while desktops could be configured with an SSD or a hybrid Fusion Drive that used an SSD as boot media and an HDD for mass storage. Apple stopped shipping spinning hard drives entirely when the last of the Intel iMacs went away.

This doesn’t actually matter much. The old icon didn’t look much like the SSD in your Mac, and the new one doesn’t really look like the SSD in your Mac either. But we didn’t want to let the old icon’s passing go unremarked. So, thanks for the memories, Macintosh HD hard drive icon! Keep on spinning, wherever you are.

RIP to the Macintosh HD hard drive icon, 2000–2025 Read More »

analysis:-the-trump-administration’s-assault-on-climate-action

Analysis: The Trump administration’s assault on climate action


Official actions don’t challenge science, while unofficial docs muddy the waters.

Last week, the Environmental Protection Agency made lots of headlines by rejecting the document that establishes its ability to regulate the greenhouse gases that are warming our climate. While the legal assault on regulations grabbed most of the attention, it was paired with two other actions that targeted other aspects of climate change: the science underlying our current understanding of the dramatic warming the Earth is experiencing, and the renewable energy that represents our best chance of limiting this warming.

Collectively, these actions illuminate the administration’s strategy for dealing with a problem that it would prefer to believe doesn’t exist, despite our extensive documentation of its reality. They also show how the administration is tailoring its approach to different audiences, including the audience of one who is demanding inaction.

When in doubt, make something up

The simplest thing to understand is an action by the Department of the Interior, which handles permitting for energy projects on federal land—including wind and solar, both onshore and off. That has placed the Interior in an awkward position. Wind and solar are now generally the cheapest ways to generate electricity and are currently in the process of a spectacular boom, with solar now accounting for over 80 percent of the newly installed capacity in the US.

Yet, when Trump issued an executive order declaring an energy emergency, wind and solar were notably excluded as potential solutions. Language from Trump and other administration officials has also made it clear that renewable energy is viewed as an impediment to the administration’s pro-fossil fuel agenda.

But shutting down federal permitting for renewable energy with little more than “we don’t like it” as justification could run afoul of rules that forbid government decisions from being “arbitrary and capricious.” This may explain why the government gave up on its attempts to block the ongoing construction of an offshore wind farm in New York waters.

On Friday, the Interior announced that it had settled on a less arbitrary justification for blocking renewable energy on public land: energy density. Given a metric of land use per megawatt, wind and solar are less efficient than nuclear plants we can’t manage to build on time or budget, and therefore “environmentally damaging” and an inefficient use of federal land, according to the new logic. “The Department will now consider proposed energy project’s capacity density when assessing the project’s potential energy benefits to the nation and impacts to the environment and wildlife,” Interior declared.

This is only marginally more reasonable than Interior Secretary Doug Burgum’s apparent inability to recognize that solar power can be stored in batteries. But it has three features that will be recurring themes. There’s at least a token attempt to provide a justification that might survive the inevitable lawsuits, while at the same time providing fodder for the culture war that many in the administration demand. And it avoids directly attacking the science that initially motivated the push toward renewables.

Energy vs. the climate

That’s not to say that climate change isn’t in for attack. It’s just that the attacks are being strategically separated from the decisions that might produce a lawsuit. Last week, the burden of taking on extremely well-understood and supported science fell to the Department of Energy, which released a report on climate “science” to coincide with the EPA’s decision to give up on attempts to regulate greenhouse gases.

For those who have followed public debates over climate change, looking at the author list—John Christy, Judith Curry, Steven Koonin, Ross McKitrick, and Roy Spencer—will give you a very clear picture of what to expect. Spencer is a creationist, raising questions about his ability to evaluate any science free from his personal biases. (He has also said, “My job has helped save our economy from the economic ravages of out-of-control environmental extremism,” so it’s not just biology where he’s got these issues.) McKitrick is an economist who engaged in a multi-year attempt to raise doubt about the prominent “hockey stick” reconstruction of past climates, even as scientists were replicating the results. Etc.

The report is a master class in arbitrary and capricious decision-making applied to science. Sometimes the authors rely on the peer-reviewed literature. Other times they perform their own analysis for this document, in some cases coming up with almost comically random metrics for data. (Example: “We examine occurrences of 5-day deluges as follows. Taking the Pacific coast as an example, a 130-year span contains 26 5-year intervals. At each location we computed the 5-day precipitation totals throughout the year and selected the 26 highest values across the sample.” Why five days? Five-year intervals? Who knows.)

This is especially striking in a few cases where the authors choose references that were published a few years ago, and thus neatly avoid the dramatic temperature records that have been set over the past couple of years. Similarly, they sometimes use regional measures and sometimes use global ones. They demand long-term data in some contexts, while getting excited about two years of coral growth in the Great Barrier Reef. The authors highlight the fact that US tide gauges don’t show any indication of an acceleration in the rate of sea level rise while ignoring the fact that global satellite measures clearly do.

That’s not to say that there aren’t other problems. There’s some blatant misinformation, like claims that urbanization could be distorting the warming, which has already been tested extensively. (Notably, warming is most intense in the sparsely populated Arctic.) There’s also some creative use of language, like referring to the ocean acidification caused by CO2 as “neutralizing ocean alkalinity.”

But the biggest bit of misinformation comes in the introduction, where the secretary of energy, Chris Wright, said of the authors, “I chose them for their rigor, honesty, and willingness to elevate the debate.” There is no reason to choose this group of marginal contrarians except the knowledge that they’d produce a report like this, thus providing a justification for those in the administration who want to believe it’s all a scam.

No science needed

The critical feature of the Department of Energy report is that it contains no policy actions; it’s purely about trying to undercut well-understood climate science. This means the questionable analyses in the report shouldn’t ever end up being tested in court.

That’s in contrast to the decision to withdraw the EPA’s endangerment finding regarding greenhouse gases. There’s quite an extensive history to the endangerment finding, but briefly, it’s the product of a Supreme Court decision (Massachusetts v. EPA), which compelled the EPA to evaluate whether greenhouse gases posed a threat to the US population as defined in the Clean Air Act. Both the Bush and Obama EPAs did so, thus enabling the regulation of greenhouse gases, including carbon dioxide.

Despite the claims in the Department of Energy report, there is comprehensive evidence that greenhouse gases are causing problems in the US, ranging from extreme weather to sea level rise. So while the EPA mentions the Department of Energy’s work a number of times, the actual action being taken skips over the science and focuses on legal issues. In doing so, it creates a false history where the endangerment finding had no legal foundation.

To re-recap, the Supreme Court determined that this evaluation was required by the Clean Air Act. George W. Bush’s administration performed the analysis and reached the exact same conclusion as the Obama administration (though the former chose to ignore those conclusions). Yet Trump’s EPA is calling the endangerment finding “an unprecedented move” by the Obama administration that involved “mental leaps” and “ignored Congress’ clear intent.” And the EPA presents the findings as strategic, “the only way the Obama-Biden Administration could access EPA’s authority to regulate,” rather than compelled by scientific evidence.

Fundamentally, it’s an ahistorical presentation; the EPA is counting on nobody remembering what actually happened.

The announcement doesn’t get much better when it comes to the future. The only immediate change will be an end to any attempts to regulate carbon emissions from motor vehicles, since regulations for power plants had been on hold due to court challenges. Yet somehow, the EPA’s statement claims that this absence of regulation imposed costs on people. “The Endangerment Finding has also played a significant role in EPA’s justification of regulations of other sources beyond cars and trucks, resulting in additional costly burdens on American families and businesses,” it said.

We’re still endangered

Overall, the announcements made last week provide a clear picture of how the administration intends to avoid addressing climate change and cripple the responses started by previous administrations. Outside of the policy arena, it will question the science and use partisan misinformation to rally its supporters for the fight. But it recognizes that these approaches aren’t flying when it comes to the courts.

So it will separately pursue a legal approach that seeks to undercut the ability of anyone, including private businesses, to address climate change, crafting “reasons” for its decisions in a way that might survive legal challenge—because these actions are almost certain to be challenged in court. And that may be the ultimate goal. The current court has shown a near-complete disinterest in respecting precedent and has issued a string of decisions that severely limit the EPA. It’s quite possible that the court will simply throw out the prior decision that compelled the government to issue an endangerment finding in the first place.

If that’s left in place, then any ensuing administrations can simply issue a new endangerment finding. If anything, the effects of climate change on the US population have become more obvious, and the scientific understanding of human-driven warming has solidified since the Bush administration first acknowledged them.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Analysis: The Trump administration’s assault on climate action Read More »

childhood-and-education-#13:-college

Childhood and Education #13: College

There’s a time and a place for everything. It used to be called college.

  1. The Big Test.

  2. Testing, Testing.

  3. Legalized Cheating On the Big Test.

  4. What Happens When You Don’t Test For Academics.

  5. What Happens Without Academic Standards.

  6. Another Academic Standard Perhaps.

  7. RIP Columbia Core Curriculum and Also Social Theory.

  8. College Tuition and Costs.

  9. Negotiation.

  10. Skipping College.

  11. Respect Their Authoritah.

  12. Men Skipping College.

  13. Stanford Still Hates Fun.

  14. Value of College.

  15. Employment Prospects After College.

  16. Fixing College.

  17. Do Not Donate To A College.

  18. Not Doing The Math.

I am continuing to come around to the high-stakes-in-person-exam (or series of such exams) as the only practical solution to AI, also it was probably mostly the right answer already.

Sean T: It’s only cheat-able because classes are set up to be cheat-able. Offer in-class exams and oral presentations and it is not cheat-able. OSU stats classes were switching to group projects in lieu of finals even before ChatGPT because it’s such an important skill in the workplace.

Nate Silver: I took my junior year in London and the whole system there was basically one high-stakes in-person exam at the end of the year. That’s probably what I’d do if I taught a class now, plus opportunities to get your grade rounded up with class participation.

My guess we should give students the opportunity to create a floor in other ways. Essentially I think of this as a deal – if you do the assignments and participation and so on in a way that demonstrates effort, and something goes wrong, we will soften the blow for you, which also encourages students in danger not to skip them. But if you’re confident, then that’s all indicative, and only the test matters.

An alternative theory of how LLMs take tests, also great practical advice.

David Chapman: My father, a high school English teacher, once took and aced the AP Physics exam with zero knowledge of the subject, to prove a point: you do well on standardized tests by knowing how to take tests.

LLMs know how to take tests.

🎓 How my father did it: a thread.

🎓 How to ace the AP Physics test without knowing physics: first of two answers to a challenge.

The correct answer has different typography than the wrong ones (extra space around ✖️). This is common! Test writers screw up frequently.

🎓 How to ace the AP Physics test without knowing any physics, second answer: reasoning about the question, not the content. These heuristics are quite reliable! Several other people in the thread answered similarly.

🎓 How to win at any multiple choice test without knowing anything. At least two of the answers are always wildly wrong and you can eliminate them with basic sanity checks. And the correct answer is usually a medium value.

🎓 How to ace the AP Physics test without knowing any physics, part three: you get the highest score if you answer 70% correct. If you eliminate 3/5 clearly wrong answers on each question, that’s 60%, so you only need to get a few more actually right.

My counterargument would be that physics is the best case scenario. You do indeed know a lot about physics, because the world is made of physics. The tricks work great.

Most other subjects are not like that. You cannot get as far.

ACT makes science portion optional. Also scores keep going down.

This is the world we have created, in so many ways, and you wonder why students are so eager to use ChatGPT.

Rachel Cohen: incredible quote in the wsj on testing accommodations.

I am less scared by the ease of cheating and more by the view of not cheating as meaning you don’t love your kids. I am not so concerned about people getting extra time, as for most tests the deadline should be a mercy to prevent students from staying there for days, rather than costing you a lot of points.

The historical rate of cheating was not low, although far from universal.

Nate Silver: Using a VERY broad definition of cheating, did you cheat, even a little tiny bit, in college?

Whereas, this is the grading system Working as Designed, or at least how it should have been designed:

Astra: A childhood story that my mom reminds me of is that in grade school I had poor grades in CS because typing tests were weighted heavily, but then suddenly my grades got a lot better because I discovered that the test results were stored in an accessible local file.

I was worse at typing than anybody else in the class by a huge margin [also spelling].

Eigil: They did an xkcd 2385 in real life.

Gregori128: The purpose of a system is what it does.

Then there’s the outright mandatory gaming of the system:

Amanda Askell: I’m sure people would get mad if schools offered classes in “how to game tests” but the art of gaming tests is basically just general purpose problem solving and probably something we should teach kids to do. Might even force us to improve our tests.

Well, actually, we totally do lots of forms of this now, except that too much of it is specialized rather than general. So we can’t even do this right.

Stanford introduced a remedial math course in 2022. Given the applicant pool there should be no such class. Why are we admitting enough students who need this class to have an entire class? If you’re worth the exception you’re worth hiring tutors or finding some other way.

Alex Tabarrok: At first there will be remedial math courses because the professors remember past cohorts of students but over time remedial will become average, professors will forget and pretty soon almost everyone will believe it has always been thus.

Patrick Collison: This week, a math professor at MIT told me that incoming students are, on average, noticeably worse at math than they used to be.

Harvard, of course, just added a remedial math class, Math MA5, “aimed at rectifying a lack of foundational algebra skills among students”.

A look back at an early 20th century middle school exam. If the kids can pass this, then on those subjects I’d be satisfied the kids are all right.

In The New York Times, Jonathan Malesic claims There’s a Very Good Reason College Students Don’t Read Anymore, and the reason seems to be they’re no longer required to do it, with the author cutting down from nine required books to none, but vowing to return to one next term?

Or rather, the argument is that what you learn in school does not matter?

Jonathan Malesic: Once students graduate, the jobs they most ardently desire are in what they proudly call the “sellout” fields of finance, consulting and tech. To outsiders, these industries are abstract and opaque, trading on bluster and jargon. One thing is certain, though: That’s where the money is.

All in all, it looks as if success follows not from knowledge and skill but from luck, hype and access to the right companies. If this is the economy students believe they’re entering, then why should they make the effort to read? For that matter, how will any effort in school prepare them for careers in which, apparently, effort is not rewarded?

Given all this, it’s easy to lose faith in humanistic learning.

In which case, um, why have a college at all, if you’re not going to force students to do the things they wouldn’t be doing anyway that you think is good for them? Did you think students used to read books because the big paying jobs wouldn’t hire you unless you had mastered The Iliad?

And why would you think that finance, consulting and tech are luck? They are very much not luck at all. Your success in school matters on the job market quite a bit, as do your knowledge and skill. It’s just not the knowledge and skills that Malesic teaches, which is fine, but it also never was.

A paper examines the impact of remote learning on the beauty premium for university students.

Highlights:

I examine the relationship between university students’ appearance and grades.

When education is in-person, attractive students receive higher grades.

The effect is only present in courses with significant teacher–student interaction.

Grades of attractive females declined when teaching was conducted remotely.

For males, there was a beauty premium even after the switch to online teaching.

Abstract

This paper examines the role of student facial attractiveness on academic outcomes under various forms of instruction, using data from engineering students in Sweden. When education is in-person, attractive students receive higher grades in non-quantitative subjects, in which teachers tend to interact more with students compared to quantitative courses. This finding holds both for males and females. When instruction moved online during the COVID-19 pandemic, the grades of attractive female students deteriorated in non-quantitative subjects. However, the beauty premium persisted for males, suggesting that discrimination is a salient factor in explaining the grade beauty premium for females only.

Taken together, these findings suggest that the return to facial beauty is likely to be primarily due to discrimination for females, and the result of a productive trait for males.

I find the ‘interact more’ hypothesis amusing, since it’s a strange way of saying ‘professor can (at least within reason) make up whatever grades they want.’

A plausible hypothesis for the productive trait in males is confidence, and a willingness to interact and work the system, that survives not being physically proximate in a way that similar female strategies do not. Preference for interaction in males could be much more strongly tied to attractiveness than in females, for several obvious reasons.

When I was forced to endure the Columbia Core Curriculum, I would not say I enjoyed my experience, but I understood why most of the books were there. Claude offers this summary, which mostly matches my experience but not entirely, and the ones that were added as optional (like Rawls and Marquez) seemed quite bad:

Literature Humanities: Homer (Iliad, Odyssey), Aeschylus, Sophocles, Euripides, Herodotus, Thucydides, Aristophanes, Plato, Aristotle, Virgil, Ovid, Augustine, Dante, Boccaccio, Montaigne, Shakespeare, Cervantes, Milton, Austen, Dostoevsky, Woolf

Contemporary Civilization: Plato, Aristotle, Bible, Augustine, Aquinas, Machiavelli, Descartes, Hobbes, Locke, Hume, Rousseau, Smith, Kant, Burke, Wollstonecraft, Mill, Marx, Nietzsche, Freud, Du Bois, de Beauvoir

Whereas now, well, this does not seem like it is Doing The Thing at all. It seems like it is doing a very different, deeply ideological thing, in at least the relevant section: Indoctrinating students into degrowth?

Also, yeah, if the most important texts on ‘social theory’ are all about degrowth, or even if that is a plausible claim to make, then we need to ‘degrow’ ‘social theory’, with starting over afterwards being optional.

Jason Kerwin: If the most important texts on social theory include three separate pieces on “degrowth” then it’s time to stop doing social theory

Someone else tried to defend the curriculum by scrolling up a bit:

So there’s still some of the real thing there, such as Smith and Kant, and I’m going to guess if you look at the Fall term, once we go back before there is an America to despise, that part survived more intact. This is still overall a very different mandatory product, clearly with a very different mandatory goal.

Tuition is going… down?

We will see if this is sustained, but tuition is at least not going substantially up in inflation-adjusted terms. Debt is going down.

Tyler Cowen: As might be expected, the trajectory for student debt is down as well. About half of last year’s graduates had no student debt. In 2013, only 40% did.

Public first-time, full-time, in-state tuition in four year colleges is both highly affordable and declining rapidly in real terms, in terms of what the students actually pay?

Stefan Schubert: Huge drop in the cost of public college in the US, virtually unreported.

The average inflation-adjusted net tuition and fees paid by first-time, full-time, in-state students enrolled in public four-year institutions:

2012–13: $4,340

2024–25: $2,480

[Nonprofit schools declined from $19,330 in 2006-7 (in 2024 dollars) to $16,510 this year.]

That’s not the story you typically hear. Borrowing is down too:

Graduate school is a different story here. Increasingly the forward-looking student debt problem looks like a graduate school and private (often for profit) college problem. Paying $10k total for four years of college is a fantastic deal, and not an amount that should be hard to repay. The actual catch is that you have to spend those four years learning rather than working.

It costs a lot to run schools these days, far more than it seems like it should cost. A lot of that is lots of administrators, it still seems like there is a gap to explain though?

Daniel Buck: Teachers THINK we spend ~$7500 per student

National average is actually ~$15,000 per student. Chicago spends ~$30,000 per student

How can we have an honest conversation about education when teachers themselves get basic facts this wrong?!

The Obama Administration had a great idea, which was to encourage Inclusive Access, a program where tuition includes the cost of their textbooks. This simplifies, creates price transparency, allows for aid to be calculated property, avoids students choosing classes or skimping on books to save money, and aligns incentives generally.

It seems obviously correct.

The Biden Administration disagrees, as part of its ongoing determination to screw up basic economic efficiency and functionality. They want to ban such programs. Some people frame the question of which way works better as complex. I do not think this is complex. At least this time they are not trying to steal a trillion dollars of taxpayer money to give mostly to wealthy party supporters, as they did in student loan forgiveness.

Here’s exactly what UPenn actually costs, huge props for laying this out there.

Jordan Weissmann: This kind of transparency in college pricing would be great.

But the reality is that the majority of private institutions can’t do it, because they use ‘merit aid’ as a form of dynamic pricing where they just try to maximize what kids will pay.

It seems kind of insane that it’s $30k a year in charges even if you don’t pay ‘tuition’? I mean wow, seriously. It seems like about $7800 of that is actually just de facto tuition that they call ‘fees,’ and the meal plan costs $6534 and is mandatory for two years, which students called a ‘blatant cash grab,’ as is the housing.

Looking at this as a marginal tax rate, we see a huge cliff at $75k, with an instant overall 39% tax rate attaching if you cross that boundary, and overall on the first $400k the tax rate is 23%. That’s all on pretax income, not post.

Did you know that once accepted at a college, you can absolutely negotiate?

Eric Nelson: My daughter got a 50% scholarship to a private national university and she wrote to them saying, “You’re my top choice and I’ll enroll immediately if you give me $20k more.” And they did. Who knew?!

[it was a] merit [scholarship]. And yes, she sent a very nice letter saying another school had given her more.

Eddie Gallagher: My daughter got a full ride to Seton Hall but toured in February and it was freezing. Then she toured at the University of San Diego and fell in love. She wrote an email to the University President and explained the situation. The next week she got the President’s scholarship.

For the students a college actually wants, such as those getting merit scholarships, once they’ve already committed the slot they realize a large surplus when you say yes. So it makes sense to try and negotiate a bit. They’re not going to rescind the offer.

On a purely self-interested basis who should skip college? Bryan Caplan notes that the jobs not requiring degrees are often quite solid, you shouldn’t let people potentially looking down on you distract you from that fact. Nor should you let others tell you whether you are ‘smart,’ you know better than they do.

He then focuses on the sheepskin effect, that most of the selfish benefits of college come when you graduate, and that if you pick an easy major you probably won’t see much benefit even then. So he sensibly suggests that you should go to college only if you have what it takes to reliably graduate with a ‘real’ major, which he equates to roughly an SAT of 1200, adjust your score 50-100 points for grades and motivation. And if you think you’re going to get through via an extraordinary effort, why not put that effort elsewhere?

I would take this one step further. In the Caplan model, the alternative to college is getting a standard job and career track, like trying to be a manager at Panda Express. Not that there’s anything wrong with that.

But you can almost certainly do better if you go to trade school and move towards a job like plumber or electrician. The middle path is basically free money.

And then there’s the other path, which is entrepreneur, of starting a business, which can be a startup but in no way needs to be one. This is The Way, if you can do it.

Or you can do any number of other things, such as learn to code, play poker, et cetra.

Skipping college to go straight into a job you would have wanted anyway, or at least spending less time in school first, is the definition of nice work if you can get it, if that work is indeed sufficiently nice.

The trick is convincing them to let you do that, and it makes sense that some employers like Palantir are trying to hire right out of high school instead.

Ruxandra Teslo argues this is all also feminist, because it allows women to get into position to start families while still within their fertility window, but as she says this benefits everyone. Who wouldn’t want to start real work at 18 instead of 24?

Flo Crivello outright makes the case that essentially no one should ever go to college and people should start working at age 14-16 as he did, that working is a better education than a formal education, and noting that Ben Franklin started working full-time at 12, Carnegie at 13 and Rockefeller at 16. Whereas the modern plan is that college and even your 20s are for ‘fucking around’ and this does not go great especially for fertility but also for producing value. The problem is getting out of the signaling trap.

Another reason to skip college: What are they largely trying to teach you?

Near Cyan: the RLHF that colleges perform on smart students seems particularly bad not just because the goals are artificial and gameable, but also because it encourages a default operational loop of “wait for an authority figure to tell you what to do next.”

tunient: Yeah training myself out of hacking bad tests assigned by authority figures seems like a pretty hard (but important) task, not sure exactly how to go about it though.

Main: It is so unbelievably hard to break this in people I wish I knew how to fix them.

Near Cyan: some good examples i’ve seen were putting them by people who they can relate to but are much higher agency. but it still takes a long of time and doesn’t scale well.

A huge fraction of the smartest people I know spend their lives trying to recover from this, generally with mixed success. You can mitigate, but you can’t entirely cure.

Why are so few men going to college? There are essentially three theories.

  1. Men and our educational system don’t mix, it favors female talents and values.

  2. Men face bad incentives and are making rational decisions.

  3. Men are idiots.

These theories are not exclusive.

Celeste Davis offers a potential new version of a mix of theory two and theory three: Male flight.

Dr. Anne Lincoln: There was really only one variable where I found an effect, and that was the proportion of women already enrolled in vet med schools… So a young male student says he’s going to visit a school and when he sees a classroom with a lot of women he changes his choice of graduate school. That’s what the findings indicate…. what’s really driving feminization of the field is ‘preemptive flight’—men not applying because of women’s increasing enrollment.”

Celeste Davis: For every 1% increase in the proportion of women in the student body, 1.7 fewer men applied. One more woman applying was a greater deterrent than $1000 in extra tuition!

The rational decision version of this is a prestige and robustness story. Here are two stories given of male flight.

  • Interior Design. William Morris is considered the father of interior design. After finishing his education at Oxford, he began an architectural design school called “the Firm”— just for men. Many universities had interior design programs. Until women began to enter the design space, at which point it was relegated to a mere “hobby.” Since the influx of women, interior design programs have been pulled from almost all universities.

  • Teaching. In the 18th century, schooling in colonial America was reserved for the white and the wealthy. Most tutors were men who taught boys. By the middle of the 19th century, girls started becoming students and women became teachers. Consequently, men swiftly left the profession, the pay dropped and teaching was no longer considered a prestigious occupation.

If a profession becomes lower paying and lower prestige, there is a lot less reason to go into that profession. A large influx of new students is a lot of extra supply, so it is inevitable that at least pay will go down. And if prestige is also doomed to go down, or the profession will now damage your outlook in the dating market, then however unfair that is, that’s another good reason to bolt.

On the other hand, the local story is very much in category three. You, a straight unmarried man, didn’t study that because you would have been taking the class with a bunch of college women? Yeah, seriously, what an idiot. It’s one thing for men not to want to be in places that men inherently don’t want to be, that makes sense, A is A. But to run from a place you’d otherwise want to be, because too many women? You can say ‘they are worried people won’t think you’re manly’ or cite whatever ‘masculinity norms’ you want, it’s all shorthand for What an Idiot.

The same goes for college in general. If you are a man and hear that a college is 60/40 female, and you think ‘oh that means I would have a worse time there,’ then, again: What an Idiot. And then it happens again on the dating market.

If the men don’t see the value in that, they probably shouldn’t go to college after all. They clearly are not smart enough.

Julia Steinberg did an excellent job interviewing new Stanford President Levin, discovering yet more reasons to skip college.

I’m hopeful to see a game theorist running the place. But his answer about how he is using game theory is a bunch of generic contentless slop, as Tyler Cowen correctly noticed was Levin’s general practice throughout.

Another question was about the COLLEGE curriculum, where students are ‘contract graded’ gets an automatic A if they turn their work in on time ‘regardless of quality,’ seriously what the hell is that? If you want to be pass/fail, I hate that but at least actually be pass/fail, everyone getting an A makes a mockery of the concept of grades. The very name ‘contract graded’ is a dystopian nightmare. Levin tries to say this is ‘the best tradition of something at Stanford’ to try new things out, and iterate and improve, but that’s not an excuse, nor does he show any sign he understands the problem.

He is challenged that professors on “Democracy day” turned it into a mandatory Harris campaign event, and he says that was ‘choices’ of professors so it’s fine. He’s challenged that donations are 96% democratic and he blames the zip code. He says the giant ‘No Justice No Peace Banner’ can indefinitely hang because it’s advertising an exhibit. He later says they need to be ‘open to’ debate from ‘all ideologies’ but I see no sign he has any indication of making that happen?

On AI, he tries to pretend that Stanford is still relevant, rather than it having almost no chips and its top professors abandoning academia for business. And he tries to have it both ways, with AI changing education while Stanford somehow continues to make sense and keep its people employed.

His refusal to even say the best or worst dorm is the central answer here. No fun!

Or perhaps it is this, classic, chef’s kiss:

Stanford Review: What is the most important problem in the world right now?

President Levin: There’s no answer to that question. There are too many important problems to give you a single answer.

Stanford Review: That is an application question that we have to answer to apply here.

President Levin: Here’s a non-answer to your question.

[which was in effect to say ‘the question you are working on right now.’]

He also wouldn’t name a favorite class or give any concrete prediction. What a tool.

Another reason to maybe skip college: The wage premium for going to college for lower-income students has halved since 1960. Higher-income students take more profitable majors at better colleges now, so they benefit a lot more.

Tyler Cowen speculates this could be because the population is ‘more sorted,’ which implies a lot of the old premium was getting more out of the signaling mechanism combined with a sorting effect, and also that the students who did go to college were in better position to benefit. The paper suggests it is because we’ve neglected the lower level universities.

No, philosophy and art history are not good ideas for staying employed. Sorry.

Business News (warning: misleading): According to the Federal Reserve Bank of New York, the college majors with the lowest unemployment rates for the calendar year 2023 were nutrition sciences, construction services, and animal/plant sciences.

Each of these majors had unemployment rates of 1% or lower among college graduates ages 22 to 27. Art history had an unemployment rate of 3% and philosophy of 3.2%…

Meanwhile, college majors in computer science, chemistry, and physics had much higher unemployment rates of 6% or higher post-graduation. Computer science and computer engineering students had unemployment rates of 6.1% and 7.5%, respectively…

Seth Burn: Yay philosophy!

Tyler Cowen: Here is the full story. Why is this? Are the art history majors so employable? Or are there options so limited they don’t engage in much search and just take a job right away?

Kyla Scanlon: “Majors in nutrition, art history and philosophy all outperformed STEM fields when it comes to employment prospects.”

Scott (commenting on this in MR): These articles are looking only at unemployment and neglecting the abysmal underemployment rates for graduates of these majors. Per that same Fed report: Nutrition Sciences (46.8%), Art History (46.9%), Philosophy (41.2%). Underemployment for the majors they mock: Computer Science (16.5%), Computer Engineering (17%), Economics (31.9%), Finance (31.5%). One has to suffer from serious innumeracy to think that Fed report suggests the conventional (parental) wisdom about employable majors is way off base and that more kids should be majoring in the humanities.

Brent (MR): Art history students are aware of the lack of jobs in their field and, thus will take any employment opportunities, such as a baristas, and at lower wages. Those in STEM are holding out for meaning jobs related to their studies that pay a better wage.

Joe Flaherty: It is also worth noting that STEM majors make nearly 2X as much as those in the majors mentioned and suffer from much lower rates of underemployment.

Consider those last two lines. Yes, philosophy has a lot less unemployment, but I’d much rather be in the physics group. The median wage is almost 50% higher and the right tail is big if you pivot into tech or finance. I do think the higher unemployment is that art history or philosophy majors know they can’t hold out for the jobs they most want, whereas the physics and computer science majors can hold out.

Tyler Cowen says we need a revolution in higher education, and we will know it when we see top universities stop thinking about teaching in terms of satisfying a fixed ‘class load’ and start rewarding innovation and adapting to what makes sense for teaching a given subject.

The post is confusing to me because it has the implicit background assumption that college is about learning things in classes, or that teaching things in classes is a large part of the job of a professor. I strongly agree the system could do a much better job of teaching material to students, but my presumption is that it is not so interested in doing that, either in relation to AI or otherwise.

Hollis Robbins argues that business metrics broke the university. Colleges increasingly started maximizing for student outcomes, prestige and other KPIs, and used centralized power to do it. In the process they deprioritizing getting out of the way for faculty so that departments and professors could run their own corners and power bases both to do unique work and advocate distinct positions. Which also meant that there was nothing to stop various ideological pressures coming from certain parts of the faculty and student body from overrunning the campus.

If you’re trying to Do Good, donating to your Alma Mater is deeply foolish.

So, if you are a college, what do you do when people stop feeling obligated to do it?

PoliMath: I’d be curious to know what the demographics of university donations are these days

All my friends under 40 see college as a service they paid through the nose for and that transaction completed on graduation

Donating more money to them feels like donating to a car dealership

They’re not wrong. I’m happy kids have realized this is stupid, and they’ve already been robbed enough. The only reason to donate is to get your kids into that college. But that’s a rather dim motivation at this point. You don’t get that huge an edge unless you’re paying through the nose, you don’t know that edge gets sustained, you don’t know your kids will want to go there or even go to college at all.

So, what’s next? I don’t think Eliezer’s suggestion here works at all, but it’s fun to think about it.

Eliezer Yudkowsky: An obvious evolution of the institution would be for them to make a big deal out of revoking some degrees over minor shit, and then heavily hint that alumni donors are safe. You’d no longer own a degree; you’d rent one, just like you no longer own the software you use.

Student loans are a glorious thing for them — they control your ability to get a job, so why shouldn’t they demand a nice portion of the first 20 years’ receipts? But with degrees that you have to pay to keep, they could scale the required payments even further, just like they scale back financial aid if you have a scholarship.

The degree is some combination of education, socialization and signaling. The first two can’t be taken away from you via degree revocation. So what this takes away is the signal, but mostly that signal should still stand despite the revocation, especially if there’s a pattern of revoking it for dumb stuff. Almost no one will even check. And obviously, if the college can ‘hold you up’ for more money later, there’s a lot less motivation to go to college at all.

As a concrete example, if I was told I’d lose my degree if I didn’t donate, I wouldn’t give them even one cent for tribute. I’d let them revoke my degree, what the hell do I care, also fyou.

A lot of the math has been cancelled, because the math is being done at universities.

Terence Tao: The current administration in the US has, through various funding agencies such as the NSF and NIH, has recently suspended virtually all federal grants to my home university, UCLA (including my own personal grant, although that is far from the most serious impact of this decision), on the grounds that UCLA was “failing to promote a research environment free of antisemitism and bias”.

One can certainly debate whether these grounds were justified, or whether they merit the extremely draconian damage to the very research environment that this decision is claiming to protect, but if nothing else this unprecedented decision does not appear to have followed the usual standards of due process for actions of this nature; for instance, there appears to have been no good faith effort by the administration to receive a response from UCLA to its allegations before implementing its decision.

The suspension of my personal grant has a non-trivial impact on myself (in particular, my summer salary, which I had already deferred in order to allow the previously released NSF funds to support several of my graduate students over this period, is now in limbo), and now gives me almost no resources to support my graduate students going forward; but this is only a fraction of a percent of the entire amount being suspended.

A far greater concern is the impact on the Institute for Pure and Applied Mathematics (IPAM) https://www.ipam.ucla.edu/, which despite receiving preliminary approval earlier this year for a new five-year round of funding (albeit at significantly reduced levels) from the NSF, now only has enough emergency funding for a few months of further operation at best if the suspension is not lifted.

(More details follow)

The people defending this decision are saying, essentially, that UCLA was acting sufficiently badly that it was necessary to not give them a dollar, and if he ‘stood idly by’ while UCLA did that, it’s on him, too.

Eric Raymond: There’s been some griping here on X recently about Terrence Tao losing his research grant. And yes, I think this is a shame – I have enormous respect for the man.

But. If you stand idly by as an academic while your university engages in illegal and immoral racial discrimination, I don’t think you have any actual grounds for complaint when your failure to oppose that discrimination comes back around to bite you in the ass.

Would I like to live in a world where research funding for titans like Tao isn’t subject to political winds? Why yes, I would.

I’d also like to live in a world where the Marxists who corrupted the university system are all dead or exiled and institutions of higher learning have returned to affirming the highest values of the civilization they serve.

We won’t get the former until we get the latter.

Roon: you are doing something close to maoism ie getting mad at terry tao (a guy who is reportedly such a mathematically preoccupied egghead he can barely tie his own shoelaces) wasn’t more politically conscious about the hiring policies at his university.

it reminds me of the “it’s not enough to not be racist you have to be actively antiracist” type of diatribes, stretching back to children beating up professors of relativity at Tsinghua university for being insufficiently Marxist.

don’t let politics become totalizing or you lose your moral superiority over whatever forces of communism you are expelling.

Eric Raymond: Thank you, roon. That’s the most thoughtful response I’ve seen on this thread.

And you’d have a point if I were actually mad at Tao or wanted him to suffer. But I don’t. I’d be delighted if he collected a wealthy patron and could refrain from thinking about politics ever again.

I’m lamenting the fact that Tao and people like him didn’t oppose the Long Marchers before they became so entrenched in the universities that only Donald Trump with fire and sword could even dream of disrupting their hegemony.

Like it or not, when we join institutions – and especially when we become stars of those institutions – we get to be held partly responsible for the institution’s behavior. It has to be that way, otherwise the incentives for stars to push back against institutional behavior that veers into corruption and evil would disappear.

Tao isn’t exempt. It was on him to speak up against literal pogroms. He didn’t, and now the bill has come due.

Joe Lonsdale: I have respect for what I’ve heard of his work.

But it’s clear that UCLA broke the law in a variety of really extreme ways – read the best material from the other side for all the stories of what went on in classes and around campus. Terence must find a saner place to work!

There’s a thin line between ‘I don’t want you or your work to suffer’ and ‘you have no right to complain when it bites you in the ass [and you or your work suffers.]’

Very ‘look what you made me do’ energy, except also with very big ‘everyone be quiet or I’ll shoot this puppy’ and then without waiting you go ahead and shoot the puppy and also another puppy energy.

If you go down this road, you have shown me what your priorities are. I really really don’t want to hear about how the future is determined by whether we ‘win the AI race’ and ‘beat China.’ It seems you think the future depends more on something else.

I do think this is a good question, to the extent we are worried not about math in general but about Tao in particular, which is not what Tao is worried about:

PoliMath: Is there not a way for some rich nerd to personally fund Terence Tao? Feels like a relatively cheap status win.

Yes, one of the advantages of refusing to fund things is that in the most egregious cases, at least up to some scale of cost, someone will step up to take your place.

Discussion about this post

Childhood and Education #13: College Read More »

deepmind-reveals-genie-3-“world-model”-that-creates-real-time-interactive-simulations

DeepMind reveals Genie 3 “world model” that creates real-time interactive simulations

While no one has figured out how to make money from generative artificial intelligence, that hasn’t stopped Google DeepMind from pushing the boundaries of what’s possible with a big pile of inference. The capabilities (and costs) of these models have been on an impressive upward trajectory, a trend exemplified by the reveal of Genie 3. A mere seven months after showing off the Genie 2 “foundational world model,” which was itself a significant improvement over its predecessor, Google now has Genie 3.

With Genie 3, all it takes is a prompt or image to create an interactive world. Since the environment is continuously generated, it can be changed on the fly. You can add or change objects, alter weather conditions, or insert new characters—DeepMind calls these “promptable events.” The ability to create alterable 3D environments could make games more dynamic for players and offer developers new ways to prove out concepts and level designs. However, many in the gaming industry have expressed doubt that such tools would help.

Genie 3: building better worlds.

It’s tempting to think of Genie 3 simply as a way to create games, but DeepMind sees this as a research tool, too. Games play a significant role in the development of artificial intelligence because they provide challenging, interactive environments with measurable progress. That’s why DeepMind previously turned to games like Go and StarCraft to expand the bounds of AI.

World models take that to the next level, generating an interactive world frame by frame. This provides an opportunity to refine how AI models—including so-called “embodied agents”—behave when they encounter real-world situations. One of the primary limitations as companies work toward the goal of artificial general intelligence (AGI) is the scarcity of reliable training data. After piping basically every webpage and video on the planet into AI models, researchers are turning toward synthetic data for many applications. DeepMind believes world models could be a key part of this effort, as they can be used to train AI agents with essentially unlimited interactive worlds.

DeepMind says Genie 3 is an important advancement because it offers much higher visual fidelity than Genie 2, and it’s truly real-time. Using keyboard input, it’s possible to navigate the simulated world in 720p resolution at 24 frames per second. Perhaps even more importantly, Genie 3 can remember the world it creates.

DeepMind reveals Genie 3 “world model” that creates real-time interactive simulations Read More »