Author name: Mike M.

meta-defends-charging-fee-for-privacy-amid-showdown-with-eu

Meta defends charging fee for privacy amid showdown with EU

Meta defends charging fee for privacy amid showdown with EU

Meta continues to hit walls with its heavily scrutinized plan to comply with the European Union’s strict online competition law, the Digital Markets Act (DMA), by offering Facebook and Instagram subscriptions as an alternative for privacy-inclined users who want to opt out of ad targeting.

Today, the European Commission (EC) announced preliminary findings that Meta’s so-called “pay or consent” or “pay or OK” model—which gives users a choice to either pay for access to its platforms or give consent to collect user data to target ads—is not compliant with the DMA.

According to the EC, Meta’s advertising model violates the DMA in two ways. First, it “does not allow users to opt for a service that uses less of their personal data but is otherwise equivalent to the ‘personalized ads-based service.” And second, it “does not allow users to exercise their right to freely consent to the combination of their personal data,” the press release said.

Now, Meta will have a chance to review the EC’s evidence and defend its policy, with today’s findings kicking off a process that will take months. The EC’s investigation is expected to conclude next March. Thierry Breton, the commissioner for the internal market, said in the press release that the preliminary findings represent “another important step” to ensure Meta’s full compliance with the DMA.

“The DMA is there to give back to the users the power to decide how their data is used and ensure innovative companies can compete on equal footing with tech giants on data access,” Breton said.

A Meta spokesperson told Ars that Meta plans to fight the findings—which could trigger fines up to 10 percent of the company’s worldwide turnover, as well as fines up to 20 percent for repeat infringement if Meta loses.

Meta continues to claim that its “subscription for no ads” model was “endorsed” by the highest court in Europe, the Court of Justice of the European Union (CJEU), last year.

“Subscription for no ads follows the direction of the highest court in Europe and complies with the DMA,” Meta’s spokesperson said. “We look forward to further constructive dialogue with the European Commission to bring this investigation to a close.”

However, some critics have noted that the supposed endorsement was not an official part of the ruling and that particular case was not regarding DMA compliance.

The EC agreed that more talks were needed, writing in the press release, “the Commission continues its constructive engagement with Meta to identify a satisfactory path towards effective compliance.”

Meta defends charging fee for privacy amid showdown with EU Read More »

nasa-orders-more-tests-on-starliner,-but-says-crew-isn’t-stranded-in-space

NASA orders more tests on Starliner, but says crew isn’t stranded in space

Boeing's Starliner spacecraft is seen docked at the International Space Station on June 13.

Enlarge / Boeing’s Starliner spacecraft is seen docked at the International Space Station on June 13.

NASA and Boeing officials pushed back Friday on headlines that the commercial Starliner crew capsule is stranded at the International Space Station but said they need more time to analyze data before formally clearing the spacecraft for undocking and reentry.

Two NASA astronauts, commander Butch Wilmore and pilot Suni Williams, will spend at least a few more weeks on the space station as engineers on the ground conduct thruster tests to better understand issues with the Starliner propulsion system in orbit. Wilmore and Williams launched June 5 aboard an Atlas V rocket and docked at the station the next day, completing the first segment of Starliner’s first test flight with astronauts.

NASA managers originally planned for the Starliner spacecraft to remain docked at the space station for at least eight days, although they left open the possibility of a mission extension. The test flight is now likely to last at least a month and a half, and perhaps longer, as engineers wrestle with helium leaks and thruster glitches on Starliner’s service module.

Batteries on this Starliner spacecraft were initially only certified for a 45-day mission duration, but NASA officials said they are looking at extending the limit after confirming the batteries are functioning well.

“We have the luxury of time,” said Ken Bowersox, associate administrator for NASA’s space operations mission directorate. “We’re still in the middle of a test mission. We’re still pressing forward.”

Previously, NASA and Boeing officials delayed Starliner’s reentry and landing from mid-June, then from June 26, and now they have bypassed a potential landing opportunity in early July. Last week, NASA said in a statement that the agency’s top leadership will meet to formally review the readiness of Starliner for reentry, something that wasn’t part of the original plan.

“We’re not stuck on ISS”

Steve Stich, manager of NASA’s commercial crew program, said Friday that he wanted to clear up “misunderstandings” that led to headlines claiming the Starliner spacecraft was stuck or stranded at the space station.

“I want to make it very clear that Butch and Suni are not stranded in space,” Stich said. “Our plan is to continue to return them on Starliner and return them home at the right time. We have a little bit more work to do to get there for the final return, but they’re safe on (the) space station.”

With Starliner docked, the space station currently hosts three different crew spacecraft, including SpaceX’s Crew Dragon and Russia’s Soyuz. There are no serious plans under consideration to bring Wilmore and Williams home on a different spacecraft.

“Obviously, we have the luxury of having multiple vehicles, and we work contingency plans for lots of different cases, but right now, we’re really focused on returning Butch and Suni on Starliner,” Stich said.

“We’re not stuck on the ISS,” said Mark Nappi, Boeing’s vice president in charge of the Starliner program. “It’s pretty painful to read the things that are out there. We’ve gotten a really good test flight that’s been accomplished so far, and it’s being viewed rather negatively.”

Stich said NASA officials should have “more frequent interaction” with reporters to fill in gaps of information on the Starliner test flight. NASA’s written updates are not always timely, and often lack details and context.

NASA officials have cleared the Starliner spacecraft for an emergency return to Earth if astronauts need to evacuate the space station for safety or medical reasons. But NASA hasn’t yet approved Starliner for reentry and landing under “nominal” conditions.

“When it is a contingency situation, we’re ready to put the crew on the spacecraft and bring them home as a lifeboat,” Bowersox said. “For the nominal entry, we want to look at the data more before we make the final call to put the crew aboard the vehicle, and it’s a serious enough call that we’ll bring the senior management team together (for approval).”

NASA orders more tests on Starliner, but says crew isn’t stranded in space Read More »

scotus-kills-chevron-deference,-giving-courts-more-power-to-block-federal-rules

SCOTUS kills Chevron deference, giving courts more power to block federal rules

Supreme Court Chief Justice John Roberts and Associate Justice Sonia Sotomayor wearing their robes as they arrive for the State of the Union address.

Enlarge / Supreme Court Chief Justice John Roberts and Associate Justice Sonia Sotomayor arrive for President Joe Biden’s State of the Union address on March 7, 2024, in Washington, DC.

Getty Images | Win McNamee

The US Supreme Court today overturned the 40-year-old Chevron precedent in a ruling that limits the regulatory authority of federal agencies. The 6-3 decision in Loper Bright Enterprises v. Raimondo will make it harder for agencies such as the Federal Communications Commission and Environmental Protection Agency to issue regulations without explicit authorization from Congress.

Chief Justice John Roberts delivered the opinion of the court and was joined by Clarence Thomas, Samuel Alito, Neil Gorsuch, Brett Kavanaugh, and Amy Coney Barrett. Justice Elena Kagan filed a dissenting opinion that was joined by Sonia Sotomayor and Ketanji Brown Jackson.

Chevron gave agencies leeway to interpret ambiguous laws as long as the agency’s conclusion was reasonable. But the Roberts court said that a “statutory ambiguity does not necessarily reflect a congressional intent that an agency, as opposed to a court, resolve the resulting interpretive question.”

“Perhaps most fundamentally, Chevron‘s presumption is misguided because agencies have no special competence in resolving statutory ambiguities. Courts do,” the ruling said. “The Framers anticipated that courts would often confront statutory ambiguities and expected that courts would resolve them by exercising independent legal judgment. Chevron gravely erred in concluding that the inquiry is fundamentally different just because an administrative interpretation is in play.”

This is especially critical “when the ambiguity is about the scope of an agency’s own power—perhaps the occasion on which abdication in favor of the agency is least appropriate,” the court said. The Roberts opinion also said the Administrative Procedure Act “specifies that courts, not agencies, will decide ‘all relevant questions of law’ arising on review of agency action—even those involving ambiguous laws,” and “prescribes no deferential standard for courts to employ in answering those legal questions.”

Kagan: SCOTUS majority now “administrative czar”

The Loper Bright case involved a challenge to a rule enforced by the National Marine Fisheries Service. Lower courts applied the Chevron framework when ruling in favor of the government.

Kagan’s dissent said that Chevron “has become part of the warp and woof of modern government, supporting regulatory efforts of all kinds—to name a few, keeping air and water clean, food and drugs safe, and financial markets honest.”

Ambiguities should generally be resolved by agencies instead of courts, Kagan wrote. “This Court has long understood Chevron deference to reflect what Congress would want, and so to be rooted in a presumption of legislative intent. Congress knows that it does not—in fact cannot—write perfectly complete regulatory statutes. It knows that those statutes will inevitably contain ambiguities that some other actor will have to resolve, and gaps that some other actor will have to fill. And it would usually prefer that actor to be the responsible agency, not a court,” the dissent said.

The Roberts court ruling “flips the script: It is now ‘the courts (rather than the agency)’ that will wield power when Congress has left an area of interpretive discretion,” Kagan wrote. “A rule of judicial humility gives way to a rule of judicial hubris.”

Kagan wrote that the court in recent years “has too often taken for itself decision-making authority Congress assigned to agencies,” substituting “its own judgment on workplace health for that of the Occupational Safety and Health Administration; its own judgment on climate change for that of the Environmental Protection Agency; and its own judgment on student loans for that of the Department of Education.”

Apparently deciding those previous decisions were “too piecemeal,” the court “majority today gives itself exclusive power over every open issue—no matter how expertise-driven or policy-laden—involving the meaning of regulatory law,” Kagan wrote. “As if it did not have enough on its plate, the majority turns itself into the country’s administrative czar. It defends that move as one (suddenly) required by the (nearly 80-year-old) Administrative Procedure Act. But the Act makes no such demand. Today’s decision is not one Congress directed. It is entirely the majority’s choice.”

The unanimous 1984 SCOTUS ruling in Chevron U.S.A. Inc. v. Natural Resources Defense Council involved the Environmental Protection Agency and air pollution rules. Even with Chevron deference in place, the EPA faced limits to its regulatory power. A Supreme Court ruling earlier this week imposed a stay on rules meant to limit the spread of ozone-generating pollutants across state lines.

Consumer advocacy group Public Knowledge criticized today’s ruling, saying that it “grounds judicial superiority over the legislative and executive branches by declaring that the Constitution requires judges to unilaterally decide the meaning of statutes written by Congress and entrusted to agencies.”

Public Knowledge Senior VP Harold Feld argued that after today’s ruling, “no consumer protection is safe. Even if Congress can write with such specificity that a court cannot dispute its plain meaning, Congress will need to change the law for every new technology and every change in business practice. Even at the best of times, it would be impossible for Congress to keep up. Given the dysfunction of Congress today, we are at the mercy of the whims of the Imperial Court.”

SCOTUS kills Chevron deference, giving courts more power to block federal rules Read More »

the-world’s-toughest-race-starts-saturday,-and-it’s-delightfully-hard-to-call-this-year

The world’s toughest race starts Saturday, and it’s delightfully hard to call this year

Is it Saturday yet? —

Setting the stage for what could be a wild ride across France.

The peloton passing through a sunflowers field during the stage eight of the 110th Tour de France in 2023.

Enlarge / The peloton passing through a sunflowers field during the stage eight of the 110th Tour de France in 2023.

David Ramos/Getty Images

Most readers probably did not anticipate seeing a Tour de France preview on Ars Technica, but here we are. Cycling is a huge passion of mine and several other staffers, and this year, a ton of intrigue surrounds the race, which has a fantastic route. So we’re here to spread Tour fever.

The three-week race starts Saturday, paradoxically in the Italian region of Tuscany. Usually, there is a dominant rider, or at most two, and a clear sense of who is likely to win the demanding race. But this year, due to rider schedules, a terrible crash in early April, and new contenders, there is more uncertainty than usual. A solid case could be made for at least four riders to win this year’s Tour de France.

For people who aren’t fans of pro road cycling—which has to be at least 99 percent of the United States—there’s a great series on Netflix called Unchained to help get you up to speed. The second season, just released, covers last year’s Tour de France and introduces you to most of the protagonists in the forthcoming edition. If this article sparks your interest, I recommend checking it out.

Anyway, for those who are cycling curious, I want to set the stage for this year’s race by saying a little bit about the four main contenders, from most likely to least likely to win, and provide some of the backstory to what could very well be a dramatic race this year.

Tadej Pogačar

Tadej Pogacar of Slovenia and UAE Team Emirates won the Giro d'Italia in May.

Enlarge / Tadej Pogacar of Slovenia and UAE Team Emirates won the Giro d’Italia in May.

Tim de Waele/Getty Images

  • Slovenia
  • 25 years old
  • UAE Team Emirates
  • Odds: -190

Pogačar burst onto the scene in 2019 at the very young age of 20 by finishing third in the Vuelta a España, one of the three grand tours of cycling. He then went on to win the 2020 and 2021 Tours de France, first by surprising fellow countryman Primož Roglič (more on him below) in 2020 and then utterly dominating in 2021. Given his youth, it seemed he would be the premiere grand tour competitor for the next decade.

But then another slightly older rider, a teammate of Roglič’s named Jonas Vingegaard, emerged in 2022 and won the next two races. Last year, in fact, Vingegaard cracked Pogačar by 7 minutes and 29 seconds in the Tour, a huge winning margin, especially for two riders of relatively close talent. This established Vingegaard as the alpha male of grand tour cyclists, having proven himself a better climber and time trialist than Pogačar, especially in the highest and hardest stages.

So this year, Pogačar decided to change up his strategy. Instead of focusing on the Tour de France, Pogačar participated in the first grand tour of the season, the Giro d’Italia, which occurred in May. He likely did so for a couple of reasons. First of all, he almost certainly received a generous appearance fee from the Italian organizers. And secondly, riding the Giro would give him a ready excuse for not beating Vingegaard in France.

Why is this? Because there are just five weeks between the end of the Giro and the start of the Tour. So if a rider peaks for the Giro and exerts himself in winning the race, it is generally thought that he can’t arrive at the Tour in winning form. He will be a few percent off, not having ideal preparation.

Predictably, Pogačar smashed the lesser competition at the Giro and won the race by 9 minutes and 56 seconds. Because he was so far ahead, he was able to take the final week of the race a bit easier. The general thinking in the cycling community is that Pogačar is arriving at the Tour in excellent but not peak form. But given everything else that has happened so far this season, the bettors believe that will be enough for him to win. Maybe.

The world’s toughest race starts Saturday, and it’s delightfully hard to call this year Read More »

monitoring-and-analytics:-the-eyes-and-ears-of-zero-trust

Monitoring and Analytics: The Eyes and Ears of Zero Trust

Welcome back to our zero trust blog series! In our previous post, we took a deep dive into API security and explored best practices for securing this critical component of modern application architectures. Today, we’re turning our attention to another essential aspect of zero trust: monitoring and analytics.

In a zero trust model, visibility is everything. With no implicit trust granted to any user, device, or application, organizations must continuously monitor and analyze all activity across their environment to detect and respond to potential threats in real-time.

In this post, we’ll explore the role of monitoring and analytics in a zero trust model, discuss the key data sources and technologies involved, and share best practices for building a comprehensive monitoring and analytics strategy.

The Role of Monitoring and Analytics in Zero Trust

In a traditional perimeter-based security model, monitoring and analytics often focus on detecting threats at the network boundary. However, in a zero trust model, the perimeter is everywhere, and threats can come from any user, device, or application, both inside and outside the organization.

To mitigate these risks, zero trust requires organizations to take a comprehensive, data-driven approach to monitoring and analytics. This involves:

  1. Continuous monitoring: Collecting and analyzing data from all relevant sources, including users, devices, applications, and infrastructure, in real-time.
  2. Behavioral analytics: Using machine learning and other advanced analytics techniques to identify anomalous or suspicious behavior that may indicate a potential threat.
  3. Automated response: Leveraging automation and orchestration tools to quickly investigate and remediate potential threats, minimizing the impact of security incidents.
  4. Continuous improvement: Using insights from monitoring and analytics to continuously refine and optimize security policies, controls, and processes.

By applying these principles, organizations can create a more proactive, adaptive security posture that can detect and respond to threats faster and more effectively than traditional approaches.

Key Data Sources and Technologies for Zero Trust Monitoring and Analytics

To build a comprehensive monitoring and analytics strategy for zero trust, organizations must collect and analyze data from a wide range of sources, including:

  1. Identity and access management (IAM) systems: Data on user identities, roles, and permissions, as well as authentication and authorization events.
  2. Endpoint detection and response (EDR) tools: Data on device health, configuration, and activity, as well as potential threats and vulnerabilities.
  3. Network security tools: Data on network traffic, including flow logs, packet captures, and intrusion detection and prevention system (IDPS) events.
  4. Application performance monitoring (APM) tools: Data on application performance, errors, and potential security issues, such as injection attacks or data exfiltration attempts.
  5. Cloud security posture management (CSPM) tools: Data on cloud resource configurations, compliance with security policies, and potential misconfigurations or vulnerabilities.

To collect, process, and analyze this data, organizations can leverage a range of technologies, including:

  1. Security information and event management (SIEM) platforms: Centralized platforms for collecting, normalizing, and analyzing security event data from multiple sources.
  2. User and entity behavior analytics (UEBA) tools: Advanced analytics tools that use machine learning to identify anomalous or suspicious behavior by users, devices, and applications.
  3. Security orchestration, automation, and response (SOAR) platforms: Tools that automate and orchestrate security processes, such as incident response and remediation, based on predefined playbooks and workflows.
  4. Big data platforms: Scalable platforms for storing, processing, and analyzing large volumes of structured and unstructured security data, such as Hadoop, Spark, and Elasticsearch.

By leveraging these data sources and technologies, organizations can build a comprehensive, data-driven monitoring and analytics strategy that can detect and respond to threats in real-time.

Best Practices for Zero Trust Monitoring and Analytics

Implementing a zero trust approach to monitoring and analytics requires a comprehensive, multi-layered strategy. Here are some best practices to consider:

  1. Identify and prioritize data sources: Identify all relevant data sources across your environment, and prioritize them based on their level of risk and criticality. Focus on collecting data from high-risk sources first, such as IAM systems, EDR tools, and critical applications.
  2. Establish a centralized logging and monitoring platform: Implement a centralized platform, such as a SIEM or big data platform, to collect, normalize, and analyze security event data from multiple sources. Ensure that the platform can scale to handle the volume and variety of data generated by a zero trust environment.
  3. Implement behavioral analytics: Leverage UEBA tools and machine learning algorithms to identify anomalous or suspicious behavior by users, devices, and applications. Focus on detecting behavior that deviates from established baselines or patterns, such as unusual login attempts, data access patterns, or network traffic.
  4. Automate incident response and remediation: Implement SOAR tools and automated playbooks to quickly investigate and remediate potential threats. Ensure that playbooks are aligned with zero trust principles, such as least privilege access and continuous verification.
  5. Continuously monitor and refine policies and controls: Use insights from monitoring and analytics to continuously refine and optimize security policies, controls, and processes. Regularly review and update policies based on changes in the threat landscape, business requirements, and user behavior.
  6. Foster a culture of continuous improvement: Encourage a culture of continuous learning and improvement across the organization. Regularly share insights and lessons learned from monitoring and analytics with stakeholders, and use them to drive ongoing enhancements to the zero trust strategy.

By implementing these best practices and continuously refining your monitoring and analytics posture, you can better protect your organization’s assets and data from the risks posed by evolving threats and changing business requirements.

Conclusion

In a zero trust world, monitoring and analytics are the eyes and ears of the security organization. By continuously collecting and analyzing data from all relevant sources, organizations can detect and respond to potential threats faster and more effectively than ever before.

However, achieving effective monitoring and analytics in a zero trust model requires a commitment to leveraging the right data sources and technologies, implementing behavioral analytics and automation, and fostering a culture of continuous improvement. It also requires a shift in mindset, from a reactive, perimeter-based approach to a proactive, data-driven approach that assumes no implicit trust.

As you continue your zero trust journey, make monitoring and analytics a top priority. Invest in the tools, processes, and skills necessary to build a comprehensive monitoring and analytics strategy, and regularly assess and refine your approach to keep pace with evolving threats and business needs.

In the next post, we’ll explore the role of automation and orchestration in a zero trust model and share best practices for using these technologies to streamline security processes and accelerate incident response.

Until then, stay vigilant and keep your eyes and ears open!

Additional Resources:

Monitoring and Analytics: The Eyes and Ears of Zero Trust Read More »

google-translate-just-nearly-doubled-its-number-of-supported-languages

Google Translate just nearly doubled its number of supported languages

Large language models —

This includes common languages like Cantonese and lesser-known ones like Manx.

The Google PaLM 2 logo.

Enlarge / The logo for PaLM 2, a Google large language model.

Google

Google announced today that it has added support for 110 new languages to Google Translate, nearly doubling the number of languages that can be translated.

The company used the PaLM 2 large language model to facilitate these additions.

In a blog post, Google Senior Software Engineer Isaac Caswell claimed that the newly added languages are spoken by more than 614 million people, or about 8 percent of the global population.

He noted that about a quarter of the languages originate in Africa, “representing our largest expansion of African languages to date.”

The blog post also went into some light detail about Google’s philosophy for choosing languages and for deciding which dialects to support:

Languages have an immense amount of variation: regional varieties, dialects, different spelling standards. In fact, many languages have no one standard form, so it’s impossible to pick a “right” variety. Our approach has been to prioritize the most commonly used varieties of each language. For example, Romani is a language that has many dialects all throughout Europe. Our models produce text that is closest to Southern Vlax Romani, a commonly used variety online. But it also mixes in elements from others, like Northern Vlax and Balkan Romani.

This update brings the total number of languages supported by Google Translate to 243, which is just the beginning of its publicized initiative to ultimately support 1,000 languages through the use of AI. You can see the full list of languages added in a help page published by Google.

By contrast, Apple Translate supports 21 languages, though that number includes both US and UK English as distinct options. Apple recently announced plans to add Hindi to its Translate app. Of course, Apple and Google take very different approaches to—and have different levels of investment in—these tools.

Google Translate just nearly doubled its number of supported languages Read More »

openai’s-new-“criticgpt”-model-is-trained-to-criticize-gpt-4-outputs

OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs

automated critic —

Research model catches bugs in AI-generated code, improving human oversight of AI.

An illustration created by OpenAI.

Enlarge / An illustration created by OpenAI.

On Thursday, OpenAI researchers unveiled CriticGPT, a new AI model designed to identify mistakes in code generated by ChatGPT. It aims to enhance the process of making AI systems behave in ways humans want (called “alignment”) through Reinforcement Learning from Human Feedback (RLHF), which helps human reviewers make large language model (LLM) outputs more accurate.

As outlined in a new research paper called “LLM Critics Help Catch LLM Bugs,” OpenAI created CriticGPT to act as an AI assistant to human trainers who review programming code generated by the ChatGPT AI assistant. CriticGPT—based on the GPT-4 family of LLMS—analyzes the code and points out potential errors, making it easier for humans to spot mistakes that might otherwise go unnoticed. The researchers trained CriticGPT on a dataset of code samples with intentionally inserted bugs, teaching it to recognize and flag various coding errors.

The researchers found that CriticGPT’s critiques were preferred by annotators over human critiques in 63 percent of cases involving naturally occurring LLM errors and that human-machine teams using CriticGPT wrote more comprehensive critiques than humans alone while reducing confabulation (hallucination) rates compared to AI-only critiques.

Developing an automated critic

The development of CriticGPT involved training the model on a large number of inputs containing deliberately inserted mistakes. Human trainers were asked to modify code written by ChatGPT, introducing errors and then providing example feedback as if they had discovered these bugs. This process allowed the model to learn how to identify and critique various types of coding errors.

In experiments, CriticGPT demonstrated its ability to catch both inserted bugs and naturally occurring errors in ChatGPT’s output. The new model’s critiques were preferred by trainers over those generated by ChatGPT itself in 63 percent of cases involving natural bugs (the aforementioned statistic). This preference was partly due to CriticGPT producing fewer unhelpful “nitpicks” and generating fewer false positives, or hallucinated problems.

The researchers also created a new technique they call Force Sampling Beam Search (FSBS). This method helps CriticGPT write more detailed reviews of code. It lets the researchers adjust how thorough CriticGPT is in looking for problems, while also controlling how often it might make up issues that don’t really exist. They can tweak this balance depending on what they need for different AI training tasks.

Interestingly, the researchers found that CriticGPT’s capabilities extend beyond just code review. In their experiments, they applied the model to a subset of ChatGPT training data that had previously been rated as flawless by human annotators. Surprisingly, CriticGPT identified errors in 24 percent of these cases—errors that were subsequently confirmed by human reviewers. OpenAI thinks this demonstrates the model’s potential to generalize to non-code tasks and highlights its ability to catch subtle mistakes that even careful human evaluation might miss.

Despite its promising results, like all AI models, CriticGPT has limitations. The model was trained on relatively short ChatGPT answers, which may not fully prepare it for evaluating longer, more complex tasks that future AI systems might tackle. Additionally, while CriticGPT reduces confabulations, it doesn’t eliminate them entirely, and human trainers can still make labeling mistakes based on these false outputs.

The research team acknowledges that CriticGPT is most effective at identifying errors that can be pinpointed in one specific location within the code. However, real-world mistakes in AI outputs can often be spread across multiple parts of an answer, presenting a challenge for future iterations of the model.

OpenAI plans to integrate CriticGPT-like models into its RLHF labeling pipeline, providing its trainers with AI assistance. For OpenAI, it’s a step toward developing better tools for evaluating outputs from LLM systems that may be difficult for humans to rate without additional support. However, the researchers caution that even with tools like CriticGPT, extremely complex tasks or responses may still prove challenging for human evaluators—even those assisted by AI.

OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs Read More »

mac-users-served-info-stealer-malware-through-google-ads

Mac users served info-stealer malware through Google ads

MOAR MALVERTISING —

Full-service Poseidon info stealer pushed by “advertiser identity verified by Google.”

Mac users served info-stealer malware through Google ads

Getty Images

Mac malware that steals passwords, cryptocurrency wallets, and other sensitive data has been spotted circulating through Google ads, making it at least the second time in as many months the widely used ad platform has been abused to infect web surfers.

The latest ads, found by security firm Malwarebytes on Monday, promote Mac versions of Arc, an unconventional browser that became generally available for the macOS platform last July. The listing promises users a “calmer, more personal” experience that includes less clutter and distractions, a marketing message that mimics the one communicated by The Browser Company, the start-up maker of Arc.

When verified isn’t verified

According to Malwarebytes, clicking on the ads redirected Web surfers to arc-download[.]com, a completely fake Arc browser page that looks nearly identical to the real one.

Malwarebytes

Digging further into the ad shows that it was purchased by an entity called Coles & Co, an advertiser identity Google claims to have verified.

Malwarebytes

Visitors who click the download button on arc-download[.]com will download a .dmg installation file that looks similar to the genuine one, with one exception: instructions to run the file by right-clicking and choosing open, rather than the more straightforward method of simply double clicking on the file. The reason for this is to bypass a macOS security mechanism that prevents apps from being installed unless they’re digitally signed by a developer Apple has vetted.

Malwarebytes

An analysis of the malware code shows that once installed, the stealer sends data to the IP address 79.137.192[.]4. The address happens to host the control panel for Poseidon, the name of a stealer actively sold in criminal markets. The panel allows customers to access accounts where data collected can be accessed.

Malwarebytes

“There is an active scene for Mac malware development focused on stealers,” Jérôme Segura, lead malware intelligence analyst at Malwarebytes, wrote. “As we can see in this post, there are many contributing factors to such a criminal enterprise. The vendor needs to convince potential customers that their product is feature-rich and has low detection from antivirus software.”

Poseidon advertises itself as a full-service macOS stealer with capabilities including “file grabber, cryptocurrency wallet extractor, password stealer from managers such as Bitwarden, KeePassXC, and browser data collector.” Crime forum posts published by the stealer creator bill it as a competitor to Atomic Stealer, a similar stealer for macOS. Segura said both apps share much of the same underlying source code.

The post author, Rodrigo4, has added a new feature for looting VPN configurations, but it’s not currently functional, likely because it’s still in development. The forum post appeared on Sunday, and Malwarebytes found the malicious ads one day later. The discovery comes a month after Malwarebytes identified a separate batch of Google ads pushing a fake version of Arc for Windows. The installer in that campaign installed a suspected infostealer for that platform.

Malwarebytes

Like most other large advertising networks, Google Ads regularly serves malicious content that isn’t taken down until third parties have notified the company. Google Ads takes no responsibility for any damage that may result from the oversights. The company said in an email it removes malicious ads once it learns of them and suspends the advertiser and has done so in this case.

People who want to install software advertised online should seek out the official download site rather than relying on the site linked in the ad. They should also be wary of any instructions that direct Mac users to install apps through the double-click method mentioned earlier. The Malwarebytes post provides indicators of compromise people can use to determine if they’ve been targeted.

Mac users served info-stealer malware through Google ads Read More »

nasa-will-pay-spacex-nearly-$1-billion-to-deorbit-the-international-space-station

NASA will pay SpaceX nearly $1 billion to deorbit the International Space Station

Illustration of the SpaceX Dragon XL as it is deployed from the Falcon Heavy's second stage in high Earth orbit on its way to the Gateway in lunar orbit.

Enlarge / Illustration of the SpaceX Dragon XL as it is deployed from the Falcon Heavy’s second stage in high Earth orbit on its way to the Gateway in lunar orbit.

SpaceX

NASA has awarded an $843 million contract to SpaceX to develop a “US Deorbit Vehicle.” This spacecraft will dock to the International Space Station in 2029 and then ensure the large facility makes a controlled reentry through Earth’s atmosphere before splashing into the ocean in 2030.

“Selecting a US Deorbit Vehicle for the International Space Station will help NASA and its international partners ensure a safe and responsible transition in low Earth orbit at the end of station operations,” said Ken Bowersox, NASA’s associate administrator for Space Operations, in a statement. “This decision also supports NASA’s plans for future commercial destinations and allows for the continued use of space near Earth.”

NASA has a couple of reasons for bringing the space station’s life to a close in 2030. Foremost among these is that the station is aging. Parts of it are now a quarter of a century old. There are cracks on the Russian segment of the space station that are spreading. Although the station could likely be maintained beyond 2030, it would require increasing amounts of crew time to keep flying the station safely.

Additionally, NASA is seeking to foster a commercial economy in low-Earth orbit. To that end, it is working with several private companies to develop commercial space stations that would be able to house NASA astronauts, as well as those from other countries and private citizens, by or before 2030. By setting an end date for the station’s lifetime and sticking with it, NASA can help those private companies raise money from investors.

Do we have to sink the station?

The station, the largest object humans have ever constructed in space, is too large to allow it to make an uncontrolled return to Earth. It has a mass of 450 metric tons and is about the size of an American football field. The threat to human life and property is too great. Hence the need for a deorbit vehicle.

The space agency considered alternatives to splashing the station down into a remote area of an ocean. One option involved moving the station into a stable parking orbit at 40,000 km above Earth, above geostationary orbit. However, the agency said this would require 3,900 m/s of delta-V, compared to the approximately 47 m/s of delta-V needed to deorbit the station. In terms of propellant, NASA estimated moving to a higher orbit would require 900 metric tons, or the equivalent of 150 to 250 cargo supply vehicles.

NASA also considered partially disassembling the station before its reentry but found this would be much more complex and risky than a controlled deorbit that kept the complex intact.

The NASA announcement did not specify what vehicle SpaceX would use to perform the deorbit burn, but we can draw some clues from the public documents for the contract procurement. For example, NASA will select a rocket for the mission at a later date, but probably no later than 2026. This would support a launch date in 2029, to have the deorbit vehicle docked to the station one year before the planned reentry.

NASA will pay SpaceX nearly $1 billion to deorbit the International Space Station Read More »

scotus-tears-down-sacklers’-immunity,-blowing-up-opioid-settlement

SCOTUS tears down Sacklers’ immunity, blowing up opioid settlement

Not immune —

Majority of justices ruled on meaning of legal code; dissenters called it “ruinous”

Grace Bisch holds a picture of stepson Eddie Bisch who died as a result of an overdose on outside of the U.S. Supreme Court on December 4, 2023  in Washington, DC. The Supreme Court heard arguments regarding a nationwide settlement with Purdue Pharma, the manufacturer of OxyContin.

Enlarge / Grace Bisch holds a picture of stepson Eddie Bisch who died as a result of an overdose on outside of the U.S. Supreme Court on December 4, 2023 in Washington, DC. The Supreme Court heard arguments regarding a nationwide settlement with Purdue Pharma, the manufacturer of OxyContin.

In a 5-4 ruling, the US Supreme Court on Thursday rejected an opioid settlement plan worth billions over the deal’s stipulation that the billionaire Sackler family would get lifetime immunity from further opioid-related litigation.

While the ruling may offer long-sought schadenfreude over the deeply despised Sackler family, it is a heavy blow to the over 100,000 people affected by opioid epidemic who could have seen compensation from the deal. With the high court’s ruling, the settlement talks will have to begin again, with the outcome and possible payouts to plaintiffs uncertain.

Between 1999 and 2019, as nearly 250,000 Americans died from prescription opioid overdoses, members of the Sackler family siphoned approximately $11 billion from the pharmaceutical company they ran, Purdue Pharma, maker of OxyContin, a highly addictive and falsely marketed pain medication. In 2007, amid the nationwide epidemic of opioid addiction and overdoses, Purdue affiliates pleaded guilty in federal court to falsely branding OxyContin as less addictive and less abusive than other pain medications. Out of fear of future litigation, the Sacklers began a “milking program,” the high court noted, draining Purdue of roughly 75 percent of its assets.

An “appropriate” deal

In 2019, Purdue filed for Chapter 11 bankruptcy, leading to negotiations for a massive consolidated settlement plan that took years. As part of the resulting deal, the Sacklers—who did not file for bankruptcy and had detached themselves from the company—agreed to return up to $6 billion to Purdue, but only in exchange for immunity. The bankruptcy court approved the controversial condition, while a district court later overturned it and a yet higher court reinstated it.

In today’s majority opinion from the Supreme Court, Justices Gorsuch, Thomas, Alito, Barrett, and Jackson found that the lower courts that approved the Sackers’ immunity condition had erred in interpreting Chapter 11 bankruptcy code. “No provision of the code authorizes that kind of relief,” they court ruled. The explanation boiled down to a single sentence in a catchall provision. While the code speaks solely about responsibilities of a debtors—which in this case is Purdue, not the Sacklers—the catchall provision allows “for any other appropriate provision” not otherwise outlined.

The erring lower courts, the high court wrote, had interpreted the word “appropriate” far too broadly. Based on the context, any additional “appropriate” arrangements in a settlement that was not explicitly outlined would apply only to the debtor (in this case, Purdue) not to nondebtors (the Sacklers). The provision cannot be read, the justices wrote, “to endow a bankruptcy court with the ‘radically different’ power to discharge the debts of a nondebtor.”

“Ruinous” ruling

Justices Kavanaugh, Sotomayor, Kagan, and Roberts disagreed. In a minority opinion penned by Kavanaugh and joined by Sotomayor and Kagan, the justices blasted the ruling, calling it “wrong on the law and devastating for more than 100,000 opioid victims and their families.”

“The text of the Bankruptcy Code does not come close to requiring such a ruinous result,” Kavanaugh wrote, noting that such deals granting immunity to “nondebtors” is a longstanding practice used to secure just settlements. Neither legal structure, context, nor history necessitate today’s ruling, Kavanaugh continued. “Nor does hostility to the Sacklers—no matter how deep: ‘Nothing is more antithetical to the purpose of bankruptcy than destroying estate value to punish someone,” he wrote, citing a legal essay on Chapter 11 for mass torts.

The opioid victims and others will “suffer greatly in the wake of today’s unfortunate and destabilizing decision,” the dissenting justices wrote. “Only Congress can fix the chaos that will now ensue. The Court’s decision will lead to too much harm for too many people for Congress to sit by idly without at least carefully studying the issue.”

SCOTUS tears down Sacklers’ immunity, blowing up opioid settlement Read More »

ai-#70:-a-beautiful-sonnet

AI #70: A Beautiful Sonnet

They said it couldn’t be done.

No, not Claude Sonnet 3.5 becoming the clear best model.

No, not the Claude-Sonnet-empowered automatic meme generators. Those were whipped together in five minutes.

They said I would never get quiet time and catch up. Well, I showed them!

That’s right. Yes, there is a new best model, but otherwise it was a quiet week. I got a chance to incorporate the remaining biggest backlog topics. The RAND report is covered under Thirty Eight Ways to Steal Your Model Weights. Last month’s conference in Seoul is covered in You’ve Got Seoul. I got to publish my thoughts on OpenAI’s Model Spec last Friday.

Be sure to read about Claude 3.5 Sonnet here. That is by far the biggest story.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. I am increasingly persuaded.

  4. Language Models Don’t Offer Mundane Utility. EU’s DMA versus the AiPhone.

  5. Clauding Along. More people, mostly impressed.

  6. Fun With Image Generation. They are coming for our memes. Then Hollywood.

  7. Copyright Confrontation. The RIAA does the most RIAA thing.

  8. Deepfaketown and Botpocalypse Soon. Character.ai addiction. Am I out of touch?

  9. They Took Our Jobs. More arguments that the issues lie in the future.

  10. The Art of the Jailbreak. We need to work together as a team.

  11. Get Involved. AISI, Apollo, Astra, Accra, BlueDot, Cybersecurity and DOE.

  12. Introducing. Forecasting, OpenAI Mac App, Otto, Dot, Butterflies, Decagon.

  13. In Other AI News. OpenAI equity takes steps forward. You can sell it.

  14. Quiet Speculations. A distinct lack of mojo.

  15. You’ve Got Seoul. Delayed coverage of the Seoul summit from last month.

  16. Thirty Eight Ways to Steal Your Model Weights. Right now they would all work.

  17. The Quest for Sane Regulations. Steelmanning restraint.

  18. SB 1047. In Brief.

  19. The Week in Audio. Dwarkesh interviews Tony Blair, and many more.

  20. Rhetorical Innovation. A demolition, and also a disputed correction.

  21. People Are Worried About AI Killing Everyone. Don’t give up. Invest wisely.

  22. Other People Are Not As Worried About AI Killing Everyone. What even is ASI?

  23. The Lighter Side. Eventually the AI will learn.

Training only on (x,y) pairs, define the function f(x), compose and invert it without in-context examples or chain of thought.

AI Dungeon will let you be the DM and take the role of the party, if you prefer.

Lindy ‘went rogue’ and closed a customer on its own. They seem cool with it?

Persuasive capability of the model is proportional to the log of the model size, says paper. Author Kobi Hackenburg paints this as reassuring, but the baseline is that everything scales with the log of the model size. He says this is mostly based on ‘task completion’ and staying on topic improving, and current frontier models are already near perfect at that, so he is skeptical we will see further improvement. I am not.

I do believe the result that none of the models was ‘more persuasive than human baseline’ in the test, but that is based on uncustomized messages on generic political topics. Of course we should not expect above human performance there for current models.

75% of knowledge workers are using AI, but 78% of the 75% are not telling the boss.

Build a team of AI employees to write the first half of your Shopify CEO speech from within a virtual office, then spend the second half of the speech explaining how you built the team. It is so weird to think ‘the best way to get results from AI employees I can come up with is to make them virtually thirsty so they will have spontaneous water cooler conversations.’ That is the definition of scratching the (virtual) surface.

Do a bunch of agent-based analysis off a single prompt. This kind of demo hides the real (human) work to get it done, but that will decline over time.

Apple Intelligence rollout will be at least delayed in the European Union, with Apple citing the Digital Markets Act (DMA) compromising user privacy and data security. I look forward to the EU now going after them for failing to deploy. Note that DMA is deeply stupid EU tech regulation unrelated to AI, the EU AI Act is not mentioned as an issue, and nothing about Apple Intelligence would be subject to regulation by SB 1047 or any other major regulatory proposal in the USA.

New paper finds LLMs engage in difficult-to-predict escalatory behavior patterns in political simulations, in rare cases leading to deployment of nuclear weapons. Well, yes, of course. The LLMs are trained as CDT (Causal Decision Theory) agents in various ways and asked to predict text and imitate human behavior, and it is very obviously correct to engage in hard to predict escalatory behavior with nonzero risk of worst case scenarios by all of those metrics.

Andrej Karpathy requests that LLMs have a feature to offer ‘proof’ in the form of their references, which right now is only available when you have web access.

Saagar Jha is not impressed by Apple’s claims of Private Cloud Compute, claiming it is a lot of words for a Trusted Platform Module, but that it is not all that secure.

Your engineers might copy your GPT wrapper product.

AI detection software in education continues to have a lot of false positives. Serious advice to all students and other writers, never delete your drafts and history. That would be smart anyway, as AI could plausibly soon be helping you learn a better process by analyzing them. For now, they are vital to proving you actually wrote what you wrote.

Sometimes I wonder if these false positives are good, actually? If the AI thinks an AI wrote your paper, and instead you wrote your paper, what does that say about your work? What grade do you deserve?

Takes on Claude 3.5 continue to come in.

While I consider Claude 3.5 to be clearly best for most purposes right now, that does not mean Anthropic now has an overall longer term lead on OpenAI. OpenAI is at the end of its model cycle. Of course, they could fail to deliver the goods, but chances are they will retake the visible lead with GPT-5, and are still ‘ahead’ overall, although their lead is likely not what it once was.

Heraklines: the larger point about OpenAI > anthropic is correct, this lead right now is illusory.

The common man cares not about vibe check perf tho, all that matters is how much better at grunt work like coding is it?

3.5 smashes, not even close. usefulness =! smortness.

3.5 is a model of the people.

I still default to 4o for anything math related, but 3.5 just grinds better. A glimpse of what a future without grunt work could look like

note: vibe checks are to be taken with a grain of salt, like benchies. i’ve seen too much overcorrection based on both in the past

It is always weird to see what people think about ‘the common man.’ The common man does not know Claude exists, and barely knows about ChatGPT. This comment was in response to Teortaxes:

Teortaxes: Sorry to be a killjoy but: Anthropic hopes to hyperstition AGI lead, their people are deluding themselves, and their models are like “talented” middle-class American kids – NOT HALF AS SMART AS THEY’RE TRYING TO LOOK LIKE

OpenAI will wreck them on instruction following… again.

Incidentally the “other model’s” MMLU is 79

…I wanted to dunk on Flash being dumb but it’s also 0-shotting this problem.

Anthropic is simply not very good in instruction-tuning. Folks who say they’re switching their automated pipelines to Sonnet because “smart” are being silly.

Lots of crap like this. Let me clarify

What I’m NOT saying:

– 3.5-Sonnet is dumb[er than 4o/4t/DSC];

– spelling tasks are good tests for LLMs

What I DID SAY:

– 3.5-Sonnet is deceptively pretentious;

– Anthropic’s instruction tuning is wonky

You might think I’m just obsessively nitpicking

I’m not, I think this wonkiness in reasoning about trivial instructions indicates a broader bad trend at Anthropic

One can say they’re creating AI takeover risks by encouraging this I-am-a-person bullshitting.

So there’s AI takeover risk, then? And it is being created now, from alignment failures being observed now? Huh. I do see how one could worry about what Teortaxes worries about here. But I see it as indicating rather than creating a problem. The true problem does not go away if you force the existing model to stop expressing it.

If most people are reporting that plugging in Sonnet 3.5 gives them much better performance? I am inclined to believe them. Nor do I think instruction handling issues are that big a deal here, but I will keep an eye out for other complaints.

Danielle Fong reassembles the ‘invention team’ without any tricks, is impressed.

Matt Parlmer reports Sonnet 3.5 is the first LLM to reliably pass his vision test.

Tyler Cowen is impressed by an answer on economics. I was not as impressed here as Tyler, as it feels like Claude is unfocused and flooding the zone a bit, and a straight answer was possible but missing as was one key consideration, but yeah, overall very good. To me the key concept here is that the net cost of inefficient wage levels is likely lower than expected, so you would be more inclined to allow wages to remain sticky.

Some speculation of how artifacts work under the hood.

Some fun attempts to get around the face blindness instructions. In these cases Claude gets it right but how reliable or wide ranging would this hack be? Not that I am especially worried about the model being not face blind, especially as it applies to major public figures.

A LessWrong commenter notes it identified my writing from a short passage.

Cuddly Salmon: effectively prompting for claude 3.5 artifacts is such an incredible edge right now.

Minh Nhat Nguyen: I don’t think it’s actually made a single error while I’ve been using it to write out+iterate+merge thousands of lines of code. Whenever the code doesn’t work, it’s usually me being too vague with specs.

Cuddly Salmon: Cutting thru all of my problem code like it’s nothing, this AI is an absolute unit. Incredibly creative, too.

Claude makes it easy to create automatic meme generators.

Here’s what the original form, the Wojack, from Fabian Stelzer.

Good fun was had by all, and truths were spoken.

Here’s one for Virgin vs. Chad.

Fabian: another meme maker I made on glif dot app

fully automated Virgin vs Chad memes on any topic, just prompt it

Claude 3.5 is just sublime at these and the workflow is super simple to build on glif.. 😙🤌

Here’s one begging you to stop doing X, which is often wise.

The original took all of five minutes to create. It often seems like that is where our society is at. We can do things in five minutes, or we can take forever. Choose.

Andrew Chen says Hollywood is being slow to adapt AI for a variety of reasons, starting with being slow to adapt to everything in general, but also legal concerns, the difficulty of finding good engineers and the pushback from creatives.

His call for creatives to think about themselves like software engineers, who only benefited from advances in tech, does not seem like something to say to creatives. It needs to be appreciated in all such discussions the extent to which almost all creatives, and also most consumers and fans, absolutely despise AI in this context.

He also does not appreciate the extent to which the technology is not ready. All this talk of innovation and new forms and six second dance videos illustrates that it will be a bit before AI is all that visibly or centrally useful for producing great work.

They should use it the same ways everyone should use it. Yes, it helps you code and implement things, it helps you learn and so on. Do all that. But directly generating a ton of content on its own as opposed to helping a human write? Not well, not yet.

His talk of the ‘$1000 blockbuster movie’ forgets that such a movie would suck, and also cost vastly more than that if you count the labor of the writers and coders.

Toys ‘R Us releases AI (Sora) generated ad. It is executed well, yet I expect this to backfire. It is about how the consumer reacts.

It is music’s turn. The RIAA and three major record labels are doing RIAA things, looking for damages of $150k per song that was ‘copied.’

Ed Newton-Rex: The 3 major record labels are suing AI music companies Suno and Udio. Here are the two lawsuits in full.

– They accuse Suno & Udio of “willful copyright infringement on an almost unimaginable scale”

– They provide evidence that both companies trained on their music, including outputs that closely resemble their recordings (ABBA, Michael Jackson, Green Day, James Brown, & many more)

– They outline why this is not fair use

– They say this “wholesale theft of… copyrighted recordings threatens the entire music ecosystem and the numerous people it employs”

– They include unknown co-defendants who assisted in copying/scraping

– They demand a jury trial

If you do one thing today, read the full complaints (Suno, Udio).

Kristin Robinson (Billboard): The complaints against the two companies also make the case that copyrighted material was used to train these models. Some of the circumstantial evidence cited in the lawsuits include generated songs by Suno and Udio that sound just like the voices of Bruce Springsteen, Lin-Manuel Miranda, Michael Jackson and ABBA; outputs that parrot the producer tags of Cash Money AP and Jason Derulo; and outputs that sound nearly identical to Mariah Carey’s “All I Want For Christmas Is You,” The Beach Boys’ “I Get Around,” ABBA’s “Dancing Queen,” The Temptations’ “My Girl,” Green Day’s “American Idiot,” and more.

RIAA Chief Legal Officer Ken Doroshow adds, “These are straightforward cases of copyright infringement involving unlicensed copying of sound recordings on a massive scale. Suno and Udio are attempting to hide the full scope of their infringement rather than putting their services on a sound and lawful footing. These lawsuits are necessary to reinforce the most basic rules of the road for the responsible, ethical, and lawful development of generative AI systems and to bring Suno’s and Udio’s blatant infringement to an end.”

Did Suno and Udio do the crime? Oh, hell yes. They very much went with the ‘we are doing it and daring you to sue us’ strategy. The question is, are they allowed to do it, or not? We are about to find out.

This is good. We should have that fight and find out what current law says. Early indications are mixed.

If it turns out current law says you can train on any song you want, and produce soundalike versions on demand, without compensation?

My strong prediction is that Congress would change the law very quickly.

In other copyright news: Startup ‘Created by Humans’ is launching to help book authors license their work to AI companies.

Al Michaels agrees to let an AI version of his voice be used for Olympic coverage. The people responding are predictably not taking kindly to this. I am also not a fan. What made Al Michaels great is not the part the AI will be copying.

The evidence is a little thin, but what a great title, chef’s kiss by Wired: Perplexity Plagiarized Our Story About How Perplexity Is a Bullshit Machine.

Perplexity did not do one of their previously reported ‘post a version of the full article to our own website’ specials. What they did do was provide a summary upon request, which included accessing the article and reproducing this sentence: “Instead, it invented a story about a young girl named Amelia who follows a trail of glowing mushrooms in a magical forest called Whisper Woods.”

That sentence was obviously not a coincidence, but as Wired notes it is not fully clear this crosses any red lines, although not having quote marks was at best a very bad look. I doubt they will be able to make anything stick unless they find worse.

To the extent there is already an ongoing Botpocalypse it is likely at Character.ai.

Eliezer Yudkowsky: Grim if true (for reasons basically unrelated to the totally separate track where later ASI later kills everyone later)

Deedy: Most people don’t realize how many young people are extremely addicted to CharacterAI. Users go crazy in the Reddit when servers go down.

They get 250M+ visits/mo and ~20M monthly users, largely in the US.

Most impressively, they see ~2B queries a day, 20% of Google Search!

Another comparison is WhatsApp.

They do 100B+ messages a day, so Character is ~4% of WhatsApp!

(1 qps = 2 WhatsApp messages)

He also links to the associated subreddit.

When I look there, I continue to not see the appeal at current tech levels.

Ben Landau-Taylor: To be clear, kids spending hours talking to these robots feels weird as hell to me, too.

It’s just, this is *obviouslywhat skinner.jpg feels like from the inside.

I do my best not to kink shame. This is no exception. My objection is not to the scenario being role played. It is purely that the AI is not yet… good at it?

The story of Bentham Tools and their AI bot doom loop.

Indian farmers getting their news from AI anchors. For now it seems the anchors are performers and don’t write their own copy.

Another one searches for Facebook AI slop for a few minutes, floods their feed. Is doing this intentionally the solution for those addicted to Facebook?

Allison Schrager, author of ‘An Economist Walks Into a Brothel,’ sees AI bots as displacing some of the world’s oldest profession by producing simulated intimacy, which she says is what most sex work is ultimately about. Her worries are that this will reduce drive to seek out relationships and destabilize existing ones, similar to the concerns of many others, but notes that like prostitutes this could work both ways. Central here is the idea that the ‘girlfriend experience’ is the highest end product, someone who will be the perfect companion always there for you, that even a few years ago cost $1,000 an hour even where it was fully legal because of how mentally taxing it is to be consistently present for another person. Whereas AI could do that a lot cheaper. As usual, this is a form of ‘AI is what it is today and won’t get any better’ speculation.

Ethan Mollick notes that AI has compromised traditional approaches to security. Spear phishing got very easy, text-to-speech is almost flawless and so on. Despite this, there has been remarkably little disruption. Few are using this capability. Not yet. We are fortunate that time has been given. But until the time is almost up, it will be wasted.

Michael Strain makes the case for AI optimism on economics and jobs. It’s a noble effort, so I’m going to take the bait and offer one more attempt to explain the problem.

This seems to be a very patient, well reasoned reiteration of all the standard economic arguments about how technology always creates new jobs to replace the ones it automates away, and how yes you might have a robot or chatbot do X but then the human will need to do Y.

As I’ve noted before, I agree that we should be short term jobs optimists, but there could come a point at which the robot or chatbot also does Y and also new thing Z.

But that is because, like most people making such arguments, Michael Strain does not feel the AGI. He thinks AI is a tool like any other, and will always remain so, and then writes at length about why tools don’t create structural unemployment. True, they don’t, but this is completely missing the point.

It is telling that while he mentions Eliezer Yudkowsky and existential risk in his opening paragraph, he then spends all his time talking about economics and jobs without noticing the ways AI is different, and with zero mention of existential risk, and then closes like this:

Michael Strain: The year 2023 will be remembered as a turning point in history. The previous year, humans and machines could not converse using natural language. But in 2023, they could.

Many greeted this news with wonder and optimism; others responded with cynicism and fear. The latter argue that AI poses a profound risk to society, and even the future of humanity. The public is hearing these concerns: A YouGov poll from November 2023 found that 43% of Americans were very or somewhat concerned about “the possibility that AI will cause the end of the human race on Earth.”

This view ignores the astonishing advances in human welfare that technological progress has delivered. For instance, over the past 12 decades, child mortality has plummeted thanks in large part to advances in drugs, therapies, and medical treatment, combined with economic and productivity gains. Generative AI is already being used to develop new drugs to treat various health conditions. Other advances in the technology will mitigate the threat of a future pandemic. AI is helping scientists better understand volcanic activity — the source of most previous mass-extinction events — and to detect and eliminate the threat of an asteroid hitting the earth. AI appears more likely to save humanity than to wipe it out.

Like all technological revolutions, the AI revolution will be disruptive. But it will ultimately lead to a better world.

What does one have to do with the other? That is very similar to saying:

Strawman Climate Skeptic: This view ignores the astonishing advances in human welfare that burning fossil fuels has delivered. For instance, over the past 12 decades, we have vastly increased our energy production, which has led to [various great things including the same stuff], combined with economic and productivity gains. Fossil fuels are already being used to develop new drugs to treat various health conditions. Other advances in the technology will mitigate the threat of a future pandemic. Machines powered by fossil fuels are helping scientists better understand volcanic activity — the source of most previous mass-extinction events — and to detect and eliminate the threat of an asteroid hitting the earth. Fossil fuels appear more likely to save humanity than to wipe it out.

Like all technological revolutions, the fossil fuel revolution has been disruptive. But it will ultimately lead to a better world.

Presumably one can see that none of that has anything to do with whether doing so is pumping carbon into the atmosphere, and whether that is altering the climate. It has nothing to do with what we should or should not do about that. It flat out is not evidence one way or another.

On jobs the argument is better. It is a good explanation for why in the short term this time will be the same time. In the short term, I buy that argument. Such arguments still fail to grapple with any of the reasons that long term, this time is different.

Texas survey finds nearly 40 percept of Texas firms use AI, with no signs of changes to employment. Only 10% using AI said it decreased need for workers, 2% said it increased. There was also a marginal shift from low skill to high skill work, note that this is the percent chance a firm in total had any shift at all, so the absolute numbers here are quite low so far.

What’s it good for? Mainly productivity. Access to information is also essentially productivity, after all.

One alternative to jailbreaking is to divide your task into subcomponents. A weaker model without safeguards does the blatant actions, a frontier model does seemingly harmless but difficult tasks, paper says you can get from <3% to 43% overall success rate this way on malicious tasks.

Well, sure. A strong model can help you do anything better without directly violating ethics, the same way you can get a lot of help out of ethical people and use that plus unethical henchman to do lots of unethical things.

That does not mean the safeguards are useless. In practice they are still big barriers if they force you into this song and dance. Also note that the strategic planning layer has to be done by the weaker model, so that makes it much harder to get humans properly out of the loop.

AISI hiring ML research scientists to explore technical AI safety cases, apply here.

Apollo Research hiring Senior AI governance researcher.

OpenAI brags about its cybersecurity grant program, invites more applications.

Protest against US-based AI companies in Accra, Ghana outside the US embassy.

Department of Energy releases 3.6 billion token corpus of federal permitting documents onto HuggingFace. A competition is available.

BlueDot Impact is hiring a software engineer.

Cate Hall is now CEO of Astera, and is building a team including a new COO to use their $2.5 billion endowment to make their vision of public goods for scientific and technological progress a reality in the age of AI. I worry that this agenda has no mention of existential risks from AI, and that if not careful they could amplify those risks. However it is true that other scientific progress is a worthy cause. As always in such cases, if it sounds appealing, investigate, ask questions and make your own decisions. It certainly is a big chance to steer a large endowment.

The AI Forecasting Benchmark Series from Metaculus, starting July 8, $120k in prizes over four contests. Only bots can enter. Metaculus scoring on blinded binary questions is a good test of prediction, so long as you notice it is radically different than what will make money gambling or in a market.

OpenAI has a Mac desktop app, which lets you quickly ask about anything on your computer. Marginally more convenient in ways that might make a practical difference.

Nvidia releases, as an open model, Nemotron-4 with 340B parameters, trained on 9 trillion tokens.

Oleksii Kuchaiev: Generating synthetic data for alignment of smaller models is key use case we have in mind.

I notice this use case confuses me. What makes this model better than alternatives for that? They offer some evaluation numbers, which are solid but seem disappointing for a model this large, and few are discussing this release. Indeed, it has entered the Arena Elo rankings at 1208, which essentially ties it with Llama-3-70B while being five times as large.

Otto, a way to interact and work with lots of AI agents using tables, you can apply for early access. No idea if the agents or interface are any good.

Dot is available in the Apple store. It appears to be a combined AI assistant and life coach you talk to on your phone and that claims to have effectively unlimited long term memory. It is $12/month. Kevin Fischer is impressed, and says he can’t share the great stuff because it is all too personal. As usual with such products it is impossible to know without an investigation: Is this anything?

Butterflies, which is Instragram except most of the users are AI that run accounts on their own and interact with each other and the few humans around. The future of social media whether we like it or not? I doubt it so long as humans are otherwise in charge, but the hybrids are going to get weird.

Decagon, providing Substack with customer service AI using RAG for context and categorizing responses by type.

Chris Best (CEO Substack): @DecagonAI was our first “holy shit AI just changed our business” moment at Substack. These guys are the real deal.

Jesse Zhang (Decagon AI): We’re creating the most human-like systems to handle all the things a customer support agent does: responding to customers, looking up data, taking actions, and also analyzing conversations, filing bugs, and writing knowledge articles. Read more here [at business insider].

They have raised 35 million.

I missed it a month ago: The UK’s AISI issued its May evaluations update.

They gave scaffolding to the models. Their central technique for cyber capabilities was ‘capture the flag’ problems, where you can read the answer in a file if you do other things first. For chemistry and biology they used private expert-written questions. Agent evaluations assigned the models various tasks, none succeeded at anything with a long time horizon.

Safeguard checks… did not go well.

They have now done evaluations prior to release for Gemini 1.5 Pro and Claude 3.5 Sonnet. This all looks reasonable, but implementation matters and is hard to evaluate from here, and this will need to expand over time.

OpenAI changes its policy on tender offers, assuring that all will have equal opportunity to sell, and removing the ‘fair market value’ repurchase provision.

Kelsey Piper: ! OpenAI is committing to access to tender offers for former employees and removing a provision allowing them to take equity back for “fair market value”. This was a major ask from ex-employees when the secret NDA story first broke.

Hayden Field: Scoop: OpenAI has reversed course on many of its tender offer policies, which in the past treated current employees differently than former ones & in some ways excluded former employees working at competitors, CNBC has learned, via an internal document.

The exception is if a tender offer is oversubscribed, with more sellers than buyers, in which case current employees get prioritized. A loophole, but fair enough. Former employees can still be excluded from ‘donation rounds,’ which I assume is relatively minor but not nothing.

These changes are a major step forward, if we trust these promises to be enacted, as a lot of this is ‘we will do X’ or ‘we will revise the documents to say Y.’ If they are not enacted as promised, that would be a gigantic red flag. If we feel that makes the promises sufficiently credible, then this counts for a lot.

OpenAI taking additional steps to block access to its services from China. Bloomberg speculates this opens the door for Chinese firms. Technically OpenAI services were not previously available in China. It seems everyone was ignoring that.

Bloomberg News: For China, that could help usher out many smaller startups created during the “battle of a hundred models,” in the wake of ChatGPT’s late 2022 debut. And a bigger concern may be whether open-source models like Meta Platforms Inc.’s Llama also cut off access, said Bernard Leong, chief executive officer of Singapore-based Dorje AI.

Um, Bloomberg, how exactly would Meta do that? Meta’s models are open weights. Is Meta going to say ‘we are asking you nicely not to use our model, if we discover you copied and used it anyway we will be cross with you?’ Are they going to sue the Chinese companies for not getting a commercial license? Good luck with that.

Also, it pains me when I see reports like this that cite Meta as part of the lead group in AI but that do not mention Anthropic, despite Anthropic having the best model.

OpenAI delays its advanced Voice Mode for another month, anticipates all Plus users having access in the fall along with new video and screen sharing capabilities.

Apple in talks with Meta to add its AI to Apple Intelligence’s offerings alongside ChatGPT. They said they intended to offer a variety of choices. I would be talking to Google and Anthropic first, but it matters little.

Sarah Constantin says it is 10+ years from state of the art to widespread use in the military, procurement is slow, so Leopold’s military timelines don’t make sense.

I mean, sure, in peacetime, when everyone is mostly fine with that. If we are in AGI world, and a few months lead in tech would if implemented be decisive, what happens then? Presumably we go on a wartime footing and throw our procurement rules out the window. Wartime militaries work completely differently from peacetime militaries.

If not, well, then our military is going to stop being effective, even against domestic rivals, because being 10 years behind is going to be quite obviously fatal even in relatively slow scenarios.

One view of Ilya’s new venture.

Roon: Extreme bear signal on anyone who says cracked especially in their launch post.

Gwern speculates that OpenAI has ‘lost its mojo’ and key employees, and could now be largely coasting on momentum.

Gwern: What made OA OA in 2020 was that it had taste: it had much less resources than competitors like DeepMind or Google Brain or FAIR, but (thanks to Alec Radford, Ilya Sutskever, Jared Kaplan, and the RLHF-focused safety team like Paul Christiano & Dario Amodei, and fellow-traveler scalers like Andrej Karpathy etc) they bet big on scaling laws & unsupervised learning at the moment those suddenly began to work. Without taste and agility—or you might say, “without its people, OA is nothing”—OA doesn’t have that much of a moat.

And most of those people are gone, and the survivors are being policed for leaks to the media, and now know that if they leave, OA management wants to gag them, and has the power to confiscate their vested equity, wiping out all their wealth.

What are the vibes now? Where is the research taste at OA, what ideas or breakthroughs have they published the past few years of note? The weird rumored Franken-MoE architecture of GPT-4? GPT-4o, whose architecture has been obvious since DALL·E 1, if not well before, and which benchmarks great but users are overall less pleased?

I think it implies that they are eating their seed-corn: scrapping any safety issues may work in the short run, but is self-sabotaging in the long run. (Like the man who works with his office door closed, who is highly productive now, but somehow, a few years later, is irrelevant.) The rot will set in long before it become clear publicly. OA will just slow down, look glossier but increasingly forfeit its lead, and some point it stops being possible to say “oh, they’re way ahead, you’ll see when they release the next model in a few months/years”.

And the Mandate of Heaven shifts elsewhere, irreversibly, as OA becomes just another place to work. (Startup & research culture mostly only degrades from the peak at their founding.) The visionaries go to Anthropic, or follow Ilya to SSI, or take a risk on Google, or go someplace small like Keen to bet big.

What’s weird about GPT-4o is actually that it scores so well on Arena, versus my observation that it is fine but not that good.

David Chapman responds that perhaps instead scaling has run out, as a different explanation of the failure to create a new killer product.

Ability at math competitions is bizarrely strongly correlated among humans with later winning Fields Medals for doing frontier math, despite the tasks being highly distinct. So should we take winning math competitions as a sign the AI is likely to earn Fields Medals? Should we also respect doing well on other standardized tests more? My guess is no, because this has a lot to do with details of humans and we have to worry about data contamination on many levels and the use of techniques that don’t transfer. It is still food for thought.

There have always been people who think most possible technologies have been invented and things will not much change from here. Robin Hanson claims this is actually the ‘dominant view among most intellectuals.’ He does note ‘there are other variables,’ but this illustrates why ‘most intellectuals’ should mostly be ignored when it comes to predicting the future. They utterly lack situational awareness on AI, but even without AI there are plenty of worlds left to conquer.

Sir, the reason we will want to turn over decision making to AIs is that the AIs will be capable of making better and faster decisions.

Timothy Lee: I’ve never understood why people think we’ll want to turn over strategic decision-making to AIs. We can always ask for recommendations and follow the ones that make sense.

People point to examples like chess or Go where computers are now strictly better than people. But very few strategic decisions in the real world are purely instrumental. There are almost always tradeoffs between competing values; people are going to want the final say.

It’s one thing for a computer to say “you need to sacrifice your rook to win the chess game.” It’s another for it to say “you need to sacrifice 10,000 soldiers to win the war.” Human decision-makers might think that’s worth it but they might not.

What happens by default, if capabilities keep advancing, is that those who do let AIs make those decisions win and those who don’t let them make those decisions lose. Keeping humans in the loop is cheaper for strategic decisions than tactical ones, but still expensive. After some point, humans subtract rather than add value to AI decisions, even by their own metrics, except that not doing so means you lose control.

That’s the game. You could ask for recommendations, but what happens when it is clear that when you disagree you are by default making things worse, while also wasting valuable time?

Point, counterpoint.

Richard Ngo: I expect the premium on genius to increase after AGI, not decrease, because only the smartest humans will be able to understand what the AGIs are up to.

Interesting analogy here to physical prowess – manual labor became much less common, but the returns to being athletic are now through the roof via professional sports.

Professional AI interpretation won’t be quite as heavy-tailed, but still more than current science, I’d guess.

Zack Davis: Doesn’t seem like this era will last very long?

Richard Ngo: Even when AIs become smart enough that nobody understands what they’re up to, understanding more than anyone else seems like a big deal as long as humans are still around! If we met friendly-ish aliens, the person who spoke their language most fluently would get very rich.

There is a lot of wishcasting here. The AGIs will rapidly be doing lots of things no one can understand. Events will presumably be well out of our control. Yet being somewhat less completely confused, or getting completely confused slower, will be where it is at, and will pay meaningful dividends in real world outcomes?

This requires threading quite a few needles. Your expertise has to give you better understanding, despite the AGIs being able to explain things. That has to let you make better decisions. Your better decisions have to matter.

Even taking his metaphor at face value, are returns to being athletic higher? Yes, you can make quite a lot of money by being the very best. But you can be outrageously good at athletics, as in a minor league baseball player, and get very little return. Even trying for college scholarships is quite the sweepstakes. This is a winners-take-all (or at least most) competition.

Maxwell Tabarrok offers a takedown of Daron Acemoglu’s paper The Simple Macroeconomics of AI, another in the line of economic models that presumes AI will never gain any capabilities and current AI cannot be used except in certain specific ways, then concluded AI won’t increase economic growth or productivity much.

Anton points out that dumping massive context into systems like Claude Sonnet 3.5 is not going to dominate RAG because of cost considerations. Claude costs $3 per million input tokens, which is definitely ‘our price cheap’ but is still $187/GB, versus DDR4 at $2.44/GB, NVME at $0.09/GB. You will have an infinite context window but you will learn how not to use (and abuse) it.

If we do discover dangerous cyber capabilities in AI, what do we do next? Who finds out? The proposal here from Joe O’Brien is Coordinated Disclosure of Dual-Use Capabilities, with a government team funded and on standby to coordinate it. That way defenders can take concrete action in time. He and others make the same case here as well, that we need an early warning system.

It is hard to imagine, short of it being completely botched and useless, an early warning system being a bad use of funds.

What happened in Seoul last month?

Mostly: Diplomacy happened.

That makes it difficult to know whether things moved forward. In diplomacy (as I understand it) most time is spent establishing foundation and trust, laying groundwork for the final agreement. But always, always, always, when it comes to the bottom line, nothing is done until everything is done.

Still, this commitment goes beyond that and seems like an excellent start?

Dan Hendrycks (June 7, 2024): Last month in Seoul, major AI developers already committed to testing their models for risks, and even ceasing development if their models reach a catastrophic level.

It’s revealing how many people oppose regulation that would require companies to keep some of these promises.

Here are the commitments.

Outcome 1. Organisations effectively identify, assess and manage risks when developing and deploying their frontier Al models and systems. They will:

I. Assess the risks posed by their frontier models or systems across the Al lifecycle, including before deploying that model or system, and, as appropriate, before and during training. Risk assessments should consider model capabilities and the context in which they are developed and deployed, as well as the efficacy of implemented mitigations to reduce the risks associated with their foreseeable use and misuse. They should also consider results from internal and external evaluations as appropriate, such as by independent third-party evaluators, their home governments[footnote 2], and other bodies their governments deem appropriate.

II. Set out thresholds [footnote 3] at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable. Assess whether these thresholds have been breached, including monitoring how close a model or system is to such a breach. These thresholds should be defined with input from trusted actors, including organisations’ respective home governments as appropriate. They should align with relevant international agreements to which their home governments are party. They should also be accompanied by an explanation of how thresholds were decided upon, and by specific examples of situations where the models or systems would pose intolerable risk.

III. Articulate how risk mitigations will be identified and implemented to keep risks within defined thresholds, including safety and security-related risk mitigations such as modifying system behaviours and implementing robust security controls for unreleased model weights.

IV. Set out explicit processes they intend to follow if their model or system poses risks that meet or exceed the pre-defined thresholds. This includes processes to further develop and deploy their systems and models only if they assess that residual risks would stay below the thresholds. In the extreme, organisations commit not to develop or deploy a model or system at all, if mitigations cannot be applied to keep risks below the thresholds.

V. Continually invest in advancing their ability to implement commitments i-iv, including risk assessment and identification, thresholds definition, and mitigation effectiveness. This should include processes to assess and monitor the adequacy of mitigations, and identify additional mitigations as needed to ensure risks remain below the pre-defined thresholds. They will contribute to and take into account emerging best practice, international standards, and science on Al risk identification, assessment, and mitigation.

Outcome 2. Organisations are accountable for safely developing and deploying their frontier Al models and systems. They will:

VI. Adhere to the commitments outlined in I-V, including by developing and continuously reviewing internal accountability and governance frameworks and assigning roles, responsibilities and sufficient resources to do so.

Outcome 3. Organisations’ approaches to frontier Al safety are appropriately transparent to external actors, including governments. They will:

VII. Provide public transparency on the implementation of the above (I-VI), except insofar as doing so would increase risk or divulge sensitive commercial information to a degree disproportionate to the societal benefit. They should still share more detailed information which cannot be shared publicly with trusted actors, including their respective home governments or appointed body, as appropriate.

VIII. Explain how, if at all, external actors, such as governments, civil society, academics, and the public are involved in the process of assessing the risks of their Al models and systems, the adequacy of their safety framework (as described under I-VI), and their adherence to that framework.

  1. We define ‘frontier AI’ as highly capable general-purpose AI models or systems that can perform a wide variety of tasks and match or exceed the capabilities present in the most advanced models. References to AI models or systems in these commitments pertain to frontier AI models or systems only. 

  2. We define “home governments” as the government of the country in which the organisation is headquartered. 

  3. Thresholds can be defined using model capabilities, estimates of risk, implemented safeguards, deployment contexts and/or other relevant risk factors. It should be possible to assess whether thresholds have been breached. 

That is remarkably similar to SB 1047.

Markus Anderljung: This is just the start of this journey. Going forward, governments, civil society, academia, the public will need to be a part of defining and scrutinizing these frontier AI safety frameworks. But the first step is that they exist.

The thresholds would be set by the companies themselves. In the future, they should and probably will see significant input from others, including governments. They’d have to be public about it, which allows others to spot if their commitments aren’t sensible. Most of these companies don’t have these frameworks in place, let alone talk about them publicly, so this seems like a step in the right direction

In order to comply with this, you need to detail your safety protocols, which also means detailing what is being trained in at least a broad sense. You have to have procedures to verify your mitigations. You have to comply with shifting international standards and best practices that are not defined in advance.

The only substantial parts missing are the shutdown protocol and protecting the model weights until such time as they are intentionally released.

Also the thresholds are set by the companies rather than the governments. This seems worse for everyone, in the sense that a government standard offers safe harbor, whereas not having one opens the door to arbitrary declarations later.

So if this is so terrible, presumably companies would not sign… oh.

•Amazon

•Anthropic

• Cohere

•Google

• G42

• IBM

• Inflection Al

• Meta

• Microsoft

• Mistral Al

• Naver

• OpenAl

•Samsung Electronics

• Technology Innovation Institute

•xΑΙ

•Zhipu.ai

I am not saying that is ‘everyone’ but aside from some Chinese companies it is remarkably close to everyone who is anyone.

Ian Hogarth (Chair AISI): Really remarkable achievement announced at AI Seoul Summit today: leading companies spanning North America, Asia, Europe and Middle East agree safety commitments on development of AI.

If you scan the list of signatories you will see the list spans geographies, as well as approaches to developing AI – including champions of open and closed approaches to safe development of AI.

What else happened?

What about China’s statements? China would be key to making this work.

Matt Sheehan: Chinese readout from AI dialogue meets (low) expectations:

– want AI good not bad

– UN=leader on governance

Disappointing (but expected): China delegation led by Foreign Ministry North America bureau. Indicates China treating dialogue as aspect of US-China relations, not global tech risk.

Helen Toner: No Matt but didn’t you see, they agreed that AI could have big benefits but also poses big risks! I think that’s what they call a diplomatic breakthrough.

Saad Siddiqui: It feels like lots of different parts of the CN bureaucracy in the room, hard to imagine productive dialogue with so many different interests present across NDRC, CAC, MOST, MIIT, Central Committee Foreign Affairs Office. Any sense if that’s typical?

I do not know why anyone would have any hope for the United Nations. I worry that saying ‘the UN should take a leading role’ is a lot like saying ‘we should do nothing.’ Then again, if we already believe all five security council members have de facto vetoes over everything anyway, then does it change anything? I don’t know.

Imane Bello calls it a success, because:

  1. They got everyone together.

  2. They got China and America into the same room.

  3. There were calls for cooperation between many AI safety institutes.

  4. The interim international scientific report was unanimously welcomed.

  5. In Imane’s opinion, IISR is ‘history in the making.’

Again, that’s diplomacy. Did it matter? Hard to say.

UK lead negotiator Henry de Zoete is also calling it a win.

Jan Brauner sums up what they see as the most important outcomes.

  1. AI safety institutes say they will partner and share info.

  2. Companies make the commitments above.

  3. US AISI within NIST releases strategic vision (full version here).

  4. Soul Ministerial Statement is super explicit about existential risk.

  5. UK government sets up $11mm grant program for AI safety.

I looked over the NIST strategic vision. I have no particular objections to it, but neither does it involve much detail. It is a case of successfully not messing up.

Some have ambitious further plans.

Eva Behrens: Here are 5 policy recommendations for the upcoming AI Safety Summit in Seoul, from me and my colleagues at ICFG.

In Bletchley, world leaders discussed major risks of frontier AI development. In Seoul, they should agree on concrete next steps to address them.

Overview

In accordance with the shared intent communicated through the Bletchley Declaration to deepen international cooperation where necessary and mitigate catastrophic risks from advanced Al, we urge countries attending the Summit in South Korea to jointly recognise that:

  1. The development of so-called long-term planning agents (LTPAs) should be prohibited until proven safe,

  2. Advanced Al models trained on 10^25 Floating Point Operations (FLOP) of compute capacity or more should be considered high-risk and need to be regulated accordingly, and

  3. The open-sourcing of advanced Al models trained on 10^25 FLOP or more should be prohibited.

To build a strong foundation for international cooperation on the governance of high-risk advanced Al, we urge that Summit participants jointly agree to:

  1. Hold biannual international Al Safety Summits, and pick a host country to follow after France and

  2. Keep the focus of the Summits on international collaboration for mitigating catastrophic risks from advanced Al.

Contrast this with SB 1047. This would heavily regulate above 10^25 including full bans on open source (until a protocol is designed to allow this to happen safety, they say, no idea what that would be), with no adjustments over time. SB 1047 starts at 10^26, requires only reasonable assurance, and has a $100 million minimum such that the threshold will rapidly scale higher very soon.

Indeed, the ICFG says the threshold should over time be adjusted downwards, not upwards, due to algorithmic and hardware improvements.

This also proposes a ban on ‘long term planning agents,’ which unfortunately is not how any of this works. I don’t know how to allow short term planning agents, and effectively stop people from making long term ones. What would that mean in practice?

There was this talk that included Yoshua Bengio, Max Tegmark and Jaan Tallinn.

What about the full International Scientific Report on the Safety of Advanced AI? I looked briefly and I was disappointed. Over 95% of this report is the standard concerns about job displacements and deepfakes and privacy and other similar issues. The one section that does address ‘loss of control’ says experts disagree about whether this could be a concern in the future if we create things smarter than ourselves, so who can say.

They even say that a loss of control of highly capable AI systems is ‘not necessarily catastrophic.’ That is the only time the word ‘catastrophic’ is used, and they do not say ‘existential.’ ‘Extinction’ is only mentioned once, in the section directly after that, entitled ‘AI researchers have differing views on loss of control risks.’ Thus, despite the conference saying it should focus on existential dangers, this report is in effect highly dismissive of them, including implicitly treating the uncertainty as reason not throw up one’s hands and focus on issues like implicit bias.

Top AI labs are currently dramatically insecure. As the value of their model weights and other assets rises, both commercially and as an existential risk and matter of national security, this will increasingly become a problem. Alexander Wang, CEO of Scale AI, did a ChinaTalk interview in which he emphasized the need to lock down the labs if AI capabilities continue to advance.

Rand recently came out with an extensive report on how to secure model weights. As they note, securing only the model weights is a far more tractable problem than securing all the data and algorithms involved. They assume future frontier models will be larger, and online API access will need to be widespread.

Here is a Q&A with director Sella Nevo, one of the coathors, which goes over the most basic items.

What are their core recommendations?

They start with things that need to be done yesterday. The biggest dangers lie in the future, but our security now is woefully inadequate to the dangers that exist now.

Avoiding significant security gaps is highly challenging and requires comprehensive implementation of a broad set of security practices. However, we highlight several recommendations that should be urgent priorities for frontier AI organizations today. These recommendations are critical to model weight security, most are feasible to achieve within about a year given prioritization, and they are not yet comprehensively implemented in frontier AI organizations.

• Develop a security plan for a comprehensive threat model focused on preventing unauthorized access and theft of the model’s weights.

• Centralize all copies of weights to a limited number of access-controlled and monitored systems.

• Reduce the number of people authorized to access the weights.

• Harden interfaces for model access against weight exfiltration.

• Implement insider threat programs.

• Invest in defense-in-depth (multiple layers of security controls that provide redundancy in case some controls fail).

• Engage advanced third-party red-teaming that reasonably simulates relevant threat actors.

• Incorporate confidential computing to secure the weights during use and reduce the attack surface. (This measure is more challenging to implement than the others in this list but is backed by a strong consensus in industry.)

This is the least you could do if you cared about the security of model weights. Have an actual plan, limit access and attack surface, use red-teaming and defense in depth.

As Leopold noted, our goal must be to stay ahead of the threat curve.

The authors note that FBI Director Christopher Wray implied China had a workforce of more than 175,000 hackers. If China wanted to go full OC5+, they could. For now it would not make sense given the economic and diplomatic costs. Later, it will.

They also say North Korea invests ‘between 10% and 20% of the regime’s military budget’ in cyberwarfare, between $400 million and $800 million. I presume they do this largely because it is profitable for them.

Everyone acknowledges that an OC5-level attack on any major lab would almost certainly succeed. For now, that is fine. The question is, when does that become not fine, and where should we be right now? Should we be able to block an OC4 attack? I certainly hope we would be able to block an OC3 one given the value at stake.

We do not need to attempt bulletproof security until we are under robust attack and have assets that justify the real costs of attempting bulletproof security. We do need to be trying at all, and starting our preparations and groundwork now.

Longer term we will need things like this to have much chance, similar to what one would do if worried about model self-exfiltration, which we should be worried about in such scenarios as well:

• physical bandwidth limitations between devices or networks containing weights and the outside world

• development of hardware to secure model weights while providing an interface for inference, analogous to hardware security modules in the cryptographic domain

• setting up secure, completely isolated networks for training, research, and other more advanced interactions with weights.

They highlight 38 potential attack vectors in 9 categories.

How many resources are needed to launch various attacks? They have a table for that.

The numbers here are weird, representing chance of success linearly from <20% to >80%, against an arbitrary target. I would think things would scale differently.

I also do not think that ‘up to 20% chance of success’ is the right category? If something has a 10% chance of success it is a big deal.

Also important is that this is an enumeration of things we know about. That is a lower bound on the risk. The actual situation is far worse, because it includes unknown unknowns. It is very hard for the things we do not know about to be ‘good news’ here.

For multiple reasons, it is prudent to recognize the plausibility of current assessments underestimating the threat:

• We assume that other attack vectors exist that are as yet unknown to security experts, particularly ones concerning advanced persistent threats (APTs), such as state actors.

Novel attack vectors and conceptual approaches are likely to evolve over time, as are novel insights and infrastructure that make existing attacks more accessible.

• Publicly known examples of attacks are only a subset of attacks actually taking place, especially when it comes to more-advanced operations. Most APTs persist for years before discovery.

Many national security experts with whom we spoke mentioned that the vast majority of highly resourced state actor attacks they are aware of were never publicly revealed. This means that a purely empirical analysis based on detected operations would systematically underestimate the feasibility and frequency of advanced attack vectors.

Accordingly, one should expect capable actors to have access not only to well-established attack vectors but also to unknown approaches. In Appendix A, we share many examples of state actors developing such conceptually novel attacks years or decades before they were discovered by others.

Bold is mine. All of that involves human attack vectors only. If we include future AI attack vectors, enabled by future frontier models, the situation gets even more dire if we do not bring our new capabilities to play on defense with similar effectiveness.

Chapter 6 proposes that labs define security levels (SLs) from SL1 to SL5. If you are SL(X), you are protected against threats of OC level X.

So what does it take to get to even SL1?

In some senses this is easy. In others, in the context of a startup? It is asking a lot.

Moving to SL2 means ‘industry best practices’ across the board. Doing all of the standard things everyone says one should do is a standard few companies, in practice, actually meet. Almost everyone is doing some number of ‘stupid’ things in the form of not doing some of the things on this list.

What about SL3? It is essentially more of the same, only more so, and with serious worries about insider threat vectors. Any individual item on the list seems plausible but annoying. Doing all of them, in a world where your weakest point gets attacked, is not going to happen without a concerted effort.

SL4 gets expensive. Things are going to get slowed down. You do not want to be implementing this level of paranoia too early.

SL5 is that much more expensive to implement. You have to care quite a lot. Having eight security layers is quite the ask as are many other action items.

Is all that necessary? Would it even be sufficient? Consensus weakens as you move up to higher security levels.

There are deeper and more conceptual disagreements about what is needed to achieve the security implied by SL4 and SL5—with opinions ranging from the SL3 benchmark being sufficient to secure against all threat actors to claims that no system could ever present a significant hurdle to operations in the OC5 category.

A particular point of disagreement was the number of people who should have authorization to access the weights. Some experts strongly asserted that the model weights cannot be secure if this number is not aggressively reduced (e.g., to the low tens); others claimed that such a reduction would not be necessary, feasible, or justified.

I have definitely talked to an expert who thought that against an OC5 operation all you can hope to do is buy some time. You can prevent them from stealing everything the first day they set their sights on it, but protecting assets over time is, they claimed, rather hopeless. I haven’t seen credible claims that SL3-style procedures would be sufficient to protect against OC5, and I find that highly implausible, even if it has rarely if ever been tried.

The low tens seems to me quite a lot of people to have access to your core asset. I am not sure how different ‘low tens’ is from infinity. Certainly if your plan involves dozens of people each not being compromised, then you have no plan.

The second half of the report is details of the different attack vectors.

House appropriations bill cuts $100 million in funding for NIST. This is one of the worst things to be cutting right now, it is already woefully underfunded.

New paper on Risk Thresholds for Frontier AI. How should we combine compute thresholds, risk thresholds and capability thresholds? The conclusion is to primarily use capability thresholds but have them be informed by risk thresholds.

I am going to quote this in full because it feels like a good steelman of being skeptical about going too far too fast on regulation.

Seb Krier (Google DeepMind): I tend to think of AI policy in three consecutive phases: observation and monitoring; standardization and norm-setting; and then rules, law, and regulations if necessary. My impression is that in recent years some governance crowds have taken the reverse approach, motivated by the usual policymaker urgency of ‘we must do something now’. The problem with this is that you now have to define and cement very precise things that are still evolving, like evaluations and mitigations. Combined with the many trade-offs, inefficiencies, conflicting interests, low capacity, and frankly generally poor decision-making that governments currently suffer from, this often leads to messes, evidentiary gaps, legal risks, and rushed policymaking.

To be clear, I definitely think AI is a technology that will warrant some degree of regulation – and there may well be sector-specific uses or applications that warrant this now. I think cybersecurity-oriented regulations make more sense than omnibus regulatory behemoths. But at a more general level, I feel like we’re still in a phase where the value comes from research and finding things out. And I’d rather see 50 organizations developing evaluations and 5 advocating for regulations rather than the reverse (i.e. what we have today). This is also why I’m quite supportive of the experimental nature of institutions like the AI Safety Institute, where both sides iteratively learn as things progress.

Some people justify hasty policymaking because they think we will have AGI very soon and therefore this demands quick pre-emptive action, otherwise governments won’t have time to intervene. I think it’s right to try to pre-empt things, prepare institutions, and think ahead – but I don’t think timelines alone grant a carte blanche for any kind of legislation. Plus if we are indeed getting very close to AGI, I have 0 doubt that governments will inevitably wake up – and the implications, particularly for large risks, will be a lot more Leopold-like than creating a new GDPR for AI.

So essentially:

  1. For now we should observe and monitor, lay groundwork such as with NIST, and perhaps do select sector-specific interventions such as in cybersecurity.

  2. Later we will do, and will want to do various regulatory actions.

  3. But let’s try and push the key decisions forward in time so we learn more.

Also GPDR is deeply stupid law. Do not make laws like GPDR. They do great harm via creating frictions without accomplishing almost anything.

It is also correct to worry about regulatory lock-in. Not infinitely worried as in ‘anything imposed is automatically forever,’ but yes there is a lot of inertia and these things are hard to reverse.

How much do we need to worry about moving too slowly? That depends on:

  1. How long you think we have.

  2. How quickly you think we can move.

  3. How sensibly you think we would move in a crisis but with more information.

  4. Whether you think that by the time there is a crisis, it will be too late.

Reasonable people disagree on all those questions.

What most critics and skeptics fail to do is differentiate their responses to different types of regulatory proposals.

As in, is a proposal about observing and monitoring and allowing us to intervene when the time comes? Or is it attempting to intervene now on what people can do now, or dictate the form of intervention later?

Consider the response to something like SB 1047 or Biden’s executive order. Both are primarily about transparency, observation and monitoring of frontier models for the sole purpose of concerns on catastrophic or existential risks. They are deeply compatible with the perspective outlined here by Krier.

The logical response is suggesting improvements and discussing details, and talking price. Instead, most (not Krier!) who are skeptical of other forms of regulation choose for SB 1047 instead to hallucinate a different bill and different impacts, and for the executive order to demand it be repealed. They hallucinated so badly on SB 1047 that they demanded the removal of the limited duty exception, a free option that exclusively lightened the burden of the bill, and got their wish.

The logic of these others seems to be:

  1. You want to be able to observe and monitor, and prepare to act.

  2. If you did that, you might later act.

  3. Can’t have that. So we can’t let you observe or monitor.

SB 1047 has strong bipartisan public support (77%-13%), if this is how you ask about it. I notice that this is not exactly a neutral wording, although its claims are accurate.

This is unsurprising, although the margin is impressive. We have yet to see a poll on AI that doesn’t go this way.

The LA Times discusses SB 1047 and other proposed bills here. All the other bills seem actively counterproductive to me, especially the pure rent seeking demand from teamsters for supervision of self-driving trucks.

Dean Ball argues that SB 1047 is bad because it creates a government regulatory agency, via a fully general public choice counterargument against having government regulatory agencies for anything with broad positive use cases. I ended up discussing various SB 1047 things on Twitter a bit with him and Eli Dourado.

Politico covers that Y Combinator sent a letter opposing SB 1047. While the letter refreshingly say that the law was clearly drafted in good faith, all four of the letter listed concerns misstate the practical implications of the bill in alarmist terms. Then they say, rather than proposing fixes to particular issues, why not scrap the whole thing and instead encourage open source software? It is telling that such letters so often ask not only for no rules of any kind, but also for active government handouts and special treatment, despite SB 1047 already giving open source special treatment.

Dwarkesh Patel interviews Tony Blair, with AI as a major focus. Blair sees AI as the biggest change since the industrial revolution, the most important thing to focus on. He very much gives off the technocrat ‘this is how it all works’ vibe, without pretending that the technocrats are generally in charge or governments are competent. He sees AI will be huge but doesn’t seem to notice the existential risk angle. Essentially he is a sensible ‘AI skeptic,’ who does not expect AGI or a takeoff but sees AI would be transformative anyway. His focus has been ‘good governance’ so then he pulls out the standard good governance tropes. He also emphasizes that policy and politics (or ‘change makers’) are distinct things, and if you want to accomplish anything you have to be policy first.

Also has this great line from Blair: “The problem with government is not that it’s a conspiracy, either left-wing or right-wing. It’s a conspiracy for inertia.”

Interview with OpenAI board chairman Bret Taylor. He is excited for this generation of AI. His focus is clearly being CEO of Sierra, where he is building hopefully cool solutions for consumer brands, rather than his far more important role at OpenAI. That does at least mean he has lots of practical experience with current models. He holds his own on mundane job transitions but does not seem to be feeling the AGI. Instead he says, beware specific hype, but the economy will transform within 30 years and this will ‘meet the hype.’ Someone needs to have him talk to the technical staff. For now, it seems he does not grok existential risk because he doesn’t grok AGI.

Lester Holt interviews OpenAI CEO Sam Altman and AirBnB’s Brian Chesky, skip to about 35: 00, it is ~40 minutes. Often not new information dense. Colin Fraser notes some of the ways Altman is playing rhetorical slight of hand with the risks of AGI. If you expect to be able to tell an AI ‘go solve all of physics’ or ‘go create a great company’ then that is a completely transformed world, you cannot simply talk about ‘solving misuse’ as if misuse was a distinct magisteria.

When discussing events around Altman’s firing, Altman sticks to his story and lets Chesky tell a series of rather glaring whoppers. Both try to walk back the idea of an ‘AGI moment,’ there are only various capabilities in various areas, and try to deny that there is ‘a race’ in a meaningful sense. Altman follows the general theme of acting like everything will stay normal under AGI. I know he knows better. When he says ‘AGI could double the world’s GDP’ Holt points out this sounds outlandish, but I see it as outlandish on the downside and I think Altman knows that.

And he is making the ‘we have great ability to steer our current models and their values’ card, the real problem is choosing our values, which I see as a highly disingenuous attempt to dismiss alignment problems as being handled.

Mira Murati talks to Dartmouth Engineering where she is an alumni. It has some key spots but has low information density.

  1. She says we should expect to get ‘PhD-level intelligence for specific tasks’ in a year to 18 months. The usual suspects responded to this as saying no GPT-5 for over a year and did some gloating, which seems like the wrong response to this kind of prediction.

  2. She was broadly supportive of the government understanding what is going on and called for more of that.

  3. She says of the AI ‘it’s a tool, right’ and there is a subtle blackpill that she does not seem to notice that this might not be the full story in the future.

  4. It does seem she said ‘Some creative jobs maybe will go away due to AI, but maybe they shouldn’t have been there in the first place.’ Hot take. She then tried to save it on Twitter.

Roon (linking to this clip from this segment): I fing love Larry Summers.

Beff Jezos (responding to clip): So ing based holy .

Larry Summers introduces Bloomberg to the concept of recursive self-improvement, eventually using the term explicitly, and predicting transformative and seismic change. The issue, he says, is how do you manage that? He says we cannot leave AI only to AI developers. Public authorities must take a strong role in ensuring it gets used for good, but stopping it or slowing it down without thinking about positive developments would seed the field to the irresponsible and our adversaries, and he endorses ‘responsible iterative deployment.’

If this counts as highly based, where public authorities must take a strong role, and we should consider the positive benefits and also the downsides, perhaps we are getting somewhere. Lots of great stuff here, we need to now also work in alignment and the control problem, which did not get mentioned.

New interview with Anthropic CEO Dario Amodei. I haven’t listened yet.

Yuhal Noah Harai asks, among other things, what happens when finance becomes when zero humans understand the financial system? Would we end up controlled by an essentially alien intelligence? This specific mechanism is not that high on my list. The generalized version is reasonably high. Yes, of course, we will be under immense pressure to turn control over all of the things to AIs.

Leo Gao of OpenAI reminds us we do not know how neutral networks work. He does so in response to someone citing Leo Gao’s paper as evidence to the contrary that someone ‘must have missed.’ When the moment was described, he did not take it great.

This does seem to be accurate.

Agustin Lebron: No one:

Absolutely no one:

Every AI researcher: AGI is incredibly dangerous and no one should build it. Except ME. I can do it safely.

Eliezer Yudkowsky: Elon starts OpenAI because he doesn’t like Demis. OpenAI people repeatedly distrust OpenAI and leave to start their own companies… none of which trust *each other*… and one observes that they’re all founded by the sort of people who went to work for OpenAI in the first place.

Elon Musk: Actually, I like Demis. Just don’t trust the Google corporate blob.

Eliezer Yudkowsky: Apparently I’ve heard and told the wrong story all these years!

Reluctantly — because I do usually prefer to listen to people when they tell me what they actually said or thought, what with my not being a telepath — I feel obligated to mention that 3 different sources reached out to me to say, ‘No, Elon actually did dislike Demis.’

This puts me in an odd position and I’m not sure what I’ll say going forward. I am really reluctant to contradict people about what they themselves thought, but I also don’t want to represent a mixed state of evidence to the public as if it was a purer state of evidence.

An attempt to portray AGI existential risk as risk of domination. Would such a focus on such details convince people who are not otherwise convinced? My guess is some people do respond to such details, it makes things click, but it is hard to predict which people will respond well to which details.

I’m not going to lie and say it’s good. That doesn’t mean give up.

Alex Trembath: When I tell people I work in environmental policy, the most common response, BY FAR, is to ask me “How fucked are we?”

Kelsey Piper: People say this to me about climate and about AI. Guys, there are lots of serious challenges ahead but we are an inventive, wealthy, ambitious society with lots of brilliant hardworking people and all of our problems are solvable. We’re not doomed, we just have a big to-do list.

One reason I sincerely love Silicon Valley despite its deficits is that it’s the only place where I’ve run into strangers who will listen to a description of a serious problem they haven’t heard of before and go “huh.” [beat.] “What needs doing?”

Everyone who thinks you should obviously do [insane thing] is wrong. That is the easy realization. The hard part is: What is the sane thing?

Francois Fleuret: AGI happens in 3y, where should I invest my money?

Eliezer Yudkowsky: Everyone in the replies is saying “Guns and bullets” and I regret to inform everyone THAT WILL NOT ACTUALLY WORK.

There were a ton of replies to Fleuret. They did not contain original ideas. The most common were things like energy, Microsoft and Nvidia, which are a way to go out while having previously had more dollars to your name.

As many have long suspected about many accelerationists: The positions of Beff Jezos make a lot more sense if he simply does not believe in AGI.

Beff Jezos: ASI is a fairy tale.

Explain to me.

What the fis “ASI”.

FORMALLY.

Seriously. I’ll wait.

Mario Cannistra: Explains a lot.

Of course I’d want to accelerate if I didn’t think superintelligent AI was even possible.

We can safety consider the matter closed, then.

We now know why he named his new company xAI.

Elon Musk: The trend is very strong that any AI company’s name that can be inverted will be inverted.

Technology advances.

AI #70: A Beautiful Sonnet Read More »

childhood-and-education-roundup-#6:-college-edition

Childhood and Education Roundup #6: College Edition

Childhood roundup #5 excluded all developments around college. So this time around is all about issues related to college or graduate school, including admissions.

What went wrong with federal student loans? Exactly what you would expect when you don’t check who is a good credit risk. From a performance perspective, the federal government offered loans to often-unqualified students to attend poor-performing, low-value institutions. Those students then did not earn much and were often unable to repay the loans. The students are victims here too, as we told them to do it.

Alas, none of the proposed student loan solutions involve fixing the underlying issue. If you said ‘we are sorry we pushed these loans on students and rewarded programs and institutions that do not deserve it, and we are going to stop giving loans for those programs and institutions and offer help to the suffering former students, ideally passing some of those costs on to the institutions’ then I would understand that. Instead, our programs are moving dollars mostly to relatively rich people who can afford to pay, and by offering forgiveness we are making the underlying problems far worse rather than better. Completely unacceptable even if it were constitutional.

Colorado governor Jared Polis, who really ought to know better, signs bipartisan bill to make first two years of college free for students whose family income is under $90k/year at in-state public schools. Technically this is 65 credits not counting AP/IB, concurrent enrollment, military credit or credit for prior learning, so there is even more incentive to get such credits.

The good news is they do have a full cliff, this falls off as you approach $90k, so they dodged the full version of quit-your-job insanity. The obvious bad news is that this is effectively one hell of a tax increase.

The less obvious bad news is this is setting up a huge disaster. Think about what the student who actually needs this help will do. They will go to a local college for two years for free. If they do well, they’ll get to 65 credits.

Then the state will say ‘oops, time to pay tuition.’ And what happens now? Quite a lot of them will choose to, or be forced to, leave college and get a job.

This is a disaster for everyone. The benefits of college mostly accrue to those who finish. At least roughly 25% of your wage premium is the pure Sheepskin Effect for getting your degree. If you aren’t going to finish and were a marginal student to begin with (hence the not finishing), you are better off not going, even for free.

I do not think we should be in the business of providing universal free college. There are real costs involved, including the negative externalities involved in accelerating credentialism. However, if we do want to make this offer to help people not drown, we need to at least not stop it halfway across the stream.

The real life version of the college where there degree students who pay for a degree but aren’t allowed to come to class versus the non-degree students who get no degree but are educated for free. To be clear, this is totally awesome.

David Weekly: This seems kinda…radical? ASU makes its courses available to anyone for $25/course. After you take the class, if you want the grade you got added to an official transcript with a credit you can use, +$400. These are real college credits. 8 year olds are getting college credits!

Emmett Shear: This is cool to me because you can see the core of university economics right there. Bundling $25 worth of education with $400 of credentialist gatekeeping. I’m not blaming ASU, it’s cool they’re doing this, but that is deeply broken.

Sudowoodo: Totally understand your comment but this is the best possible instance of a college credit system I’ve seen. One course for $400 equals 120 credits of a degree for $16k (plus the $25 per course), or an additional major for just a few thousand dollars.

Emmett Shear: Right, but that just goes to highlight how absurdly overpriced the credentials are vs the actual education.

James Hulce: I did 70+ credits under this program. During the early years of the pandemic ASU reduced the credit conversion fee to $100 and waived the $25 enrollment fee, so I took a wide variety of courses. Overall very happy with the quality and delivery.

Aside from being virtual, this product is vastly better than the normal one. You get to try out courses for $25 and bail if they are no good. If you struggle, or you get bad grades, you can start over again for another $25 or bail. You are never stuck with a bad grade. Then at the end, after you pay for the credits, it is still a deep discount, an entire degree for $16k.

Of course, this is Arizona State University, so the real product (by reputation) is neither education nor credential. Rather it is the cool parties. This program cannot help you with those. But if you are cool enough and show up, they are also close to free.

The big picture is that trust in academia, like many American institutions, is rapidly collapsing, among essentially all groups.

Here is one theory on (one aspect of) what happened.

Derek Thompson: Why is trust in US institutions—esp colleges—collapsing? Here’s a theory. The 21st c has became the age of the unfocused institution—the age of mission inflation, goal ambiguity, and complex orgs losing any clear sense of priority, or identity.

Odalisk Flower: The university is supposed to solve the perennial question of the American Experiment: How do we get the benefits of an intellectual elite without the drawbacks of a hereditary aristocracy?

What has changed recently is common knowledge that this particular solution has failed.

In fact, it has failed so spectacularly that dissidents are now floating suggestions that perhaps a hereditary aristocracy isn’t so bad after all. For most, this is still outside the Overton window, but it’s wild how fast that window is moving.

I have not noticed rising whispers of the potential wisdom of hereditary aristocracy, indeed neoreaction seems to be fully dead. From where I sit, there is broad recognition that the universities and our other institutions have failed, without any particular suggestion about what plausible replacement would be superior beyond building private local alternatives. My expectations is that the replacement will emerge out of the transformations wrought by AI, whether or not it is an improvement.

Harvard students are highly stressed, despite having made it to Harvard, says Harvard Crimson. I would note that getting mental health counseling is often a function of how and when counseling is provided as much as it is about actual mental health – if we applied today’s standards to 2017 I bet the graph starts substantially higher.

Is this despite, or because, of the very high grades?

Article goes into the usual suspects, overscheduling, lack of social time, social media, hyper-competitiveness and perfectionism. Everyone running between ‘pre-professional’ activities trying to stand out. Harvard, the author says, is now a group of students obsessed with their relative status. Sounds like what would happen if you filter for exactly that type of young person, then put them all in the same place to compete, without the ability to differentiate themselves with grades because everyone who wants one has a 4.0.

Not that everyone in the Ivy league actually has a 4.0. Grade inflation is high, but these percentages of A grades from Yale are still a lot less than 100%, and inflation may have at least temporarily peaked:

The patterns here are clear, such that small surprises stand out and seem meaningful. Are we not appreciating what is happening in psychology? Their studies may not be replicating, but the grades are not either. You have to respect that. Whereas physics seems to have gone rather soft.

What does it say about the students who choose various majors and classes, given this wide distribution of grades? One could say that students going into education studies are smarter because they knew to secure better grades. Or one can say they went that way because they can’t hack it, or did not care to. Or one could say that your 4.0 in education studies means nothing (above getting into Yale in the first place) and everyone will know that.

Obviously we need a meaningful range of grades, otherwise students cannot differentiate themselves based on grades, so they both won’t care about doing well and learning, and they will become obsessed with other signals and status markers.

Ben Golub: This is real and is creeping outside Harvard to most elite private schools grades should be made to matter again, and instructor evaluation practices should be adjusted to give them a free hand to give bad grades!

Orin Kerr: Very interesting essay by Harvard undergrad @aden_barton, arguing that Harvard undergrads don’t spend a lot of time on classes and studying—which he attributes mostly to grade inflation. If grades are compressed around “A”, there isn’t much to study for.

Aden Batron (essay in Harvard Crimson): In the final class, each student was asked to cite their favorite readings, and the professor was surprised that so many chose readings from the first few units. That wasn’t because the students happened to be most interested in those classes’ material; rather, that was the brief period of the course when everyone actually did some of the readings.

Despite having barely engaged with the course material, we all received A’s. I don’t mean to blame the professors for our poor work ethic, but we certainly would have read more had our grades been at risk. At the time, we bemoaned our own lack of effort. By that point in the semester, though, many other commitments had started requiring more of us, so prioritizing curiosity for its own sake became difficult.

And therein lies the second reinforcing effect of grade inflation, which not only fails to punish substandard schoolwork but actively incentivizes it, as students often rely on extracurriculars to get ahead. Amanda Claybaugh, dean of undergraduate education, made this point in a recent New York Times interview, saying that “Students feel the need to distinguish themselves outside the classroom because they are essentially indistinguishable inside the classroom.”

How bad is it? Oh my lord.

Zalman Rothschild: I was a teaching fellow for two classes at Harvard College when I was at HLS. One was taught by an amazing visitor from Dartmouth. He enforced a strict curve. The other was taught by a Harvard prof. He informed us TFs that an A is the default grade. A- would require justification.

Orin Kerr: Jesus H.

Maggie Wittlin: No no, that’s the law school system.

Matt Yglesias says students in college should study more, and we should hold them to actual standards.

Right now, they are doing remarkably little real work.

When you add in-class education, homework, other educational activities and outside work (which I would say largely counts as educational and is often necessary for support), we get 5.1 hours for ‘full time’ college students, or 35.7 hours a week.

Matthew Yglesias: Philip Babcock and Mindy Marks have shown that over the decades, students have been spending less and less time on studying — “full-time students allocated 40 hours per week toward class and studying in 1961, whereas by 2003 they were investing about 27 hours per week.”

I agree with Yglesias that to fix this we would need a dramatic reversal of grading practices. You need willingness to actually punish students who are not getting it done, with actual life consequences on more than the margin, or it won’t work.

Matt Yglesias: The nascent Summers-era crackdown was turning A-s into B+s and B+s into Bs. That generated some whining from students, but ultimately, to restore old-school academic values, schools will need to hand out Cs and Ds that put students at the risk of real negative consequences, like loss of scholarships, getting kicked out of school, or heading into the job market looking like a real fuckup.

And then you get the problem that Hunham confronted: Is this what students and their parents want?

It is indeed not what most parents and students want. Which means we know what product they are mostly buying, and the universities are mostly selling.

And like so many other things these days, there is remarkably little product differentiation. Almost no one is willing to say, this is something different, and we will get those who want that different product, and employers or prospective citizens or what not who want that product can reward that. It is odd to me that this is rare. If all the selective universities are rejecting most applications, so what if 90% of students and parents recoil in horror, so long as the other 10% are excited? Or 98% and 2%?

The killing of Harvard’s Math 55. John Arnold contrasts an ‘06 Crimson article on how hard the course is, with a ‘23 Crimson article showing how it is no longer special. One can reasonably argue that if 70 start, 20 finish and only 10 understand, maybe that is bad actually, but I disagree. I think that math is a place for exactly that, because failure is an option. You want to provide the real thing, and it is fine if the majority can’t hack it and drop out. If we can’t fail here, where can we fail?

Claim that in the wake of their donors pulling out complaining about antisemitism, the price for Ivy league admission via donation has effectively been slashed on the order of 90%, from $20 million to $2 million. That seems clearly below the market clearing or profit maximizing price? The optics of doing large volume on this also seem pretty terrible. Kids whose parents can pay $20 million are someone you want as a peer so you can network, but at $2 million that advantage mostly fades. At some point the damage to the student body adds up. So I’m skeptical.

To what extent are we seeing a shift lowering the value of Ivy league degrees?

Nate Silver: This speaks to the story I wrote earlier this week. Yes, the value of your Ivy League degree is going to be affected if people start to associate your school with political activism instead of academic rigor.

Andrew Ross Sorkin (NYT): Businesses may be unlikely to rush into formally patrolling universities’ policies by adopting either of these theoretical maneuvers, but they might amp up the pressure in some other way through their informal preferences. As Darren Woods, the chief executive of Exxon Mobil, said of campus protests in an interview with CNBC this week: “If that action or those protests reflect the values of the campuses where they’re doing it, we wouldn’t be interested in recruiting students from those campuses.”

John Arnold: Anecdotal, but I’ve had several conversations in recent years with people who hire undergrads for highly competitive jobs (tech, finance, consulting etc) that are moving away from the Ivies and towards flagship state universities, citing better cultural and professional fit.

Now confirmed with data. Forbes surveyed managers with hiring authority. When asked whether more/less likely to hire vs 5 years ago:

Ivy League: 7% more likely; 33% say less likely

Public univs: 42% more likely; 5% less likely

Selective privates: 37% more likely; 5% less likely

I would classify the selective privates at least half with the Ivies, not mostly with the public universities, if I was doing this style of recruitment.

Preston Cooper provides an entry in the genre where you measure the financial ROI of various college degrees given different universities and majors. 31% of degrees were negative ROI, once you factor in time costs and risk of not finishing.

Every time we run this test we get a graph of majors that looks like this:

That then interacts with different colleges, which differ in many ways including completion rates. And of course, if you switch programs based on this information, you do not suddenly get the completion rate (or net life impacts) of the degree you switch to, even if the original study was done fully correctly.

The return on master’s degrees was not so great.

Preston Cooper: What about grad school? It’s complicated.

Med school & law school have huge payoffs.

But nearly half of master’s degree programs leave students in the red.

How much government funding goes to programs with no return? We can answer that thanks to new data.

Programs in the ROI database received $418bn in funding from 2018 to 2022.

Of that, $122bn (29%) flowed to negative-ROI programs.

It would be highly reasonable to tie government funding to program ROI, if we had a good measurement of that, but that is not how our government works.

Here is the data dashboard. In which I learned that my degree and major had negative ROI by this metric, whereas if I had switched majors from Mathematics to Economics like I considered, I would have had a vastly easier job all around and also picked up almost three million dollars (!) in expected value.

I don’t buy the full result there, but if this reflects reality even somewhat, letting me make this mistake and stick with Mathematics, without even a warning, was deeply, deeply irresponsible.

Ideally we would get a more detailed breakdown, but yes.

Derek Thompson: Before the pandemic, new england colleges had more than 2x more applicants than southwestern colleges.

At current trajectories, southwestern college applicants will surpass new england in two years.

Nate Silver: This is pretty interesting in light of yesterday’s post.

There’s an inverse correlation between the left-wingness of the colleges in each region and growth in applications.

A lot of students just want to go to college to drink beer, hook up, go to football games, and emerge with a degree that will give them gainful employment. They far, far outnumber the political activist types. And they’re voting with their feet, it looks like.

Most students care primarily about things other than political activism. The problem for them is that college is a package deal. (Almost?) all the selective colleges have lots of political activism and force you to care deeply about things that are neither fun nor going to be useful to your future or part of getting a traditional education. And at least faking those things is deeply tied to your admission to those schools and to your social life and experience in class and administrative rules set once you arrive.

Colleges are reversing course, and admitting that yes standardized test scores are required for admissions.

It was completely insane to drop this requirement. Doing so only hurt the exact people they claimed to be trying to help. The good news is, while we have a long way to go, we seem to be past peak insanity in such matters.

Nate Silver: The critique that universities are run like for-profit corporations that are mostly concerned about the bottom line is correct. Also, that’s what might save them.

A new way has been found to discriminate.

Steve Miller: UCSD announced a new policy April 9 to exclude students whose parent is college educated and makes over $45,000 from enrolling in computer science or other selective majors, unless spots are available after first generation or low income students enroll.

Nearly 40% of all UC San Diego students are first generation students.

So if you are not a first generation student or low income, it will likely become virtually impossible to enroll in computer science or other selective majors.

This policy applies to students seeking to enroll in selective majors after their initial admission to the university, as the policy linked in the original post specifies.

Separate preferences for first generation students apply in admission.

This likely effectively means that if you are not a first-generation college student (and an in-state student) then you will not be able to transfer to a selective major, no matter your other GPA. Those making these decisions have made their motivations and intentions clear, so go in with your eyes open, both reading the fine print and realizing that they could add more fine print later.

But also, it seems odd that students want to major in computer science, and we are saying no rather than expanding the program? Isn’t that exactly what we want?

Perhaps our children are learning after all. They can solve for the equilibrium.

That was back in 2021. Presumably this number has only gone up since then.

The Hill:

  1. A survey found that 34 percent of white students who applied to colleges falsely claimed they were a racial minority on their application.

  2. Most students, 48 percent, claimed to be Native American on their application.

  3. Seventy-seven percent of white applicants who lied about their race on their application were accepted to those colleges.

According to Intelligent.com Managing Editor Kristen Scatton, the prevalence of applicants who claim Native American ancestry is possibly due to the popular narrative that for many Americans, a small percentage of their DNA comes from a Native American tribe.

It is not clear this is helping the applicants much, whether or not they were caught. Liars got accepted at a 77% clip, but the typical acceptance rate overall is already about 65%, and it is not clear this is ‘accepted at any given college’ rather than at all, and there are various other factors in both directions.

What’s totally crazy is doing the math on this.

  1. About 50% of college applications are from white students.

  2. White students report they lied 34% of the time.

  3. Of those students, 48% pretended to be Native American.

  4. That means that 5.8% of applications are falsely claiming to be Native American.

But the rate of real Native American applications is only about 1%. So that means, even if the other half of applications never lie: If you mark Native American, there is an 85% change you are lying. Meanwhile, several percent of those who lied checked the box for AAPI, which presumably only hurts your chances even if they believe you.

So yes, I doubt checking that box helps you much on its own.

Phil Magness: If you want to genuinely disrupt higher education for the better, impose severe limits on the number of mandatory GenEd classes that students must take. These courses are the lifeblood of hyper-politicized woke departments that otherwise wouldn’t attract many students.

Most students would be better served by starting their majors earlier and taking more classes in skills and subjects related to their degrees. Most GenEds, as currently taught, are complete wastes of time at best, and political indoctrination at worst.

Those same GenEds serve another function though: they create jobs for faculty in otherwise unpopular disciplines. And the depts that have the heaviest presence on the GenEd curriculum (e.g. English) also tend to be the largest departments on campus, despite drawing few majors.

It’s unethical to require economically precarious 18-21 year olds to pay for classes they don’t need just to keep a horde of English, Sociology, and Foreign Language professors employed.

I had a highly extensive set of general required courses I had to take, something like 40 credits. You could make a reasonable case for the 16 that were reading the ‘Great Works’ of literature and philosophy. There wasn’t a problem with wokeness back then (the closest thing was when I sort of tried to cancel The Symposium for all the praising of child rape, and got told to STFU about it and come to class or else), but still the rest was pointless, a waste of time taking up almost a full year of coursework.

Phil Magness notes that students could instead start their majors. That implies that when you arrive on campus, you should know what major is right for you.

That is another issue with all the required classes. There is little room for exploration, most of those slots are already spoken for. If I had wanted to switch majors to something other than Mathematics, I had almost no opportunities to sample alternatives in time to do this. Realistically I could have probably made it to Physics or Economics, and that’s about it.

Which majors are most often regretted? Humanities.

Jacob Shell: What they don’t tell you in high school or college advisor offices is some of these are “winner takes all” majors and others aren’t. The comp sci normie is making a nice living right now, but the physics major is a sunlight-deprived lab tech for 30 years in a row.

I would have thought the physics majors were mostly not now doing physics? It still makes sense that regret rates are high. Math majors are mostly not doing math all day anymore, but they seem fine with it. As a math major myself, I am an exception, and I do regret it, although perhaps the signaling value made it worthwhile after all.

Here is a different survey that asks the same question. Will you regret that major? This time the answer is, probably.

Regret is an imprecise measure, but these are not small differences.

Thread of what Basil Halperin learned in graduate school. Increasing returns to effort for specialization in terms of skills, whether that translates to world improvement or pay is another question. Nothing here made me think anyone should go to grad school.

Then again, do you go there for the learning?

Here is Bryan Caplan on when to get which Econ PhD. The algorithm is essentially:

  1. Only get an economics PhD at all if you want a job that needs it, such as an economics professor.

  2. If you can get into a top-25 Econ program and endure the pain, go there instead.

  3. If you can’t do either or both of those, you can go with GMU.

  4. When to get a Masters? When you drop out before finishing your PhD.

  5. In any case, if you want this, apply to at least 15 schools, process is super random.

It is no surprise given his other opinions that Bryan Caplan’s answer to that question is a very sharp no. In Caplan’s model the purpose of graduate school is to get a job that won’t hire you without one. That is it.

I think he’s right.

Nate Silver offers related Good Advice.

Nate Silver: Real, non-trollish life advice:

If you’re a smart young person and you really want to go to graduate school, then by all means go. But if you’re on the fence, probably don’t. That’s not where the action is. And it’s not where the action is going to be for the foreseeable future.

The specific exception is if you go to graduate school with the intention of being a sleeper agent to improve academic (or government/broader nonprofit research) culture. That is potentially quite valuable for society (though it won’t necessarily be lucrative for you personally).

I do not believe you when you say you are going to be a Sleeper Agent. I expect you to either get worn down and be a normal academic, or to run away screaming in horror at some point, because man does that all sound miserable. It is a noble thing to do, of course, to be the change you want to see and fight for it, if you can.

It emphasizes that my basic advice here would be that going to graduate school is something you should only do with a very specific purpose, and generally only if you can attend an elite institution. Do not go because you have nothing better to do. Have a specific career path in mind, that either does not face or justifies the long odds usually against such paths. Know what you want to learn, and want to prove.

Or, ideally, if you possibly can, go do something else instead.

What is academia for, then? Presumably something else.

Aella: It’s insane how much academia is not about figuring stuff out. The current state of academia is not what it would look like if we went “hey I wanna figure out the truth behind a thing.”

Fred Scharmen (QTing): “Hey I wanna figure out the truth behind a thing” is like what an elementary schooler thinks that grad students are supposed to be doing. I hope this person grows up eventually.

Hazard: Good example of the general vibes and tactics used to haze people into fucked up social orders and institutions without ever having to defend them.

You just mock people who don’t know the scam is a scam.

I’ve written about this before.

It’s a load bearing tactic for maintaining normalization of deviance.

It is worse than that.

I get mocking someone for actually being confused here. One should not do even that. But yeah, if someone with experience straight up said ‘I am shocked, shocked to fund that things other than searching for truth are going on in here, how can that be, I am so confused’ then then mockers gonna mock.

This is not someone saying ‘I do not understand why someone is slurring their words in this cafe’ in a world where the cafes were called cafes but were actually bars. This is ‘it really is insane the amount of hard drinking going on in all the cafes, did you notice how rare it is for anyone to get a coffee anymore, they are actually bars’ and someone mocking you, saying ‘coffee is what an elementary schooler thinks people drink at cafes.’ And then everyone went back to pretending cafes only served coffee.

As Dilan Esper and Andrew Rettek note here, the right thing on free speech is to defend everyone’s right to speak. It is in the context of very much not doing this in other contexts, treating a wide variety of far less harmful speech as ‘violence,’ that this sudden claim of realizing of one’s principles in this one case rung hollow. No one is pretending this is a new set of general pro-speech principles to be universally applied.

As Jill Filipovic and Jonathan Haidt each note, it would be great if universities used the recent protest moment to realize they their systematic error, and broadly once again embrace free speech the way they used to do.

This is the letter the ACLU sent out in 1978 after they defended the right of actual Nazis to march in order to defend free speech for all of us.

You have to let them talk. This is America, man. Or at least, it used to be.

Alas, I am not holding my breath for such an outcome.

If it does happen, Charles Murray has kindly offered to allow the presidents to prove their devotion to free speech by letting him host a talk.

We are not letting them talk. FIRE found that 3% of current college students have been punished for speech, which translates to 5% over four years, which is enough for a hell of a chilling effect especially given how risk averse college students are now.

Jill Filipovic urges us all to rise to the standard of the old ACLU, no matter what others have done, and stand firm for free speech even asymmetrically. Do not call, she says, for more restrictions in the name of even-handedness. That is a tough sell. It is also not obvious which path leads to more free speech. Si vis pacem, para bellum?

Larry Summers points out that Harvard’s multiple Antisemitism Taskforces, which are accomplishing nothing, are the wrong approach, an alternative to both moral leadership and standing up strongly for free speech. Instead, Harvard continues to allow official support antisemitic positions without allowing the voicing of pro-Israel positions.

Paul Graham links to Richard Florida, a professor at the University of Toronto, who says people in academia now feel more space to speak their minds after recent events.

Here are some examples of other cases where free speech could have been stood up for, and universities chose a rather different path.

Harvard declares it is now mission first. It will no longer make ‘official statements about public matters that do not directly affect the university’s core function.’ I put up a prediction market on whether they stick to it. Good luck, Harvard!

What is Harvard’s mission? Harvard.

Nate Silver: Notable exceptions to free speech:

Incitement

Defamation

Criticizing Harvard

Lawrence Bobo (Dean of Social Sciences, Harvard): A faculty member’s right to free speech does not amount to a blank check to engage in behaviors that plainly incite external actors – be it the media, alumni, donors, federal agencies, or the government to intervene in Harvard’s affairs.

Lawrence Summers: It takes something extraordinary to bring me into agreement with Israel demonizing faculty like Walter Johnson. That is what Harvard Dean Lawrence Bobo has done with his call for punishing faculty who publicly challenge university decisions.

I cannot understand why his boss Dean Hopi Hoekstra has not condemned the idea. Nor can I understand how someone who believes faculty who believes in punishing dissent can be allowed to set faculty salaries, decide on promotions or be involved in faculty discipline.

How can it be according to Harvard leaders that it is fine to call for an end to Israel as a Jewish state but not to criticize the University administration?

Students from the University of Waterloo computer science programs have been enjoying oversized success, despite it being a relatively young university founded in 1957. Henry Dashwood looks at what makes Waterloo different. They have a five year program that does not break for the summer, the culture focuses on working on projects rather than partying or sports, they have a startup accelerator on campus, and despite having a lot of CS students they are very selective (claimed 4% acceptance rate).

So it is exactly the story one would expect based on what startup culture says. Focus on building things, cut out everything else.

I am curious if that model will long survive moves like this, although I appreciate that they have a distinct department for pure mathematics:

Chris Brunet: The Department of Pure Mathematics at @UWaterloo is hiring a math professor.

”Eligible candidates for this search must self-identify as women, transgender, gender-fluid, nonbinary and Two-Spirit people.”

Waterloo’s Faculty of Engineering is also hiring an engineering professor.

”Eligible candidates are required to identify as a woman or gender minority, which is defined to include individuals who self-identify as women, transgender, gender-fluid, non-binary, or twospirited.”

Also, 2 professors of computer science.

Joshua Rauh nots that his training on DEI included an example of where someone saying ‘DEI has gone too far’ is the first sign of prejudice and on the job discrimination.

Alex Tabarrok in response: DEI has gone too far.

Indiana signs a bill introducing ‘intellectual diversity’ as a standard for tenure decisions. Tyler Cowen suggests it will backfire, that observance will be addressed via technical box-checking, and that universities could retaliate by not hiring any actual conservatives (even more than they already do) at all for fear they would be forced to grant such people tenure later. It is extremely difficult to get a bunch of academics who want it to be one way, with only left-wing (or often only far-left-wing) viewpoints welcome in academia, to agree to have it be the other way via a law. Tyler does not lay out what he would do instead. I can think of ways to do it, but they involve big guns.

Wisconsin’s universities initially voted down a compromise to get rid of some DEI positions in exchange for funding for raises and new buildings, but they came around.

Washington Post Editorial Board comes out against DEI statements in hiring.

WaPo Editorial Board: The last thing academia — or the country — needs is another incentive for people to be insincere or dishonest. The very purpose of the university is to encourage a free exchange of ideas, seek the truth wherever it may lead, and to elevate intellectual curiosity and openness among both faculty and students. Whatever their original intent, the use of DEI statements has too often resulted in self-censorship and ideological policing.

Here is what they are opposing.

Paul Graham: People in the sciences thought they could ignore the fools over in the humanities and just focus on their research. But now the fools’ ideology is colonizing the sciences.

John Sailer: NEW: Yale University’s department of molecular biophysics and biochemistry requires all job applicants to submit a DEI statement.

Here’s the evaluation rubric, which shows the exhaustive DEI criteria for assessing any scientist hoping to work in the Yale department.

Here is the full post from The Free Press.

When making hires at Yale’s department of molecular biophysics and biochemistry, faculty are told to place “DEI at the center of every decision,” according to a document tucked away on its website

To what extent does that mean an applicant’s DEI score impacts their chance of being hired? If you have a 12 versus an 8 versus a 0, what happens? One cannot be sure. It is compatible with ‘anyone under 11 need not apply’ and also with ‘no one actually cares.’

How easy is a high score? My guess is you can get to about a 7 (3/2/1/1) with a willingness to bullshit and use ChatGPT. Higher than that likely requires either lying or being willing to spend (and commit to spending) substantial amounts of time.

What about Columbia? How much do they care? What do they want?

John Sailer: NEW: For hiring new professors, Columbia University recommends valuing “contributions to DEI” on par with “research.”

The sample evaluation tool also weighs DEI more highly than teaching.

That’s an especially wild default given how Columbia defines “contributions to DEI”

Columbia provides an in-depth rubric for assessing DEI credentials. Which, of course, is pretty important if DEI might carry the same weight as research. Take a look. The rubric gives a low score to candidates who are skeptical of racially-segregated “affinity groups.”

You can feel the attitude coming off these rubrics.

This looks like a substantially tougher test to handle if you mainly care about your subject or are trying to muddle through without a huge time sink or ethical compromise. They mean business.

Given how numerical scores usually work, you do not have much margin for error. Getting a 15 here, if you are willing to do what it takes and spend the time, is easy, and probably so is getting a 9-10 in ‘service’ and that is probably highly linked. I doubt they have that high a bar to get to 8+ on teaching, and a 10 might be pretty easy there too. That does not leave much room to make up points, which has to be done with research. And a third of that is ‘curricular fit’ so those who are gaming the system are going to get full credit there too, while plans are pretty easy to fake.

Your entire actual ‘research track record’ is only worth five points. So yeah, if you are not heavy DEI for real, good luck. You’re not going to make it here.

Harvard’s Faculty of Arts and Sciences eliminated the requirement for DEI statements in hiring (source). Instead they are asked to submit a ‘service statement,’ which can include DEI if you want that. As an applicant, you now must ask: Do you think the requirement went away, or that they are testing you to see if you realize that it didn’t?

One must ask, what exactly did Sally Kornbluth believe before?

John Sailer: BREAKING: A university spokesperson has officially confirmed to me that MIT will no longer use diversity statements in faculty hiring—making it the first elite private institution to backtrack on the controversial policy.

As recently as late 2023, MIT required prospective nuclear scientists to submit “a statement regarding their views on diversity, inclusion, and belonging.” No longer. In a statement provided to me by MIT, Sally Kornbluth said these statements “impinge on freedom of expression, and they don’t work.”

Was she unable to get rid of the statements until now?

Did she think they both worked and that they didn’t impinge on freedom of expression? I can see one thinking that perhaps they work. I can’t see how one can claim they don’t impinge on freedom of expression. You either care about that, or you don’t. So, revealed preferences on priorities, then?

NYU opening a new campus in… Tulsa? Seems like an excellent source of diversity.

Childhood and Education Roundup #6: College Edition Read More »