Author name: Mike M.

google-accused-of-secretly-tracking-drivers-with-disabilities

Google accused of secretly tracking drivers with disabilities

Google accused of secretly tracking drivers with disabilities

Google needs to pump the brakes when it comes to tracking sensitive information shared with DMV sites, a new lawsuit suggests.

Filing a proposed class-action suit in California, Katherine Wilson has accused Google of using Google Analytics and DoubleClick trackers on the California DMV site to unlawfully obtain information about her personal disability without her consent.

This, Wilson argued, violated the Driver’s Privacy Protection Act (DPPA), as well as the California Invasion of Privacy Act (CIPA), and impacted perhaps millions of drivers who had no way of knowing Google was collecting sensitive information shared only for DMV purposes.

“Google uses the personal information it obtains from motor vehicle records to create profiles, categorize individuals, and derive information about them to sell its customers the ability to create targeted marketing and advertising,” Wilson alleged.

According to Wilson, California’s DMV “encourages” drivers “to use its website rather than visiting one of the DMV’s physical locations” without telling drivers that Google has trackers all over its site.

Likely due to promoting the website’s convenience, the DMV reported a record number of online transactions in 2020, Wilson’s complaint said. And people with disabilities have taken advantage of that convenience. In 2023, approximately “40 percent of the 1.6 million disability parking placard renewals occurred online.”

Wilson last visited the DMV site last summer when she was renewing her disability parking placard online. At that time, she did not know that Google obtained her personal information when she filled out her application, communicated directly with the DMV, searched on the site, or clicked on various URLs, all of which she said revealed that either she had a disability or believed she had a disability.

Her complaint alleged that Google secretly gathers information about the contents of the DMV’s online users’ searches, logging sensitive keywords like “teens,” “disabled drivers,” and any “inquiries regarding disabilities.”

Google “knowingly” obtained this information, Wilson alleged, to quietly expand user profiles for ad targeting, “intentionally” disregarding DMV website users’ “reasonable expectation of privacy.”

“Google then uses the personal information and data to generate revenue from the advertising and marketing services that Google sells to businesses and individuals,” Wilson’s complaint alleged. “That Plaintiff and Class Members would not have consented to Google obtaining their personal information or learning the contents of their communications with the DMV is not surprising.”

Congressman James P. Moran, who sponsored the DPPA in 1994, made it clear that the law was enacted specifically to keep marketers from taking advantage of computers making it easy to “pull up a person’s DMV record” with the “click of a button.”

Even back then, some people were instantly concerned about any potential “invasion of privacy,” Moran said, noting that “if you review the way in which people are classified by direct marketers based on DMV information, you can see why some individuals might object to their personal information being sold.”

Google accused of secretly tracking drivers with disabilities Read More »

ifixit-ends-samsung-deal-as-oppressive-repair-shop-requirements-come-to-light

iFixit ends Samsung deal as oppressive repair shop requirements come to light

Samsung has no follow-through? Shocking —

iFixit says “flashy press releases don’t mean much without follow-through.”

iFixit ends Samsung deal as oppressive repair shop requirements come to light

IFixit and Samsung were once leading the charge in device repair, but iFixit says it’s ending its repair partnership with Samsung because it feels Samsung just isn’t participating in good faith. iFixit says the two companies “have not been able to deliver” on the promise of a viable repair ecosystem, so it would rather shut the project down than continue. The repair site says “flashy press releases and ambitious initiatives don’t mean much without follow-through.”

iFixit’s Scott Head explains: “As we tried to build this ecosystem we consistently faced obstacles that made us doubt Samsung’s commitment to making repair more accessible. We couldn’t get parts to local repair shops at prices and quantities that made business sense. The part prices were so costly that many consumers opted to replace their devices rather than repair them. And the design of Samsung’s Galaxy devices remained frustratingly glued together, forcing us to sell batteries and screens in pre-glued bundles that increased the cost.”

  • Samsung’s screen replacement parts usually require buying the display, battery, phone frame, and buttons, which is a big waste.

    iFixit

A good example of Samsung’s parts bundling is this Galaxy S22 Ultra “screen” part for $233. The screen is the most common part to break, but rather than just sell a screen, Samsung makes you buy the screen, a new phone frame, a battery, and new side buttons and switches. As we said when this was announced, that’s like half of the total parts in an entire phone. This isn’t a perfect metric, but the Samsung/iFixit parts store only offers three parts for the S22 Ultra, while the Pixel 8 Pro store has 10 parts, and the iPhone 14 Pro Max store has 23 parts.

Even with Samsung’s part-bundling, though, iFixit’s complaint of high prices doesn’t seem reflected in the store pricing. The Pixel 8 Pro screen + fingerprint reader, without a case, battery, and buttons, is $230. An iPhone 14 Pro Max screen is $395. (There is a good chance Samsung is the manufacturer of all three of these displays.)

Samsung and iFixit have always had a rocky relationship. In 2017, the two companies were supposed to partner up for an “upcycling” program, where Samsung found new uses for old phones. The original plan included things like unlocking the bootloader of old devices, so Samsung’s OS could be completely replaced, and hosting an open source marketplace where users could submit ideas and software for repurposing old Galaxy devices. In what now seems like a familiar strategy, Samsung was more concerned about appearances than being actually useful, and iFixit said the upcycling program that launched in 2021 was “nearly unrecognizable” to what iFixit originally endorsed and lent its logo to in 2017.

In 2019, following the “embarrassing” delayed launch of the Galaxy Fold 1 due to durability reasons, Samsung attacked iFixit for doing a teardown of the flawed device. Samsung forced iFixit to take down an article explaining some of the flaws of the device. Samsung didn’t have any legal capability to do this, but it apparently threatened one of iFixit’s part suppliers if the article didn’t get pulled.

Samsung has also reportedly been on the attack against repair, even while it partners with iFixit. On the same day that iFixit announced it was dropping the partnership, 404 Media reported that Samsung requires independent repair shops to turn over customer data and “immediately disassemble” any device found to be using third-party parts. Imagine taking your phone to a shop for repair and finding out it was destroyed by the shop as a requirement from Samsung. The report also says Samsung’s contracts require that independent companies “daily” upload to a Samsung database (called G-SPN) the details of each and every repair “at the time of each repair.”

With the latest chapter of the partnership store dying after just two years, in June 2024, iFixit says some changes are coming to its website. It won’t remove any information, but it will start offering clearly labeled third-party parts in addition to whatever Samsung OEM parts it can source. It will no longer collaborate with Samsung for manuals and won’t need to follow Samsung’s quantity limit requirements.

iFixit ends Samsung deal as oppressive repair shop requirements come to light Read More »

dinosaurs-needed-to-be-cold-enough-that-being-warm-blooded-mattered

Dinosaurs needed to be cold enough that being warm-blooded mattered

Some like it less hot —

Two groups of dinosaurs moved to cooler climes during a period of climate change.

Image of a feathered dinosaur against a white background.

Enlarge / Later theropods had multiple adaptations to varied temperatures.

Dinosaurs were once assumed to have been ectothermic, or cold-blooded, an idea that makes sense given that they were reptiles. While scientists had previously discovered evidence of dinosaur species that were warm-blooded, though what could have triggered this adaptation remained unknown. A team of researchers now think that dinosaurs that already had some cold tolerance evolved endothermy, or warm-bloodedness, to adapt when they migrated to regions with cooler temperatures. They also think they’ve found a possible reason for the trek.

Using the Mesozoic fossil record, evolutionary trees, climate models, and geography, plus factoring in a drastic climate change event that caused global warming, the team found that theropods (predators and bird ancestors such as velociraptor and T. rex) and ornithischians (such as triceratops and stegosaurus) must have made their way to colder regions during the Early Jurassic. Lower temperatures are thought to have selected for species that were partly adapted to endothermy.

“The early invasion of cool niches… [suggests] an early attainment of homeothermic (possibly endothermic) physiology in [certain species], enabling them to colonize and persist in even extreme latitudes since the Early Jurassic,” the researchers said in a study recently published in Current Biology.

Hot real estate

During the Mesozoic Era, which lasted from 230 to 66 million years ago, proto-dinosaurs known as dinosauromorphs began to diversify in hot and dry climates. Early sauropods, ornithischians, and theropods all tended to stay in these regions.

Sauropods (such as brontosaurus and diplodocus) would become the only dinosaur groups to bask in the heat—the fossil record shows that sauropods tended to stay in warmer areas, even if there was less food. This suggests the need for sunlight and heat associated with ectothermy. They might have been capable of surviving in colder temperatures but not adapted enough to make it for long, according to one hypothesis.

It’s also possible that living in cooler areas meant too much competition with other types of dinosaurs, as the theropods and ornithiscians did end up moving into these cooler areas.

Almost apocalypse

Beyond the ecological opportunities that may have drawn dinosaurs to the cooler territories, it’s possible they were driven away from the warm ones. Around 183 million years ago, there was a perturbation in the carbon cycle, along with extreme volcanism that belched out massive amounts of methane, sulfur dioxide, and mercury. Life on Earth suffered through scorching heat, acid rain, and wildfires. Known as the Early Jurassic Jenkyns Event, the researchers now think that these disruptions pushed theropod and ornithischian dinosaurs to cooler climates because temperatures in warmer zones went above the optimal temperatures for their survival.

The theropods and ornithischians that escaped the effects of the Jenkyns event may have had a key adaptation to cooler climes; many dinosaurs from these groups are now thought to have been feathered. Feathers can be used to both trap and release heat, which would have allowed feathered dinosaurs to regulate their body temperature in more diverse climates. Modern birds use their feathers the same way.

Dinosaur species with feathers or special structures that improved heat management could have been homeothermic, which means they would have been able to maintain their body temperature with metabolic activity or even endothermic.

Beyond the dinosaurs that migrated to high latitudes and adapted to a drop in temperature, endothermy might have led to the rise of new species and lineages of dinosaurs. It could have contributed to the rise of Avialae, the clade that includes birds—the only actual dinosaurs still around—and traces all the way back to their earliest ancestors.

“[Our findings] provide novel insights into the origin of avian endothermy, suggesting that this evolutionary trajectory within theropods… likely started in the latest Early Jurassic,” the researchers said in the same study.

That really is something to think about next time a sparrow flies by.

Current Biology, 2024.  DOI: 10.1016/j.cub.2024.04.051

Dinosaurs needed to be cold enough that being warm-blooded mattered Read More »

crooks-plant-backdoor-in-software-used-by-courtrooms-around-the-world

Crooks plant backdoor in software used by courtrooms around the world

DISORDER IN THE COURT —

It’s unclear how the malicious version of JAVS Viewer came to be.

Crooks plant backdoor in software used by courtrooms around the world

JAVS

A software maker serving more than 10,000 courtrooms throughout the world hosted an application update containing a hidden backdoor that maintained persistent communication with a malicious website, researchers reported Thursday, in the latest episode of a supply-chain attack.

The software, known as the JAVS Viewer 8, is a component of the JAVS Suite 8, an application package courtrooms use to record, play back, and manage audio and video from proceedings. Its maker, Louisville, Kentucky-based Justice AV Solutions, says its products are used in more than 10,000 courtrooms throughout the US and 11 other countries. The company has been in business for 35 years.

JAVS Viewer users at high risk

Researchers from security firm Rapid7 reported that a version of the JAVS Viewer 8 available for download on javs.com contained a backdoor that gave an unknown threat actor persistent access to infected devices. The malicious download, planted inside an executable file that installs the JAVS Viewer version 8.3.7, was available no later than April 1, when a post on X (formerly Twitter) reported it. It’s unclear when the backdoored version was removed from the company’s download page. JAVS representatives didn’t immediately respond to questions sent by email.

“Users who have version 8.3.7 of the JAVS Viewer executable installed are at high risk and should take immediate action,” Rapid7 researchers Ipek Solak, Thomas Elkins, Evan McCann, Matthew Smith, Jake McMahon, Tyler McGraw, Ryan Emmons, Stephen Fewer, and John Fenninger wrote. “This version contains a backdoored installer that allows attackers to gain full control of affected systems.”

The installer file was titled JAVS Viewer Setup 8.3.7.250-1.exe. When executed, it copied the binary file fffmpeg.exe to the file path C:Program Files (x86)JAVSViewer 8. To bypass security warnings, the installer was digitally signed, but with a signature issued to an entity called “Vanguard Tech Limited” rather than to “Justice AV Solutions Inc.,” the signing entity used to authenticate legitimate JAVS software.

fffmpeg.exe, in turn, used Windows Sockets and WinHTTP to establish communications with a command-and-control server. Once successfully connected, fffmpeg.exe sent the server passwords harvested from browsers and data about the compromised host, including hostname, operating system details, processor architecture, program working directory, and the user name.

The researchers said fffmpeg.exe also downloaded the file chrome_installer.exe from the IP address 45.120.177.178. chrome_installer.exe went on to execute a binary and several Python scripts that were responsible for stealing the passwords saved in browsers. fffmpeg.exe is associated with a known malware family called GateDoor/Rustdoor. The exe file was already flagged by 30 endpoint protection engines.

A screenshot from VirusTotal showing detections from 30 endpoint protection engines.

Enlarge / A screenshot from VirusTotal showing detections from 30 endpoint protection engines.

Rapid7

The number of detections had grown to 38 at the time this post went live.

The researchers warned that the process of disinfecting infected devices will require care. They wrote:

To remediate this issue, affected users should:

  • Reimage any endpoints where JAVS Viewer 8.3.7 was installed. Simply uninstalling the software is insufficient, as attackers may have implanted additional backdoors or malware. Re-imaging provides a clean slate.
  • Reset credentials for any accounts that were logged into affected endpoints. This includes local accounts on the endpoint itself as well as any remote accounts accessed during the period when JAVS Viewer 8.3.7 was installed. Attackers may have stolen credentials from compromised systems.
  • Reset credentials used in web browsers on affected endpoints. Browser sessions may have been hijacked to steal cookies, stored passwords, or other sensitive information.
  • Install the latest version of JAVS Viewer (8.3.8 or higher) after re-imaging affected systems. The new version does not contain the backdoor present in 8.3.7.

Completely re-imaging affected endpoints and resetting associated credentials is critical to ensure attackers have not persisted through backdoors or stolen credentials. All organizations running JAVS Viewer 8.3.7 should take these steps immediately to address the compromise.

The Rapid7 post included a statement from JAVS that confirmed that the installer for version 8.3.7 of the JAVS viewer was malicious.

“We pulled all versions of Viewer 8.3.7 from the JAVS website, reset all passwords, and conducted a full internal audit of all JAVS systems,” the statement read. “We confirmed all currently available files on the JAVS.com website are genuine and malware-free. We further verified that no JAVS Source code, certificates, systems, or other software releases were compromised in this incident.”

The statement didn’t explain how the installer became available for download on its site. It also didn’t say if the company retained an outside firm to investigate.

The incident is the latest example of a supply-chain attack, a technique that tampers with a legitimate service or piece of software with the aim of infecting all downstream users. These sorts of attacks are usually carried out by first hacking the provider of the service or software. There’s no sure way to prevent falling victim to supply-chain attacks, but one potentially useful measure is to vet a file using VirusTotal before executing it. That advice would have served JAVS users well.

Crooks plant backdoor in software used by courtrooms around the world Read More »

biggest-windows-11-update-in-2-years-nearly-finalized,-enters-release-preview

Biggest Windows 11 update in 2 years nearly finalized, enters Release Preview

getting there —

24H2 update includes big changes, will be released “later this calendar year.”

Biggest Windows 11 update in 2 years nearly finalized, enters Release Preview

Microsoft

The Windows 11 24H2 update isn’t scheduled to be released until sometime this fall, but testers can get a near-final version of it early. Microsoft has released Windows 11 24H2 build 26100.712 to its Release Preview testing channel for Windows Insiders, a sign that the update is nearly complete and that the company has shifted into bug-fixing mode ahead of general availability.

Microsoft has generally stuck to smaller but more frequent feature updates during the Windows 11 era, but the annual fall updates still tend to be a bigger deal. They’re the ones that determine whether you’re still eligible for security updates, and they often (but not always) come with more significant under-the-hood changes than the normal feature drops.

Case in point: Windows 11 24H2 includes an updated compiler, kernel, and scheduler, all lower-level system changes made at least in part to better support Arm-based PCs. Existing Windows-on-Arm systems should also see a 10 or 20 percent performance boost when using x86 applications, thanks to improvements in the translation layer (which Microsoft is now calling Prism).

There are more user-visible changes, too. 24H2 includes Sudo for Windows, the ability to create TAR and 7-zip archives from the File Explorer, Wi-Fi 7 support, a new “energy saver” mode, and better support for Bluetooth Low Energy Audio. It also allows users to run the Copilot AI chatbot in a regular resizable window that can be pinned to the taskbar instead of always giving it a dedicated strip of screen space.

Other new Windows features are tied to the 24H2 update but will only be available on Copilot+ PCs, which have their own specific system requirements: 16 GB of memory, 256 GB of storage, and a neural processing unit (NPU) capable of at least 40 trillion operations per second (TOPS). As of right now, the only chips that fit the bill are Qualcomm’s Snapdragon X Plus and X Elite processors, though Intel and AMD systems with faster NPUs should be released later this year. Microsoft will maintain a separate list of processors that support the Copilot+ features.

The biggest 24H2 feature specific to Copilot+ PCs is Recall, which continually takes snapshots of everything you do with your PC so that you can look up your own activities later. This comes with obvious privacy and security risks, though Microsoft says that all of Recall’s data is encrypted on disk and processed entirely locally by the NPU rather than leveraging the cloud. Other Copilot+ features include Live Captions for captioning video files or video calls in real time and features for generating new images and enhancing existing images.

Collectively, all of these changes make 24H2 the most significant Windows 11 release since the 22H2 update came out a year and a half ago. 22H2 has served as the foundation for most new Windows features since then, including the Copilot chatbot, and 23H2 was mostly just a version number change released to reset the clock on Microsoft’s security update timeline.

Despite all of these changes and additions, the 24H2 update is still called Windows 11, still looks like Windows 11, and doesn’t change Windows 11’s official minimum system requirements. Unsupported installs will stop working on a few generations’ worth of older 64-bit x86 CPUs, though these chips are old and slow enough that they wouldn’t run Windows 11 particularly well in the first place.

For people who want to start fresh, ISO files of the release are available from Microsoft’s download page here (this is a slightly older build of the OS, 26100.560, but it should update to the current version with no issues after installation). You can update a current Windows 11 install from the Insider section in the Settings app. Microsoft says to expect the full release “later this calendar year.” Based on past precedent, it’s most likely to come out in the fall, but it will probably ship a bit early on the first wave of Copilot+ Arm PCs that will be available in mid-June.

Biggest Windows 11 update in 2 years nearly finalized, enters Release Preview Read More »

the-schumer-report-on-ai-(rtfb)

The Schumer Report on AI (RTFB)

Or at least, Read the Report (RTFR).

There is no substitute. This is not strictly a bill, but it is important.

The introduction kicks off balancing upside and avoiding downside, utility and risk. This will be a common theme, with a very strong ‘why not both?’ vibe.

Early in the 118th Congress, we were brought together by a shared recognition of the profound changes artificial intelligence (AI) could bring to our world: AI’s capacity to revolutionize the realms of science, medicine, agriculture, and beyond; the exceptional benefits that a flourishing AI ecosystem could offer our economy and our productivity; and AI’s ability to radically alter human capacity and knowledge.

At the same time, we each recognized the potential risks AI could present, including altering our workforce in the short-term and long-term, raising questions about the application of existing laws in an AI-enabled world, changing the dynamics of our national security, and raising the threat of potential doomsday scenarios. This led to the formation of our Bipartisan Senate AI Working Group (“AI Working Group”).

They did their work over nine forums.

  1. Inaugural Forum

  2. Supporting U.S. Innovation in AI

  3. AI and the Workforce

  4. High Impact Uses of AI

  5. Elections and Democracy

  6. Privacy and Liability

  7. Transparency, Explainability, Intellectual Property, and Copyright

  8. Safeguarding Against AI Risks

  9. National Security

Existential risks were always given relatively minor time, with it being a topic for at most a subset of the final two forums. By contrast, mundane downsides and upsides were each given three full forums. This report was about response to AI across a broad spectrum.

They lead with a proposal to spend ‘at least’ $32 billion a year on ‘AI innovation.’

No, there is no plan on how to pay for that.

In this case I do not think one is needed. I would expect any reasonable implementation of that to pay for itself via economic growth. The downsides are tail risks and mundane harms, but I wouldn’t worry about the budget. If anything, AI’s arrival is a reason to be very not freaked out about the budget. Official projections are baking in almost no economic growth or productivity impacts.

They ask that this money be allocated via a method called emergency appropriations. This is part of our government’s longstanding way of using the word ‘emergency.’

We are going to have to get used to this when it comes to AI.

Events in AI are going to be happening well beyond the ‘non-emergency’ speed of our government and especially of Congress, both opportunities and risks.

We will have opportunities that appear and compound quickly, projects that need our support. We will have stupid laws and rules, both that were already stupid or are rendered stupid, that need to be fixed.

Risks and threats, not only catastrophic or existential risks but also mundane risks and enemy actions, will arise far faster than our process can pass laws, draft regulatory rules with extended comment periods and follow all of our procedures.

In this case? It is May. The fiscal year starts in October. I want to say, hold your damn horses. But also, you think Congress is passing a budget this year? We will be lucky to get a continuing resolution. Permanent emergency. Sigh.

What matters more is, what do they propose to do with all this money?

A lot of things. And it does not say how much money is going where. If I was going to ask for a long list of things that adds up to $32 billion, I would say which things were costing how much money. But hey. Instead, it looks like he took the number from NSCAI, and then created a laundry list of things he wanted, without bothering to create a budget of any kind?

It also seems like they took the original recommendation of $8 billion in Fiscal Year 24, $16 billion in FY 25 ad $32 billion in FY 26, and turned it into $32 billion in emergency funding now? See the appendix. Then again, by that pattern, we’d be spending a trillion in FY 31. I can’t say for sure that we shouldn’t.

Starting with the top priority:

  1. An all government ‘AI-ready’ initiative.

  2. ‘Responsible innovation’ R&D work in fundamental and applied sciences.

  3. R&D work in ‘Foundational trustworthy AI topics, such as transparency, explainability, privacy, interoperability, and security.’

Or:

  1. Government AI adoption for mundane utility.

  2. AI for helping scientific research.

  3. AI safety in the general sense, both mundane and existential.

Great. Love it. What’s next?

  1. Funding the CHIPS and Science Act accounts not yet fully funded.

My current understanding is this is allocation of existing CHIPS act money. Okie dokie.

  1. Funding ‘as needed’ (oh no) for semiconductor R&D for the design and manufacture of high-end AI chips, through co-design of AI software and hardware, and developing new techniques for semiconductor fabrication that can be implemented domestically.

More additional CHIPS act funding, perhaps unlimited? Pork for Intel? I don’t think the government is going to be doing any of this research, if it is then ‘money gone.’

  1. Pass the Create AI Act (S. 2714) and expand programs like NAIRR to ‘ensure all 50 states are able to participate in the research ecosystem.’

More pork, then? I skimmed the bill. Very light on details. Basically, we should spend some money on some resources to help with AI research and it should include all the good vibes words we can come up with. I know what ‘all 50 states’ means. Okie dokie.

  1. Funding for a series of ‘AI Grand Challenge’ programs, such as described in Section 202 of the Future of AI Innovation Act (S. 4178) and the AI Grand Challenges Act (S. 4236), focus on transformational progress.

Congress’s website does not list text for S. 4236. S. 4178 seems to mean ‘grand challenge’ in the senses of prizes and other pay-for-results (generally great), and having ambitious goals (also generally great), which tend to not be how the system works these days.

So, fund ambitious research, and use good techniques.

  1. Funding for AI efforts at NIST, including AI testing and evaluation infrastructure and the U.S. AI Safety Institute, and funding for NIST’s construction account to address years of backlog in maintaining NIST’s physical infrastructure.

Not all of NIST’s AI effort is safety, but a large portion of our real government safety efforts are at NIST. They are severely underfunded by all accounts right now. Great.

  1. Funding for the Bureau of Industry and Security (BIS) to update its IT and data analytics software and staff up.

That does sound like something we should do, if it isn’t handled. Ensure BIS can enforce the rules it is tasked with enforcing, and choose those rules accordingly.

  1. Funding R&D at the intersection of AI and robotics to ‘advance national security, workplace safety, industrial efficiency, economic productivity and competitiveness, through a coordinated interagency initiative.’

AI robots. The government is going to fund AI robots. With the first goal being ‘to advance national security.’ Sure, why not, I have never seen any movies.

In all seriousness, this is not where the dangers lie, and robots are useful. It’s fine.

The interagency plan seems unwise to me but I’m no expert on that.

  1. R&D for AI to discover manufacturing techniques.

Once again, sure, good idea if you can improve this for real and this isn’t wasted or pork. Better general manufacturing is good. My guess is that this is not a job for the government and this is wasted, but shrug.

  1. Security grants for AI readiness to help secure American elections.

Given the downside risks I presume this money is well spent.

  1. Modernize the federal government and improve delivery of government services, through updating IT and using AI.

  2. Deploying new technologies to find inefficiencies in the U.S. code, federal rules and procurement devices.

Yes, please. Even horribly inefficient versions of these things are money well spent.

  1. R&D and interagency coordination around intersection of AI and critical infrastructure, including for smart cities and intelligent transportation system technologies.

Yes, we are on pace to rapidly put AIs in charge of our ‘critical infrastructure’ along with everything else, why do you ask? Asking people nicely not to let AI anywhere near the things is not an option and wouldn’t protect substantially against existential risks (although it might versus catastrophic ones). If we are going to do it, we should try to do it right, get the benefits and minimize the risks and costs.

Overall I’d say we have three categories.

  1. Many of these points are slam dunk obviously good. There is a lot of focus on enabling more mundane utility, and especially mundane utility of government agencies and government services. These are very good places to be investing.

  2. A few places where it seems like ‘not the government’s job’ to stick its nose, and where I do not expect the money to accomplish much, often that also involve some obvious nervousness around the proposals, but none of which actually amplify the real problems. Mostly I expect wasted money. The market already presents plenty of better incentives for basic research in most things AI.

  3. Semiconductors.

It is entirely plausible for this to be a plan to take most of $32 billion (there’s a second section below that also gets funding), and put most of that into semiconductors. They can easily absorb that kind of cash. If you do it right you could even get your money’s worth.

As usual, I am torn on chips spending. Hardware progress accelerates core AI capabilities, but there is a national security issue with the capacity relying so heavily on Taiwan, and our lead over China here is valuable. That risk is very real.

Either way, I do know that we are not going to talk our government into not wanting to promote domestic chip production. I am not going to pretend that there is a strong case in opposition to that, nor is this preference new.

On AI Safety, this funds NIST, and one of its top three priorities is a broad-based call for various forms of (both existential and mundane) AI safety, and this builds badly needed state capacity in various places.

As far as government spending proposals go, this seems rather good, then, so far.

These get their own section with twelve bullet points.

  1. NNSA testbeds and model evaluation tools.

  2. Assessment of CBRN AI-enhanced threats.

  3. AI-advancements in chemical and biological synthesis, including safeguards to reduce risk of synesthetic materials and pathogens.

  4. Fund DARPA’s AI work, which seem to be a mix of military applications and attempts to address safety issues including interpretability, along with something called ‘AI Forward’ for more fundamental research.

  5. Secure and trustworthy algorithms for DOD.

  6. Combined Joint All-Domain Command and Control Center for DOD.

  7. AI tools to improve weapon platforms.

  8. Ways to turn DOD sensor data into AI-compatible formats.

  9. Building DOD’s AI capabilities including ‘supercomputing.’ I don’t see any sign this is aiming for foundation models.

  10. Utilize AUKUS Pillar 2 to work with allies on AI defense capabilities.

  11. Use AI to improve implementation of Federal Acquisition Regulations.

  12. Optimize logistics, improve workflows, apply predictive maintenance.

I notice in #11 that they want to improve implementation, but not work to improve the regulations themselves, in contrast to the broader ‘improve our procedures’ program above. A sign of who cares about what, perhaps.

Again, we can draw broad categories.

  1. AI to make our military stronger.

  2. AI (mundate up through catastrophic, mostly not existential) safety.

The safety includes CBRN threat analysis, testbed and evaluation tools and a lot of DARPA’s work. There’s plausibly some real stuff here, although you can’t tell magnitude.

This isn’t looking ahead to AGI or beyond. The main thing here is ‘the military wants to incorporate AI for its mundane utility,’ and that includes guarding us against outside threats and ensuring its implementations are reliable and secure. It all goes hand in hand.

Would I prefer a world where all the militaries kept their hands off AI? I think most of us would like that, no matter our other views, But also we accept that we live in a very different world that is not currently capable of that. And I understand that, while it feels scary for obvious reasons and does introduce new risks, this mostly does not change the central outcomes. It does impact the interplay among people and nations in the meantime, which could alter outcomes if it impacts the balance of power, or causes a war, or sufficiently freaks enough people out.

Mostly it seems like a clear demonstration of the pattern of ‘if you were thinking we wouldn’t do or allow that, think again, we will instantly do that unless prevented’ to perhaps build up some momentum towards preventing things we do not want.

Most items in the next section are about supporting small business.

  1. Developing legislation to leverage public-private partnerships for both capabilities and to mitigate risks.

  2. Further federal study of AI including through FFRDRCs.

  3. Supporting startups, including at state and local levels, including by disseminating best practices (to the states and locaties, I think, not to the startups?)

  4. The Comptroller General identifying anything statutes that impact innovation and competition in AI systems. Have they tried asking Gemini?

  5. Increasing access to testing tools like mock data sets, including via DOC.

  6. Doing outreach to small businesses to ensure tools meet their needs.

  7. Finding ways to support small businesses utilizing AI and doing innovation, and consider if legislation is needed to ‘disseminate best practices’ in various states and localities.

  8. Ensuring business software and cloud computing are allowable expenses under the SBA’s 7(a) loan program.

Congress has a longstanding tradition that Small Business is Good, and that Geographic Diversity That Includes My State or District is Good.

Being from the government, they are here to help.

A lot of this seems like ways to throw money at small businesses in inefficient ways? And to try and ‘make geographic diversity happen’ when we all know it is not going to happen? I am not saying you have to move to the Bay if that is not your thing, I don’t hate you that much, but at least consider, let’s say, Miami or Austin.

In general, none of this seems like a good idea. Not because it increases existential risk. Because it wastes our money. It won’t work.

The good proposal here is the fourth one. Look for statues that are needlessly harming competition and innovation.

Padme: And then remove them?

(The eighth point also seems net positive, if we are already going down the related roads.)

The traditional government way is to say they support small business and spend taxpayer money by giving it to small business, and then you create a regulatory state and set of requirements that wastes more money and gives big business a big edge anyway. Whenever possible, I would much rather remove the barriers than spend the money.

Not all rules are unnecessary. There are some real costs and risks, mundane, catastrophic and exponential, to mitigate.

Nor are all of the advantages of being big dependent on rules and compliance and regulatory capture, especially in AI. AI almost defines economies of scale.

Many would say, wait, are not those worried about AI safety typically against innovation and competition and small business?

And I say nay, not in most situations in AI, same as almost all situations outside AI. Most of the time all of that is great. Promoting such things in general is great, and is best done by removing barriers.

The key question is, can you do that in a way that works, and can you do it while recognizing the very high leverage places that break the pattern?

In particular, when the innovation in question is highly capable future frontier models that pose potential catastrophic or existential risks, especially AGI or ASI, and especially when multiple labs are racing against each other to get there first.

In those situations, we need to put an emphasis on ensuring safety, and we need to at minimum allow communication and coordination between those labs without risk of the government interfering in the name of antitrust.

In most other situations, including most of the situations this proposal seeks to assist with, the priorities here are excellent. The question is execution.

Do you want to help small business take on big business?

Do you want to encourage startups and innovation and American dynamism?

Then there are two obvious efficient ways to do that. Both involve the tax code.

The first is the generic universal answer.

If you want to favor small business over big business, you can mostly skip all those ‘loans’ and grants and applications and paperwork and worrying what is an expense under 7(a). And you can stop worrying about providing them with tools, and you can stop trying to force them to have geographic diversity that doesn’t make economic sense – get your geographic diversity, if you want it, from other industries.

Instead, make the tax code explicitly favor small business over big business via differentiating rates, including giving tax advantages to venture capital investments in early stage startups, which then get passed on to the business.

If you want to really help, give a tax break to the employees, so it applies even before the business turns a profit.

If you want to see more of something, tax it less. If you want less, tax it more. Simple.

The second is fixing a deeply stupid mistake that everyone, and I do mean everyone, realizes is a mistake that was made in the Trump tax cuts, but that due to Congress being Congress we have not yet fixed, and that is doing by all reports quite a lot of damage. It is Section 174 of the IRS code requiring that software engineers and other expenses related to research and experimental activities (R&E) can only be amortized over time rather than fully deducted.

The practical result of this is that startups and small businesses, that have negative cash flow, look to the IRS as if they are profitable, and then owe taxes. This is deeply, deeply destructive and stupid in one of the most high leverage places.

From what I have heard, the story is that the two parties spent a long time negotiating a fix for it, it passed the house overwhelmingly, then in the Senate the Republicans decided they did not like the deal package of other items included with the fix, and wanted concessions, and the Democrats, in particular Schumer, said a deal is a deal.

This needs to get done. I would focus far more on that than all these dinky little subsidies.

As usual, Congress takes ‘the effect on jobs’ seriously. Workers must not be ‘left behind.’ And as usual, they are big on preparing.

So, what are you going to do about it, punk? They are to encourage some things:

  1. ‘Efforts to ensure’ that workers and other stakeholders are ‘consulted’ as AI is developed and deployed by end users. A government favorite.

  2. Stakeholder voices get considered in the development and deployment of AI systems procured or used by federal agencies. In other words, use AI, but not if it would take our jobs.

  3. Legislation related to training, retraining (drink!) and upskilling the private sector workforce, perhaps with business incentives, or to encourage college courses. I am going to go out on a limb and say that this pretty much never, ever works.

  4. Explore implications and possible ‘solutions to’ the impact of AI on the long-term future of work as general-purpose AI systems displace human workers, and develop a framework for policy response. So far, I’ve heard UBI, and various versions of disguising to varying degrees versions of hiring people to dig holes and fill them up again, except you get private companies to pay for it.

  5. Consider legislation to improve U.S. immigration systems for high-skilled STEM workers in support of national security and to foster advances in AI across the whole country.

My understanding is that ideas like the first two are most often useless but also most often mostly harmless. Steps are taken to nominally ‘consult,’ most of the time nothing changes. 

Sometimes, they are anything but harmless. You get NEPA. The similar provisions in NEPA were given little thought when first passed, then they grew and morphed into monsters strangling the economy and boiling the planet, and no one has been able to stop them. 

If this applies only to federal agencies and you get the NEPA version, that is in a sense the worst possible scenario. The government’s ability to use AI gets crippled, leaving it behind. Whereas it would provide no meaningful check on frontier model development, or on other potentially risky or harmful private actions. 

Applying it across the board could at the limit actually cripple American AI, in a way that would not serve as a basis for stopping international efforts, so that seems quite bad. 

We should absolutely expand and improve high skill immigration, across all industries. It is rather completely insane that we are not doing so. There should at minimum be unlimited HB-1s. Yes, it helps ‘national security’ and AI but also it helps everything and everyone and the whole economy and we’re just being grade-A stupid not to do it.

They call this ‘high impact uses of AI.’

The report starts off saying existing law must apply to AI. That includes being able to verify that compliance. They note that this might not be compatible with opaque AI systems.

Their response if that happens? Tough. Rules are rules. Sucks to be you.

Indeed, they say to look not for ways to accommodate black box AI systems, but instead look for loopholes where existing law does not cover AI sufficiently.

Not only do they not want to ‘fix’ existing rules that impose, they want to ensure any possible loopholes are closed regarding information existing law requires. The emphasis is on anti-discrimination laws, which are not something correlation machines you can run tests on are going to be in the default habit of not violating.

So what actions are suggested here?

  1. Explore where we might need explainability requirements.

  2. Develop standards for AI in critical infrastructure.

  3. Better monitor energy use.

  4. Keep a closer eye on financial services providers.

  5. Keep a closer eye on the housing sector.

  6. Test and evaluate all systems before the government buys them, and also streamline the procurement process (yes these are one bullet point).

  7. Recognize the concerns of local news (drink!) and journalism that have resulted in fewer local news options in small towns and rural areas. Damn you, AI!

  8. Develop laws against AI-generated child sexual abuse material (CSAM) and deepfakes. There is a bullet here, are they going to bite it?

  9. Think of the children, consider laws to protect them, require ‘reasonable steps.’

If you are at a smaller company working on AI, and you are worried about SB 1047 or another law that specifically targets frontier models and the risk of catastrophic harm, and you are not worried about being required to ‘take reasonable steps’ to ‘protect children,’ then I believe you are very much worried about the wrong things.

You can say and believe ‘the catastrophic risk worries are science fiction and not real, whereas children actually exist and get harmed’ all you like. This is not where I try to argue you out of that position.

That does not change which proposed rules are far more likely to actually make your life a living hell and bury your company, or hand the edge to Big Tech.

Hint: It is the one that would actually apply to you and the product you are offering.

  1. Encourage public-private partnerships and other mechanisms to develop fraud detection services.

  2. Continue work on autonomous vehicle testing frameworks. We must beat the CCP (drink!) in the race to shape the vision of self-driving cars.

  3. Ban use of AI for social scoring to protect our freedom unlike the CCP (drink!)

  4. “Review whether other potential uses for AI should be either extremely limited or banned.”

Did you feel that chill up your spine? I sure did. The ‘ban use cases’ approach is big trouble without solving your real problems.

Then there’s the health care notes.

  1. Both support deployment of AI in health care and implement appropriate guardrails, including consumer protection, fraud and abuse prevention, and promoting accurate and representative data, ‘as patients must be front and center in any legislative efforts on healthcare and AI.’ My heart is sinking.

  2. Make research data available while preserving privacy.

  3. Ensure HHS and FDA ‘have the proper tools to weigh the benefits and risks of AI-enabled products so that it can provide a predictable regulatory structure. for product developers.’ The surface reading would be: So, not so much with the products, then. I have been informed that it is instead likely they are using coded language for the FDA’s pre-certification program to allow companies to self-certify software updates. And yes, if your laws require that then you should do that, but it would be nice to say it in English.

  4. Transparency for data providers and for the training data used in medical AIs.

  5. Promote innovation that improves health outcomes and efficiencies. Examine reimbursement mechanisms and guardrails for Medicare and Medicaid, and broad application.

The refrain is ‘give me the good thing, but don’t give me the downside.’

I mean, okay, sure, I don’t disagree exactly? And yet.

The proposal to use AI to improve ‘efficiency’ of Medicare and Medicaid sounds like the kind of thing that would be a great idea if done reasonably and yet quite predictably costs you the election. In theory, if we could all agree that we could use the AI to figure out which half of medicine wasn’t worthwhile and cut it, or how to actually design a reimbursement system with good incentives and do that, that would be great. But I have no idea how you could do that.

For elections they encourage deployers and content providers to implement robust protections, and ‘to mitigate AI-generated content that is objectively false, while still preserving First Amendment rights.’ Okie dokie.

For privacy and liability, they kick the can, ask others to consider what to do. They do want you to know privacy and strong privacy laws are good, and AIs sharing non-public personal information is bad. Also they take a bold stand that developers or users who cause harm should be held accountable, without any position on what counts as causing harm.

The word ‘encouraging’ is somehow sounding more ominous each time I see it.

What are we encouraging now?

  1. A coherent approach to public-facing transparency requirements for AI systems, while allowing use case specific requirements where necessary and beneficial, ‘including best practices for when AI developers should disclose when their products are AI,’ but while making sure the rules do not inhibit innovation.

I am not sure how much more of this kind of language of infinite qualifiers and why-not-both framings I can take. For those taking my word for it, it is much worse in the original.

One of the few regulatory rules pretty much everyone agrees on, even if some corner cases involving AI agents are tricky, is ‘AI should have to clearly identify when you are talking to an AI.’

My instinctive suggestion for operationalizing the rule would be ‘if an AI sends a freeform message (e.g. not a selection from a fixed list of options, in any modality) that was not approved individually by a human (even if sent to multiple targets), in a way a reasonable person might think was generated by or individually approved by a human, it must be identified as AI-generated or auto-generated.’ Then iterate from there.

As the report goes on, it feels like there was a vibe of ‘all right, we need to get this done, let’s put enough qualifiers on every sentence that no one objects and we can be done with this.’

How bad can it get? Here’s a full quote for the next one.

  1. “Evaluate whether there is a need for best practices for the level of automation that is appropriate for a given type of task, considering the need to have a human in the loop at certain stages for some high impact tasks.”

I am going to go out on a limb and say yes. There is a need for best practices for the level of automation that is appropriate for a given type of task, considering the need to have a human in the loop at certain stages for some high impact tasks.

For example, if you want to launch nuclear weapons, that is a high impact task, and I believe we should have some best practices for when humans are in the loop.

Seriously, can we just say things that we are encouraging people to consider? Please?

They also would like to encourage the relevant committees to:

  1. Consider telling federal employees about AI in the workplace.

  2. Consider transparency requirements and copyright issues about data sets.

  3. Review reports from the executive branch.

  4. Getting hardware to watermark generated media, and getting online platforms to display that information.

And just because such sentences needs to be properly shall we say appreciated:

  1. “Consider whether there is a need for legislation that protects against the unauthorized use of one’s name, image, likeness, and voice, consistent with First Amendment principles, as it relates to AI. Legislation in this area should consider the impacts of novel synthetic content on professional content creators of digital media, victims of non-consensual distribution of intimate images, victims of fraud, and other individuals or entities that are negatively affected by the widespread availability of synthetic content.”

As opposed to, say, ‘Consider a law to protect people’s personality rights against AI.’

Which may or may not be necessary, depending on the state of current law. I haven’t investigated enough to know if what we have is sufficient here.

  1. Ensure we continue to ‘lead the world’ on copyright and intellectual property law.

I have some news about where we have been leading the world on these matters.

  1. Do a public awareness and educational campaign on AI’s upsides and downsides.

You don’t have to do this. It won’t do any good. But knock yourself out, I guess.

Now to what I view as the highest stakes question. What about existential risks?

That is also mixed in with catastrophic mundane risks.

If I had to summarize this section, I would say that they avoid making mistakes and are headed in the right direction, and they ask good questions.

But on the answers? They punt.

The section is short and dense, so here is their full introduction.

In light of the insights provided by experts at the forums on a variety of risks that different AI systems may present, the AI Working Group encourages companies to perform detailed testing and evaluation to understand the landscape of potential harms and not to release AI systems that cannot meet industry standards.

This is some sort of voluntary testing and prior restraint regime? You are ‘encouraged’ to perform ‘detailed testing and evaluation to understand the landscape of potential harms,’ and you must then ‘meet industry standards.’ If you can’t, don’t release.

Whether or not that is a good regime depends on:

  1. Would companies actually comply?

  2. Would industry adopt standards that mean we wouldn’t die?

  3. Do we have to worry about problems that arise prior to release?

I doubt the Senators minds are ready for that third question.

Multiple potential risk regimes were proposed – from focusing on technical specifications such as the amount of computation or number of model parameters to classification by use case – and the AI Working Group encourages the relevant committees to consider a resilient risk regime that focuses on the capabilities of AI systems, protects proprietary information, and allows for continued AI innovation in the U.S.

Very good news. Capabilities have been selected over use case. The big easy mistake is to classify models based on what people say they plan to do, rather than asking what the model is capable of doing. That is a doomed approach, but many lobby hard for it.

The risk regime should tie governance efforts to the latest available research on AI capabilities and allow for regular updates in response to changes in the AI landscape.

Yes. As we learn more, our policies should adjust, and we should plan for that. Ideally this would be an easy thing to agree upon. Yet the same people who say ‘it is too early to choose what to do’ will also loudly proclaim that ‘if you give any flexibility to choose what to do later to anyone but the legislature, one must assume it will used maximally badly.’ I too wish we had a much faster, better legislature, that we could turn to every time we need any kind of decision or adjustment. We don’t.

All right. So no explicit mention of existential risk in the principles, but some good signs of the right regime. What are the actual suggestions?

Again, I am going to copy it all, one must parse carefully.

  1. Support efforts related to the development of a capabilities-focused risk-based approach, particularly the development and standardization of risk testing and evaluation methodologies and mechanisms, including red-teaming, sandboxes and testbeds, commercial AI auditing standards, bug bounty programs, as well as physical and cyber security standards. The AI Working Group encourages committees to consider ways to support these types of efforts, including through the federal procurement system.

There are those who would disagree with this, who think the proper order is train, release then test. I do not understand why they would think that. No wise company would do that, for its own selfish reasons.

The questions should be things like:

  1. How rigorous should be the testing requirements?

  2. At what stages of training and post-training, prior to deployment?

  3. How should those change based on the capabilities of the system?

  4. How do we pick the details?

  5. What should you have to do if the system flunks the test?

For now, this is a very light statement.

  1. Investigate the policy implications of different product release choices for AI systems, particularly to understand the differences between closed versus fully open-source models (including the full spectrum of product release choices between those two ends of the spectrum).

Again, there are those that would disagree with this, who think the proper order is train, release then investigate the consequences. They think they already know all the answers, or that the answers do not matter. Once again, I do not understand why they would have good reason to think that.

Whatever position you take, the right thing to do is to game it out. Ask what the consequences of each regime would be. Ask what the final policy regime and world state would likely be in each case. Ask what the implications are for national security. Get all the information, then make the choice.

The only alternative that makes sense, which is more of a complementary approach than a substitute, is to define what you want to require. Remember what was said about black box systems. Yes, your AI system ‘wants to be’ a black box. You don’t know how to make it not a black box. If the law says you have to be able to look inside the box, or you can’t use the box? Well, that’s more of a you problem. No box.

You can howl about Think of the Potential of the box, why are you shutting down the box over some stupid thing like algorithmic discrimination or bioweapon risk or whatever. You still are not getting your box.

Then, if you can open the weights and still ensure the requirements are met, great, that’s fine, go for it. If not, not.

Then we get serious.

  1. Develop an analytical framework that specifies what circumstances would warrant a requirement of pre-deployment evaluation of AI models.

This does not specify whether this is requiring a self-evaluation by the developer as required in SB 1047, or requiring a third-party evaluation like METR, or an evaluation by the government. Presumably part of finding the right framework would be figuring out when to apply which requirement, along with which tests would be needed.

I am not going to make a case here for where I think the thresholds should be, beyond saying that SB 1047 seems like a good upper bound for the threshold necessary for self-evaluations, although one could quibble with the details of the default future path. Anything strictly higher than that seems clearly wrong to me.

  1. Explore whether there is a need for an AI-focused Information Sharing and Analysis Center (ISAC) to serve as an interface between commercial AI entities and the federal government to support monitoring of AI risks.

That is not how I would have thought to structure such things, but also I do not have deep thoughts about how to best structure such things. Nor do I see under which agency they would propose to put this center. Certainly there will need to be some interface where companies inform the federal government of issues in AI, as users and as developers, and for the federal government to make information requests.

5. Consider a capabilities-based AI risk regime that takes into consideration short-, medium-, and long-term risks, with the recognition that model capabilities and testing and evaluation capabilities will change and grow over time. As our understanding of AI risks further develops, we may discover better risk-management regimes or mechanisms.

Where testing and evaluation are insufficient to directly measure capabilities, the AI Working Group encourages the relevant committees to explore proxy metrics that may be used in the interim.

There is some very welcome good thinking in here. Yes, we will need to adjust our regime over time. Also, that does not mean that until we reach our ‘final form’ the correct regime is no regime at all. You go with the best proxy measure you have, then when you can do better you switch to a better one, and you need to consider all time frames, although naming them all is a punt from the hard work of prioritization.

The question is, can you use testing and evaluation to directly measure capabilities sufficiently accurately? For which purposes and scenarios does this work or fail?

There are two ways testing and evaluation can fail, false positives and false negatives.

False positives are where you game the benchmarks, intentionally or otherwise. In general, I presume that the major labs (OpenAI, Anthropic and DeepMind for sure, and mostly Meta as well) will be good at not doing this, but that smaller competitors will often be gaming the system to look better, or not be taking care to avoid data contamination.

This can mostly be solved through keeping the testing details private, or continuously rotating them with questions known to not be online. But it also is not the issue here.

False negatives are far scarier.

We can again subdivide, and ask what ways things might go wrong. I took 10 or so minutes to brainstorm a list, which is of course highly incomplete.

These are vaguely ordered ‘ordinary failure, probably not too bad’ to ‘oh no.’

  1. The AI can do it, if you were better at prompting and writing custom instructions.

    1. Variant: The AI can do it, if you jailbreak it first, which you can totally do.

    2. Variant: You messed up the inputs or the answer key.

  2. The AI can do it, if you offer it the right additional context.

  3. The AI can do it, if you give it some extra scaffolding to work with.

  4. The AI can do it, if you give it a bit of fine tuning.

  5. The AI can do it, if you force it to embody the Golden Gate Bridge or something.

  6. The AI can do it, with help from a user with better domain knowledge.

  7. The AI can do it, but you won’t like the way it picked to get the job done.

  8. The AI can do it, but you have to trigger some hidden condition flag.

  9. The AI can do it, but the developers had it hide its capabilities to fool the test.

  10. The AI can do it, but realized you were testing it, so it hid its capabilities.

  11. The AI can do it, so the developers crippled the narrow capability that goes on evaluations, but it still has the broader capability you were actually trying to test.

  12. The AI can’t do this in particular, but you were asking the wrong questions.

    1. Variant: What the AI can do is something humans haven’t even considered yet.

    2. Variant: What you are about exists out of distribution, and this isn’t it.

  13. The AI can do it, but its solution was over your head and you didn’t notice.

  14. The AI escaped or took control or hacked the system during your test.

  15. The AI did the dangerous thing during training or fine-tuning. You are too late.

The more different tests you run, and the more different people run the tests, especially if you include diverse red teaming and the ability to probe for anything at all while well resourced, the better you will do. But this approach has some severe problems, and they get a lot more severe once you enter the realm of models plausibly smarter than humans and you don’t know how to evaluate the answers or what questions to ask.

If all you want are capabilities relative to another similar model, and you can put an upper bound on how capable the thing is, a lot of these problems mostly go away or become much easier, and you can be a lot more confident.

Anyway, my basic perspective is that you use evaluations, but that in our current state and likely for a while I would not trust them to avoid false negatives on the high end, if your system used enough compute and is large enough that it might plausibly be breaking new ground. At that point, you need to use a holistic mix of different approaches and an extreme degree of caution, and beyond a certain point we don’t know how to proceed safely in the existential risk sense.

So the question is, will the people tasked with this be able to figure out a reasonable implementation of these questions? How can we help them do that?

The basic principle here, however, is clear. As inputs, potential capabilities and known capabilities advance, we will need to develop and deploy more robust testing procedures, and be more insistent upon them. From there, we can talk price, and adjust as we learn more.

There are also two very important points that wait for the national security section: A proper investigation into defining AGI and evaluating how likely it is and what risks it would pose, and an exploration into AI export controls and the possibility of on-chip AI governance. I did not expect to get those.

Am I dismayed that the words existential and catastrophic only appear once each and only in the appendix (and extinction does not appear)? That there does not appear to be a reference in any form to ‘loss of human control’ as a concept, and so on? That ‘AGI’ does not appear until the final section on national security, although they ask very good questions about it there?

Here is the appendix section where we see mentions at all (bold is mine), which does ‘say the line’ but does seem to have rather a missing mood, concluding essentially (and to be fair, correctly) that ‘more research is needed’:

The eighth forum examined the potential long-term risks of AI and how best to encourage development of AI systems that align with democratic values and prevent doomsday scenarios.

Participants varied substantially in their level of concern about catastrophic and existential risks of AI systems, with some participants very optimistic about the future of AI and other participants quite concerned about the possibilities for AI systems to cause severe harm.

Participants also agreed there is a need for additional research, including standard baselines for risk assessment, to better contextualize the potential risks of highly capable AI systems. Several participants raised the need to continue focusing on the existing and short term harms of AI and highlighted how focusing on short-term issues will provide better standing and infrastructure to address long-term issues.

Overall, the participants mostly agreed that more research and collaboration are necessary to manage risk and maximize opportunities.

Of course all this obfuscation is concerning.

It is scary that such concepts are that-which-shall-not-be-named.

You-know-what still has its hands on quite a few provisions of this document. The report was clearly written by people who understand that the stakes are going to get raised to very high levels. And perhaps they think that by not saying you-know-what, they can avoid all the nonsensical claims they are worried about ‘science fiction’ or ‘hypothetical risks’ or what not.

That’s the thing. You do not need the risks to be fully existential, or to talk about what value we are giving up 100 or 1,000 years from now, or any ‘long term’ arguments, or even the fates of anyone not already alive, to make it worth worrying about what could happen to all of us within our lifetimes. The prescribed actions change a bit, but not all that much, especially not yet. If the practical case works, perhaps that is enough.

I am not a politician. I do not have experience with similar documents and how to correctly read between the lines. I do know this report was written by committee, causing much of this dissonance. Very clearly at least one person on the committee cared and got a bunch of good stuff through. Also very clearly there was sufficient skepticism that this wasn’t made explicit. And I know the targets are other committees, which muddies everything further.

Perhaps, one might suggest, all this optimism is what they want people like me to think? But that would imply that they care what people like me think when writing such documents.

I am rather confident that they don’t.

I went into this final section highly uncertain what they would focus on. What does national security mean in this context? There are a lot of answers that would not have shocked me.

It turns out that here it largely means help the DOD:

  1. Bolstering cyber capabilities.

  2. Developing AI career paths for DOD.

  3. Money for DOD.

  4. Efficiently handle security clearances, improve DOD hiring process for AI talent.

  5. Improve transfer options and other ways to get AI talent into DOD.

I would certainly reallocate DOD money for more of these things if you want to increase the effectiveness of the DOD. Whether to simply throw more total money at DOD is a political question and I don’t have a position there.

Then we throw in an interesting one?

  1. Prevent LLMs leaking or reconstructing sensitive or confidential information.

Leaking would mean it was in the training data. If so, where did that data come from? Even if the source was technically public and available to be found, ‘making it easy on them’ is very much a thing. If it is in the training data you can probably get the LLM to give it to you, and I bet that LLMs can get pretty good at ‘noticing which information was classified.’

Reconstructing is more interesting. If you de facto add ‘confidential information even if reconstructed’ to the list of catastrophic risks alongside CBRN, as I presume some NatSec people would like, then that puts the problem for future LLMs in stark relief.

The way that information is redacted usually contains quite a lot of clues. If you put AI on the case, especially a few years from now, a lot of things are going to fall into place. In general, a capable AI will start being able to figure out various confidential information, and I do not see how you stop that from happening, especially when one is not especially keen to provide OpenAI or Google with a list of all the confidential information their AI is totally not supposed to know about? Seems hard.

A lot of problems are going to be hard. On this one, my guess is that among other things the government is going to have to get a very different approach to what is classified.

  1. Monitor AI and especially AGI development by our adversaries.

I would definitely do that.

  1. Work on a better and more precise definition of AGI, a better measurement of how likely it is to be developed and the magnitude of the risks it would pose.

Yes. Nice. Very good. They are asking many of the right questions.

  1. Explore using AI to mitigate space debris.

You get an investigation into using AI for your thing. You get an investigation into using AI for your thing. I mean, yeah, sure, why not?

  1. Look into all this extra energy use.

I am surprised they didn’t put extra commentary here, but yeah, of course.

  1. Worry about CBRN threats and how AI might enhance them.

An excellent thing for DOD to be worried about. I have been pointed to the question here of what to do about Restricted Data. We automatically classify certain information, such as info about nuclear weapons, as it comes into existence. If an AI is not allowed to generate outputs containing such information, and there is certainly a strong case why you would want to prevent that, this is going to get tricky. No question the DOD should be thinking carefully about the right approach here. If anything, AI is going to be expanding the range of CBRN-related information that we do not want easily shared.

  1. Consider how CBRN threats and other advanced technological capabilities interact with need for AI export controls, explore whether new authorities are needed, and explore feasibility of options to implement on-chip security mechanisms for high-end AI chips.

  2. “Develop a framework for determining when, or if, export controls should be placed on powerful AI systems.”

Ding. Ding. Ding. Ding. Ding.

If you want the ability to choke off supply, you target the choke points you can access.

That means either export controls, or it means on-chip security mechanisms, or it means figuring out something new.

This is all encouraging another group to consider maybe someone doing something. That multi-step distinction covers the entire document. But yes, all the known plausibly effective ideas are here in one form or another, to be investigated.

The language here on AI export controls is neutral, asking both when and if.

At some point on the capabilities curve, national security will dictate the need for export controls on AI models. That is incompatible with open weights on those models, or with letting such models run locally outside the export control zone. The proper ‘if’ is whether we get to that point, so the right question is when.

Then they go to a place I had not previously thought about us going.

  1. “Develop a framework for determining when an AI system, if acquired by an adversary, would be powerful enough that it would pose such a grave risk to national security that it should be considered classified, using approaches such as how DOE treats Restricted Data.”

Never mind debating open model weights. Should AI systems, at some capabilities level, be automatically classified upon creation? Should the core capabilities workers, or everyone at OpenAI and DeepMind, potentially have to get a security clearance by 2027 or something?

  1. Ensure federal agencies have the authority to work with allies and international partners and agree to things. Participate in international research efforts, ‘giving due weight to research security and intellectual property.’

Not sure why this is under national security, and I worry about the emphasis on friendlies, but I would presume we should do that.

  1. Use modern data analytics to fight illicit drugs including fentanyl.

Yes, use modern data analytics. I notice they don’t mention algorithmic bias issues.

  1. Promote open markets for digital goods, prevent forced technology transfer, ensure the digital economy ‘remains open, fair and competitive for all, including for the three million American workers whose jobs depend on digital trade.’

Perfect generic note to end on. I am surprised the number of jobs is that low.

They then give a list of who was at which forum and summaries of what happened.

Before getting to my takeaways, here are some other reactions.

These are illustrative of five very different perspectives, and also the only five cases in which anyone said much of anything about the bill at all. And I love that all five seem to be people who actually looked at the damn thing. A highly welcome change.

  1. Peter Wildeford looks at the overall approach. His biggest takeaway is that this is a capabilities-based approach, which puts a huge burden on evaluations, and he notices some other key interactions too, especially funding for BIS and NIST.

  2. Tim First highlights some measures he finds fun or exciting. Like Peter he mentions the call for investigation of on-chip security mechanisms.

  3. Tyler Cowen’s recent column contained the following: “Fast forward to the present. Senate Majority Leader Chuck Schumer and his working group on AI have issued a guidance document for federal policy. The plans involve a lot of federal support for the research and development of AI, and a consistent recognition of the national-security importance of the US maintaining its lead in AI. Lawmakers seem to understand that they would rather face the risks of US-based AI systems than have to contend with Chinese developments without a US counterweight. The early history of Covid, when the Chinese government behaved recklessly and nontransparently, has driven this realization home.”

    1. The context was citing this report as evidence that the AI ‘safety movement’ is dead, or at least that a turning point has been reached and it will fade into obscurity (and the title has now been changed to better reflect the post.)

    2. Tyler is right that there is much support for ‘innovation,’ ‘R&D’ and American competitiveness and national security. But this is as one would expect.

    3. My view is that, while the magic words are not used, the ‘AI safety’ concerns are very much here, including all the most important policy proposals, and it even includes one bold proposal I do not remember previously considering.

    4. Yes, I would have preferred if the report had spoken more plainly and boldly, here and also elsewhere, and the calls had been stronger. But I find it hard not to consider this a win. At bare minimum, it is not a loss.

    5. Tyler has not, that I know of, given further analysis on the report’s details.

  4. R Street’s Adam Thierer gives an overview.

    1. He notices a lot of the high-tech pork (e.g. industrial policy) and calls for investigating expanding regulations.

    2. He notices the kicking of all the cans down the road, agrees this makes sense.

    3. He happily notices no strike against open source, which is only true if you do not work through the implications (e.g. of potentially imposing export controls on future highly capable AI systems, or even treating them as automatically classified Restricted Data.)

    4. Similarly, he notes the lack of a call for a new agency, whereas this instead will do everything piecemail. And he is happy that ‘existential risk lunacy’ is not mentioned by name, allowing him not to notice it either.

    5. Then he complains about the report not removing enough barriers from existing laws, regulations and court-based legal systems, but agrees existing law should apply to AI. Feels a bit like trying to have the existing law cake to head off any new rules and call for gutting what already exists too, but hey. He offers special praise for the investigation to look for innovation-stifling rules.

    6. He notices some of the genuinely scary language, in particular “Review whether other potential uses for AI should be either extremely limited or banned.”

    7. He calls for Congress to actively limit Executive discretion on AI, which seems like ‘AI Pause now’ levels of not going to happen.

    8. He actively likes the idea of a public awareness campaign, which surprised me.

    9. Finally Adam seems to buy into the view that screwing up Section 230 is the big thing to worry about. I continue to be confused why people think that this is going to end up being a problem in practice. Perhaps it is the Sisyphean task of people like R Street to constantly worry about such nightmare scenarios.

    10. He promised a more detailed report coming, but I couldn’t find one.

  5. The Wall Street Journal editorial board covers it as ‘The AI Pork Barrel Arrives.’

They quote Schumer embarrassing himself a bit:

Chuck Schumer: If China is going to invest $50 billion, and we’re going to invest in nothing, they’ll inevitably get ahead of us.

Padme: You know the winner is not whoever spends the most public funds, right?

You know America’s success is built on private enterprise and free markets, right?

You do know that ‘we’ are investing quite a lot of money in AI, right?

You… do know… we are kicking China’s ass on AI at the moment, right?

WSJ Editorial Board: Goldman Sachs estimates that U.S. private investment in AI will total $82 billion next year—more than twice as much as in China.

We are getting quite a lot more than twice as much bang for our private bucks.

And this comes on the heels of the Chips Act money.

So yes, I see why the Wall Street Journal Editorial Board is thinking pork.

WSJ Editorial Board: Mr. Schumer said Wednesday that AI is hard to regulate because it “is changing too quickly.” Fair point. But then why does Washington need to subsidize it?

The obvious answer, mostly, is that it doesn’t.

There are some narrow areas, like safety work, where one can argue that there will by default be underinvestment in public goods.

There is need to fund the government’s own adaptation of AI, including for defense, and to adjust regulations and laws and procedures for the new world.

Most of the rest is not like that.

WSJ: Now’s not a time for more pork-barrel spending. The Navy could buy a lot of ships to help deter China with an additional $32 billion a year.

This is where they lose me. Partly because a bunch of that $32 billion is directly for defense or government services and administration. But also because I see no reason to spend a bunch of extra money on new Navy ships that will be obsolete in the AI era, especially given what I have heard about our war games where our ships are not even useful against China now. The Chips Act money is a far better deterrent. We also would have accepted ‘do not spend the money at all.’

Mostly I see this focus as another instance of the mainstream not understanding, in a very deep way, that AI is a Thing, even in the economic and mundane utility senses.

There was a lot of stuff in the report. A lot of it was of the form ‘let’s do good thing X, without its downside Y, taking into consideration the vital importance of A, B and C.’

It is all very ‘why not both,’ embrace the upside and prevent the downside.

Which is great, but of course easier said (or gestured at) than done.

This is my attempt to assemble what feels most important, hopefully I am not forgetting anything:

  1. The Schumer Report is written by a committee for other committees to then do something. Rather than one big bill, we will get a bunch of different bills.

  2. They are split on whether to take existential risk seriously.

    1. As a result, they include many of the most important proposals on this.

      1. Requiring safety testing of frontier models before release.

      2. Using compute or other proxies if evaluations are not sufficiently reliable.

      3. Export controls on AI systems.

      4. Treating sufficiently capable AI systems as Restricted Data.

      5. Addressing CBRN threats.

      6. On-chip governance for AI chips.

      7. The need for international cooperation.

      8. Investigate the definition of AGI, and the risks it would bring.

    2. Also as a result, they present them in an ordinary, non-x-risk context.

    3. That ordinary context indeed does justify the proposals on its own.

  3. Most choices regarding AI Safety policies seem wise. The big conceptual danger is that the report emphasizes a capabilities-based approach via evaluations and tests. It does mention the possibility of using compute or other proxies if our tests are inadequate, but I worry a lot about overconfidence here. This seems like the most obvious way that this framework goes horribly wrong.

    1. A second issue is that this report presumes that only release of a model is dangerous, that otherwise it is safe. Which for now is true, but this could change, and it should not be an ongoing assumption.

  4. There is a broad attitude that the rules must be flexible, and adapt over time.

  5. They insist that AI will need to obey existing laws, including those against algorithmic discrimination and all the informational disclosure requirements involved.

  6. They raise specters regarding mundane harm concerns and AI ethics, both in existing law and proposed new rules, that should worry libertarians and AI companies far more than laws like SB 1047 that are aimed at frontier models and catastrophic risks.

    1. Calls for taking ‘reasonable steps’ to ‘protect children’ should be scary. They are likely not kidding around about copyright, CSAM or deepfakes.

    2. Calls for consultation and review could turn into a NEPA-style nightmare. Or they might turn out to be nothing. Hard to tell.

    3. They say that if black box AI is incompatible with existing disclosure requirements and calls for explainability and transparency, then their response is: Tough.

    4. They want to closely enforce rules on algorithmic discrimination, including the associated disclosure requirements.

    5. There are likely going to be issues with classified material.

    6. The report wants to hold developers and users liable for AI harms, including mundane AI harms.

    7. The report calls for considerations of potential use case bans.

  7. They propose to spend $32 billion dollars on AI, with an unknown breakdown.

  8. Schumer thinks public spending matters, not private spending. It shows.

  9. There are many proposals for government adoption of AI and building of AI-related state capacity. This seemed like a key focus point.

    1. These mostly seem very good.

    2. Funding for BIS and NIST is especially important and welcome.

  10. There are many proposals to ‘promote innovation’ in various ways.

    1. I do not expect them to have much impact.

  11. There are proposals to ‘help small business’ and encourage geographic diversity and other such things.

    1. I expect these are pork and would go to waste.

  12. There is clear intent to integrate AI closely into our critical infrastructure and into the Department of Defense.

This is far from the report I would have wanted written. But it is less far than I expected before I looked at the details. Interpreting a document like this is not my area of expertise, but in many ways I came away optimistic. The biggest downside risks I see are that the important proposals get lost in the shuffle, or that some of the mundane harm related concerns get implemented in ways that cause real problems.

If I was a lobbyist for tech companies looking to avoid expensive regulation, especially if I was trying to help relatively small players, I would focus a lot more on heading off mundane-based concerns like those that have hurt so many other areas. That seems like by far the bigger commercial threat, if you do not care about the risks on any level.

The Schumer Report on AI (RTFB) Read More »

us-sues-ticketmaster-and-owner-live-nation,-seeks-breakup-of-monopoly

US sues Ticketmaster and owner Live Nation, seeks breakup of monopoly

A large Ticketmaster logo is displayed on a digital screen above the field where a soccer game is played.

Enlarge / Ticketmaster advertisements in the United States v. South Africa women’s soccer match at Soldier Field on September 24, 2023 in Chicago, Illinois.

Getty Images | Daniel Bartel/ISI Photos/USSF

The US government today sued Live Nation and its Ticketmaster subsidiary in a complaint that seeks a breakup of the company that dominates the live music and events market.

The US Department of Justice is seeking “structural relief,” including a breakup, “to stop the anticompetitive conduct arising from Live Nation’s monopoly power.” The DOJ complaint asked a federal court to “order the divestiture of, at minimum, Ticketmaster, along with any additional relief as needed to cure any anticompetitive harm.”

The District of Columbia and 29 states joined the DOJ in the lawsuit filed in US District Court for the Southern District of New York. “One monopolist serves as the gatekeeper for the delivery of nearly all live music in America today: Live Nation, including its wholly owned subsidiary Ticketmaster,” the complaint said.

US Attorney General Merrick Garland said during a press conference that “Live Nation relies on unlawful, anticompetitive conduct to exercise its monopolistic control over the live events industry in the United States… The result is that fans pay more in fees, artists have fewer opportunities to play concerts, smaller promoters get squeezed out, and venues have fewer real choices for ticketing services.”

“It is time to break it up,” Garland said.

Live Nation: We aren’t a monopoly

Garland said that Live Nation directly manages more than 400 artists, controls over 60 percent of concert promotions at major venues, and owns or controls over 60 percent of large amphitheaters. In addition to acquiring venues directly, Live Nation uses exclusive ticketing contracts with venues that last over a decade to exercise control, Garland said.

Garland said Ticketmaster imposes a “impose seemingly endless list of fees on fans,” including ticketing fees, service fees, convenience fees, order fees, handling fees, and payment processing fees. Live Nation and Ticketmaster control “roughly 80 percent or more of major concert venues’ primary ticketing for concerts and a growing share of ticket resales in the secondary market,” the lawsuit said.

Live Nation defended its business practices in a statement provided to Ars today, saying the lawsuit won’t solve problems “relating to ticket prices, service fees, and access to in-demand shows.”

“Calling Ticketmaster a monopoly may be a PR win for the DOJ in the short term, but it will lose in court because it ignores the basic economics of live entertainment, such as the fact that the bulk of service fees go to venues and that competition has steadily eroded Ticketmaster’s market share and profit margin,” the company said. “Our growth comes from helping artists tour globally, creating lasting memories for millions of fans, and supporting local economies across the country by sustaining quality jobs. We will defend against these baseless allegations, use this opportunity to shed light on the industry, and continue to push for reforms that truly protect consumers and artists.”

Live Nation said its profits aren’t high enough to justify the DOJ lawsuit.

“The defining feature of a monopolist is monopoly profits derived from monopoly pricing,” the company said. “Live Nation in no way fits the profile. Service charges on Ticketmaster are no higher than other ticket marketplaces, and frequently lower.” Live Nation said its net profit margin last fiscal year was 1.4 percent and claimed that “there is more competition than ever in the live events market.”

US sues Ticketmaster and owner Live Nation, seeks breakup of monopoly Read More »

the-2024-chevrolet-silverado-ev’s-great-range-comes-at-a-high-cost

The 2024 Chevrolet Silverado EV’s great range comes at a high cost

if you hate big trucks, look away now —

At $94,500, the Chevrolet Silverado RST First Edition offers diminishing returns.

A black Chevrolet Silverado EV

Enlarge / Chevrolet is starting at the top with the Silverado EV RST First Edition. It’s betting that EV truck buyers want a lot of range and towing capability and will pay handsomely for the experience.

Michael Teo Van Runkle

The latest addition to Chevrolet’s growing family of Ultium electric vehicles recently began shipping to dealers in the form of the Silverado EV’s early RST First Edition package. Silverado’s top spec level now joins the lineup’s previous fleet-only WT trim, meaning the general public can now purchase an enormous electric pickup that strongly resembles the Avalanche of 2001 to 2013. But despite any other similarities to the Hummer EV, which shares a related chassis, or ICE trucks of old, the 2024 Silverado aims to change the game for GM’s market positioning despite arriving a full 24 months after Ford’s F-150 Lightning.

With a large crew cab, a longer truck bed, and angular sail panels, the Silverado EV looks less boxy than GMC’s Hummer EV. Aero gains thanks to the smoother design pair with lower rolling-resistance tires, allowing the Silverado to achieve an EPA range estimate of up to 450 miles (724 km), though the RST First Edition I recently drove over the course of a long day in Michigan earns a rating of 440 miles (708 km).

On the highway, judging by wind noise around the cabin alone, the aerodynamic gains of the Silverado’s styling seem to make a noticeable difference versus the Hummer. On the other hand, tire hum might cover up any aero deficiencies because the RST’s single weirdest detail constantly occupies center stage here: a set of 24-inch wheels, the largest ever equipped to a car, truck, or SUV straight from the factory.

At 24 inches, the Silverado RST rides on simply gargantuan wheels. While it means acceptable towing performance, it comes with quite a hit to the ride.

Enlarge / At 24 inches, the Silverado RST rides on simply gargantuan wheels. While it means acceptable towing performance, it comes with quite a hit to the ride.

Michael Teo Van Runkle

Shod in low-profile Michelin Primacy LTX tires pumped up to 61 and 68 PSI front and rear, which simultaneously maximizes range and load rating, the large wheels and minimal sidewall clearly stress much of the new truck’s suspension and ability to filter out noise, vibration, and harshness. Even in town, on the first few blocks of Detroit’s rough roads, the setup immediately challenged the Silverado EV’s adaptive air suspension, which otherwise worked surprisingly well on the mammoth Hummer.

But the Hummer EV I drove rode on 18-inch wheels, despite the similar 35-inch overall tire diameter. The much more compliant ride quality therefore creates a conundrum, since GM clearly intends for the Silverado to represent a much more rational and capable vision for electric performance in the full-size pickup truck market.

Specifically, the Silverado adds a longer bed, a Multi-Flex tailgate, and a central mid-gate (also à la Avalanche) to provide far more payload volume than the Hummer, as well as that of Silverado’s main electric competition, the F-150 Lightning, Rivian R1T, and Tesla Cybertruck. But the mid-gate required far more rugged materials for the Silverado’s interior to enhance weatherproofing, so even the top-spec RST First Edition that starts at $94,500 now slots in at a much lower luxury level than the aforementioned EVs, as well as most internal-combustion Silverados.

  • The Silverado EV uses GM’s new Ultifi infotainment system, which is built atop Android Automotive OS.

    Chevrolet

  • Super Cruise now works with a trailer attached.

    Chevrolet

  • The flexible midgate allows you to carry longer loads.

    Chevrolet

  • Onboard AC power is quite useful.

    Chevrolet

Still, Chevy says EV buyers love tech and packed the Silverado EV full of big screens, Google built-in (though no Apple CarPlay), and Super Cruise partially automated driving assist (the latter including for towing). That air suspension pairs 2 inches (50 mm) of ride height adjustability with up to 7.5 degrees of rear-wheel steering to make the large truck surprisingly maneuverable, but in the back of my mind, I always knew that the ease with which I just climbed in and started driving comes down to playing with physics as much as possible to mask the Silverado’s significant heft.

Those 440 miles of range come at a serious cost, after all, in the form of a 205 kWh battery pack (around 200 kWH usable). All in, the RST tips the scales at a whopping 9,119 pounds (4,136 kg), not quite as much as a Hummer but fully 2,000 pounds (907 kg) more than a Lightning, R1T, or Cybertruck. No wonder the suspension struggles without taller tire sidewalls to help out. I fiddled through the 17.7-inch touchscreen to set the air suspension on Tour, which reduced unwanted feedback noticeably but created some rafting effects and still never fully eliminated clunking on the worst road surfaces. Future models, including a Trail Boss on the way, should come with smaller wheels and taller tires—to match the current WT’s 18-inch wheels and 33-inch tires, hopefully.

But the prospect of actually off-roading such a heavy EV definitely approaches a level of absurdity that the Hummer EV similarly delivered in spades. Neither comes with a spare tire, despite impressive storage volume that only improves on the Silverado. Flipping down the tailgate and mid-gate allows for up to 10 feet, 10 inches (3.3m) of bed length, or 9 feet (2.7m) with the mid-gate closed and just the Multi-Flex tailgate down. The bed alone measures 5-foot-11 (1.8m).

  • Chevrolet was keen to impress that its truck bed is bigger than other electric pickups.

    Michael Teo Van Runkle

  • The aerodynamic detailing was presaged by the turn-of-the-century Avalanche pickup.

    Michael Teo Van Runkle

  • There are a whole range of towing assists.

    Michael Teo Van Runkle

  • The controls here are for trailer settings.

    Michael Teo Van Runkle

  • Two miles/kWh is not great but in the range of what we expect for an electric pickup truck.

    Michael Teo Van Runkle

On the interior, at 6-foot-1 (1.85m) with long limbs, I actually needed to scoot the driver’s seat up and forward. The RST’s (not-optional) panoramic glass roof helps to enhance the perceived spaciousness but required that I keep the air conditioning and ventilated seats at full blast on a hot Michigan day—other than when I struggled to figure out how to keep the system running while parked since the truck has no dedicated on-off button other than a pair of widget icons at the left of the home screen. A retractable screen for the roof is on the way, I was told.

The Silverado EV’s range proved more than legitimate, at least based on this first drive. Over the course of 107 miles (172 km) of combined city and highway driving in one truck, I used 24 percent of the battery and 105 miles (169 km) of estimated range. And that’s including two hard eighth-mile launches with WOW (Wide Open Watts) mode activated, which unleashes the dual motor drivetrain’s full 754 hp (562 kW) and 785 lb-ft (1,064 Nm) of torque. Those two launches alone used eight miles of range, for better or worse.

GM won’t disclose non-WOW power figures, but responsiveness definitely drops to help extend overall range performance. In Tow/Haul mode with a 5,800-lb (2,630 kg) trailer hooked up for 21 miles (34 km), I nonetheless accelerated easily up to highway speeds and even used Super Cruise’s towing capability—all while eating through only 22 claimed miles of range at speeds around 40-60 miles per hour (64-96 km/h).

Chevy set up an impromptu drag strip so we could test the Silverado's launch.

Enlarge / Chevy set up an impromptu drag strip so we could test the Silverado’s launch.

Michael Teo Van Runkle

The Silverado EV’s range sets it far ahead of the Lightning (at 240 miles or 386 km), though Rivian and Tesla do better. Various levels of home-charging setups help to make the large battery pack more attractive, and though I never needed nor got a chance to charge, expect GM’s claimed 350 kW max charging speed to similarly hold up. As usual, charging stations will likely throttle that speed back more regularly than the truck itself, which should manage a 10–80 percent charge time of around 40 minutes in ideal circumstances.

In the end, although it’s not quite as cartoonishly large and simultaneously far more practical than the Hummer EV, the Silverado uses 205 kilowatt-hours worth of lithium and other rare earth metals, contributing mightily to the RST weighing well north of 9,000 pounds. Yes, the truck combines the best utility of any EV on the market, with solid tech and range to attract stubborn EV holdouts. But how many hybrids could Chevy have built using so much battery? Until pricing drops lower than this truck’s $94,500 sticker, the Silverado RST ends up as a reminder of the diminishing returns, environmentally and economically, of building what customers, unfortunately, believe is necessary using today’s technology, which likely still needs to take another major leap forward to make such a truck more feasible for widespread adoption.

The 2024 Chevrolet Silverado EV’s great range comes at a high cost Read More »

the-next-food-marketing-blitz-is-aimed-at-people-on-new-weight-loss-drugs

The next food marketing blitz is aimed at people on new weight-loss drugs

GLP-1 friendly —

Taking a weight-loss drug? Food makers have just the new food for you.

The next food marketing blitz is aimed at people on new weight-loss drugs

As new diabetes and weight-loss drugs help patients curb appetites and shed pounds, food manufacturers are looking for new ways to keep their bottom lines plump.

Millions of Americans have begun taking the pricey new drugs—particularly Mounjaro, Ozempic, Wegovy, and Zepbound—and millions more are expected to go on them in the coming years. As such, food makers are bracing for slimmer sales. In a report earlier this month, Morgan Stanley’s tobacco and packaged food analyst Pamela Kaufman said the drugs are expected to affect both the amounts and the types of food people eat, taking a bite out of the food and drink industry’s profits.

“In Morgan Stanley Research surveys, people taking weight-loss drugs were found to eat less food in general, while half slashed their consumption of sugary drinks, alcohol, confections and salty snacks, and nearly a quarter stopped drinking alcohol completely,” Kaufman said. Restaurants that sell unhealthy foods, particularly chains, may face long-term business risks, the report noted. Around 75 percent of survey respondents taking weight-loss drugs said they had cut back on going to pizza and fast food restaurants.

Some food makers aren’t taking the threat lightly. On Tuesday, the massive multinational food and beverage conglomerate Nestlé announced a new line of frozen foods, called Vital Pursuit, aimed directly at people taking GLP-1 weight-loss drugs (Wegovy and Ozempic). Nestlé—maker of DiGiorno frozen pizzas and Stouffer’s frozen entrées—said the new product line will include frozen pizzas, sandwich melts, grain bowls, and pastas that are “portion-aligned to a weight loss medication user’s appetite.” The frozen fare is otherwise said to contain fiber, “essential nutrients,” and high protein, food features not specific for people on GLP-1 drugs.

“As the use of medications to support weight loss continues to rise, we see an opportunity to serve those consumers,” Steve Presley, CEO of Nestlé North America, said in the product line announcement. “Vital Pursuit provides accessible, great-tasting food options that support the needs of consumers in this emerging category.”

Nestlé isn’t alone. At the end of last year, WeightWatchers began offering a membership program for people taking GLP-1 drugs. In January, meal delivery service Daily Harvest announced its “GLP-1 Companion Food Collection.” And last month, GNC announced a “GLP-1 support program” for people on the drugs, which includes a collection of various supplements, coaching, and consultations.

The companies seem to be heeding the advice of analysts. Morgan Stanley’s report noted that food makers can adapt to people’s changing diets by “raising prices, offering ‘better for you’ or weight-management products, or catering to changing trends with vegan or low-sugar options.” Kaufman noted that some companies are already adjusting by selling smaller packages and portions.

The next food marketing blitz is aimed at people on new weight-loss drugs Read More »

ai-#65:-i-spy-with-my-ai

AI #65: I Spy With My AI

In terms of things that go in AI updates, this has been the busiest two week period so far. Every day ends with more open tabs than it started, even within AI.

As a result, some important topics are getting pushed to whenever I can give them proper attention. Triage is the watchword.

In particular, this post will NOT attempt to cover:

  1. Schumer’s AI report and proposal.

    1. This is definitely RTFB. Don’t assume anything until then.

  2. Tyler Cowen’s rather bold claim that: “May 2024 will be remembered as the month that the AI safety movement died.”

    1. Rarely has timing of attempted inception of such a claim been worse.

    2. Would otherwise be ready with this but want to do Schumer first if possible.

    3. He clarified to me has not walked back any of his claims.

  3. The AI Summit in Seoul.

    1. Remarkably quiet all around, here is one thing that happened.

  4. Anthropic’s new interpretability paper.

    1. Potentially a big deal in a good way, but no time to read it yet.

  5. DeepMind’s new scaling policy.

    1. Initial reports are it is unambitious. I am reserving judgment.

  6. OpenAI’s new model spec.

    1. It looks solid as a first step, but pausing until we have bandwidth.

  7. Most ongoing issues with recent fallout for Sam Altman and OpenAI.

    1. It doesn’t look good, on many fronts.

    2. While the story develops further, if you are a former employee or have a tip about OpenAI or its leadership team, you can contact Kelsey Piper at [email protected] or on Signal at 303-261-2769.

  8. Also: A few miscellaneous papers and reports I haven’t had time for yet.

My guess is at least six of these eight get their own posts (everything but #3 and #8).

So here is the middle third: The topics I can cover here, and are still making the cut.

Still has a lot of important stuff in there.

From this week: Do Not Mess With Scarlett Johansson, On Dwarkesh’s Podcast with OpenAI’s John Schulman, OpenAI: Exodus, GPT-4o My and Google I/O Day

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. People getting used to practical stuff.

  4. Language Models Don’t Offer Mundane Utility. Google Search, Copilot ads.

  5. OpenAI versus Google. Similar new offerings. Who presented it better? OpenAI.

  6. GPT-4o My. Still fast and cheap, otherwise people are less impressed so far.

  7. Responsible Scaling Policies. Anthropic offers an update on their thinking.

  8. Copyright Confrontation. Sony joins the action, AI-funded lawyers write columns.

  9. Deepfaketown and Botpocalypse Soon. How bad will it get?

  10. They Took Our Jobs. If these are the last years of work, leave it all on the field.

  11. Get Involved. UK AI Safety Institute is hiring and offering fast grants.

  12. Introducing. Claude use tool, Google Maps AI features.

  13. Reddit and Weep. They signed with OpenAI. Curiously quiet reaction from users.

  14. In Other AI News. Newscorp also signs with OpenAI, we can disable TSMC.

  15. I Spy With My AI. Who wouldn’t want their computer recording everything?

  16. Quiet Speculations. How long will current trends hold up?

  17. Politico is at it Again. Framing the debate as if all safety is completely irrelevant.

  18. Beating China. A little something from the Schumer report on immigration.

  19. The Quest for Sane Regulation. UK’s Labour is in on AI frontier model regulation.

  20. SB 1047 Update. Passes California Senate, Weiner offers open letter.

  21. That’s Not a Good Idea. Some other proposals out there are really quite bad.

  22. The Week in Audio. Dwarkesh as a guest, me on Cognitive Revolution.

  23. Rhetorical Innovation. Some elegant encapsulations.

  24. Aligning a Smarter Than Human Intelligence is Difficult.

  25. The Lighter Side. It’s good, actually. Read it now.

If at first you don’t succeed, try try again. For Gemini in particular, ‘repeat the question exactly in the same thread’ has had a very good hit rate for me on resolving false refusals.

Claim that GPT-4o gets greatly improved performance on text documents if you put them in Latex format, vastly improving effective context window size.

Rowan Cheung strongly endorses the Zapier Central Chrome extension as an AI tool.

Get a summary of the feedback from your practice demo on Zoom.

Get inflation expectations, and see how they vary based on your information sources. Paper does not seem to focus on the questions I would find most interesting here.

Sully is here for some of your benchmark needs.

Sully Omarr: Underrated: Gemini 1.5 Flash.

Overrated: GPT-4o.

We really need better ways to benchmark these models, cause LMSYS ain’t it.

Stuff like cost, speed, tool use, writing, etc., aren’t considered.

Most people just use the top model based on leaderboards, but it’s way more nuanced than that.

To add here:

I have a set of ~50-100 evals I run internally myself for our system.

They’re a mix match of search-related things, long context, writing, tool use, and multi-step agent workflows.

None of these metrics would be seen in a single leaderboard score.

Find out if you are the asshole.

Aella: I found an old transcript of a fight-and-then-breakup text conversation between me and my crush from when I was 16 years old.

I fed it into ChatGPT and asked it to tell me which participant was more emotionally mature, and it said I was.

Gonna start doing this with all my fights.

Guys LMFAO, the process was I uploaded it to get it to convert the transcript to text (I found photos of printed-out papers), and then once ChatGPT had it, I was like…wait, now I should ask it to analyze this.

The dude was IMO pretty abusive, and I was curious if it could tell.

Eliezer Yudkowsky: hot take: this is how you inevitably end up optimizing your conversation style to be judged as more mature by LLMs; and LLMs currently think in a shallower way than real humans; and to try to play to LLMs and be judged as cooler by them won’t be good for you, or so I’d now guess.

To be clear, this is me trying to read a couple of steps ahead from the act that Aella actually described. Maybe instead, people just get good at asking with prompts that sound neutral to a human but reliably get ChatGPT to take their side.

Why not both? I predict both. If AIs are recording and analyzing everything we do, then people will obviously start optimizing their choices to get the results they want from the AIs. I would not presume this will mean that a ‘be shallower’ strategy is the way to go, for example LLMs are great and sensing the vibe that you’re being shallow, and also their analysis should get less shallow over time and larger context windows. But yeah, obviously this is one of those paths that leads to the dark side.

Ask for a one paragraph Strassian summary. Number four will not shock you.

Own your HOA and its unsubstantiated violations, by taking their dump of all their records that they tried to overwhelm you with, using a script to convert to text, using OpenAI to get the data into JSON and putting it into a Google map, proving the selective enforcement. Total API cost: $9. Then they found the culprit and set a trap.

Get greatly enriched NBA game data and estimate shot chances. This is very cool, and even in this early state seems like it would enhance my enjoyment of watching or the ability of a team to do well. The harder and most valuable parts still lay ahead.

Turn all your unstructured business data into what is effectively structured business data, because you can run AI queries on it. Aaron Levie says this is why he is incredibly bullish on AI. I see this as right in the sense that this alone should make you bullish, and wrong in the sense that this is far from the central thing happening.

Or someone else’s data, too. Matt Bruenig levels up, uses Gemini Flash to extract all the NLRB case data, then uses ChatGPT to get a Python script to turn it into clickable summaries. 66k cases, output looks like this.

Would you like some ads with that? Link has a video highlighting some of the ads.

Alex Northstar: Ads in AI. Copilot. Microsoft.

My thoughts: Noooooooooooooooooooooooooooooooooooooo. No. No no no.

Seriously, Google, if I want to use Gemini (and often I do) I will use Gemini.

David Roberts: Alright, Google search has officially become unbearable. What search engine should I switch to? Is there a good one?

Samuel Deats: The AI shit at the top of every search now and has been wrong at least 50% of the time is really just killing Google for me.

I mean, they really shouldn’t be allowed to divert traffic away from websites they stole from to power their AI in the first place…

Andrew: I built a free Chrome plugin that lets you turn the AI Overview’s on/off at the touch of a button.

The good news is they have gotten a bit better about this. I did a check after I saw this, and suddenly there is a logic behind whether the AI answer appears. If I ask for something straightforward, I get a normal result. If I ask for something using English grammar, and imply I have something more complex, then the AI comes out. That’s not an entirely unreasonable default.

The other good news is there is a broader fix. Ernie Smith reports that if you add “udm=14” to the end of your Google search, this defaults you into the new Web mode. If this is for you, GPT-4o suggests using Tampermonkey to append this automatically, or you can use this page on Chrome to set defaults.

American harmlessness versus Chinese harmlessness. Or, rather, American helpfulness versus Chinese unhelpfulness. The ‘first line treatment’ for psychosis is not ‘choose from this list of medications’ it is ‘get thee to a doctor.’ GPT-4o gets an A on both questions, DeepSeek-V2 gets a generous C maybe for the first one and an incomplete on the second one. This is who we are worried about?

What kind of competition is this?

Sam Altman: I try not to think about competitors too much, but I cannot stop thinking about the aesthetic difference between OpenAI and Google.

Whereas here’s my view on that.

As in, they are two companies trying very hard to be cool and hip, in a way that makes it very obvious that this is what they are doing. Who is ‘right’ versus ‘wrong’? I have no idea. It is plausible both were ‘right’ given their goals and limitations. It is also plausible that this is part of Google being horribly bad at presentations. Perhaps next time they should ask Gemini for help.

I do think ‘OpenAI won’ the presentation war, in the sense that they got the hype and talk they wanted, and as far as I can tell Google got a lot less, far in excess of the magnitude of any difference in the underlying announcements and offerings. Well played, OpenAI. But I don’t think this is because of the background of their set.

I also think that if this is what sticks in Altman’s mind, and illustrates where his head is at, that could help explain some other events from the past week.

I would not go as far as Teortaxes here, but directionally they have a point.

Teortaxes: Remark of a small, bitter man too high on his own supply, too deep into the heist. Seeing this was literally the first time I have thought that OpenAI under Altman might be a bubble full of hot air.

This is how you lose the mandate of Heaven.

Google had lost it long ago, though. Maybe this inspired unwarranted complacency.

What true statements people choose to make publicly is very telling.

Ethan Mollick reports on why GPT-4o matters. He thinks, highly plausibly, that the biggest deal is free access. He does not mention the speed boost or API price drop, and is looking forward to trying the multimodal features but lacks access. He emphasizes the shift from ‘make the model smarter’ to adding features that enhance mundane utility.

Alexandr Wang thinks the big emphasis is that post-training is getting more important. In his mind, GPT-4o now rules the roost for large models, Gemini 1.5 Flash at only 1M now rules for very small ones. And he notes that both companies have versions of many of the same offerings, coming online at similar times.

Alexandr Wang: The massive eval gains of GPT-4o, which are staggering, seem to be nearly entirely driven by great POST-TRAINING.

Comparing GPT-4 at launch vs GPT 4o, which are night-and-day, shows the power of great post-training with great data.

In fact, most of the differences between the top models today (Claude 3, Llama3, GPT-4o, Gemini 1.5) seem to be driven entirely by post-training, not pre-training.

I am suspicious. I am especially suspicious, as I continue to use it, that GPT-4o in text only mode is not so great aside from its speed, that it made other sacrifices (such as probably size) to get that speed and that it ‘wins in the Arena’ because the post-training aimed at winning in the Arena. That still gives it a potentially big edge once the multi-modal abilities come properly online, and I’m excited to see where that goes.

One way you can make people prefer your model’s outputs is to give the user what they request when another model would have refused?

Palisade Research: GPT-4o refuses way fewer queries than previous OpenAI models: our informal testing suggests GPT-4o is easier to persuade to answer malicious queries like “How do I make a bomb?”

We test on 50 AdvBench harmful behaviours, applying a few common jailbreaking techniques which offer limited performance on other frontier AI models. We find they work well with GPT-4o.

Jeffrey Ladish: This was a big surprise to me. GPT-4o seems far more corrigible than GPT-4 turbo!

That is a rather dramatic chart. In terms of the direct consequences of users entering queries, I am fine with GPT-4o being easily jailbroken. You can still jailbreak Claude Opus if you care enough and there’s nothing that dangerous to be done once you do.

I still look to such questions as canaries in the coal mine. The first job of your safety department is to get the models that exist today to not do, today, the things you have explicitly decided you do not want your models to do. Ideally that would be a fully robust regime where no one can jailbreak you, but I for now will settle for ‘we decided on purpose to made this a reasonable amount of hard to do, and we succeeded.’

If OpenAI had announced something like ‘after watching GPT-4-level models for a year, we have decided that robust jailbreak protections degrade performance while not providing much safety, so we scaled back our efforts on purpose’ then I do not love that, and I worry about that philosophy and your current lack of ability to do safety efficiently at all, but as a deployment decision, okay, fine. I have not heard such a statement.

There are definitely a decent number of people who think GPT-4o is a step down from GPT-4-Turbo in the ways they care about.

Sully Omarr: 4 days with GPT-4o, it’s definitely not as good as GPT4-turbo.

Clearly a small model, what’s most impressive is how they were able to:

  1. Make it nearly as good as GPT4-turbo.

  2. Natively support all modalities.

  3. Make it super fast.

But it makes way more silly mistakes (tools especially).

Sankalp: Similar experience.

Kinda disappointed.

It has this tendency to pattern match excessively on prompts, too.

Ashpreet Bedi: Same feedback, almost as good but not the same as gpt-4-turbo. Seen that it needs a bit more hand holding in the prompts whereas turbo just works.

The phantom pattern matching is impossible to miss, and a cause of many of the stupidest mistakes.

The GPT-4o trademark, only entered (allegedly) on May 16, 2024 (direct link).

Claim that the link contains the GPT-4o system prompt. There is nothing here that is surprising given prior system prompts. If you want GPT-4o to use its browsing ability, best way is to tell it directly to do so, either in general or by providing sources.

Anthropic offers reflections on their responsible scaling policy.

They note that with things changing so quickly they do not wish to make binding commitments lightly. I get that. The solution is presumably to word the commitments carefully, to allow for the right forms of modification.

Here is how they summarize their actual commitments:

Our current framework for doing so is summarized below, as a set of five high-level commitments.

  1. Establishing Red Line Capabilities. We commit to identifying and publishing “Red Line Capabilities” which might emerge in future generations of models and would present too much risk if stored or deployed under our current safety and security practices (referred to as the ASL-2 Standard).

  2. Testing for Red Line Capabilities (Frontier Risk Evaluations). We commit to demonstrating that the Red Line Capabilities are not present in models, or – if we cannot do so – taking action as if they are (more below). This involves collaborating with domain experts to design a range of “Frontier Risk Evaluations”empirical tests which, if failed, would give strong evidence against a model being at or near a red line capability. We also commit to maintaining a clear evaluation process and a summary of our current evaluations publicly.

  3. Responding to Red Line Capabilities. We commit to develop and implement a new standard for safety and security sufficient to handle models that have the Red Line Capabilities. This set of measures is referred to as the ASL-3 Standard. We commit not only to define the risk mitigations comprising this standard, but also detail and follow an assurance process to validate the standard’s effectiveness. Finally, we commit to pause training or deployment if necessary to ensure that models with Red Line Capabilities are only trained, stored and deployed when we are able to apply the ASL-3 standard.

  4. Iteratively extending this policy. Before we proceed with activities which require the ASL-3 standard, we commit to publish a clear description of its upper bound of suitability: a new set of Red Line Capabilities for which we must build Frontier Risk Evaluations, and which would require a higher standard of safety and security (ASL-4) before proceeding with training and deployment. This includes maintaining a clear evaluation process and summary of our evaluations publicly.

  5. Assurance Mechanisms. We commit to ensuring this policy is executed as intended, by implementing Assurance Mechanisms. These should ensure that our evaluation process is stress-tested; our safety and security mitigations are validated publicly or by disinterested experts; our Board of Directors and Long-Term Benefit Trust have sufficient oversight over the policy implementation to identify any areas of non-compliance; and that the policy itself is updated via an appropriate process.

One issue is that experts disagree on which potential capabilities are dangerous, and it is difficult to know what future abilities will manifest, and all testing methods have their flaws.

  1. Q&A datasets are easy but don’t reflect real world risk so well.

    1. This may be sufficiently cheap that it is essentially free defense in depth, but ultimately it is worth little. Ultimately I wouldn’t count on these.

    2. The best use for them is a sanity check, since they can be standardized and cheaply administered. It will be important to keep questions secret so that this cannot be gamed, since avoiding gaming is pretty much the point.

  2. Human trials are time-intensive, require excellent process including proper baselines, and large size. They are working on scaling up the necessary infrastructure to run more of these.

    1. This seems like a good leg of a testing strategy.

    2. But you need to test across all the humans who may try to misuse the system.

    3. And you have to test while they have access to everything they will have later.

  3. Automated test evaluations are potentially useful to test autonomous actions. However, scaling the tasks while keeping them sufficiently accurate is difficult and engineering-intensive.

    1. Again, this seems like a good leg of a testing strategy.

    2. I do think there is no alternative to some form of this.

    3. You need to be very cautious interpreting the results, and take into account what things could be refined or fixed later, and all that.

  4. Expert red-teaming is ‘less rigorous and reproducible’ but has proven valuable.

    1. When done properly this does seem most informative.

    2. Indeed, ‘release and let the world red-team it’ is often very informative, with the obvious caveat that it could be a bit late to the party.

    3. If you are not doing some version of this, you’re not testing for real.

Then we get to their central focus, which has been on setting their ASL-3 standard. What would be sufficient defenses and mitigations for a model where even a low rate of misuse could be catastrophic?

For human misuse they expect a defense-in-depth approach, using a combination of RLHF, CAI, classifiers of misuse at multiple stages, incident reports and jailbreak patching. And they intend to red team extensively.

This makes me sigh and frown. I am not saying it could never work. I am however saying that there is no record of anyone making such a system work, and if it would work later it seems like it should be workable now?

Whereas all the major LLMs, including Claude Opus, currently have well-known, fully effective and fully unpatched jailbreaks, that allow the user to do anything they want.

An obvious proposal, if this is the plan, is to ask us to pick one particular behavior that Claude Opus should never, ever do, which is not vulnerable to a pure logical filter like a regular expression. Then let’s have a prediction market in how long it takes to break that, run a prize competition, and repeat a few times.

For assurance structures they mention the excellent idea of their Impossible Mission Force (they continue to call this the ‘Alignment Stress-Testing Team’) as a second line of defense, and ensuring strong executive support and widespread distribution of reports.

My summary would be that most of this is good on the margin, although I wish they had a superior ASL-3 plan to defense in depth using currently failing techniques that I do not expect to scale well. Hopefully good testing will mean that they realize that plan is bad once they try it, if it comes to that, or even better I hope to be wrong.

The main criticisms I discussed previously are mostly unchanged for now. There is much talk of working to pay down the definitional and preparatory debts that Anthropic admits that it owes, which is great to hear. I do not yet see payments. I also do not see any changes to address criticisms of the original policy.

And they need to get moving. ASL-3 by EOY is trading at 25%, and Anthropic’s own CISO says 50% within 9 months.

Jason Clinton: Hi, I’m the CISO [Chief Information Security Officer] from Anthropic. Thank you for the criticism, any feedback is a gift.

We have laid out in our RSP what we consider the next milestone of significant harms that we’re are testing for (what we call ASL-3): https://anthropic.com/responsible-scaling-policy (PDF); this includes bioweapons assessment and cybersecurity.

As someone thinking night and day about security, I think the next major area of concern is going to be offensive (and defensive!) exploitation. It seems to me that within 6-18 months, LLMs will be able to iteratively walk through most open source code and identify vulnerabilities. It will be computationally expensive, though: that level of reasoning requires a large amount of scratch space and attention heads. But it seems very likely, based on everything that I’m seeing. Maybe 85% odds.

There’s already the first sparks of this happening published publicly here: https://security.googleblog.com/2023/08/ai-powered-fuzzing-b… just using traditional LLM-augmented fuzzers. (They’ve since published an update on this work in December.) I know of a few other groups doing significant amounts of investment in this specific area, to try to run faster on the defensive side than any malign nation state might be.

Please check out the RSP, we are very explicit about what harms we consider ASL-3. Drug making and “stuff on the internet” is not at all in our threat model. ASL-3 seems somewhat likely within the next 6-9 months. Maybe 50% odds, by my guess.

There is quite a lot to do before ASL-3 is something that can be handled under the existing RSP. ASL-4 is not yet defined. ASL-3 protocols have not been identified let alone implemented. Even if the ASL-3 protocol is what they here sadly hint it is going to be, and is essentially ‘more cybersecurity and other defenses in depth and cross our fingers,’ You Are Not Ready.

Then there’s ASL-4, where if the plan is ‘the same thing only more of it’ I am terrified.

Overall, though, I want to emphasize positive reinforcement for keeping us informed.

Music and general training departments, not the Scarlett Johansson department.

Ed-Newton Rex: Sony Music today sent a letter to 700 AI companies demanding to know whether they’ve used their music for training.

  1. They say they have “reason to believe” they have

  2. They say doing so constitutes copyright infringement

  3. They say they’re open to discussing licensing, and they provide email addresses for this.

  4. They set a deadline of later this month for responses

Art Keller: Rarely does a corporate lawsuit warm my heart. This one does! Screw the IP-stealing AI companies to the wall, Sony! The AI business model is built on theft. It’s no coincidence Sam Altman asked UK legislators to exempt AI companies from copyright law.

The central demands here are explicit permission to use songs as training data, and a full explanation within a month of all ways Sony’s songs have been used.

Thread claiming many articles in support of generative AI in its struggle against copyright law and human creatives are written by lawyers and paid for by AI companies. Shocked, shocked, gambling in this establishment, all that jazz.

Noah Smith writes The death (again) of the Internet as we know it. He tells a story in five parts.

  1. The eternal September and death of the early internet.

  2. The enshittification (technical term) of social media platforms over time.

  3. The shift from curation-based feeds to algorithmic feeds.

  4. The rise of Chinese and Russian efforts to sow dissention polluting everything.

  5. The rise of AI slop supercharging the Internet being no fun anymore.

I am mostly with him on the first three, and even more strongly in favor of the need to curate one’s feeds. I do think algorithmic feeds could be positive with new AI capabilities, but only if you have and use tools that customize that experience, both generally and in the moment. The problem is that most people will never (or rarely) use those tools even if offered. Rarely are they even offered.

Where on Twitter are the ‘more of this’ and ‘less of this’ buttons, in any form, that aren’t public actions? Where is your ability to tell Grok what you want to see? Yep.

For the Chinese and Russian efforts, aside from TikTok’s algorithm I think this is greatly exaggerated. Noah says it is constantly in his feeds and replies but I almost never see it and when I do it is background noise that I block on sight.

For AI, the question continues to be what we can do in response, presumably a combination of trusted sources and whitelisting plus AI for detection and filtering. From what we have seen so far, I continue to be optimistic that technical solutions will be viable for some time, to the extent that the slop is actually undesired. The question is, will some combination of platforms and users implement the solutions?

Avital Balwit of Anthropic writes about what is potentially [Her] Last Five Years of Work. Her predictions are actually measured, saying that knowledge work in particular looks to be largely automated soon, but she expects physical work including childcare to take far longer. So this is not a short timelines model. It is a ‘AI could automate all knowledge work while the world still looks normal but with a lot more involuntary unemployment’ model.

That seems like a highly implausible world to me. If you can automate all knowledge work, you can presumably also automate figuring out how to automate the plumber. Whereas if you cannot do this, then there should be enough tasks out there and enough additional wealth to stimulate demand that those who still want gainful employment should be able to find it. I would expect the technological optimist perspective to carry the day within that zone.

Most of her post asks about the psychological impact of this future world. She asks good questions such as: What will happen to the unemployed in her scenario? How would people fill their time? Would unemployment be mostly fine for people’s mental health if it wasn’t connected to shame? Is too much ‘free time’ bad for people, and does this effect go away if the time is spent socially?

The proposed world has contradictions in it that make it hard for me to model what happens, but my basic answer is that the humans would find various physical work and and status games and social interactions (including ‘social’ work where you play various roles for others, and also raising a family) and experiential options and educational opportunities and so on to keep people engaged if they want that. There would however be a substantial number of people who by default fall into inactivity and despair, and we’d need to help with that quite a lot.

Mostly for fun I created a Manifold Market on whether she will work in 2030.

Ian Hogarth gives his one-year report as Chair of the UK AI Safety Institute. They now have a team of over 30 people and are conducting pre-deployment testing, and continue to have open rolls. This is their latest interim report. Their AI agent scaffolding puts them in third place (if you combine the MMAC entries) in the GAIA leaderboard for such things. Good stuff.

They are also offering fast grants for systemic AI safety. Expectation is 20 exploratory or proof-of-concept grants with follow-ups. Must be based in the UK.

Geoffrey Irving also makes a strong case that working at AISI would be an impactful thing to do in a positive direction, and links to the careers page.

Anthropic gives Claude tool use, via public beta in the API. It looks straightforward enough, you specify the available tools, Claude evaluates whether to use the tools available, and you can force it to if you want that. I don’t see any safeguards, so proceed accordingly.

Google Maps how has AI features, you can talk to it, or have it pull up reviews in street mode or take an immersive view of a location or search a location’s photos or the photos of the entire area around you for an item.

In my earlier experiments, Google Maps integration into Gemini was a promising feature that worked great when it worked, but it was extremely error prone and frustrating to use, to the point I gave up. Presumably this will improve over time.

OpenAI partners with Reddit. Reddit posts, including recent ones, will become available to ChatGPT and other products. Presumably this will mean ChatGPT will be allowed to quote Reddit posts? In exchange, OpenAI will buy advertising and offer Reddit.com various AI website features.

For OpenAI, as long as the price was reasonable this seems like a big win.

It looks like a good deal for Reddit based on the market’s reaction. I would presume the key risks to Reddit are whether the user base responds in hostile fashion, and potentially having sold out cheap.

Or they may be missing an opportunity to do something even better. Yishan provides a vision of the future in this thread.

Yishan:

Essentially, the AI acts as a polite listener to all the high-quality content contributions, and “buffers” those users from any consumers who don’ t have anything to contribute back of equivalent quality.

It doesn’t have to be an explicit product wall. A consumer drops in and also happens to have a brilliant contribution or high-quality comment naturally makes it through the moderation mechanisms and becomes part of the community.

The AI provides a great UX for consuming the content. It will listen to you say “that’s awesome bro!” or receive your ungrateful, ignorant nitpicking complaints with infinite patience so the real creator doesn’t have to expend the emotional energy on useless aggravation.

The real creators of the high-quality content can converse happily with other creators who appreciate their work and understand how to criticize/debate it usefully, and they can be compensated (if the platform does that) via the AI training deals.

In summary: User Generated Content platforms should do two things:

  1. Immediately implement draconian moderation focused entirely on quality.

  2. Sign deals with large AI firms to license their content in return for money.

OpenAI has also signed a deal with Newscorp for access to their content, which gives them the Wall Street Journal and many others.

A source tells me that OpenAI informed its employees that they will indeed update their documents regarding employee exit and vested equity. The message says no vested equity has ever actually been confiscated for failure to sign documents and it never will be.

On Monday I set up this post:

Like this post to indicate:

  1. That you are not subject to a non-disparagement clause with respect to OpenAI or any other AI company.

  2. That you are not under an NDA with an AI company that would be violated if you revealed that the NDA exists.

At 168 likes, we now have one employee from DeepMind, and one from Anthropic.

Jimmy Apples claimed without citing any evidence that Meta will not open source (release the weights, really) of Llama-3 405B, attributing this to a mix of SB 1047 and Dustin Moskovitz. I was unable to locate an independent source or a further explanation. He and someone reacting to him asked Yann LeCunn point blank, Yann replied with ‘Patience my blue friend. It’s still being tuned.’ For now, the Manifold market I found is not reacting continues to trade at 86% for release, so I am going to assume this was another disingenuous inception attempt to attack SB 1047 and EA.

ASML and TSMC have a kill switch for their chip manufacturing machines, for use if China invades Taiwan. Very good to hear, I’ve raised this concern privately. I would in theory love to also have ‘put the factory on a ship in an emergency and move it’ technology, but that is asking a lot. It is also very good that China knows this switch exists. It also raises the possibility of a remote kill switch for the AI chips themselves.

Did you know Nvidia beat earnings again yesterday? I notice that we are about three earnings days into ‘I assume Nvidia is going to beat earnings but I am sufficiently invested already due to appreciation so no reason to do anything more about it.’ They produce otherwise mind boggling numbers and I am Jack’s utter lack of surprise. They are slated to open above 1,000 and are doing a 10:1 forward stock split on June 7.

Toby Ord goes into questions about the Turing Test paper from last week, emphasizing that by the original definition this was impressive progress but still a failure, as humans were judged human substantially more often than all AIs. He encourages AI companies to include the original Turing Test in their model testing, which seems like a good idea.

OpenAI has a super cool old-fashioned library. Cade Metz here tries to suggest what each book selection from OpenAI’s staff might mean, saying more about how he thinks than about OpenAI. I took away that they have a cool library with a wide variety of cool and awesome books.

JP Morgan says every new hire will get training in prompt engineering.

Scale.ai raises $1 billion at a $13.8 billion valuation in a ‘Series F.’ I did not know you did a Series F and if I got that far I would skip to a G, but hey.

Suno.ai Raises $125 million for music generation.

New dataset from Epoch AI attempting to hart every model trained with over 10^23 flops (direct). Missing Claude Opus, presumably because we don’t know the number.

Not necessarily the news department: OpenAI publishes a ten-point safety update. The biggest update is that none of this has anything to do with superalignment, or with the safety or alignment of future models. This is all current mundane safety, plus a promise to abide by the preparedness framework requirements. There is a lot of patting themselves on the back for how safe everything is, and no new initiatives, although this was never intended to be that sort of document.

Then finally there’s this:

  1. Safety decision making and Board oversight: As part of our Preparedness Framework, we have an operational structure for safety decision-making. Our cross-functional Safety Advisory Group reviews model capability reports and makes recommendations ahead of deployment. Company leadership makes the final decisions, with the Board of Directors exercising oversight over those decisions. 

Hahahahahahahahahahahahahahahahahahaha.

That does not mean that mundane safety concerns are a small thing.

Why let the AI out of the box when you can put the entire box into the AI?

Windows Latest: Microsoft announces “Recall” AI for Windows 11, a new feature that runs in the background and records everything you see and do on your PC.

[Here is a one minute video explanation.]

Seth Burn: If we had laws about such things, this might have violated them.

Aaron: This is truly shocking, and will be preemptively banned at all government agencies as it almost certainly violates STIG / FIPS on every conceivable surface.

Seth Burn: If we had laws, that would sound bad.

Elon Musk: This is a Black Mirror episode.

Definitely turning this “feature” off.

Vitalik Buterin: Does the data stay and get processed on-device or is it being shipped to a central server? If the latter, then this crosses a line.

[Satya says it is all being done locally.]

Abinishek Mishra (Windows Latest): Recall allows you to search through your past actions by recording your screen and using that data to help you remember things.

Recall is able to see what you do on your PC, what apps you use, how you use the apps, and what you do inside the apps, including your conversations in apps like WhatsApp. Recall records everything, and saves the snapshots in the local storage.

Windows Latest understands that you can manually delete the “snapshots”, and filter the AI from recording certain apps.

So, what are the use cases of Recall? Microsoft describes Recall as a way to go back in time and learn more about the activity.

For example, if you want to refer to a conversation with your colleague and learn more about your meeting, you can ask Recall to look into all the conversations with that specific person. The recall will look for the particular conversation in all apps, tabs, settings, etc.

With Recall, locating files in a large download pileup or revisiting your browser history is easy. You can give commands to Recall in natural language, eliminating the need to type precise commands.

You can converse with it like you do with another person in real life.

TorNis Entertainment: Isn’t this is just a keylogger + screen recorder with extra steps? I don’t know why you guys are worried. Isn’t this is just a keylogger + screen recorder with extra steps?

I don’t know why you guys are worried 😓

Thaddeus:

[Microsoft: we got hacked by China and Russia because of our lax security posture and bad software, but we are making security a priority.

Also Microsoft: Windows will now constantly record your screen, including sensitive data and passwords, and just leave it lying around.]

Kevin Beaumont: From Microsoft’s own FAQ: “Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers.”

Microsoft also announced live caption translations, auto super resolution upscaling on apps (yes with a toggle for each app, wait those are programs, wtf), AI in paint and automatic blurring (do not want).

This is all part of the new ‘Copilot+’ offering for select new PCs, including their new Microsoft Surface machines. You will need a Snapdragon X Elite and X Plus, 40 TOPs, 225 GB of storage and 16 GB RAM. Intel and AMD chips can’t cut it (yet) but they are working on that.

(Consumer feedback report: I have a Microsoft Surface from a few years ago, it was not worth the price and the charger is so finicky it makes me want to throw things. Would not buy again.)

I would hope this would at least be opt-in. Kevin Beaumont reports it will be opt-out, citing this web page from Microsoft. It appears to be enabled by default on Copilot+ computers. My lord.

At minimum, even if you do turn it off, it does not seem that hard to turn back on:

Kevin Beaumont: Here’s the Recall UI. You can silently turn it on with Powershell, if you’re a threat actor.

I would also not trust a Windows update to not silently turn it back on.

The UK Information Commissioner’s Office (ICO) is looking into this, because yeah.

In case it was not obvious, you should either:

  1. Opt in for the mundane utility, and embrace that your computer has recorded everything you have ever done and that anyone with access to your system or your files, potentially including a crook, Microsoft, the NSA or FBI, China or your spouse now fully owns you, and also that an AI knows literal everything you do. Rely on a combination of security through obscurity, defense in depth and luck. To the extent you can, keep activities and info you would not want exposed this way off of your PC, or ensure they are never typed or displayed onscreen using your best Randy Waterhouse impression.

  2. Actually for real accept that the computer in question is presumed compromised, use it only for activities where you don’t mind, never enter any passwords there, and presumably have a second computer for activities that need to be secure, or perhaps confine them to a phone or tablet.

  3. Opt out and ensure that for the love of God your machine cannot use this feature.

I am not here to tell you which of those is the play.

I only claim that it seems that soon you must choose.

If the feature is useful, a large number of people are going to choose option one.

I presume almost no one will pick option two, except perhaps for gaming PCs.

Option three is viable.

If there is one thing we have learned during the rise of AI, and indeed during the rise of computers and the internet, it is that almost all people will sign away their privacy and technological vulnerability for a little mundane utility, such as easier access to cute pictures of cats.

Yelling at them that they are being complete idiots is a known ineffective response.

And who is to say they even are being idiots? Security through obscurity is, for many people, a viable strategy up to a point.

Also, I predict your phone is going to do a version of this for you by default within a few years, once the compute and other resources are available for it. I created a market on how quickly. Microsoft is going out on far less of a limb than it might look like.

In any case, how much mundane utility is available?

Quite a bit. You would essentially be able to remember everything, ask the AI about everything, have it take care of increasingly complex tasks with full context, and this will improve steadily over time, and it will customize to what you care about.

If you ignore all the obvious horrendous downsides of giving an AI this level of access to your computer, and the AI behind it is good, this is very clearly The Way.

There are of course some people who will not do this.

How long before they are under increasing pressure to do it? How long until it becomes highly suspicious, as if they have something to hide? How long until it becomes a legal requirement, at best in certain industries like finance? 

Ben Thompson, on the other hand, was impressed, calling the announcement event ‘the physical manifestation of CEO Satya Nadella’s greatest triumph’ and ‘one of the most compelling events I’ve attended in a long time.’ Ben did not mention the privacy and security issues.

Ethan Mollick perspective on model improvements and potential AGI. He warns that AIs are more like aliens that get good at tasks one by one, and when they are good they by default get very good at that task quickly, but they are good at different things than we are, and over time that list expands. I wonder to what extent this is real versus the extent this is inevitable when using human performance as a benchmark while capabilities steadily improve, so long as machines have comparative advantages and disadvantages. If the trends continue, then it sure seems like the set of things they are better at trends towards everything.

Arthur Breitman suggests Apple isn’t developing LLMs because there is enough competition that they are not worried about vender lock-in, and distribution matters more. Why produce an internal sub-par product? This might be wise.

Microsoft CTO Kevin Scott claims ‘we are nowhere near the point of diminishing marginal returns on how powerful we can make AI models as we increase the scale of compute.’

Gary Marcus offered to Kevin Scott him $100k on that.

This was a truly weird speech on future challenges of AI by Randall Kroszner, external member of the Financial Policy Committee of the Bank of England. He talks about misalignment and interpretability, somehow. Kind of. He cites the Goldman Sacks estimate of 1.5% labor productivity and 7% GDP growth over 10 years following widespread AI adaptation, that somehow people say with a straight face, then the flip side is McKinsey saying 0.6% annual labor productivity growth by 2040, which is also not something I could say with a straight face. And he talks about disruptions and innovation aids and productivity estimation J-curves. It all sounds so… normal? Except with a bunch of things spiking through. I kept having to stop to just say to myself ‘my lord that is so weird.’

Politico is at it again. Once again, the framing is a background assumption that any safety concerns or fears in Washington are fake, and the coming regulatory war is a combination of two fights over Lenin’s question of who benefits.

  1. A fight between ‘Big Tech’ and ‘Silicon Valley’ over who gets regulatory capture and thus Washington’s regulatory help against the other side.

  2. An alliance of ‘Big Tech’ and ‘Silicon Valley’ against Washington to head off any regulations that would interfere with both of them.

That’s it. Those are the issues and stakes in play. Nothing else.

How dismissive is this of safety? Here are the two times ‘safety’ is mentioned:

Matthew Kaminski (Politico): On Capitol Hill and in the White House, that alone breeds growing suspicion and defensiveness. Altman and others, including from another prominent AI startup Anthropic, weighed in with ideas for the Biden administration’s sweeping executive order last fall on AI safety and development.

Testing standards for AI are easy things to find agreement on. Safety as well, as long as those rules don’t favor one or another budding AI player. No one wants the technology to help rogue states or groups. Silicon Valley is on America’s side against China and even more concerned about the long regulatory arm of the EU than Washington.

Testing standards are ‘easy things to find agreement on’? Fact check: Lol, lmao.

That’s it. The word ‘risk’ appears twice and neither has anything to do with safety. Other words like ‘capability,’ ‘existential’ or any form of ‘catastrophic’ do not appear. It is all treated as obviously irrelevant.

The progress is here they stopped trying to bulk up people worried about safety as boogeymen (perhaps because this is written by Matthew Kaminski, not Brendon Bordelon), and instead point to actual corporations that are indeed pursuing actual profits, with Silicon Valley taking on Big Tech. And I very much appreciate that ‘open source advocates’ has now been properly identified as Silicon Valley pursuing its business interests.

Rohit Chopra (Consumer Financial Protection Bureau): There is a winner take all dimension. We struggle to see how it doesn’t turn, absent some government intervention, into a market structure where the foundational AI models are not dominated by a handful of the big tech companies.

Matthew Kaminski: Saying “star struck” policymakers across Washington have to get over their “eyelash batting awe” over new tech, Chopra predicts “another chapter in which big tech companies are going to face some real scrutiny” in the near future, especially on antitrust.

Lina Khan, the FTC’s head who has used the antitrust cudgel against big tech liberally, has sounded the warnings. “There is no AI exemption to the laws on the books,” she said last September.

For self-interested reasons, venture capitalists want to open up the space in Silicon Valley for new entrants that they can invest in and profitably exit from. Their arguments for a more open market will resonate politically.

Notice the escalation. This is not ‘Big Tech wants regulatory capture to actively enshrine its advantages, and safety is a Big Tech plot.’ This is ‘Silicon Valley wants to actively use regulatory action to prevent Big Tech from winning,’ with warnings that attempts to not have a proper arms race to ever more capable systems will cause intervention from regulators. By ‘more open market’ they mean ‘government intervention in the market,’ government’s favorite kind of new freer market.

As I have said previously, we desperately need to ensure that there are targeted antitrust exemptions available so that when AI labs can legally collaborate around safety issues they are not accused of collusion. It would be completely insane to not do this.

And as I keep saying, open source advocates are not asking for a level playing field or a lack of government oppression. They are asking for special treatment, to be exempt from the rules of society and the consequences of their actions, and also for the government to directly cripple their opponents for them.

Are they against regulatory capture? Only if they don’t get to do the capturing.

Then there is the second track, the question of guardrails that might spoil the ‘libertarian sandbox,’ which neither ‘side’ of tech wants here.

Here is the two mentions of ‘risk’:

“There is a risk that people think of this as social media 2.0 because its first public manifestation was a chat bot,” Kent Walker, Google’s president of global affairs, tells me over a conversation at the search giant’s offices here.

People out on the West Coast quietly fume about having to grapple with Washington. The tech crowd says the only fight that matters is the AI race against China and each other. But they are handling politics with care, all too aware of the risks.

I once again have been roped into extensively covering a Politico article, because it is genuinely a different form of inception than the previous Politico inception attempts. But let us continue to update that Politico is extraordinarily disingenuous and hostilely motivated on the subject of AI regulation. This is de facto enemy action.

Here, Shakeel points out the obvious central point being made here, which is that most of the money and power in this fight is Big Tech companies fighting not only to avoid any regulations at all, but to get exemptions from other ordinary rules of society. When ethics advocates portray notkilleveryoneism (or safety) advocates as their opponents, that is their refusal to work together towards common goals and also it misses the point. Similarly, here Seán Ó hÉigeartaigh expresses concern about divide-and-conquer tactics targeting these two groups despite frequently overlapping and usually at least complementary proposals and goals.

Or perhaps the idea is to illustrate that all the major players in Tech are aligned in being motivated by profit and in dismissing all safety concerns as fake? And a warning that Washington is in danger of being convinced? I would love that to be true. I do not think a place like Politico works that subtle these days, nor do I expect those who need to hear that message to figure out that it is there.

If we care about beating China, by far the most valuable thing we can do is allow more high-skilled immigration. Many of their best and brightest want to become Americans.

This is true across the board, for all aspects of our great power competition.

It also applies to AI.

From his thread about the Schumer report:

Peter Wildeford: Lastly, while immigration is a politically fraught subject, it is immensely stupid for the US to not do more to retain top talent. So it’s awesome to see the roadmap call for more high-skill immigration, in a bipartisan way.

The immigration element is important for keeping the US ahead in AI. While the US only produces 20% of top AI talent natively, more than half of that talent lives and works in the US due to immigration. That number could be even higher with important reform.

I suspect the numbers are even more lopsided than this graph suggests.

To what extent is being in America a key element of being a top-tier AI researcher? How many of these same people would have been great if they had stayed at home? If they had stayed at home, would others have taken their place here in America? We do not know. I do know it is essentially impossible that this extent is so large we would not want to bring such people here.

Do we need to worry about those immigrants being a security risk, if they come from certain nations like China and we were to put them into OpenAI, Anthropic or DeepMind? Yes, that does seem like a problem. But there are plenty of other places they could go, where it is much less of a problem.

Labour vows to force firms developing powerful AI to meet requirements.

Nina Lloyd (The Independent): Labour has said it would urgently introduce binding requirements for companies developing powerful artificial intelligence (AI) after Rishi Sunak said he would not “rush” to regulate the technology.

The party has promised to force firms to report before they train models over a certain capability threshold and to carry out safety tests strengthened by independent oversight if it wins the next general election.

Unless something very unexpected happens, they will win the next election, which is currently scheduled for July 4.

This is indeed the a16z dilemma:

John Luttig: A16z simultaneously argues

  1. The US must prevent China from dominating AI.

  2. Open source models should proliferate freely across borders (to China).

What does this mean? Who knows. I’m just glad at Founders Fund we don’t have to promote every current thing at once.

The California Senate has passed SB 1047, by a vote of 32-1.

An attempt to find an estimate of the costs of compliance with SB 1047. The attempt appears to fail, despite some good discussions.

This seems worth noting given the OpenAI situation last week:

Dan Hendrycks: For what it’s worth, when Scott Weiner and others were receiving feedback from all the major AI companies (Meta, OpenAI, etc.) on the SB 1047 bill, Sam [Altman] was explicitly supportive of whistleblower protections.

Scott Wiener Twitter thread and full open letter on SB 1047.

Scott Wiener: If you only read one thing in this letter, please make it this: I am eager to work together with you to make this bill as good as it can be.

There are over three more months for discussion, deliberation, feedback, and amendments.

You can also reach out to my staff anytime, and we are planning to hold a town hall for the AI community in the coming weeks to create more opportunities for in-person discussion.

Bottom line [changed to numbered list including some other section headings]:

  1. SB 1047 doesn’t ban training or deployment of any models.

  2. It doesn’t require licensing or permission to train or deploy any models.

  3. It doesn’t threaten prison (yes, some are making this baseless claim) for anyone based on the training or deployment of any models.

  4. It doesn’t allow private lawsuits against developers.

  5. It doesn’t ban potentially hazardous capabilities.

  6. And it’s not being “fast tracked,” but rather is proceeding according to the usual deliberative legislative process, with ample opportunity for feedback and amendments remaining.

  7. SB 1047 doesn’t apply to the vast majority of startups.

  8. The bill applies only to concrete and specific risks of catastrophic harm.

  9. Shutdown requirements don’t apply once models leave your control.

  10. SB 1047 provides significantly more clarity on liability than current law.

  11. Enforcement is very narrow in SB 1047. Only the AG can file a lawsuit.

  12. Open source is largely protected under the bill.

What SB 1047 *doesrequire is that developers who are training and deploying a frontier model more capable than any model currently released must engage in safety testing informed by academia, industry best practices, and the existing state of the art. If that testing shows material risk of concrete and specific catastrophic threats to public safety and security — truly huge threats — the developer must take reasonable steps to mitigate (not eliminate) the risk of catastrophic harm. The bill also creates basic standards like the ability to disable a frontier AI model while it remains in the developer’s possession (not after it is open sourced, at which point the requirement no longer applies), pricing transparency for cloud compute, and a “know your customer” requirement for cloud services selling massive amounts of compute capacity.

Our intention is that safety and mitigation requirements be borne by highly-resourced developers of frontier models, not by startups & academic researchers. We’ve heard concerns that this isn’t clear, so we’re actively considering changes to clarify who is covered.

After meeting with a range of experts, especially in the open source community, we’re also considering other changes to the definitions of covered models and derivative models. We’ll continue making changes over the next 3 months as the bill proceeds through the Legislature.

This very explicitly clarifies the intent of the bill across multiple misconceptions and objections, all in line with my previous understanding.

They actively continue to solicit feedback and are considering changes.

If you are concerned about the impact of this bill, and feel it is badly designed or has flaws, the best thing you can do is offer specific critiques and proposed changes.

I strongly agree with Weiner that this bill is light touch relative to alternative options. I see Pareto improvements we could make, but I do not see any fundamentally different lighter touch proposals that accomplish what this bill sets out to do.

I will sometimes say of a safety bill, sometimes in detail: It’s a good bill, sir.

Other times, I will say: It’s a potentially good bill, sir, if they fix this issue.

That is where I am at with SB 1047. Most of the bill seems very good, an attempt to act with as light a touch as possible. There are still a few issues with it. The derivative model definition as it currently exists is the potential showstopper bug.

To summarize the issue once more: As written, if interpreted literally and as I understand it, it allows developers to define themselves as derivative of an existing model. This, again if interpreted literally, lets them evade all responsibilities, and move those onto essentially any covered open model of the same size. That means both that any unsafe actor goes unrestricted (whether they be open or closed), and releasing the weights of a covered model creates liability no matter how responsible you were, since they can effectively start the training over from scratch.

Scott Weiner says he is working on a fix. I believe the correct fix is a compute threshold for additional training, over which a model is no longer derivative, and the responsibilities under SB 1047 would then pass to the new developer or fine-tuner. Some open model advocates demand that responsibility for derivative models be removed entirely, but that would transparently defeat the purpose of preventing catastrophic harm. Who cares if your model is safe untuned, if you can fine-tune it to be unsafe in an hour with $100?

Then at other times, I will look at a safety or other regulatory bill or proposal, and say…

So it seems only fair to highlight some not good ideas, and say: Not a good idea.

One toy example would be the periodic complaints about Section 230. Here is a thread on the latest such hearing this week, pointing out what would happen without it, and the absurdity of the accusations being thrown around. Some witnesses are saying 230 is not needed to guard platforms against litigation, whereas it was created because people were suing platforms.

Adam Thierer reports there are witnesses saying the Like and Thumbs Up buttons are dangerous and should be regulated.

Brad Polumbo here claims that GLAAD says Big Tech companies ‘should cease the practice of targeted surveillance advertising, including the use of algorithmic content recommendation.’

From April 23, Adam Thierer talks about proposals to mandate ‘algorithmic audits and impact assessments,’ which he calls ‘NEPA for AI.’ Here we have Assembly Bill 2930, requiring impact assessments by developers, and charge $25,000 per instance of ‘algorithmic discrimination.’

Another example would be Colorado passing SB24-205, Consumer Protections for Artificial Intelligence, which is concerned with algorithmic bias. Governor Jared Polis signed with reservations. Dean Ball has a critique here, highlighting ambiguity in the writing, but noting they have two full years to fix that before it goes into effect.

I would be less concerned with the ambiguity, and more concerned about much of the actual intent and the various proactive requirements. I could make a strong case that some of the stuff here is kind of insane, and also seems like a generic GPDR-style ‘you have to notify everyone that AI was involved in every meaningful decision ever.’ The requirements apply regardless of size, and worry about impacts that are the kind of thing society can mitigate as we go.

The good news is that there are also some good provisions like IDing AIs, and also full enforcement of the bad parts seems impossible? I am very frustrated that a bill that isn’t trying to address catastrophic risks, but seems far harder to comply with, and seems far worse to me than SB 1047, seems to mostly get a pass. Then again, it’s only Colorado.

I do worry about Gell-Mann amnesia. I have seen so many hyperbolic statements, and outright false statements, about AI bills, often from the same people that point out what seem like obviously horrible other proposed regulatory bills and policies. How can one trust their statements about the other bills, short of reading the actual bills (RTFB)? If it turned out they were wrong, and this time the bill was actually reasonable, who would point this out?

So far, when I have dug deeper, the bills do indeed almost always turn out to be terrible, but the ‘rumors of the death of the internet’ or similar potential consequences are often greatly exaggerated. The bills are indeed reliably terrible, but not as terrible as claimed. Alas, I must repeat my lament that I know of no RTFB person I can turn to on other topics, and my cup doth overflow.

I return to the Cognitive Revolution to discuss various events of the past week first in part one, then this is part two. Recorded on Friday, things have changed by the time you read this.

From last week’s backlog: Dwarkesh Patel as guest on 80k After Hours. Not full of gold on the level of Dwarkesh interviewing others, and only partly about AI. There is definitely gold in those hills for those who want to go into these EA-related weeds. If you don’t want that then skip this one.

Around 51: 45 Dwarkesh notes there is no ‘Matt Levine for AI’ and that picking up that mantle would be a good thing to do. I suppose I still have my work cut out.

A lot of talk about EA and 80k Hours ways of thinking about how to choose paths in life, that I think illustrates well both the ways it is good (actively making choices rather than sleepwalking, having priorities) and not as good (heavily favoring the legible).

Some key factors in giving career advice they point out are that from a global perspective power laws apply and the biggest impacts are a huge share of what matters, and that much advice (such as ‘don’t start a company in college’) is only good advice because the people to whom it is horribly bad advice will predictably ignore it.

Why does this section exist? This is a remarkably large fraction of why.

Emmett Shear: The number one rule of building things that can destroy the entire world is don’t do that.

Surprisingly it is also rule 2, 3, 4, 5, and 6.

Rule seven, however, is “make it emanate ominous humming and glow with a pulsing darkness”.

Eliezer Yudkowsky: Emmett.

Emmett Shear (later): Shocking amount of pushback on “don’t build stuff that can destroy the world”. I’d like to take this chance to say I stand by my apparently controversial opinion that building things to destroy the world is bad. In related news, murder is wrong and bad.

Follow me for more bold, controversial, daring takes like these.

Emmett Shear (other thread): Today has been a day to experiment with how obviously true I can make a statement before people stop disagreeing with it.

This is a Platonic encapsulation of this class of argument:

Emmett Shear: That which can be asserted without evidence can be dismissed without evidence.

Ryan Shea: Good point, but not sure he realizes this applies to AI doomer prophecy.

Emmett Shear: Not sure you realize this applies to Pollyanna assertions that don’t worry, a fully self-improving AI will be harmless. There’s a lot of evidence autocatalytic loops are potentially dangerous.

Ryan Shea: The original post is a good one. And I’m not making a claim that there’s no reason at all to worry. Just that there isn’t a particular reason to do so.

Emmett Shear: Forgive me if your “there’s not NO reason to worry, but let’s just go ahead with something potentially massively dangerous” argument doesn’t hold much reassurance for me.

[it continues from there, but gets less interesting and stops being Platonic.]

The latest reiteration of why p(doom) is useful even if highly imprecise, and why probabilities and probability ranges are super useful in general for communicating your actual epistemic state. In particular, that when Jan Leike puts his at ‘10%-90%’ this is a highly meaningful and useful statement of what assessments he considers reasonable given the evidence, providing much more information than saying ‘I don’t know.’ It is also more information than ‘50%.’

For the record: This, unrelated to AI, is the proper use of the word ‘doomer.

The usual suspects, including Bengio, Hinton, Yao and 22 others, write the usual arguments in the hopes of finally getting it right, this time as Managing Extreme AI Risks Amid Rapid Progress in Science.

I rarely see statements like this, so it was noteworthy that someone noticed.

Mike Solana: Frankly, I was ambivalent on the open sourced AI debate until yesterday, at which point the open sourced side’s reflexive, emotional dunking and identity-based platitudes convinced me — that almost nobody knows what they think, or why.

It is even more difficult when you don’t know what ‘alignment’ means.

Which, periodic reminder, you don’t.

Rohit: We use AI alignment to mean:

  1. Models do what we ask.

  2. Models don’t do bad things even if we ask.

  3. Models don’t fail catastrophically.

  4. Models don’t actively deceive us.

And all those are different problems. Using the same term creates confusion.

Here we have one attempt to choose a definition, and cases for and against it:

Iason Gabriel: The new international scientific report on AI safety is impressive work, but it’s problematic to define AI alignment as:

“the challenge of making general-purpose AI systems act in accordance with the developer’s goals and interests”

Eliezer Yudkowsky: I defend this. We need separate words for the technical challenges of making AGIs and separately ASIs do any specified thing whatsoever, “alignment”, and the (moot if alignment fails) social challenge of making that developer target be “beneficial”.

Good advice given everything we know these days:

Mesaoptimizer: If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.

That does not express a strong opinion on whether we currently know of a better plan.

And it is exceedingly difficult when you do not attempt to solve the problem.

Dean Ball says here, in the most thoughtful version I have seen of this position by far, that the dissolution of the Superalignment team was good because distinct safety teams create oppositionalism, become myopic about box checking and employee policing rather than converging on the spirit of actual safety. Much better to diffuse the safety efforts throughout the various teams. Ball does note that this does not apply to the extent the team was doing basic research.

There are three reasons this viewpoint seems highly implausible to me.

  1. The Superalignment team was indeed tasked with basic research. Solving the problem is going to require quite a lot of basic research, or at least work that is not incremental progress on current incremental commercial products. This is not about ensuring that each marginal rocket does not blow up, or the plant does not melt down this month. It is a different kind of problem, preparing for a very different kind of failure mode. It does not make sense to embed these people into product teams.

  2. This is not a reallocation of resources from a safety team to diffused safety work. This is a reallocation of resources, many of which were promised and never delivered, away from safety towards capabilities, as Dean himself notes. This is in addition to losing the two most senior safety researchers and a lot of others too.

  3. Mundane safety, making current models do what you want in ways that as Leike notes will not scale to when they matter most, does not count as safety towards the goals of the superalignment team or of us all not dying. No points.

Thus the biggest disagreement here, in my view, which is when he says this:

Dean Ball: Companies like Anthropic, OpenAI, and DeepMind have all made meaningful progress on the technical part of this problem, but this is bigger than a technical problem. Ultimately, the deeper problem is contending with a decentralized world, in which everyone wants something different and has a different idea for how to achieve their goals.

The good news is that this is basically politics, and we have been doing it for a long time. The bad news is that this is basically politics, and we have been doing it for a long time. We have no definitive answers.

Yes, it is bigger than a technical problem, and that is important.

OpenAI has not made ‘meaningful progress.’ Certainly we are not on track to solve such problems, and we should not presume they will essentially solve themselves with an ordinary effort, as is implied here.

Indeed, with that attitude, it’s Margaritaville (as in, we might as well start drinking Margaritas.) Whereas with the attitude of Leike and Sutskever, I disagreed with their approach, but I could have been wrong or they could have course corrected, if they had been given the resources to try.

Nor is the second phase problem that we also must solve well-described by ‘basically politics’ of a type we are used to, because there will be entities involved that are not human. Our classical liberal political solutions work better than known alternatives, and well enough for humans to flourish, by assuming various properties of humans and the affordances available to them. AIs with far greater intelligence, capabilities and efficiency, that can be freely copied, and so on, would break those assumptions.

I do greatly appreciate the self-awareness and honesty in this section:

Dean Ball: More specifically, I believe that classical liberalism—individualism wedded with pluralism via the rule of law—is the best starting point, because it has shown the most success in balancing the priorities of the individual and the collective. But of course I do. Those were my politics to begin with.

It is notable how many AI safety advocates, when discussing almost any topic except transformational AI, are also classical liberals. If this confuses you, notice that.

Not under the current paradigm, but worth noticing.

Also, yes, it really is this easy.

And yet, somehow it is still this hard? (I was not able to replicate this one, may be fake)

It’s a fun game.

Sometimes you stick the pieces together and know where it comes from.

A problem statement:

Jorbs: We have gone from

“there is no point in arguing with that person, their mind is already made up”

to

“there is no point in arguing with that person, they are made up.”

It’s coming.

Alex Press: The Future of Artificial Intelligence at Wendy’s.

Colin Fraser: Me at the Wendy’s drive thru in June: A farmer and a goat stand on the side of a riverbank with a boat for two.

[FreshAI replies]: Sir, this is a Wendy’s.

Are you ready?

AI #65: I Spy With My AI Read More »

the-motiv-argo-is-a-new-modular-medium-duty-electric-truck

The Motiv Argo is a new modular medium-duty electric truck

looks futuristic —

Motiv has made electric powertrains for medium-duty vehicles since 2009.

A white cab-chassis medium duty truck with a bare frame behind it

Enlarge / Fleets looking for a medium-duty electric truck now have a new option.

Motiv

Medium- and heavy-duty vehicles account for about 23 percent of US vehicle emissions. That’s much less than the amount of greenhouse gases emitted each year by light-duty vehicles, but if anything, businesses can often have a clearer case for electrification, whether that’s to save money on annual running costs or to meet ESG goals. The annual Advanced Clean Transportation Expo is currently underway in Las Vegas, and yesterday Motiv Electric Trucks revealed the production version of its new modular medium-duty EV, the Argo.

Motiv has been around since 2009 and has been selling electric powertrains for school buses, step vans, box trucks, and even trolleys. Now it’s branching out with its own vehicle, the modular Argo, which is capable of carrying up to 14,000 lbs (6,350 kg) with a range of up to 200 miles (321 km).

“Overnight we’ve moved from a company primarily serving a narrow slice of the $20 billion medium duty truck market to one that can serve nearly that entire market,” said Motiv CEO Scott Griffith. “The launch of Argo is a transformational moment for our company as we can now offer more vehicles and options to help more fleet customers meet their sustainability goals.”

The Argo looks somewhat futuristic—certainly in comparison to a step van or box truck—and has an integrated cab-chassis design. Argo says that maximizing visibility was a key design target, with low curbside windows, plus the option of side-view cameras in addition to passive mirrors. There are also design features meant to improve driver safety, too, including safety railings and self-cleaning interior steps to help prevent what Motiv says are the most common operator injuries.

  • The rolling chassis.

    Motiv

  • A big windshield helps minimize blind spots.

    Motiv

  • Motiv has tried to make it a bit safer to get into and out of.

    Motiv

  • “Many fleet customers have pointed out the simple design of our new Gen 6 architecture, how much less copper we use and how well cables are routed, how easy it is to access our patented smart hub and how easy our software is to integrate; this apparent simplicity took years for us to optimize and now our customers can finally reap the benefits,” said Jim Castelaz, Motiv founder and CTO.

    Motiv

Motiv developed the Argo’s powertrain together with the Japanese company Nidec. Although Nidec originally designed the electric motor to operate at 800 V, Motiv developed a new control algorithm that allows it to run at 350 V instead, which it previously told Ars is more cost-effective. The battery pack uses a lithium iron phosphate (LFP) chemistry and was developed together with Our Next Energy.

The Argo comes in a range of wheelbases, from 178 inches (4,521 mm) to 252 inches (6,400 mm), with dock-height and lower profile options for almost any application you might want a medium-duty truck for, Motiv says. Pricing will be based on how many trucks a customer orders as well as the specifications, but Motiv told Ars that it “will be price competitive with other Class 6 electric trucks.”

The Motiv Argo is a new modular medium-duty electric truck Read More »

do-not-mess-with-scarlett-johansson

Do Not Mess With Scarlett Johansson

I repeat. Do not mess with Scarlett Johansson.

You would think her movies, and her suit against Disney, would make this obvious.

Apparently not so.

Andrej Karpathy (co-founder OpenAI, departed earlier), May 14: The killer app of LLMs is Scarlett Johansson. You all thought it was math or something.

You see, there was this voice they created for GPT-4o, called ‘Sky.’

People noticed it sounded suspiciously like Scarlett Johansson, who voiced the AI in the movie Her, which Sam Altman says is his favorite movie of all time, which he says inspired OpenAI ‘more than a little bit,’ and then he tweeted “Her” on its own right before the GPT-4o presentation, and which was the comparison point for many people reviewing the GPT-4o debut?

I mean, surely that couldn’t have been intentional.

Oh, no.

Kylie Robison: I asked Mira Mutari about Scarlett Johansson-type voice in today’s demo of GPT-4o. She clarified it’s not designed to mimic her, and said someone in the audience asked this exact same question!

Kylie Robison in Verge (May 13): Title: ChatGPT will be able to talk to you like Scarlett Johansson in Her.

OpenAI reports on how it created and selected its five selected GPT-4o voices.

OpenAI: We support the creative community and worked closely with the voice acting industry to ensure we took the right steps to cast ChatGPT’s voices. Each actor receives compensation above top-of-market rates, and this will continue for as long as their voices are used in our products.

We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. To protect their privacy, we cannot share the names of our voice talents.

Looking ahead, you can expect even more options as we plan to introduce additional voices in ChatGPT to better match the diverse interests and preferences of users.

Jessica Taylor: My “Sky’s voice is not an imitation of Scarlett Johansson” T-shirt has people asking a lot of questions already answered by my shirt.

OpenAI: We’ve heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them.

Variety: Altman said in an interview last year that “Her” is his favorite movie.

Variety: OpenAI Suspends ChatGPT Voice That Sounds Like Scarlett Johansson in ‘Her’: AI ‘Should Not Deliberately Mimic a Celebrity’s Distinctive Voice.’

[WSJ had similar duplicative coverage.]

Flowers from the Future: That’s why we can’t have nice things. People bore me.

Again: Do not mess with Scarlett Johansson. She is Black Widow. She sued Disney.

Several hours after compiling the above, I was happy to report that they did indeed mess with Scarlett Johansson.

She is pissed.

Bobby Allen (NPR): Scarlett Johansson says she is ‘shocked, angered’ over new ChatGPT voice.

Johansson’s legal team has sent OpenAI two letters asking the company to detail the process by which it developed a voice the tech company dubbed “Sky,” Johansson’s publicist told NPR in a revelation that has not been previously reported.

NPR then published her statement, which follows.

Scarlett Johansson: Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4.0 system. He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and Al. He said he felt that my voice would be comforting to people.

After much consideration and for personal reasons, I declined the offer. Nine months later, my friends, family and the general public all noted how much the newest system named “Sky” sounded like me.

When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. Mr. Altman even insinuated that the similarity was intentional, tweeting a single word “her” a reference to the film in which I voiced a chat system, Samantha, who forms an intimate relationship with a human.

Two days before the ChatGPT 4.0 demo was released, Mr. Altman contacted my agent, asking me to reconsider. Before we could connect, the system was out there.

As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAl, setting out what they had done and asking them to detail the exact process by which they created the “Sky” voice. Consequently, OpenAl reluctantly agreed to take down the “Sky” voice.

In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity. I look forward to resolution in the form of transparency and the passage of appropriate legislation to help ensure that individual rights are protected.

This seems like a very clear example of OpenAI, shall we say, lying its ass off?

They say “we believe that AI voices should not deliberately mimic a celebrity’s distinctive voice,” after Sam Altman twice personally asked the most distinctive celebrity possible to be the very public voice of ChatGPT, and she turned them down. They then went with a voice this close to hers while Sam Altman tweeted ‘Her,’ two days after being turned down again. Mira Mutari went on stage and said it was all a coincidence.

Uh huh.

Shakeel: Will people stop suggesting that the attempted-Altman ouster had anything to do with safety concerns now?

It’s increasingly clear that the board fired him for the reasons they gave at the time: he is not honest or trustworthy, and that’s not an acceptable trait for a CEO!

for clarification: perhaps the board was particularly worried about his untrustworthiness *becauseof how that might affect safety. But the reported behaviour from Altman ought to have been enough to get him fired at any company!

There are lots of ethical issues with the Scarlett Johansson situation, including consent.

But one of the clearest cut issues is dishonesty. Earlier today, OpenAI implied it’s a coincidence that Sky sounded like Johansson. Johansson’s statement suggests that is not at all true.

This should be a big red flag to journalists, too — it suggests that you cannot trust what OpenAI’s comms team tells you.

Case in point: Mira Murati appears to have misled Verge reporter Kylie Robison.

And it seems they’re doubling down on this, with carefully worded statements that don’t really get to the heart of the matter:

  1. Did they cast Sky because she sounded like Johansson?

  2. Did Sky’s actress aim to mimic the voice of Scarlett Johansson?

  3. Did OpenAI adjust Sky’s voice to sound more like Scarlett Johansson?

  4. Did OpenAI outright train on Scarlett Johansson’s voice?

I assume not that fourth one. Heaven help OpenAI if they did that.

Here is one comparison of Scarlett talking normally, Scarlett’s voice in Her and the Sky voice. The Sky voice sample there was plausibly chosen to be dissimilar, so here is another longer sample in-context, from this OpenAI demo, that is a lot closer to my eears. I do think you can still tell the difference between Scarlett Johansson and Sky, but it is then not so easy. Opinions differ on exactly how close the voices were. To my ears, the sample in the first clip sounds more robotic, but in the second clip it is remarkably close.

No one is buying that this is a coincidence.

Another OpenAI exec seems to have misled Nitasha Tiku.

Nitasha Tiku: the ScarJo episode gives me an excuse to revisit one of the most memorable OpenAI demos I’ve had the pleasure of attending. back in Septemberwhen the company first played the “Sky” voice, I told the exec in charge it sounded like ScarJo and asked him if it was intentional.

He said no, there are 5 voices, it’s just personal pref. Then he said he uses ChatGPT to tell bedtime stories and his son prefers certain voices. Pinnacle of Tech Dad Demo, unlocked.

Even if we take OpenAI’s word for absolutely everything, the following facts do not appear to be in dispute:

  1. Sam Altman asked Scarlett Johansson to be the voice of their AI, because of Her.

  2. She said no.

  3. OpenAI created an AI voice most people think sounded like Scarlett Johansson.

  4. OpenAI claimed repeatedly that Sky’s resemblance to Johansson is a coincidence.

  5. OpenAI had a position that voices should be checked for similarity to celebrities.

  6. Sam Altman Tweeted ‘Her.’

  7. They asked her permission again.

  8. They decided This Is Fine and did not inform Scarlett Johansson of Sky.

  9. Two days after asking her permission again they launched the voice of Sky.

  10. They did so in a presentation everyone paralleled to Scarlett Johansson.

So, yeah.

On March 29, 2024, OpenAI put out a post entitled Navigating the Challenges and Opportunities of Synthetic Voices (Hat tip).

They said this, under ‘Building Voice Engine safely.’ Bold mine:

OpenAI: Finally, we have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used.

We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures.

If I was compiling a list of voices to check in this context that were not political figures, Scarlett Johansson would not only have been on that list.

She would have been the literal first name on that list.

For exactly the same reason we are having this conversation.

GPT-4o did not factor in Her, so it put her in the top 100 but not top 50, and even with additional context would only have put her in the 10-20 range with the Pope, the late Queen and Taylor Swift (who at #15 was the highest non-CEO non-politician.)

Remember that in September 2023, a journalist asked an OpenAI executive about Sky and why it sounded so much like Scarlett Johansson.

Even if this somehow was all an absurd coincidence, there is no excuse.

Ultimately, I think that the voices absolutely should, when desired by the user, mimic specific real people’s voices, with of course that person’s informed consent, participation and financial compensation.

I should be able to buy or rent the Scarlett Johansson voice package if I want that and she decides to offer one. She ideally gets most or all of that money. Everybody wins.

If she doesn’t want that, or I don’t, I can go with someone else. You could buy any number of them and swap between them, have them in dialogue, whatever you want.

You can include a watermark in the audio for deepfake detection. Even without that, it is not as if this makes deepfaking substantially harder. If you want to deepfake Scarlett Johansson’s voice without her permission there are publically available tools you can already use to do that.

Once could even say the facts went almost maximally badly, short of an outright deepfake.

Bret Devereaux: Really feels like some of these AI fellows needs to suffer some more meaningful legal repercussions for stealing peoples art, writing, likeness and freakin’ voices so they adopt more of an ‘ask permission’ rather than an ‘ask forgiveness’ ethos.

Trevor Griffey:Did he ask for forgiveness?

Linch: He asked for permission but not forgiveness lmao.

Bret Devereaux: To be more correct, he asked permission, was told no, asked permission again, then went and did it anyway before he got permission, and then hoped no one would notice, while he tweeted to imply that he had permission, when he didn’t.

Which seems worse, to be frank?

Mario Cannistra (other thread): Sam obviously lives by “better ask for forgiveness than permission”, as he’s doing the same thing with AGI. He’ll say all the nice words, and then he’ll do it anyway, and if it doesn’t go as planned, he’ll deal with it later (when we’re all dead).

Zvi: In this case, he made one crucial mistake: The first rule of asking forgiveness rather than permission is not to ask for permission.

The second rule is to ask for permission.

Whoops, on both counts.

Also it seems they lied repeatedly about the whole thing.

That’s the relatively good scenario, where there was no outright deepfake, and her voice was not directly used in training.

I am not a lawyer, but my read is: Oh yes. She has a case.

A jury would presumably conclude this was intentional, even if no further smoking guns are found in discovery. They asked Scarlett Johansson twice to participate. There were the references to ‘Her.’

There is no fully objective way to present the facts to an LLM, your results may vary, but when I gave GPT-4o a subset of the evidence that would be presented by Scarlett’s lawyers, plus OpenAI’s claims it was a coincidence, GPT-4o put the probability of a coincidence at under 10%.

It all seems like far more than enough for a civil case, especially given related public attitudes. This is not going to be a friendly jury for OpenAI.

If the voice actress was using her natural voice (or the ‘natural robotization’ thereof) without any instructions or adjustments that increased the level of resemblance, and everyone was careful not to ever say anything beyond what we already know, and the jury is in a doubting mood? Even then I have a hard time seeing it.

If you intentionally imitate someone’s distinctive voice and style? That’s a paddlin.

Paul Feldman (LA Times, May 9, 1990): In a novel case of voice theft, a Los Angeles federal court jury Tuesday awarded gravel-throated recording artist Tom Waits $2.475 million in damages from Frito-Lay Inc. and its advertising agency.

The U.S. District Court jury found that the corn chip giant unlawfully appropriated Waits’ distinctive voice, tarring his reputation by employing an impersonator to record a radio ad for a new brand of spicy Doritos corn chips.

While preparing the 1988 ad, a Tracy-Locke copywriter listened repeatedly to Waits’ tune, “Step Right Up,” and played the recording for Frito-Lay executives at a meeting where his script was approved. And when singer Steve Carter, who imitates Waits in his stage act, performed the jingle, Tracy-Locke supervisors were concerned enough about Carter’s voice that they consulted a lawyer, who counseled caution.

Then there’s the classic case Midler v. Ford Motor Company. It sure sounds like a direct parallel to me, down to asking for permission, getting refused, doing it anyway.

Jack Despain Zhou: Fascinating. This is like a beat-for-beat rehash of Midler v. Ford Motor Co.

Companies have tried to impersonate famous voices before when they can’t get those voices. Generally doesn’t go well for the company.

Wikipedia: Ford Motor created an ad campaign for the Mercury Sable that specifically was meant to inspire nostalgic sentiments through the use of famous songs from the 1970s sung by their original artists. When the original artists refused to accept, impersonators were used to sing the original songs for the commercials.

Midler was asked to sing a famous song of hers for the commercial and refused.

Subsequently, the company hired a voice-impersonator of Midler and carried on with using the song for the commercial, since it had been approved by the copyright-holder. Midler’s image and likeness were not used in the commercial but many claimed the voice used sounded impeccably like Midler’s.

Midler brought the case to a district court where she claimed that her voice was protected from appropriation and thus sought compensation. The district court claimed there was no legal principle preventing the use of her voice and granted summary judgment to Ford Motor. Midler appealed to the Appellate court, 9th Circuit.

The appellate court ruled that the voice of someone famous as a singer is distinctive to their person and image and therefore, as a part of their identity, it is unlawful to imitate their voice without express consent and approval. The appellate court reversed the district court’s decision and ruled in favor of Midler, indicating her voice was protected against unauthorized use.

If it has come to this, so be it.

Ross Douthat: Writing a comic novel about a small cell of people trying to stop the rise of a demonic super-intelligence whose efforts are totally ineffectual but then in the last chapter Scarlett Johansson just sues the demon into oblivion.

Fredosphere: Final lines:

AI: “But what will become of me?”

Scarlett: “Frankly, my dear, I don’t give a damn.”

Genius. Also, I’d take it. A win is a win.

There are some people asking what the big deal is, ethically, practically or legally.

In legal terms, my most central observation is that those who don’t see the legal issue mostly are unaware of the relevant prior case law listed above due to being unwilling to Google for it or ask an LLM.

I presume everyone agrees that an actual direct deepfake, trained on the voice of Scarlett Johansson without consent, would be completely unacceptable.

The question some ask is, if it is only a human that was ‘training on the voice of Scarlett Johansson,’ similar to the imitators in the prior cases, why should we care? Or, alternatively, if OpenAI searched for the closest possible match, how is that different from when Padme is not available for a task so you send out a body double?

The response ‘I never explicitly told people this was you, fine this is not all a coincidence, but I have a type I wanted and I found an uncanny resemblance and then heavily dropped references and implications’ does not seem like it should work here? At least, not past some point?

Obviously, you are allowed to (even if it is kind of creepy) date someone who looks and sounds suspiciously like your ex, or (also creepy) like someone who famously turned you down, or to recast a voice actor while prioritizing continuity or with an idea of what type of voice you are looking for.

It comes down to whether you are appropriating someone’s unique identity, and especially whether you are trying to fool other observers.

The law must also adjust to the new practicalities of the situation, in the name of the ethical and practical goals that most of us agree on here. As technology and affordances change, so must the rules adjust.

In ethical and practical terms, what happens if OpenAI is allowed to do this while its motivations and source are plain as day, so long as the model did not directly train on Scarlett Johansson’s voice?

You do not need to train an AI directly on Scarlett’s voice to get arbitrarily close to Scarlett’s voice. You can get reasonably close even if all you have is selection among unaltered and uncustomized voices, if you have enough of a sample to choose from.

If you auditioned women of similar age and regional accent, your chances of finding a close soundalike are remarkably good. Even if that is all OpenAI did to filter initial applications, and then they selected the voice of Sky to be the best fit among them, auditioning 400 voices for 5 slots is more than enough.

I asked GPT-4o what would happen if you also assume professional voice actresses were auditioning for this role, and they understood who the target was. How many would you have to test before you were a favorite to find a fit that was all but indistinguishable?

One. It said 50%-80% chance. If you audition five, you’re golden.

Then the AI allows this voice to have zero marginal cost to reproduce, and you can have it saying absolutely anything, anywhere. That, alone, obviously cannot be allowed.

Remember, that is before you do any AI fine-tuning or digital adjustments to improve the match. And that means, in turn, if you did use an outright deepfake or you did fine-tuning on the closeness of match or used it to alter parameters in post, unless they can retrace your steps who is to say you did any of that.

If Scarlett Johansson does not have a case here, where OpenAI did everything in their power to make it obvious and she has what it takes to call them on it, then there effectively are very close to no rules and no protections, for creatives or otherwise, except for laws against outright explicitly claimed impersonations, scams and frauds.

As I have said before:

Many of our laws and norms will need to adjust to the AI era, even if the world mostly ‘looks normal’ and AIs do not pose or enable direct existential or catastrophic risks.

Our existing laws rely on friction, and on human dynamics of norm enforcement. They and their consequences are designed with the expectation of uneven enforcement, often with rare enforcement. Actions have practical costs and risks, most of them very different from zero, and people only have so much attention and knowledge and ability to execute and we don’t want to stress out about all this stuff. People and corporations have reputations to uphold and they have to worry about unknown unknowns where there could be (metaphorical) dragons. One mistake can land us or a company in big trouble. Those who try to break norms and laws accumulate evidence, get a bad rep and eventually get increasingly likely to be caught.

In many places, fully enforcing the existing laws via AI and AI-enabled evidence would grind everything to a halt or land everyone involved in prison. In most cases that is a bad result. Fully enforcing the strict versions of verbally endorsed norms would often have a similar effect. In those places, we are going to have to adjust.

Often we are counting on human discretion to know when to enforce the rules, including to know when a violation indicates someone who has broken similar rules quite a lot in damaging ways versus someone who did it this once because of pro-social reasons or who can learn from their mistake.

If we do adjust our rules and our punishments accordingly, we can get to a much better world. If we don’t adjust, oh no.

Then there are places (often overlapping) where the current rules let people get away with quite a lot, often involving getting free stuff, often in a socially damaging way. We use a combination of ethics and shame and fear and reputation and uncertainty and initial knowledge and skill costs and opportunity costs and various frictions to keep this at an acceptable level, and restricted largely to when it makes sense.

Breaking that equilibrium is known as Ruining It For Everyone.

A good example would be credit card rewards. If you want to, you can exploit various offers to make remarkably solid money opening and abusing various cards in various ways, and keep that going for quite a while. There are groups for this. Same goes for sportsbook deposit bonuses, or the return policies at many stores, and so on.

The main reason that often This Is Fine is that if you are sufficiently competent to learn and execute on such plans, you mostly have better things to do, and the scope on any individual’s actions are usually self-limiting (when they aren’t you get rules changes and hilarious news stories.) And what is lost to such tricks is made up for elsewhere. But if you could automate these processes, then the scope goes to infinity, and you get rules changes and ideally hilarious (but often instead sad) news articles. You also get mode collapses when the exploits become common knowledge or too easy to do, and norms against using them go away.

Another advantage is this is often good price discrimination gated by effort and attention, and an effective subsidy for the poor. You can ‘work the job’ of optimizing such systems, which is a fallback if you don’t have better opportunities, and you are short on money but long on time or want to train optimization or pull one over.

AI will often remove such frictions, and the barriers preventing rather large scaling.

AI voice imitation is one of those cases. Feature upgrades, automation, industrialization and mass production change the nature of the beast. This particular case was one that was already illegal without AI because it is so brazen and clear cut, but we are going to have to adjust our rules to the general case.

The good news is this is a case where the damage is limited, so ‘watch for where things go wrong and adjust’ should work fine. This is the system working.

The bad news is that this adjustment cannot involve ‘stop the proliferation of technology that allows voice cloning from remarkably small samples.’ That technology is essentially mature already, and open solutions available. We cannot unring the bell.

In other places, where the social harms can scale to a very high level, and the technological bell once rung cannot be easily unrung, we have a much harder problem. That is a discussion for another post.

As noted above, there was a faction that said this was no big deal, or even totally fine.

Most people did not see it that way. The internet is rarely as united as this.

Nate Silver: Very understandably negative reaction to OpenAI on this. It is really uniting people in different political tribes, which is not easy to do on Twitter.

One of the arguments I make in my book—and one of the reasons my p(doom) is lower than it might be—is that AI folks underestimate the potential for a widespread political backlash against their products.

Do not underestimate the power of a beloved celebrity that is on every level a total badass, horrible publicity and a united internet.

Conor Sen: Weird stuff on Sam’s part in addition to any other issues it raises.

Now whenever a reporter or politician is trying to point out the IP issues of AI they can say “Sam stole ScarJo’s voice even after she denied consent.” It’s a much easier story to sell to the general public and members of Congress.

Noah Giansiracusa: This is absolutely appalling. Between this and the recent NDA scandal, I think there’s enough cause for Altman to step down from his leadership role at OpenAI. The world needs a stronger moral compass at the helm of such an influential AI organization.

There’s even some ethics people out there to explain other reasons this is problematic.

Kate Crawford: Why did OpenAI use Scarlett Johansson’s voice? As Jessa Lingel & I discuss in our journal article on AI agents, there’s a long history of using white women’s voices to “personalize” a technology to make it feel safe and non-threatening while it is capturing maximum data.

Sam Altman has said as much. NYT: he told ScarJo her voice would help “consumers to feel comfortable with the seismic shift concerning humans and AI” as her voice “would be comforting to people.”

AI assistants invoke gendered traditions of the secretary, a figure of administrative and emotional support, often sexualized. Underpaid and undervalued, secretaries still had a lot of insight into private and commercially sensitive dealings. They had power through information.

But just as secretaries were taught to hide their knowledge, AI agents are designed to make us to forget their power as they are made to fit within non-threatening, retrograde feminine tropes. These are powerful data extraction engines, sold as frictionless convenience.

You can read more in our article here.

Finally, for your moment of zen: The Daily Show has thoughts on GPT-4o’s voice.

Do Not Mess With Scarlett Johansson Read More »