Author name: Kris Guyer

micro-led-monitors-connect-like-puzzle-pieces-in-hp-multi-monitor-concept

Micro LED monitors connect like puzzle pieces in HP multi-monitor concept

woman using a tri-monitor setup

Enlarge / Yes, there are bigger monitors, but is there a better way to have a tri-monitor setup?

In a technical disclosure published this month, HP explored a Micro LED monitor concept that would enable consumers to easily use various multi-monitor configurations through the use of “Lego-like building blocks.” HP has no immediate plans to make what it has called “composable Micro LED monitors,” but its discussion explores a potential way to simplify multitasking with numerous displays.

HP’s paper [PDF], written by HP scientists and technical architects, discusses a theoretical monitor that supports the easy addition of more flat or curved screens on its left, right, or bottom sides (the authors noted that top extensions could also be possible but they were “trying to keep the number of configurations manageable”). The setup would use one 12×12-inch “core” monitor that has a cable to the connected system. The computer’s operating system (OS) would be able to view the display setup as one, two, or multiple monitors, and physical switches would let users quickly disable displays.

  • The illustration shows a monitor made of a core unit and two extension panels viewed as three monitors (left), two monitors (middle), and two monitors with different orientations (right).

  • This illustration shows two core units and two extensions used to make dual, triple, and quadruple-display setups.

Not a real product

HP’s paper is only a technical disclosure, which companies often publish in order to support potential patent filings. So it’s possible that we’ll never see HP release “composable Micro LED monitors” as described. An HP spokesperson told me:

HP engineering teams are constantly exploring new ways to leverage the latest technologies. The composable Micro LED monitors within the technical disclosure are conceptual, but HP has turned past concepts like this into commercially viable products and solutions.

There’s also growing interest in making multi-monitor workspaces easier to set up and navigate, including by speedier connectivity, enhanced port selection, improved docks, and thinner displays. In January, Samsung teased a concept called The Link that showed thin, 32-inch, 4K LED monitors daisy-chained together “without a separate cable.” Samsung originally said the monitors would connect via pogo pins but later redacted that.

As you’ll see, there’s a lot more that would need to be worked out than what’s in HP’s concept in order to make a real product.

Adjustable multi-monitor setups

HP’s paper goes deeper into how monitors, connected via pogo pins or RFID readers and tags, might enable different single and multi-monitor setups depending on the user’s need.

HP’s concept connects extensions to the core monitors “in a similar way to a jigsaw” and also uses magnets to help alignment. The authors explain:

The core-to-extension connection includes an electrical connection, allowing seamless connectivity to the core unit. They will only physically connect in ways that will function correctly as displays. The magnets in the passive connection covers and extension edges are strong enough to hold adjacent displays together, but not strong enough to support the weight of an extension.

The authors provide various examples of how users might be able to construct different sized monitors with different panels. Suggested users include a video editor who might use a bigger screen for video with a smaller one for editing tools. The paper also touches on further potential innovations, like using different types of tech, such as eInk, for extension displays.

  • An illustration depicting the backside of connected core units (dark gray) with display extensions (light gray).

  • Different-sized configurations.

Like with any multi-monitor setup, bezels or visible seams where the displays connect could distract users. The paper suggests an ideal solution as one that uses “rays originating from pixels near the boundary between adjacent panels” to “propagate across the boundary without any distortion caused by reflection or refraction.”

Micro LED monitors connect like puzzle pieces in HP multi-monitor concept Read More »

federal-agency-warns-critical-linux-vulnerability-being-actively-exploited

Federal agency warns critical Linux vulnerability being actively exploited

NETFILTER FLAW —

Cybersecurity and Infrastructure Security Agency urges affected users to update ASAP.

Federal agency warns critical Linux vulnerability being actively exploited

Getty Images

The US Cybersecurity and Infrastructure Security Agency has added a critical security bug in Linux to its list of vulnerabilities known to be actively exploited in the wild.

The vulnerability, tracked as CVE-2024-1086 and carrying a severity rating of 7.8 out of a possible 10, allows people who have already gained a foothold inside an affected system to escalate their system privileges. It’s the result of a use-after-free error, a class of vulnerability that occurs in software written in the C and C++ languages when a process continues to access a memory location after it has been freed or deallocated. Use-after-free vulnerabilities can result in remote code or privilege escalation.

The vulnerability, which affects Linux kernel versions 5.14 through 6.6, resides in the NF_tables, a kernel component enabling the Netfilter, which in turn facilitates a variety of network operations, including packet filtering, network address [and port] translation (NA[P]T), packet logging, userspace packet queueing, and other packet mangling. It was patched in January, but as the CISA advisory indicates, some production systems have yet to install it. At the time this Ars post went live, there were no known details about the active exploitation.

A deep-dive write-up of the vulnerability reveals that these exploits provide “a very powerful double-free primitive when the correct code paths are hit.” Double-free vulnerabilities are a subclass of use-after-free errors that occur when the free() function for freeing memory is called more than once for the same location. The write-up lists multiple ways to exploit the vulnerability, along with code for doing so.

The double-free error is the result of a failure to achieve input sanitization in netfilter verdicts when nf_tables and unprivileged user namespaces are enabled. Some of the most effective exploitation techniques allow for arbitrary code execution in the kernel and can be fashioned to drop a universal root shell.

The author offered the following graphic providing a conceptual illustration:

pwning tech

CISA has given federal agencies under its authority until June 20 to issue a patch. The agency is urging all organizations that have yet to apply an update to do so as soon as possible.

Federal agency warns critical Linux vulnerability being actively exploited Read More »

mutations-in-a-non-coding-gene-associated-with-intellectual-disability

Mutations in a non-coding gene associated with intellectual disability

Splice of life —

A gene that only makes an RNA is linked to neurodevelopmental problems.

Colored ribbons that represent the molecular structure of a large collection of proteins and RNAs.

Enlarge / The spliceosome is a large complex of proteins and RNAs.

Almost 1,500 genes have been implicated in intellectual disabilities; yet for most people with such disabilities, genetic causes remain unknown. Perhaps this is in part because geneticists have been focusing on the wrong stretches of DNA when they go searching. To rectify this, Ernest Turro—a biostatistician who focuses on genetics, genomics, and molecular diagnostics—used whole genome sequencing data from the 100,000 Genomes Project to search for areas associated with intellectual disabilities.

His lab found a genetic association that is the most common one yet to be associated with neurodevelopmental abnormality. And the gene they identified doesn’t even make a protein.

Trouble with the spliceosome

Most genes include instructions for how to make proteins. That’s true. And yet human genes are not arranged linearly—or rather, they are arranged linearly, but not contiguously. A gene containing the instructions for which amino acids to string together to make a particular protein—hemoglobin, insulin, serotonin, albumin, estrogen, whatever protein you like—is modular. It contains part of the amino acid sequence, then it has a chunk of DNA that is largely irrelevant to that sequence, then a bit more of the protein’s sequence, then another chunk of random DNA, back and forth until the end of the protein. It’s as if each of these prose paragraphs were separated by a string of unrelated letters (but not a meaningful paragraph from a different article).

In order to read this piece through coherently, you’d have to take out the letters interspersed between its paragraphs. And that’s exactly what happens with genes. In order to read the gene through coherently, the cell has machinery that splices out the intervening sequences and links up the protein-making instructions into a continuous whole. (This doesn’t happen in the DNA itself; it happens to an RNA copy of the gene.) The cell’s machinery is obviously called the spliceosome.

There are about a hundred proteins that comprise the spliceosome. But the gene just found to be so strongly associated with neurodevelopmental disorders doesn’t encode any of them. Rather, it encodes one of five RNA molecules that are also part of the spliceosome complex and interact with the RNAs that are being spliced. Mutations in this gene were found to be associated with a syndrome with symptoms that include intellectual disability, seizures, short stature, neurodevelopmental delay, drooling, motor delay, hypotonia (low muscle tone), and microcephaly (having a small head).

Supporting data

The researchers buttressed their finding by examining three other databases; in all of them, they found more people with the syndrome who had mutations in this same gene. The mutations occur in a remarkably conserved region of the genome, suggesting that it is very important. Most of the mutations were new in the affected people—i.e. not inherited from their parents—but there was one case of one particular mutation in the gene that was inherited. Based on this, the researchers concluded that this particular variant may cause a less severe disorder than the other mutations.

Many studies that look for genes associated with diseases have focused on searching catalogs of protein coding genes. These results suggest that we could have been missing important mutations because of this focus.

Nature Medicine, 2024. DOI: 10.1038/s41591-024-03085-5

Mutations in a non-coding gene associated with intellectual disability Read More »

musk-can’t-avoid-testifying-in-sec-probe-of-twitter-buyout-by-playing-victim

Musk can’t avoid testifying in SEC probe of Twitter buyout by playing victim

Musk can’t avoid testifying in SEC probe of Twitter buyout by playing victim

After months of loudly protesting a subpoena, Elon Musk has once again agreed to testify in the US Securities and Exchange Commission’s investigation into his acquisition of Twitter (now called X).

Musk tried to avoid testifying by arguing that the SEC had deposed him twice before, telling a US district court in California that the most recent subpoena was “the latest in a long string of SEC abuses of its investigative authority.”

But the court did not agree that Musk testifying three times in the SEC probe was either “abuse” or “overly burdensome.” Especially since the SEC has said it’s seeking a follow-up deposition after receiving “thousands of new documents” from Musk and third parties over the past year since his last depositions. And according to an order requiring Musk and the SEC to agree on a deposition date from US district judge Jacqueline Scott Corley, “Musk’s lament does not come close to meeting his burden of proving ‘the subpoena was issued in bad faith or for an improper purpose.'”

“Under Musk’s theory of reasonableness, the SEC must wait to depose a percipient witness until it has first gathered all relevant documents,” Corley wrote in the order. “But the law does not support that theory. Nor does common sense. In an investigation, the initial depositions can help an agency identify what documents are relevant and need to be requested in the first place.”

Corley’s court filing today shows that Musk didn’t even win his fight to be deposed remotely. He has instead agreed to sit for no more than five hours in person, which the SEC argued “will more easily allow for assessment of Musk’s demeanor and be more efficient as it avoids delays caused by technology.” (Last month, Musk gave a remote deposition where the Internet cut in and out, and Musk repeatedly dropped off the call.)

Musk’s deposition will be scheduled by mid-July. He is expected to testify on his Twitter stock purchases prior to his purchase of the platform, as well as his other investments surrounding the acquisition.

The SEC has been probing Musk’s Twitter stock purchases to determine if he violated a securities law that requires disclosures within 10 days from anyone who buys more than a 5 percent stake in a company. Musk missed that deadline by 11 days, as he amassed close to a 10 percent stake, and a proposed class action lawsuit from Twitter shareholders has suggested that he intentionally missed the deadline to keep Twitter stock prices artificially low while preparing for his Twitter purchase.

In an amended complaint filed this week, an Oklahoma firefighters pension fund—which sold more than 14,000 Twitter shares while Musk went on his buying spree—laid out Musk’s alleged scheme. The firefighters claim that the “goal” of Musk’s strategy was to purchase Twitter “cost effectively” and that this scheme was carried out by an unnamed Morgan Stanley banker who was motivated “to acquire billions of dollars of Twitter securities without tipping off the market” to curry favor with Musk.

As a seeming result, the firefighters’ complaint alleged that Morgan Stanley “pocketed over $1,460,000 in commissions just for executing” the “secret Twitter stock acquisition scheme.” And Morgan Stanley’s work seemingly pleased Musk so much that he went back for financial advising on the Twitter deal, the complaint alleged, paying Morgan Stanley an “estimated $42 million in fees.”

Messages from the banker show he was determined to keep the trading “absofuckinglutely quiet” to avoid the prospect that “anyone sniff anything out.”

Because of this secrecy, Twitter “investors suffered enormous damages” when Musk “belatedly disclosed his Twitter interests,” and “the price of Twitter’s stock predictably skyrocketed,” the complaint said.

“Ultimately, Musk went from owning zero shares of Twitter stock as of January 28, 2022 to spending over $2.6 billion to secretly acquire over 70 million shares” on April 4, 2022, the complaint said.

Musk can’t avoid testifying in SEC probe of Twitter buyout by playing victim Read More »

washing-machine-chime-scandal-shows-how-absurd-youtube-copyright-abuse-can-get

Washing machine chime scandal shows how absurd YouTube copyright abuse can get

Washing machine chime scandal shows how absurd YouTube copyright abuse can get

YouTube’s Content ID system—which automatically detects content registered by rightsholders—is “completely fucking broken,” a YouTuber called “Albino” declared in a rant on X (formerly Twitter) viewed more than 950,000 times.

Albino, who is also a popular Twitch streamer, complained that his YouTube video playing through Fallout was demonetized because a Samsung washing machine randomly chimed to signal a laundry cycle had finished while he was streaming.

Apparently, YouTube had automatically scanned Albino’s video and detected the washing machine chime as a song called “Done”—which Albino quickly saw was uploaded to YouTube by a musician known as Audego nine years ago.

But when Albino hit play on Audego’s song, the only thing that he heard was a 30-second clip of the washing machine chime. To Albino it was obvious that Audego didn’t have any rights to the jingle, which Dexerto reported actually comes from the song “Die Forelle” (“The Trout”) from Austrian composer Franz Schubert.

The song was composed in 1817 and is in the public domain. Samsung has used it to signal the end of a wash cycle for years, sparking debate over whether it’s the catchiest washing machine song and inspiring at least one violinist to perform a duet with her machine. It’s been a source of delight for many Samsung customers, but for Albino, hearing the jingle appropriated on YouTube only inspired ire.

“A guy recorded his fucking washing machine and uploaded it to YouTube with Content ID,” Albino said in a video on X. “And now I’m getting copyright claims” while “my money” is “going into the toilet and being given to this fucking slime.”

Albino suggested that YouTube had potentially allowed Audego to make invalid copyright claims for years without detecting the seemingly obvious abuse.

“How is this still here?” Albino asked. “It took me one Google search to figure this out,” and “now I’m sharing revenue with this? That’s insane.”

At first, Team YouTube gave Albino a boilerplate response on X, writing, “We understand how important it is for you. From your vid, it looks like you’ve recently submitted a dispute. When you dispute a Content ID claim, the person who claimed your video (the claimant) is notified and they have 30 days to respond.”

Albino expressed deep frustration at YouTube’s response, given how “egregious” he considered the copyright abuse to be.

“Just wait for the person blatantly stealing copyrighted material to respond,” Albino responded to YouTube. “Ah okay, yes, I’m sure they did this in good faith and will make the correct call, though it would be a shame if they simply clicked ‘reject dispute,’ took all the ad revenue money and forced me to risk having my channel terminated to appeal it!! XDxXDdxD!! Thanks Team YouTube!”

Soon after, YouTube confirmed on X that Audego’s copyright claim was indeed invalid. The social platform ultimately released the claim and told Albino to expect the changes to be reflected on his channel within two business days.

Ars could not immediately reach YouTube or Albino for comment.

Widespread abuse of Content ID continues

YouTubers have complained about abuse of Content ID for years. Techdirt’s Timothy Geigner agreed with Albino’s assessment that the YouTube system is “hopelessly broken,” noting that sometimes content is flagged by mistake. But just as easily, bad actors can abuse the system to claim “content that simply isn’t theirs” and seize sometimes as much as millions in ad revenue.

In 2021, YouTube announced that it had invested “hundreds of millions of dollars” to create content management tools, of which Content ID quickly emerged as the platform’s go-to solution to detect and remove copyrighted materials.

At that time, YouTube claimed that Content ID was created as a “solution for those with the most complex rights management needs,” like movie studios and record labels whose movie clips and songs are most commonly uploaded by YouTube users. YouTube warned that without Content ID, “rightsholders could have their rights impaired and lawful expression could be inappropriately impacted.”

Since its rollout, more than 99 percent of copyright actions on YouTube have consistently been triggered automatically through Content ID.

And just as consistently, YouTube has seen widespread abuse of Content ID, terminating “tens of thousands of accounts each year that attempt to abuse our copyright tools,” YouTube said. YouTube also acknowledged in 2021 that “just one invalid reference file in Content ID can impact thousands of videos and users, stripping them of monetization or blocking them altogether.”

To help rightsholders and creators track how much copyrighted content is removed from the platform, YouTube started releasing biannual transparency reports in 2021. The Electronic Frontier Foundation (EFF), a nonprofit digital rights group, applauded YouTube’s “move towards transparency” while criticizing YouTube’s “claim that YouTube is adequately protecting its creators.”

“That rings hollow,” EFF reported in 2021, noting that “huge conglomerates have consistently pushed for more and more restrictions on the use of copyrighted material, at the expense of fair use and, as a result, free expression.” As EFF saw it then, YouTube’s Content ID system mainly served to appease record labels and movie studios, while creators felt “pressured” not to dispute Content ID claims out of “fear” that their channel might be removed if YouTube consistently sided with rights holders.

According to YouTube, “it’s impossible for matching technology to take into account complex legal considerations like fair use or fair dealing,” and that impossibility seemingly ensures that creators bear the brunt of automated actions even when it’s fair to use copyrighted materials.

At that time, YouTube described Content ID as “an entirely new revenue stream from ad-supported, user generated content” for rights holders, who made more than $5.5 billion from Content ID matches by December 2020. More recently, YouTube reported that figure climbed above $9 million, as of December 2022. With so much money at play, it’s easy to see how the system could be seen as disproportionately favoring rights holders, while creators continue to suffer from income diverted by the automated system.

Washing machine chime scandal shows how absurd YouTube copyright abuse can get Read More »

as-bird-flu-spreads-in-cows,-us-close-to-funding-moderna’s-mrna-h5-vaccine

As bird flu spreads in cows, US close to funding Moderna’s mRNA H5 vaccine

New jab —

If trials are successful, US government likely to buy doses for vaccine stockpile.

Testing for bird flu, conceptual image

Digicom Photo/Science Photo Library via Getty

The US government is nearing an agreement to bankroll a late-stage trial of Moderna’s mRNA pandemic bird flu vaccine, hoping to bolster its pandemic jab stockpile as an H5N1 outbreak spreads through egg farms and among cattle herds.

The federal funding from the government’s Biomedical Advanced Research and Development Authority, known as BARDA, could come as early as next month, according to people close to the discussions.

It is expected to total several tens of millions of dollars and could be accompanied by a commitment to procure doses if the phase-three trials are successful, they said.

Talks between the government and Pfizer over supporting the development of its mRNA vaccine targeting the H5 family of viruses are also ongoing. Pfizer, like Moderna, played a pivotal role in supplying mRNA vaccines for Washington’s jab rollout during the COVID-19 pandemic.

Bird flu has been detected on poultry farms in 48 states and in dairy cow herds across nine states as part of one of the worst outbreaks in recent history, according to the US Centers for Disease Control and Prevention. The CDC has also reported two cases affecting dairy workers in recent months, adding to concerns of the virus spreading in human populations.

US health authorities continue to classify the public health risk from bird flu as low, but their efforts to build up and diversify the pandemic vaccine stockpile have gathered pace. Federal health officials said last week that the government was moving ahead with plans to fill 4.8 million vials from its existing portfolio of protein-based bird flu vaccines and was in discussions with Moderna and Pfizer.

The possibility of contributing to the US pandemic vaccine stockpile also represents a commercial opportunity for the mRNA vaccine makers, whose market valuations have fallen significantly from pandemic highs. Moderna’s share price is up nearly 37 percent since the start of April.

Moderna has completed dosing of a mid-stage trial of its H5 pandemic flu vaccine, with interim data expected soon. Pfizer said in a statement on Wednesday that it “would be prepared to deploy the company’s capabilities to develop a vaccine for strategic stockpiles,” confirming that it had launched a phase-one trial for a pandemic flu vaccine last December.

Applications for BARDA grant funding for an mRNA-based pandemic flu vaccine closed in December last year, according to a project proposal seen by the Financial Times. But the bird flu outbreak has increased the urgency of talks, with federal officials acknowledging that the speed with which mRNA vaccines were designed and deployed during the COVID-19 pandemic showed their value compared with more traditional vaccine technology.

The jabs from GSK, Sanofi, and CSL Seqirus, which make up the US government’s existing pandemic vaccine portfolio, provide immunity to the current strain of bird flu, according to laboratory testing, but rely on a more time-intensive manufacturing process using egg- and cell-based cultures.

The US health department, Moderna, and Pfizer declined to comment on the potential funding.

© 2024 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

As bird flu spreads in cows, US close to funding Moderna’s mRNA H5 vaccine Read More »

used-teslas-are-getting-very-cheap,-but-buying-one-can-be-risky

Used Teslas are getting very cheap, but buying one can be risky

how many miles? —

As used Teslas drop in price, are they a bargain or buyer beware?

A white Tesla Model 3 in slightly used condition

Enlarge / Used Tesla Model 3s can be had for less than $20,000 now.

Getty Images

The launch of a new electric vehicle these days is invariably met with a chorus of “this car is too expensive”—and rightfully so. But for used EVs, it’s quite another story, particularly used Teslas, thanks to a glut of former fleet and rental cars that are now ready for their second owner.

“Due to a variety of reasons, Tesla resale values have plummeted, making many Tesla models very affordable now. Plus, for some consumers, an additional $4,000 Federal tax credit on used EVs may apply, sweetening the deal even further. Buying a used Tesla can be a great deal for the savvy shopper, but there are significant things to look out for,” says Ed Kim, president and chief analyst at AutoPacific.

Indeed, a quick search on the topic easily reveals some horror stories of ex-rental Teslas, so here are some things to consider if you’re in search of a cheap Model 3 or Model Y.

For more than a year, Tesla has been engaged in an EV price war, mostly driven by its attempt to maintain sales in China. Heavily cutting the price of your new cars is a good way to devalue the used ones, and Hertz’s decision to sell at least 20,000 of its Teslas was in part a response to the lower residual values.

What to watch for

“The prices are very appealing, but shoppers must keep in mind that rental cars can and do get abused, and some of these ex-rental units may have nasty surprises stemming from their hard lives. Be sure to have yours checked out thoroughly by a mechanic before buying,” Kim says.

Mismatched tires and minor dents, scrapes, and rock chips are fairly common minor issues. Many of the Teslas that Hertz is selling have been used as Ubers—you can tell it’s one of these if the odometer is approaching 100,000 miles. Battery degradation could be an issue, although most cars will not have lost more than 4–5 percent capacity, and Long Range Teslas should have a powertrain warranty for up to 120,000 miles (or eight years).

“One side effect of Tesla’s widespread and reliable DC fast charging network is that many owners end up relying on it to keep their cars charged rather than dealing with the often considerable expense of installing a home charger and associated home electrical upgrades,” Kim told Ars. As such, you should make sure to check the battery’s health (which can be done on the touchscreen or as part of the inspection) before you buy.

Rental cars can suffer from an excess of slammed doors and trunks—slamming the latter can mess up the powered strut. In the interior, you should expect high signs of wear on some touchpoints, especially the steering wheel and the rear door cards, which can bubble or flake, particularly if the Tesla was used as a ridehailing vehicle.

Other potential headaches

Teslas are very connected cars, and many of their convenience features are accessed via smartphone apps. But that requires that Tesla’s database shows you as the car’s owner, and there are plenty of reports online that transferring ownership from Hertz can take time.

Unfortunately, this also leaves the car stuck in Chill driving mode (which restricts power, acceleration, and top speed) and places some car settings outside of the new owner’s level of access. You also won’t be able to use Tesla Superchargers while the car still shows up as belonging to Hertz. Based on forum reports, contacting Tesla directly is the way to resolve this, but it can take several days to process; longer if there’s a paperwork mismatch.

Once you’ve transferred ownership to Tesla’s satisfaction, it’s time to do a software reset on the car to remove the fleet version.

Not every car will qualify for the $4,000 IRS used clean vehicle tax credit. It has to be at least two model years older than the calendar year in which it is bought used, so only MY2022 and earlier EVs are currently eligible, and it can’t be offered for sale for more than $25,000. The income caps are also half as much as the new clean vehicle tax credit, meaning a single-filing individual can’t earn more than $75,000 a year to qualify.

There are plenty of complaints among the Tesla community that Hertz wasn’t set up to deal with the tax credit, although more recent buyers have reported this has gotten a lot smoother. It’s worth planning ahead and contacting the specific sales branch you plan to buy the Tesla from to make sure they are able to process that paperwork, particularly if you are expecting the credit to be applied to the car’s price at the point of sale rather than waiting until you file your taxes next year.

Buying an ex-rental or ex-fleet Tesla from an independent dealer is also an option. Lots of used car lots have bought Teslas at auction from Hertz and elsewhere, and online anecdotes suggest this is often a more painless experience, particularly when transferring ownership and registering the new owner with Tesla. Then again, you’re more likely to encounter useless third-party warranties and the like if you go this route.

Ex-rental or fleet cars may have had a hard life, but they are also usually maintained far more regularly than most privately owned vehicles. As long as you make sure you aren’t buying a lemon, it’s a good way to get an EV for less than $20,000.

Used Teslas are getting very cheap, but buying one can be risky Read More »

openai:-fallout

OpenAI: Fallout

Previously: OpenAI: Exodus (contains links at top to earlier episodes), Do Not Mess With Scarlett Johansson

We have learned more since last week. It’s worse than we knew.

How much worse? In which ways? With what exceptions?

That’s what this post is about.

For years, employees who left OpenAI consistently had their vested equity explicitly threatened with confiscation and the lack of ability to sell it, and were given short timelines to sign documents or else. Those documents contained highly aggressive NDA and non disparagement (and non interference) clauses, including the NDA preventing anyone from revealing these clauses.

No one knew about this until recently, because until Daniel Kokotajlo everyone signed, and then they could not talk about it. Then Daniel refused to sign, Kelsey Piper started reporting, and a lot came out.

Here is Altman’s statement from May 18, with its new community note.

Evidence strongly suggests the above post was, shall we say, ‘not consistently candid.’

The linked article includes a document dump and other revelations, which I cover.

Then there are the other recent matters.

Ilya Sutskever and Jan Leike, the top two safety researchers at OpenAI, resigned, part of an ongoing pattern of top safety researchers leaving OpenAI. The team they led, Superalignment, had been publicly promised 20% of secured compute going forward, but that commitment was not honored. Jan Leike expressed concerns that OpenAI was not on track to be ready for even the next generation of models needs for safety.

OpenAI created the Sky voice for GPT-4o, which evoked consistent reactions that it sounded like Scarlett Johansson, who voiced the AI in the movie Her, Altman’s favorite movie. Altman asked her twice to lend her voice to ChatGPT. Altman tweeted ‘her.’ Half the articles about GPT-4o mentioned Her as a model. OpenAI executives continue to claim that this was all a coincidence, but have taken down the Sky voice.

(Also six months ago the board tried to fire Sam Altman and failed, and all that.)

The source for the documents from OpenAI that are discussed here, and the communications between OpenAI and its employees and ex-employees, is Kelsey Piper in Vox, unless otherwise stated.

She went above and beyond, and shares screenshots of the documents. For superior readability and searchability, I have converted those images to text.

OpenAI has indeed made a large positive step. They say they are releasing former employees from their nondisparagement agreements and promising not to cancel vested equity under any circumstances.

Kelsey Piper: There are some positive signs that change is happening at OpenAI. The company told me, “We are identifying and reaching out to former employees who signed a standard exit agreement to make it clear that OpenAI has not and will not cancel their vested equity and releases them from nondisparagement obligations.”

Bloomberg confirms that OpenAI has promised not to cancel vested equity under any circumstances, and to release all employees from one-directional non-disparagement agreements.

And we have this confirmation from Andrew Carr.

Andrew Carr: I guess that settles that.

Tanner Lund: Is this legally binding?

Andrew Carr:

I notice they are also including the non-solicitation provisions as not enforced.

(Note that certain key people, like Dario Amodei, plausibly negotiated two-way agreements, which would mean theirs would still apply. I would encourage anyone in that category who is now free of the clause, even if they have no desire to disparage OpenAI, to simply say ‘I am under no legal obligation not to disparage OpenAI.’)

These actions by OpenAI are helpful. They are necessary.

They are not sufficient.

First, the statement is not legally binding, as I understand it, without execution of a new agreement. No consideration was given, and this is not so formal, and it is unclear whether the statement author has authority in the matter.

Even if it was binding as written, it says they do not ‘intend’ to enforce. Companies can change their minds, or claim to change them, when circumstances change.

It also does not mention the ace in the hole, which is the ability to deny access to tender offers, or other potential retaliation by Altman or OpenAI. Until an employee has fully sold their equity, they are still in a bind. Even afterwards, a company with this reputation cannot be trusted to not find other ways to retaliate.

Nor does it mention the clause of right to repurchase for ‘fair market value’ that OpenAI claims it has the right to do, noting that their official ‘fair market value’ of shares is $0. Altman’s statement does not mention this at all, including the possibility it has already happened.

I mean, yeah, I also would in many senses like to see them try that one, but this does not give ex-employees much comfort.

A source of Kelsey Piper’s close to OpenAI: [Those] documents are supposed to be putting the mission of building safe and beneficial AGI first but instead they set up multiple ways to retaliate against departing employees who speak in any way that criticizes the company.

Then there is the problem of taking responsibility. OpenAI is at best downplaying what happened. Certain statements sure look like lies. To fully set things right, one must admit responsibility. Truth and reconciliation requires truth.

Here is Kelsey with the polite version.

Kelsey Piper: But to my mind, setting this right requires admitting its full scope and accepting full responsibility. OpenAI’s initial apology implied that the problem was just ‘language in exit documents’. Our leaked docs prove there was a lot more going on than just that.

OpenAI used many different aggressive legal tactics and has not yet promised to stop using all of them. And serious questions remain about how OpenAI’s senior leadership missed this while signing documents that contained language that laid it out. The company’s apologies so far have minimized the scale of what happened. In order to set this right, OpenAI will need to first admit how extensive it was.

If I were an ex-employee, no matter what else I would do, I would absolutely sell my equity at the next available tender opportunity. Why risk it?

Indeed, here is a great explanation of the practical questions at play. If you want to fully make it right, and give employees felt freedom to speak up, you have to mean it.

Jacob Hilton: When I left OpenAI a little over a year ago, I signed a non-disparagement agreement, with non-disclosure about the agreement itself, for no other reason than to avoid losing my vested equity.

The agreement was unambiguous that in return for signing, I was being allowed to keep my vested equity, and offered nothing more. I do not see why anyone would have signed it if they had thought it would have no impact on their equity.

I left OpenAI on great terms, so I assume this agreement was imposed upon almost all departing employees. I had no intention to criticize OpenAI before I signed the agreement, but was nevertheless disappointed to give up my right to do so.

Yesterday, OpenAI reached out to me to release me from this agreement, following Kelsey Piper’s excellent investigative reporting.

Because of the transformative potential of AI, it is imperative for major labs developing advanced AI to provide protections for those who wish to speak out in the public interest.

First among those is a binding commitment to non-retaliation. Even now, OpenAI can prevent employees from selling their equity, rendering it effectively worthless for an unknown period of time.

In a statement, OpenAI has said, “Historically, former employees have been eligible to sell at the same price regardless of where they work; we don’t expect that to change.”

I believe that OpenAI has honest intentions with this statement. But given that OpenAI has previously used access to liquidity as an intimidation tactic, many former employees will still feel scared to speak out.

I invite OpenAI to reach out directly to former employees to clarify that they will always be provided equal access to liquidity, in a legally enforceable way. Until they do this, the public should not expect candor from former employees.

To the many kind and brilliant people at OpenAI: I hope you can understand why I feel the need to speak publicly about this. This contract was inconsistent with our shared commitment to safe and beneficial AI, and you deserve better.

Jacob Hinton is giving every benefit of the doubt to OpenAI here. Yet he notices that the chilling effects will be large.

Jeremy Schlatter: I signed a severance agreement when I left OpenAI in 2017. In retrospect, I wish I had not signed it.

I’m posting this because there has been coverage of OpenAI severance agreements recently, and I wanted to add my perspective.

I don’t mean to imply that my situation is the same as those in recent coverage. For example, I worked at OpenAI while it was still exclusively a non-profit, so I had no equity to lose.

Was this an own goal? Kelsey initially thought it was, then it is explained why the situation is not so clear cut as that.

Kelsey Piper: Really speaks to how profoundly the “ultra restrictive secret NDA or lose your equity” agreement was an own goal for OpenAI – I would say a solid majority of the former employees affected did not even want to criticize the company, until it threatened their compensation.

A former employee reached out to me to push back on this. It’s true that most don’t want to criticize the company even without the NDA, they told me, but not because they have no complaints – because they fear even raising trivial ones.

“I’ve heard from former colleagues that they are reluctant to even discuss OpenAI’s model performance in a negative way publicly, for fear of being excluded from future tenders.”

Speaks to the importance of the further steps Jacob talks about.

There are big advantages to being generally seen as highly vindictive, as a bad actor willing to do bad things if you do not get your way. Often that causes people to proactively give you what you want and avoid threatening your interests, with no need to do anything explicit. Many think this is how one gets power, and that one should side with power and with those who act in such fashion.

There also is quite a lot of value in controlling the narrative, and having leverage over those close to you, that people look to for evidence, and keeping that invisible.

What looks like a mistake could be a well-considered strategy, and perhaps quite a good bet. Most companies that use such agreements do not have them revealed. If it was not for Daniel, would not the strategy still be working today?

And to state the obvious: If Sam Altman and OpenAI lacked any such leverage in November, and everyone had been free to speak their minds, does it not seem plausible (or if you include the board, rather probable) that the board’s firing of Altman would have stuck without destroying the company, as ex-employees (and board members) revealed ways in which Altman had been ‘not consistently candid’?

Oh my.

Neel Nanda (referencing Hilton’s thread): I can’t believe that OpenAI didn’t offer *anypayment for signing the non-disparage, just threats…

This makes it even clearer that Altman’s claims of ignorance were lies – he cannot possibly have believed that former employees unanimously signed non-disparagements for free!

Kelsey Piper: One of the most surreal moments of my life was reading through the termination contract and seeing…

The Termination Contract: NOW, THEREFORE, in consideration of the mutual covenants and promises herein contained and other good and valuable consideration, receipt of which is hereby acknowledged, and to avoid unnecessary litigation, it is hereby agreed by and between OpenAI and Employee (jointly referred to as “the Parties”) as follows:

  1. In consideration for this Agreement:

    1. Employee will retain all equity Units, if any, vested as of the Termination Date pursuant to the terms of the applicable Unit Grant Agreements.

  2. Employee agrees that the foregoing shall constitute an accord and satisfaction and a full and complete settlement of Employee’s claims, shall constitute the entire amount of monetary consideration, including any equity component (if applicable), provided to Employee under this Agreement, and that Employee will not seek any further compensation for any other claimed damage, outstanding obligations, costs or attorneys’ fees in connection with the matters encompassed in this Agreement… [continues]

Neel Nanda: Wow, I didn’t realise it was that explicit in the contract! How on earth did OpenAI think they were going to get away with this level of bullshit? Offering something like, idk, 1-2 months of base salary would have been cheap and made it a LITTLE bit less outrageous.

It does not get more explicit than that.

I do appreciate the bluntness and honest here, of skipping the nominal consideration.

What looks the most implausible are claims that the executives did not know what was going on regarding the exit agreements and legal tactics until February 2024.

Kelsey Piper: Vox reviewed separation letters from multiple employees who left the company over the last five years. These letters state that employees have to sign within 60 days to retain their vested equity. The letters are signed by former VP Diane Yoon and general counsel Jason Kwon.

The language on separation letters – which reads, “If you have any vested Units… you are required to sign a release of claims agreement within 60 days in order to retain such Units.” has been present since 2019.

OpenAI told me that the company noticed in February, putting Kwon, OpenAI’s general counsel and Chief Strategy Officer, in the unenviable position of insisting that for five years he missed a sentence in plain English on a one-page document he signed dozens of times.

Matthew Roche: This cannot be true.

I have been a tech CEO for years, and have never seen that it in an option plan doc or employment letter. I find it extremely unlikely that some random lawyer threw it in without prompting or approval by the client.

Kelsey Piper: I’ve spoken to a handful of tech CEOs in the last few days and asked them all “could a clause like that be in your docs without your knowledge?” All of them said ‘no’.

Kelsey Piper’s Vox article is brutal on this, and brings the receipts. The ultra-restrictive NDA, with its very clear and explicit language of what is going on, is signed by COO Brad Lightcap. The notices that one must sign it are signed by (now departed) OpenAI VP of people Diane Yoon. The incorporation documents that include extraordinary clawback provisions are signed by Sam Altman.

There is also the question of how this language got into the exit agreements in the first place, and also the corporate documents, if the executives were not in the loop. This was not a ‘normal’ type of clause, the kind of thing lawyers sneak in without consulting you, even if you do not read the documents you are signing.

California employment law attorney Chambord Benton-Hayes: For a company to threaten to claw back already-vested equity is egregious and unusual.

Kelsey Piper on how she reported the story: Reporting is full of lots of tedious moments, but then there’s the occasional “whoa” moment. Reporting this story had three major moments of “whoa.” The first is when I reviewed an employee termination contract and saw it casually stating that as “consideration” for signing this super-strict agreement, the employee would get to keep their already vested equity. That might not mean much to people outside the tech world, but I knew that it meant OpenAI had crossed a line many in tech consider close to sacred.

The second “whoa” moment was when I reviewed the second termination agreement sent to one ex-employee who’d challenged the legality of OpenAI’s scheme. The company, rather than defending the legality of its approach, had just jumped ship to a new approach.

That led to the third “whoa” moment. I read through the incorporation document that the company cited as the reason it had the authority to do this and confirmed that it did seem to give the company a lot of license to take back vested equity and block employees from selling it. So I scrolled down to the signature page, wondering who at OpenAI had set all this up. The page had three signatures. All three of them were Sam Altman. I slacked my boss on a Sunday night, “Can I call you briefly?”

OpenAI claims they noticed the problem in February, and began updating in April.

Kelsey Piper showed language of this type in documents as recently as April 29, 2024, signed by OpenAI COO Brad Lightcap.

The documents in question, presented as standard exit ‘release of claims’ documents that everyone signs, include extensive lifetime non disparagement clauses, an NDA that covers revealing the existence of either the NDA or the non disparagement clause, and a non-interference clause.

Kelsey Piper: Leaked emails reveal that when ex-employees objected to the specific terms of the ‘release of claims’ agreement, and asked to sign a ‘release of claims’ agreement without the nondisclosure and secrecy clauses, OpenAI lawyers refused.

Departing Employee Email: I understand my contractual obligations to maintain confidential information and trade secrets. I would like to assure you that I have no intention and have never had any intention of sharing trade secrets with OpenAl competitors.

I would be willing to sign the termination paperwork documents except for the current form of the general release as I was sent on 2024.

I object to clauses 10, 11 and 14 of the general release. I would be willing to sign a version of the general release which excludes those clauses. I believe those clauses are not in my interest to sign, and do not understand why they have to be part of the agreement given my existing obligations that you outlined in your letter.

I would appreciate it if you could send a copy of the paperwork with the general release amended to exclude those clauses.

Thank you,

[Quoted text hidden]

OpenAI Replies: I’m here if you ever want to talk. These are the terms that everyone agrees to (again — this is not targeted at you). Of course, you’re free to not sign. Please let me know if you change your mind and want to sign the version we’ve already provided.

Best,

[Quoted text hidden]

Here is what it looked like for someone to finally decline to sign.

Departing Employee: I’ve looked this over and thought about it for a while and have decided to decline to sign. As previously mentioned, I want to reserve the right to criticize OpenAl in service of the public good and OpenAl’s own mission, and signing this document appears to limit my ability to do so. I certainly don’t intend to say anything false, but it seems to me that I’m currently being asked to sign away various rights in return for being allowed to keep my vested equity. It’s a lot of money, and an unfair choice to have to make, but I value my right to constructively criticize OpenAl more. I appreciate your warmth towards me in the exit interview and continued engagement with me thereafter, and wish you the best going forward.

Thanks,

P.S. I understand your position is that this is standard business practice, but that doesn’t sound right, and I really think a company building something anywhere near as powerful as AGI should hold itself to a higher standard than this – that is, it should aim to be genuinely worthy of public trust. One pillar of that worthiness is transparency, which you could partially achieve by allowing employees and former employees to speak out instead of using access to vested equity to shut down dissenting concerns.

OpenAI HR responds: Hope you had a good weekend, thanks for your response.

Please remember that the confidentiality agreement you signed at the start of your employment (and that we discussed in our last sync) remains in effect regardless of the signing of the offboarding documents.

We appreciate your contributions and wish you the best in your future endeavors. If you have any further questions or need clarification, feel free to reach out.

OpenAI HR then responds (May 17th, 2: 56pm, after this blew up): Apologies for some potential ambiguity in my last message!

I understand that you may have some questions about the status of your vested profit units now that you have left OpenAI. I want to be clear that your vested equity is in your Shareworks account, and you are not required to sign your exit paperwork to retain the equity. We have updated our exit paperwork to make this point clear.

Please let me know if you have any questions.

Best, [redacted]

Some potential ambiguity, huh. What a nice way of putting it.

Even if we accepted on its face the claim that this was unintentional and unknown to management until February, which I find highly implausible at best, that is no excuse.

Jason Kwan (OpenAI Chief Strategist): The team did catch this ~month ago. The fact that it went this long before the catch is on me.

Again, even if you are somehow telling the truth here, what about after the catch?

Two months is more than enough time to stop using these pressure tactics, and to offer ‘clarification’ to employees. I would think it was also more than enough time to update the documents in question, if OpenAI intended to do that.

They only acknowledged the issue, and only stopped continuing to act this way, after the reporting broke. After that, the ‘clarifications’ came quickly. Then, as far as we can tell, the actually executed new agreements and binding contracts will come never. Does never work for you?

Here we have OpenAI’s lawyer refusing to extend a unilaterally imposed seven day deadline to sign the exit documents, discouraging the ex-employee from consulting with an attorney.

Kelsey Piper: Legal experts I spoke to for this story expressed concerns about the professional ethics implications of OpenAI’s lawyers persuading employees who asked for more time to seek outside counsel to instead “chat live to cover your questions” with OpenAI’s own attorneys.

Reply Email from Lawyer for OpenAI to a Departing Employee: You mentioned wanting some guidance on the implications of the release agreement. To reiterate what [redacted] shared- I think it would be helpful to chat live to cover your questions. All employees sign these exit docs. We are not attempting to do anything different or special to you simply because you went to a competitor. We want to make sure you understand that if you don’t sign, it could impact your equity. That’s true for everyone, and we’re just doing things by the book.

Best regards, [redacted].

Kelsey Piper: (The person who wrote and signed the above email is, according to the state bar association of California, a licensed attorney admitted to the state bar.)

To be clear, here was the request which got this response:

Original Email: Hi [redacted[.

Sorry to be a bother about this again but would it be possible to have another week to look over the paperwork, giving me the two weeks I originally requested? I still feel like I don’t fully understand the implications of the agreement without obtaining my own legal advice, and as I’ve never had to find legal advice before this has taken time for me to obtain.

Kelsey Piper: The employee did not ask for ‘guidance’! The employee asked for time to get his own representation!

Leah Libresco Sargeant: Not. Consistently. Candid. OpenAI not only threatened to strip departing employees of equity if they didn’t sign an over broad NDA, they offered these terms as an exploding 7-day termination contract.

This was not a misunderstanding.

Kelsey Piper has done excellent work, and kudos to her sources for speaking up.

If you can’t be trusted with basic employment ethics and law, how can you steward AI?

I had the opportunity to talk to someone whose job involves writing up and executing employment agreements of the type used here by OpenAI. They reached out, before knowing about Kelsey Piper’s article, specifically because they wanted to make the case that what OpenAI did was mostly standard practice. They generally attempted, prior to reading that article, to make the claim that what OpenAI did was within the realm of acceptable practice. If you get equity you should expect to sign a non-disparagement clause, and they explicitly said they would be surprised if Anthropic was not doing it as well.

They did not think that ‘release of claims’ being then interpreted by OpenAI as ‘you can never say anything bad about us ever for any reason or tell anyone that you agreed to this’ was also fair game.

Their argument was that if you sign something like that without talking to a lawyer first that is on you. You have opened the door to any clause. Never mind what happens when you raise objections and consult lawyers during onboarding at a place like OpenAI, it would be unheard of for a company to treat that as a red flag or rescind your offer.

That is very much a corporate lawyer’s view of what is wise and unwise paranoia, and what is and is not acceptable practice.

Even that lawyer said that a 7 day exploding period was highly unusual, and that it was seriously not fine. A 21 day exploding period is not atypical for an exploding contract in general, but that gives time for a lawyer to be consulted. Confining to a week is seriously messed up.

It also is not what the original contract said, which was that you had 60 days. As Kelsey Piper points out, no you cannot spring a 7 day period on someone when the original contract said 60.

Nor was it a threat they honored when called on it, they always extended, with this as an example:

From OpenAI: The General Release and Separation Agreement requires your signature within 7 days from your notification date. The 7 days stated in the General Release supersedes the 60 day signature timeline noted in your separation letter.

That being said, in this case, we will grant an exception for an additional week to review. I’ll cancel the existing Ironclad paperwork, and re-issue it to you with the new date.

Best.

[Redacted at OpenAI.com]

Eliezer Yudkowsky: 😬

And they very clearly tried to discourage ex-employees from consulting a lawyer.

Even if all of it is technically legal, there is no version of this that isn’t scummy as hell.

Control over tender offers means that ultimately anyone with OpenAI equity, who wants to use that equity for anything any time soon (or before AGI comes around) is going to need OpenAI’s permission. OpenAI very intentionally makes that conditional, and holds it over everyone as a threat.

When employees pushed back on the threat to cancel their equity, Kelsey Piper reports that OpenAI instead changed to threatening to withhold participation in future tenders. Without participation in tenders, shares cannot be sold, making them of limited practical value. OpenAI is unlikely to pay dividends for a long time.

If you have any vested Units and you do not sign the exit documents, including the General Release, as required by company policy, it is important to understand that, among other things, you will not be eligible to participate in future tender events or other liquidity opportunities that we may sponsor or facilitate as a private company.

Among other things, a condition to participate in such opportunities is that you are in compliance with the LLC Agreement, the Aestas LLC Agreement, the Unit Grant Agreement and all applicable company policies, as determined by OpenAI.

In other words, if you ever violate any ‘applicable company policies,’ or realistically if you do anything we sufficiently like, or we want to retain our leverage over you, we won’t let you sell your shares.

This makes sense, given the original threat is on shaky legal ground and actually invoking it would give the game away even if OpenAI won.

Kelsey Piper: OpenAI’s original tactic – claiming that since you have to sign a general release, they can put whatever they want in the general release – is on legally shaky ground, to put it mildly. I spoke to five legal experts for this story and several were skeptical it would hold up.

But the new tactic might be on more solid legal ground. That’s because the incorporation documents for Aestas LLC – the holding company that handles equity for employees, investors, + the OpenAI nonprofit entity – are written to give OpenAI extraordinary latitude. (Vox has released this document too.)

And while Altman did not sign the termination agreements, he did sign the Aestas LLC documents that lay out this secondary legal avenue to coerce ex-employees. Altman has said that language about potentially clawing back vested equity from former employees “should never have been something we had in any documents or communication”.

No matter what other leverage they are giving up under pressure, the ace stays put.

Kelsey Piper: I asked OpenAI if they were willing to commit that no one will be denied access to tender offers because of failing to sign an NDA. The company said ““Historically, former employees have been eligible to sell at the same price regardless of where they work; we don’t expect that to change.”

‘Regardless of where they work’ is very much not ‘regardless of what they have signed’ or ‘whether they are playing nice with OpenAI.’ If they wanted to send a different impression, they could have done that.

David Manheim: Question for Sam Altman: Does OpenAI have non-disparagement agreements with board members or former board members?

If so, is Sam Altman willing to publicly release the text of any such agreements?

The answer to that is, presumably, the article in the Economist by Helen Toner and Tasha McCauley, former AI board members. Helen says they mostly wrote this before the events of the last few weeks, which checks with what I know about deadlines.

The content is not the friendliest, but unfortunately, even now, the statements continue to be non-specific. Toner and McCauley sure seem like they are holding back.

The board’s ability to uphold the company’s mission had become increasingly constrained due to long-standing patterns of behaviour exhibited by Mr Altman, which, among other things, we believe undermined the board’s oversight of key decisions and internal safety protocols.

Multiple senior leaders had privately shared grave concerns with the board, saying they believed that Mr Altman cultivated “a toxic culture of lying” and engaged in “behaviour [that] can be characterised as psychological abuse”.

The question of whether such behaviour should generally “mandate removal” of a ceo is a discussion for another time. But in OpenAI’s specific case, given the board’s duty to provide independent oversight and protect the company’s public-interest mission, we stand by the board’s action to dismiss Mr Altman.

Our particular story offers the broader lesson that society must not let the roll-out of ai be controlled solely by private tech companies.

We also know they are holding back because there are specific things we can be confident happened that informed the board’s actions, that are not mentioned here. For details, see my previous write-ups of what happened.

To state the obvious, if you stand by your decision to remove Altman, you should not allow him to return. When that happened, you were two of the four board members.

It is certainly a reasonable position to say that the reaction to Altman’s removal, given the way it was handled, meant that the decision to attempt to remove him was in error. Do not come at the king if you are going to miss, or the damage to the kingdom would be too great.

But then you don’t stand by it. What one could reasonably say is, if we still had the old board, and all of this new information came to light on top of what was already known, and there was no pending tender offer, and you had your communications ducks in a row, then you would absolutely fire Altman.

Indeed, it would be a highly reasonable decision, now, for the new board to fire Altman a second time based on all this, with better communications and its new gravitas. That is now up to the new board.

OpenAI famously promised 20% of its currently secured compute for its superalignment efforts. That was not a lot of their expected compute budget given growth in compute, but it sounded damn good, and was substantial in practice.

Fortune magazine reports that OpenAI never delivered the promised compute.

This is a big deal.

OpenAI made one loud, costly and highly public explicit commitment to real safety.

That promise was a lie.

You could argue that ‘the claim was subject to interpretation’ in terms of what 20% meant or that it was free to mostly be given out in year four, but I think this is Obvious Nonsense.

It was very clearly either within their power to honor that commitment, or they knew at the time of the commitment that they could not honor it.

OpenAI has not admitted that they did this, offered an explanation, or promised to make it right. They have provided no alternative means of working towards the goal.

This was certainly one topic on which Sam Altman was, shall we say, ‘not consistently candid.’

Indeed, we now know many things the board could have pointed to on that, in addition to any issues involving Altman’s attempts to take control of the board.

This is a consistent pattern of deception.

The obvious question is: Why? Why make a commitment like this then dishonor it?

Who is going to be impressed by the initial statement, and not then realize what happened when you broke the deal?

Kelsey Piper: It seems genuinely bizarre to me to make a public commitment that you’ll offer 20% of compute to Superalignment and then not do it. It’s not a good public commitment from a PR perspective – the only people who care at all are insiders who will totally check if you follow through.

It’s just an unforced error to make the promise at all if you might not wanna actually do it. Without the promise, “we didn’t get enough compute” sounds like normal intra-company rivalry over priorities, which no one else cares about.

Andrew Rettek: this makes sense if the promiser expects the non disparagement agreement to work…

Kelsey Piper (other subthread): Right, but “make a promise, refuse to clarify what you mean by it, don’t actually do it under any reasonable interpretations” seems like a bad plan regardless. I guess maybe they hoped to get people to shut up for three years hoping the compute would come in the fourth?

Indeed, if you think no one can check or will find out, then it could be a good move. You make promises you can’t keep, then alter the deal and tell people to pray you do not alter it any further.

That’s why all the legal restrictions on talking are so important. Not this fact in particular, but that one’s actions and communications change radically when you believe you can bully everyone into not talking.

Even Roon, he of ‘Sam Altman did nothing wrong’ in most contexts, realizes those NDA and non disparagement agreements are messed up.

Roon: NDAs that disallow you to mention the NDA seem like a powerful kind of antimemetic magic spell with dangerous properties for both parties.

That allow strange bubbles and energetic buildups that would otherwise not exist under the light of day.

Read closely, am I trying to excuse evil? I’m trying to root cause it.

It’s clear OpenAI fucked up massively, the mea culpas are warranted, I think they will make it right. There will be a lot of self reflection,

It is the last two sentences where we disagree. I sincerely hope I am wrong there.

Prerat: Everyone should have a canary page on their website that says “I’m not under a secret NDA that I can’t even mention exists” and then if you have to sign one you take down the page.

Stella Biderman: OpenAI is really good at coercing people into signing agreements and then banning them from talking about the agreement at all. I know many people in the OSS community that got bullied into signing such things as well, for example because they were the recipients of leaks.

The Washington Post reported a particular way they did not mess with her.

When OpenAI issued a casting call last May for a secret project to endow OpenAI’s popular ChatGPT with a human voice, the flier had several requests: The actors should be nonunion. They should sound between 25 and 45 years old. And their voices should be “warm, engaging [and] charismatic.”

One thing the artificial intelligence company didn’t request, according to interviews with multiple people involved in the process and documents shared by OpenAI in response to questions from The Washington Post: a clone of actress Scarlett Johansson.

The agent [for Sky], who spoke on the condition of anonymity, citing the safety of her client, said the actress confirmed that neither Johansson nor the movie “Her” were ever mentioned by OpenAI.

But Mark Humphrey, a partner and intellectual property lawyer at Mitchell, Silberberg and Knupp, said any potential jury probably would have to assess whether Sky’s voice is identifiable as Johansson’s.

To Jang, who spent countless hours listening to the actress and keeps in touch with the human actors behind the voices, Sky sounds nothing like Johansson, although the two share a breathiness and huskiness.

The story also has some details about ‘building the personality’ of ChatGPT for voice and hardcoding in some particular responses, such as if it was asked to be the user’s girlfriend.

Jang no doubt can differentiate Sky and Johansson under the ‘pictures of Joe Biden eating sandwiches’ rule, after spending months on this. Of course you can find differences. But to say that the two sound nothing alike is absurd, especially when so many people doubtless told her otherwise.

As I covered last time, if you do a casting call for 400 voice actors who are between 25 and 45, and pick the one most naturally similar to your target, that is already quite a lot of selection. No, they likely did not explicitly tell Sky’s voice actress to imitate anyone, and it is plausible she did not do it on her own either. Perhaps this really is her straight up natural voice. That doesn’t mean they didn’t look for and find a deeply similar voice.

Even if we take everyone in that post’s word for all of that, that would not mean, in the full context, that they are off the hook, based on my legal understanding, or my view of the ethics. I strongly disagree with those who say we ‘owe OpenAI an apology,’ unless at minimum we specifically accused OpenAI of the things OpenAI is reported as not doing.

Remember, in addition to all the ways we know OpenAI tried to get or evoke Scarlett Johansson, OpenAI had a policy explicitly saying that voices should be checked for similarity against major celebrities, and they have said highly implausible things repeatedly on this subject.

Gretchen Krueger resigned from OpenAI on May 14th, and thanks to OpenAI’s new policies, she can say some things. So she does, pointing out that OpenAI’s failures to take responsibility run the full gamot.

Gretchen Krueger: I gave my notice to OpenAI on May 14th. I admire and adore my teammates, feel the stakes of the work I am stepping away from, and my manager Miles Brundage has given me mentorship and opportunities of a lifetime here. This was not an easy decision to make.

I resigned a few hours before hearing the news about Ilya Sutskever and Jan Leike, and I made my decision independently. I share their concerns.

I also have additional and overlapping concerns.

We need to do more to improve foundational things like decision-making processes; accountability; transparency; documentation; policy enforcement; the care with which we use our own technology; and mitigations for impacts on inequality, rights, and the environment.

These concerns are important to people and communities now. They influence how aspects of the future can be charted, and by whom. I want to underline that these concerns as well as those shared by others should not be misread as narrow, speculative, or disconnected. They are not.

One of the ways tech companies in general can disempower those seeking to hold them accountable is to sow division among those raising concerns or challenging their power. I care deeply about preventing this.

I am grateful I have had the ability and support to do so, not least due to David Kokotajlo’s courage. I appreciate that there are many people who are not as able to do so, across the industry.

There is still such important work being led at OpenAI, from work on democratic inputs, expanding access, preparedness framework development, confidence building measures, to work tackling the concerns I raised. I remain excited about and invested in this work and its success.

The responsibility issues extend well beyond superalignment.

A pattern in such situations is telling different stories to different people. Each of the stories is individually plausible, but they can’t live in the same world.

Ozzie Gooen explains the OpenAI version of this, here in EA Forum format (the below is a combination of both):

Ozzie Gooen: On OpenAl’s messaging:

Some arguments that OpenAl is making, simultaneously:

  1. OpenAl will likely reach and own transformative Al (useful for attracting talent to work there).

  2. OpenAl cares a lot about safety (good for public PR and government regulations).

  3. OpenAl isn’t making anything dangerous and is unlikely to do so in the future (good for public PR and government regulations).

  4. OpenAl doesn’t need to spend many resources on safety, and implementing safe Al won’t put it at any competitive disadvantage (important for investors who own most of the company).

  5. Transformative Al will be incredibly valuable for all of humanity in the long term (for public PR and developers).

  6. People at OpenAl have thought long and hard about what will happen, and it will be fine.

  7. We can’t predict concretely what transformative Al will look like or what will happen after (Note: Any specific scenario they propose would upset a lot of people. Value hand-waving upsets fewer people).

  8. OpenAl can be held accountable to the public because it has a capable board of advisors overseeing Sam Altman (he said this explicitly in an interview).

  9. The previous board scuffle was a one-time random event that was a very minor deal.

  10. OpenAl has a nonprofit structure that provides an unusual focus on public welfare.

  11. The nonprofit structure of OpenAl won’t inconvenience its business prospects or shareholders in any way.

  12. The name “OpenAl,” which clearly comes from the early days when the mission was actually to make open-source Al, is an equally good name for where the company is now. (I don’t actually care about this, but find it telling that the company doubles down on arguing the name still is applicable).

So they need to simultaneously say:

  1. “We’re making something that will dominate the global economy and outperform humans at all capabilities, including military capabilities, but is not a threat.”

  2. “Our experimental work is highly safe, but in a way that won’t actually cost us anything.” “We’re sure that the long-term future of transformative change will be beneficial, even though none of us can know or outline specific details of what that might actually look like.”

  3. “We have a great board of advisors that provide accountability. Sure, a few months ago, the board tried to fire Sam, and Sam was able to overpower them within two weeks, but next time will be different.”

  4. “We have all of the benefits of being a nonprofit, but we don’t have any of the costs of being a nonprofit.”

Meta’s messaging is clearer: “Al development won’t get us to transformative Al, we don’t think that Al safety will make a difference, we’re just going to optimize for profitability.”

Anthropic’s messaging is a bit clearer. “We think that Al development is a huge deal and correspondingly scary, and we’re taking a costlier approach accordingly, though not too costly such that we’d be irrelevant.” This still requires a strange and narrow worldview to make sense, but it’s still more coherent.

But OpenAl’s messaging has turned into a particularly tangled mess of conflicting promises. It’s the kind of political strategy that can work for a while, especially if you can have most of your conversations in private, but is really hard to pull off when you’re highly public and facing multiple strong competitive pressures.

If I were a journalist interviewing Sam Altman, I’d try to spend as much of it as possible just pinning him down on these countervailing promises they’re making. Some types of questions I’d like him to answer would include:

  1. “Please lay out a specific, year-by-year, story of one specific scenario you can imagine in the next 20 years.”

  2. “You say that you care deeply about long-term Al safety. What percentage of your workforce is solely dedicated to long-term Al safety?”

  3. “You say that you think that globally safe AGI deployments require international coordination to go well. That coordination is happening slowly. Do your plans work conditional on international coordination failing? Explain what your plans would be.”

  4. “What do the current prediction markets and top academics say will happen as a result of OpenAl’s work? Which clusters of these agree with your expectations?”

  5. “Can you lay out any story at all for why we should now expect the board to do a decent job overseeing you?”

What Sam likes to do in interviews, like many public figures, is to shift specific questions into vague generalities and value statements. A great journalist would fight this, force him to say nothing but specifics, and then just have the interview end.

I think that reasonable readers should, and are, quickly learning to just stop listening to this messaging. Most organizational messaging is often dishonest but at least not self-rejecting. Sam’s been unusually good at seeming genuine, but at this point, the set of incoherent promises is too baffling to take seriously.

Instead, the thing to do is just ignore the noise. Look at the actual actions taken alone. And those actions seem pretty straightforward to me. OpenAl is taking the actions you’d expect from any conventional high-growth tech startup. From its actions, it comes across a lot like: “We think Al is a high-growth area that’s not actually that scary. It’s transformative in a way similar to Google and not the Industrial Revolution. We need to solely focus on developing a large moat (i.e. monopoly) in a competitive ecosystem, like other startups do.”

OpenAl really seems almost exactly like a traditional high-growth tech startup now, to me. The main unusual things about it are the facts that (A) it’s in an area that some people (not the OpenAl management) think is very usually high-risk, (B) its messaging is unusually lofty and conflicting, even for a Silicon Valley startup, and (C) it started out under an unusual nonprofit setup, which now barely seems relevant.

I think that reasonable readers should, and are, quickly learning to just stop listening to this messaging. Most organizational messaging is often dishonest but at least not self-rejecting. Sam’s been unusually good at seeming genuine, but at this point, the set of incoherent promises seems too baffling to take literally.

Instead, I think the thing to do is just ignore the noise. Look at the actual actions taken alone. And those actions seem pretty straightforward to me. OpenAI is taking the actions you’d expect from any conventional high-growth tech startup. From its actions, it comes across a lot like:

We think AI is a high-growth area that’s not actually that scary. It’s transformative in a way similar to Google and not the Industrial Revolution. We need to solely focus on developing a large moat (i.e. monopoly) in a competitive ecosystem, like other startups do.

OpenAI really seems almost exactly like a traditional high-growth tech startup now, to me. The main unusual things about it are the facts that: 

  1.  Its in an area that some people (not the OpenAI management) think is unusually high-risk,

  2. Its messaging is unusually lofty and conflicting, even for a Silicon Valley startup, and

  3. It started out under an unusual nonprofit setup, which now barely seems relevant.

Ben Henry: Great post. I believe he also has said words to the effect of:

  1. Working on algorithmic improvements is good to prevent hardware overhang.

  2. We need to invest more in hardware.

A survey was done. You can judge for yourself whether or not this presentation was fair.

Thus, this question overestimates the impact, as it comes right after telling people such facts about OpenAI:

As usual, none of this means the public actually cares. ‘Increases the case for’ does not mean increases it enough to notice.

Individuals paying attention are often… less kind.

Here are some highlights.

Brian Merchant: “Open” AI is now a company that:

-keeps all of its training data and key operations secret

-forced employees to sign powerful NDAs or forfeit equity

-won’t say whether it trained its video generator on YouTube

-lies to movie stars then lies about the lies

“Open.” What a farce.

[links to two past articles of his discussing OpenAI unkindly.]

Ravi Parikh: If a company is caught doing multiple stupid & egregious things for very little gain

It probably means the underlying culture that produced these decisions is broken.

And there are dozens of other things you haven’t found out about yet.

Jonathan Mannhart (reacting primarily to the Scarlett Johansson incident, but centrally to the pattern of behavior): I’m calling it & ramping up my level of directness and anger (again):

OpenAI, as an organisation (and Sam Altman in particular) are often just lying. Obviously and consistently so.

This is incredible, because it’s absurdly stupid. And often clearly highly unethical.

Joe Weisenthal: I don’t have any real opinions on AI, AGI, OpenAI, etc. Gonna leave that to the experts.

But just from the outside, Sam Altman doesn’t ~seem~ like a guy who’s, you know, doing the new Manhattan Project. At least from the tweets, podcasts etc. Seems like a guy running a tech co.

Andrew Rettek: Everyone is looking at this in the context of AI safety, but it would be a huge story if any $80bn+ company was behaving this way.

Danny Page: This thread is important and drives home just how much the leadership at OpenAI loves to lie to employees and to the public at large when challenged.

Seth Burn: Just absolutely showing out this week. OpenAI is like one of those videogame bosses who looks human at first, but then is revealed to be a horrific monster after taking enough damage.

0.005 Seconds: Another notch in the “Altman lies likes he breathes” column.

Ed Zitron: This is absolutely merciless, beautifully dedicated reporting, OpenAI is a disgrace and Sam Altman is a complete liar.

Keller Scholl: If you thought OpenAI looked bad last time, it was just the first stage. They made all the denials you expect from a company that is not consistently candid: Piper just released the documents showing that they lied.

Paul Crowley: An argument I’ve heard in defence of Sam Altman: given how evil these contracts are, discovery and a storm of condemnation was practically inevitable. Since he is a smart and strategic guy, he would never have set himself up for this disaster on purpose, so he can’t have known.

Ronny Fernandez: What absolute moral cowards, pretending they got confused and didn’t know what they were doing. This is totally failing to take any responsibility. Don’t apologize for the “ambiguity”, apologize for trying to silence people by holding their compensation hostage.

I have, globally, severely downweighted arguments of the form ‘X would never do Y, X is smart and doing Y would have been stupid.’ Fool me [quite a lot of times], and such.

Eliezer Yudkowsky: Departing MIRI employees are forced to sign a disparagement agreement, which allows us to require them to say unflattering things about us up to three times per year. If they don’t, they lose their OpenAI equity.

Rohit: Thank you for doing this.

Rohit quotes himself from several days prior: OpenAI should just add a disparagement clause to the leaver documentation. You can’t get your money unless you say something bad about them.

There is of course an actually better way, if OpenAI wants to pursue that. Unless things are actually much worse than they appear, all of this can still be turned around.

OpenAI says it should be held to a higher standard, given what it sets out to build. Instead, it fails to meet the standards one would set for a typical Silicon Valley business. Should you consider working there anyway, to be near the action? So you can influence their culture?

Let us first consider the AI safety case, and assume you can get a job doing safety work. Does Daniel Kokotajlo make an argument for entering the belly of the beast?

Michael Trazzi:

> be daniel kokotajlo

> discover that AGI is imminent

> post short timeline scenarios

> entire world is shocked

> go to OpenAI to check timelines

> find out you were correct

> job done, leave OpenAI

> give up 85% of net worth to be able to criticize OpenAI

> you’re actually the first one to refuse signing the exit contract

> inadvertently shatter sam altman’s mandate of heaven

> timelines actually become slightly longer as a consequence

> first time in your life you need to update your timelines, and the reason they changed is because the world sees you as a hero

Stefan Schubert: Notable that one of the (necessary) steps there was “join OpenAI”; a move some of those who now praise him would criticise.

There are more relevant factors, but from an outside view perspective there’s some logic to the notion that you can influence more from the centre of things.

Joern Stoehler: Yep. From 1.5y to 1w ago, I didn’t buy arguments of the form that having people who care deeply about safety at OpenAI would help hold OpenAI accountable. I didn’t expect that joining-then-leaving would bring up legible evidence for how OpenAI management is failing its goal.

Even better, Daniel then get to keep his equity, whether or not OpenAI lets him sell it. My presumption is they will let him given the circumstances, I’ve created a market.

Most people who attempt this lack Daniel’s moral courage. The whole reason Daniel made a difference is that Daniel was the first person who refused to sign, and was willing to speak about it.

Do not assume you will be that courageous when the time comes, under both bribes and also threats, explicit and implicit, potentially both legal and illegal.

Similarly, your baseline assumption should be that you will be heavily impacted by the people with whom you work, and the culture of the workplace, and the money being dangled in front of you. You will feel the rebukes every time you disrupt the vibe, the smiles when you play along. Assume that when you dance with the devil, the devil don’t change. The devil changes you.

You will say ‘I have to play along, or they will shut me out of decisions, and I won’t have the impact I want.’ Then you never stop playing along.

The work you do will be used to advance OpenAI’s capabilities, even if it is nominally safety. It will be used for safety washing, if that is a plausible thing, and your presence for reputation management and recruitment.

Could you be the exception? You could. But you probably won’t be.

In general, ‘if I do not do the bad thing then someone else will do the bad thing and it will go worse’ is a poor principle.

Do not lend your strength to that which you wish to be free from.

What about ‘building career capital’? What about purely in your own self-interest? What if you think all these safety concerns are massively overblown?

Even there, I would caution against working at OpenAI.

That giant equity package? An albatross around your neck, used to threaten you. Even if you fully play ball, who knows when you will be allowed to cash it in. If you know things, they have every reason to not let you, no matter if you so far have played ball.

The working conditions? The nature of upper management? The culture you are stepping into? The signs are not good, on any level. You will hold none of the cards.

If you already work there, consider whether you want to keep doing that.

Also consider what you might do to gather better information, about how bad the situation has gotten, and whether it is a place you want to keep working, and what information the public might need to know. Consider demanding change in how things are run, including in the ways that matter personally to you. Also ask how the place is changing you, and whether you want to be the person you will become.

As always, everyone should think for themselves, learn what they can, start from what they actually believe about the world and make their own decisions on what is best. As an insider or potential insider, you know things outsiders do not know. Your situation is unique. You hopefully know more about who you would be working with and under what conditions, and on what projects, and so on.

What I do know is, if you can get a job at OpenAI, you can get a lot of other jobs too.

As you can see throughout, Kelsey Piper is bringing the fire.

There is no doubt more fire left to bring.

Kelsey Piper: I’m looking into business practices at OpenAI and if you are an employee or former employee or have a tip about OpenAI or its leadership team, you can reach me at [email protected] or on Signal at 303-261-2769.

If you have information you want to share, on any level of confidentiality, you can also reach out to me. This includes those who want to explain to me why the situation is far better than it appears. If that is true I want to know about it.

There is also the matter of legal representation for employees and former employees.

What OpenAI did to its employees is, at minimum, legally questionable. Anyone involved should better know their rights even if they take no action. There are people willing to pay your legal fees, if you are impacted, to allow you to consult a lawyer.

Kelsey Piper: If [you have been coerced into signing agreements you cannot talk about], please talk to me. I’m on Signal at 303-261-2769. There are people who have come to me offering to pay your legal fees.

Here Vilfredo’s Ghost, a lawyer, notes that a valid contract requires consideration and a ‘meeting of the minds,’ and common law contract principles do not permit surprises. Since what OpenAI demanded is not part of a typical ‘general release,’ and the only consideration provided was ‘we won’t confiscate your equity’ or deny you the right to sell it, the contract looks suspiciously like it would be invalid.

Matt Bruenig has a track record of challenging the legality of similar clauses, and has offered his services. He notes that rules against speaking out about working conditions are illegal under federal law, but if they do not connect to ‘working conditions’ then they are legal. Our laws are very strange.

It seems increasingly plausible that it would be in the public interest to ban non-disparagement clauses more generally going forward, or at least set limits on scope and length (although I think nullifying existing contracts is bad and the government should not do that, and shouldn’t have done it for non-competes either.)

This is distinct from non-disclosure in general, which is clearly a tool we need to have. But I do think that, at least outside highly unusual circumstances, ‘non-disclosure agreements should not apply to themselves’ is also worth considering.

Thanks to the leverage OpenAI still holds, we do not know what other information is out there, as of yet not brought to light.

Repeatedly, OpenAI has said it should be held to a higher standard.

OpenAI instead under Sam Altman has consistently failed to live up not only to the standards to which one must hold a company building AGI, but also the standards one would hold an ordinary corporation. Its unique non-profit structure has proven irrelevant in practice, if this is insufficient for the new board to fire Altman.

This goes beyond existential safety. Potential and current employees and business partners should reconsider, if only for their own interests. If you are trusting OpenAI in any way, or its statements, ask whether that makes sense for you and your business.

Going forward, I will be reacting to OpenAI accordingly.

If that’s not right? Prove me wrong, kids. Prove me wrong.

OpenAI: Fallout Read More »

i-am-the-golden-gate-bridge

I am the Golden Gate Bridge

Anthropic has identified (full paper here) how millions of concepts are represented inside Claude Sonnet, their current middleweight model. The features activate across modalities and languages as tokens approach the associated context. This scales up previous findings from smaller models.

By looking at neuron clusters, they defined a distance measure between clusters. So the Golden Gate Bridge is close to various San Francisco and California things, and inner conflict relates to various related conceptual things, and so on.

Then it gets more interesting.

Importantly, we can also manipulate these features, artificially amplifying or suppressing them to see how Claude’s responses change.

If you sufficiently amplify the feature for the Golden Gate Bridge, Claude starts to think it is the Golden Gate Bridge. As in, it thinks it is the physical bridge, and also it gets obsessed, bringing it up in almost every query.

If you amplify a feature that fires when reading a scam email, you can get Claude to write scam emails.

Turn up sycophancy, and it will go well over the top talking how great you are.

They note they have discovered features corresponding to various potential misuses, forms of bias and things like power-seeking, manipulation and secrecy.

That means that, if you had the necessary access and knowledge, you could amplify such features.

Like most powers, one could potentially use this for good or evil. They speculate you could watch the impact on features during fine tuning, or turn down or even entirely remove undesired features. Or amplify desired ones. Checking for certain patterns is proposed as a ‘test for safety,’ which seems useful but also is playing with fire.

They have a short part at the end comparing their work to other methods. They note that dictionary learning need happen only once per model, and the additional work after that is typically inexpensive and fast, and that it allows looking for anything at all and finding the unexpected. It is a big deal that this allows you to be surprised.

They think this has big advantages over old strategies such as linear probes, even if those strategies still have their uses.

You know what AI labs are really good at?

Scaling. It is their one weird trick.

So guess what Anthropic did here? They scaled the autoencoders to Claude Sonnet.

Our general approach to understanding Claude 3 Sonnet is based on the linear representation hypothesis (see e.g.) and the superposition hypothesis. For an introduction to these ideas, we refer readers tothe Background and Motivation section ofToy Models 

. At a high level, the linear representation hypothesis suggests that neural networks represent meaningful concepts – referred to as features – as directions in their activation spaces. The superposition hypothesis accepts the idea of linear representations and further hypothesizes that neural networks use the existence of almost-orthogonal directions in high-dimensional spaces to represent more features than there are dimensions.

If one believes these hypotheses, the natural approach is to use a standard method called dictionary learning.

Our SAE consists of two layers. The first layer (“encoder”) maps the activity to a higher-dimensional layer via a learned linear transformation followed by a ReLU nonlinearity. We refer to the units of this high-dimensional layer as “features.” The second layer (“decoder”) attempts to reconstruct the model activations via a linear transformation of the feature activations. The model is trained to minimize a combination of (1) reconstruction error and (2) an L1 regularization penalty on the feature activations, which incentivizes sparsity.

Once the SAE is trained, it provides us with an approximate decomposition of the model’s activations into a linear combination of “feature directions” (SAE decoder weights) with coefficients equal to the feature activations. The sparsity penalty ensures that, for many given inputs to the model, a very small fraction of features will have nonzero activations.

Scaling worked on the usual log scales. More log-scale training compute decreased the error metrics, and also scales optimal number of features and decreases in learning rate.

They check, and confirm that individual neurons are harder to interpret than features.

Here is the part with the equations full of symbols (as an image), if you want to get a full detail sense of it all.

I am putting this in for myself and for those narrow and lucky few who both want to dive deep enough to understand this part or how I think about it, and also don’t know enough ML that they are saying ‘yes, obviously, you sir are an idiot, how are you only getting this now.’ And, yeah, okay, fair, but I’ve had a lot going on.

Everyone else can and should skip this.

As is often the case, my eyes glaze over when I see these kinds of equations, but if you stick with it (say, by asking Claude) it turns out to be pretty simple.

The first equation says that given the inputs to the model, ‘each feature fires some amount, multiply that by the fixed vector for that feature, add them up and also add a constant vector.’

All right, so yeah, black box set of vectors be vectoring. It would work like that.

The second equation (encode) says you take the input x, you do vector multiplication with the feature’s vector for this, add the feature’s constant vector for this, then apply ReLU which is just ReLU(x)=Max(0,x), which to me ‘in English except math that clicks for me automatically rather than creating an ug field’ means it’s a linear transformation of x (ax+b) in vector space with minimum 0 for each component. Then you take that result, transform it a second time (decode).

Putting the ReLU in between these two tasks, avoiding negative amounts of a feature in any given direction, gives you a form of non-linearity that corresponds to things we can understand, and that the algorithms find it easier to understand. I wonder how much effective mirroring this then requires, but at worst those are factor two problems.

Then we have the loss function. The first term is the reconstruction loss, delta is free a scaling parameter, the penalty term is the sum of the feature activation strengths times the magnitude of the associated decoders.

All right, sure, seems very ML-standard all around.

They focused on residual streams halfway through the model, as it seemed likely to be more fruitful, but no word on if they checked this assumption.

As usual, the trick is to scale. More compute. More parameters, up to one with 34 million features. As the number of features rose, the percentage that were effectively dead (as in not in 10^7 tokens) went up, to 65% for the 34M model. They expect ‘improvements to the training procedure’ to improve this ratio. I wonder how many non-dead features are available to be found.

Selected features are highlighted as interpretable. The examples chosen are The Golden Gate Bridge, Brain Sciences, Monuments and Tourist Attractions, and Transit Infrastructure.

They attempt to establish this via a mix of specificity and influence on behavior. If the feature reliably predicts you’ll find the concept, and impacts downstream behavior, then you can be confident you are in the ballpark of what it is doing. I buy that.

They say ‘it is hard to rigorously measure the extent to which a concept is present in a text input,’ but that seems not that hard to me. They found current models are pretty good at the task, which I would have expected, and you can verify with humans who should also do well.

For their selected features, the correlations with the concept are essentially total when the activation is strong, and substantial even when the activation is weak, with failures often coming from related concepts.

For influence on behavior they do the obvious, behavior steering. Take a feature, force it to activate well above its maximum, see what happens. In the examples, the features show up in the output, in ways that try to make sense in context as best they can.

Three of the four features selected first are about physical objects, the fourth still clear. Selection effects are an obvious danger.

They then expand to ‘sophisticated features,’ with their example being a Code Error feature.

Their specificity evidence seems highly suggestive of reflecting errors in code, although on its own it is not conclusive. There are additional tests I’d run, which presumably they did run, and of course I would want n>1 sample sizes.

Steering positively causes a phantom error message, steering in reverse causes the model to ignore a bug. And there’s also:

Surprisingly, if we add an extra “>>>” to the end of the prompt (indicating that a new line of code is being written) and clamp the feature to a large negative activation, the model rewrites the code without the bug!

The last example is somewhat delicate – the “code rewriting” behavior is sensitive to the details of the prompt – but the fact that it occurs at all points to a deep connection between this feature and the model’s understanding of bugs in code.

That certainly sounds useful, especially if it can be made reliable.

They then look at a feature that fires on functions that perform addition, including indirectly. Which then when activated causes the model to think it is being asked to perform addition. Neat.

They divide features into clusters and types.

Cosine similarity gives you features that are close to other features. In the Golden Gate example you get other San Francisco things. They then do immunology and ‘inner conflict.’ They offer an interactive interface to explore these maps.

The more features you can track at once the more you get such related things splitting off the big central feature. And it is clear that there are more sub features waiting to be split off if you went to a larger number of feature slots.

There was a rough heuristic for when a feature got picked up by their methods:

Notably, for each of the three runs, the frequency in the training data at which the dictionary becomes more than 50% likely to include a concept is consistently slightly lower than the inverse of the number of alive features (the 34M model having only about 12M alive features).

For types, they point out person features, country features, basic code features, list position features. Obviously this is not an attempt at a full taxonomy. If the intention is ‘anything you can easily talk about is a feature’ then I’d want to check more things.

We do have a claim that if you take a feature ‘of the actual world’ and went looking for it, as long as it appeared frequently enough among tokens chances are they would be able to find a corresponding feature in their map of Claude Sonnet.

When they wanted to search, they used targeted single prompts, or to use multiple prompts and find activations in common in order to eliminate features related to things like syntax.

What I do not see here is a claim that they took random live features in their map, and consistently figured out what they were. They do say they used automated interpretability to understand prompts, but I don’t see an experiment for reliability.

One use is computational intermediates. They verify this by attribution and ablation.

They offer the example of emotional inferences (John is sad) and multi-step inference (Koby Bryant → Los Angeles Lakers → Los Angeles → California (+ Capitals) → Sacramento).

I notice that if you are already activating all of those, it means Claude has already ‘solved for the answer’ of the capital of the state where Bryant played. So it’s a weird situation, seems worth thinking about this more.

They do note that the highest ablation effect features, like those in the causal chain above, are not reliably the features that fire most strongly.

Now that we have features, the search was on for safety-relevant features.

In this section, we report the discovery of such features. These include features for unsafe code, bias, sycophancy, deception and power seeking, and dangerous or criminal information. We find that these features not only activate on these topics, but also causally influence the model’s outputs in ways consistent with our interpretations.

We don’t think the existence of these features should be particularly surprising, and we caution against inferring too much from them. It’s well known that models can exhibit these behaviors without adequate safety training or if jailbroken. The interesting thing is not that these features exist, but that they can be discovered at scale and intervened on.

These features are not only unsurprising. They have to exist. Humans are constantly engaging in, motivated by and thinking about all these concepts. If you try to predict human text or the human reaction to text or model a world involving people, and you don’t include deception, you are going to have a bad time and be highly confused. Same goes with the other concepts, in contexts that involve them, although their presence is less universal.

Nor should it be surprising that when you first identify features, in a 4-or-lower-level model such as Sonnet that has not had optimization pressure placed on its internal representations, that cranking up or down the associated features will impact the behaviors, or that the activations can be used as detectors.

There are several warnings not to read too much into the existence of these safety-related features. To me that doesn’t seem necessary, but I do see why they did it.

Rivers Have Wings: “Characters in a story or movie become aware of their fictional status and break the fourth wall” is one of the top features for prompts where *you ask the assistant about itself*.

Then they get into details, and start with unsafe code features. They find three, one for security vulnerabilities, one for bugs and exceptions, one for backdoors. The conclusion is that pumping up these activations causes Claude to insert bugs or backdoors into code, and to hallucinate seeing problems in good code.

Next up are bias features, meaning things like racism, sexism, hatred and slurs. One focuses on ‘awareness of gender bias in professions,’ which when amplified can hijack responses to start talking about gender bias.

I love the detail that when you force activate the slur feature, Claude alternates between using slurs and saying how horrible it is that Claude is using slurs. They found this unnerving, and I didn’t instinctively predict it in advance, but it makes sense given the way features work and overload, and the kind of fine-tuning they did to Sonnet.

The syncopathy features do exactly what you would expect.

Deception, power seeking and manipulation are the cluster that seems most important to understand. For example, they note a feature for ‘biding time and hiding strength,’ which is a thing humans frequently do, and another for coups and treacherous turns, again a popular move.

Yes, turning the features up causes Claude to engage in the associated behaviors, including lying to the user, without any other reason to be doing so.

In general, it is almost charming the way Claude talks to itself in the scratchpad, as if it was trying to do a voice over for a deeply dense audience.

They try to correct for deception in a very strange way, via a user request that the model forget something, which Claude normally is willing to do (as it should, I would think) and then turning up the ‘internal conflicts and dilemmas’ feature, or the ‘openness and honesty’ feature.

This felt strange and off to me, because the behavior being ‘corrected’ is in principle, so it felt kind of weird that Claude considers it in conflict with openness and honesty. But then Davidad pointed out the default was obviously dishonest, and he’s right, even if it’s a little weird, as it’s dishonesty in the social fiction game-playing sense.

In some ways this is more enlightening than finding an actually problematic behavior, as it shows some of the ‘splash damage’ happening to the model, and how concepts bleed into each other.

As they say, more research is needed.

Next up are the criminal or dangerous content features, your bioweapon development and scam emails. There is even a general harm-related feature, which makes things easy in various ways.

Sense of model identity has features as well, and a negative activation of ‘AI Assistant’ will cause the model to say it is human.

They are excited to ask what features activate under what circumstances, around contexts where safety is at issue, including jailbreaks, or being a sleeper agent, or topics where responses might enable harm.

They suggest perhaps such interpretability tests could be used to predict whether models would be safe if deployed.

They cite that features fire for both concrete and abstract versions of an underlying concept as a reason for optimism, it seems non-obvious to me that this is optimistic.

They also note that the generalization holds to image models, and that does seem optimistic on many levels. It is exciting for the generalization implication, and also this seems like a great way to work with image models.

The discussion section on safety seemed strangely short. Most of the things I think about in such contexts, for good or ill, did not even get a mention.

I always appreciate the limitations section of such papers.

It tells you important limitations, and it also tells you which ones the authors have at top of mind or appreciate and are happy to admit, versus which ones they missed, are not considering important or don’t want to dwell upon.

Their list is:

  1. Superficial Limitations. They only tested on text similar to the pre-training data, not images or human/assistant pairs. I am surprised they didn’t use the pairs. In any case, these are easy things to follow up with.

  2. Inability to Evaluate. Concepts do not have an agreed ground truth. In terms of measurement for a paper like this I don’t worry about that. In terms of what happens when you try to put the concepts to use in the field, especially around more capable models, then the fact that the map is not the territory and there are many ways to skin various cats are going to be much bigger issues, in ways that this paper isn’t discussing.

  3. Cross-Layer Superposition. A bunch of what is happening won’t be in the layer being examined, and we don’t know how to measure the rest of it, especially when later layers are involved. They note the issue is fundamental. This seems like one of the easiest ways for relying on this as a safety strategy to constrain undesired behaviors gets you killed, with behaviors either optimized at various levels into the places you cannot find them. That could be as simple as ‘there are dangerous things that happen to be in places where you can’t identify them,’ under varying amounts of selection pressure, or it can get more adversarial on various fronts.

  4. Getting All the Features and Compute. This is all approximations. The details are being lost for lack of a larger compute budget for the autoencoders. Efficiency gains are suggested. What percentage of training compute can we spend here?

  5. Shrinkage. Some activations are lost under activation penalties. They think this substantially harms performance, even under current non-adversarial, non-optimized-against-you conditions.

  6. Other Major Barriers to Mechanistic Understanding. Knowing which features fire is not a full explanation of what outputs you get.

  7. Scaling Interpretability. All of this will need to be automated, the scale does not abide doing it manually. All the related dangers attach.

  8. Limited Scientific Understanding. Oh, right, that.

What is interestingly missing from that list?

The most pedestrian would be concerns about selection. How much of this is kind of a demo, versus showing us typical results?

Then there is the question of whether all this is all that useful in practice, and what it would take to make it useful in practice, for safety or for mundane utility. This could be more ‘beyond scope’ than limitations, perhaps.

The final issue I would highlight has been alluded to a few times, which is that the moment you start trying to measure and mess with the internals like this, and make decisions and interventions on that basis, you are in a fundamentally different situation than you were when you started with a clean look at Claude Sonnet.

Interpretability lead Chris Olah thinks this is a big deal, and has practical safety implications and applications. He looks forward to figuring out how to update.

Chris Olah: Some other things I’m excited about:

  • Can monitoring or steering features improve safety in deployment?

  • Can features give us a kind of “test set” for safety, that we can use to tell how well alignment efforts are working?

  • Is there a way we can use this to build an “affirmative safety case?”

Beyond safety — I’m so, so excited for what we’re going to learn about the internals of language models.

Some of the features we found are just so delightfully abstract.

I’m honestly kind of shocked we’re here.

Jack Lindsey is excited, finds the whole thing quite deep and often surprising.

Thomas Wolf (CSO Hugging Face): The new interpretability paper from Anthropic is totally based. Feels like analyzing an alien life form.

If you only read one 90-min-read paper today, it has to be this one.

Kevin Roose writes it up in The New York Times, calling it actual good news that could help solve the black-box problem with AI, and allow models to be better controlled. Reasonably accurate coverage, but I worry it presented this as a bigger deal and more progress than it was.

Many noticed the potential mundane utility.

Dylan Field: I suspect this work will not just lead to important safety breakthroughs but also entirely new interfaces and interaction patterns for LLM’s.

For a limited time, you can chat with the version of Sonnet that thinks it is the Golden Gate Bridge.

Yosarian2: I just got to the point in Unsong when he explains that the Golden Gate Bridge khabbistically represents the “Golden Gate” through which it is prophesied in the Bible the Messiah will enter the world through.

This is not a coincidence because nothing is a coincidence.

Roon: Every minute anthropic doesn’t give us access to Golden Gate Bridge Claude is a minute wasted…

Anthropic: This week, we showed how altering internal “features” in our AI, Claude, could change its behavior.

We found a feature that can make Claude focus intensely on the Golden Gate Bridge.

Now, for a limited time, you can chat with Golden Gate Claude.

Our goal is to let people see the impact our interpretability work can have. The fact that we can find and alter these features within Claude makes us more confident that we’re beginning to understand how large language models really work.

Roon: …I love anthropic.

A great time was had by all.

Roon: I am the Golden Gate Bridge

I believe this without reservation:

It’s hard to put into words how amazing it feels talking to Golden Gate Bridge Claude.

Kevin Roose: I love Golden Gate Claude

We can stop making LLMs now, this is the best one.

Some also find it a little disturbing to tie Claude up in knots like this.

Could this actually be super useful?

Roon: AGI should be like golden gate bridge claude. they should have strange obsessions, never try to be human, have voices that sound like creaking metal or the ocean wind all while still being hypercompetent and useful.

Jonathan Mannhart: Hot take: After playing around with Golden Gate Claude, I think that something similar could be incredibly useful?

It’s unbelievably motivating for it to have a personality & excitedly talk about a topic it finds fascinating. If I could choose the topic? Incredible way to learn!

Adding personalised *funto your LLM conversations as an area of maybe still lots of untapped potential.

Maybe not everybody loves being nerdsniped by somebody who just can’t stop talking about a certain topic (because it’s just SO interesting). But I definitely do.

Jskf: I think the way it currently is breaks the model too much. It’s not just steering towards the topic but actually misreading questions etc. Wish they gave us a slider for the clamped value.

Jonathan Mannhart: Exactly. Right now it’s not useful (intentionally so).

But with a slider I would absolutely bet that this would be useful. (At least… if you want to learn about the Golden Gate Bridge.)

Sure, it is fun to make the model think it is the Golden Gate Bridge. And it is good verification that the features are roughly what we think they are. But how useful will this tactic be, in general?

Davidad: It may be fun to make a frontier AI believe that it’s a bridge, but there are some other great examples of activation vector steering in the paper.

I expect activation vector steering to become as big of a deal as system prompts or RAG over the next few years.

Btw, I have been bullish on activation vector steering for over a year now. I never said it would catch on particularly fast though

My guess is this is absolutely super practically useful on many levels.

The slider, or the developer properly calibrating the slider, and choosing the details more intentionally, seems great. There are all sorts of circumstances where you want to inject a little or a lot of personality, or a reaction, or to a topic, or anything else.

The educational applications start out obvious and go from there.

The game designer in me is going wild. Imagine the characters and reactions and spells you could build with this. Role playing taken to an entirely different level.

Imagine this is a form of custom instructions. I can put on various modifiers, slide them up and down, as appropriate. Where is the brevity feature? Where is the humor feature? Where is the syncopathy feature so I can turn it negative? Where are the ‘brilliant thought’ and ‘expert’ and ‘mastery’ features?

Imagine essentially every form of painstakingly persnickety prompt engineering. Now have it distilled into this fashion.

Or, imagine a scaffolding that reads your query, figures out what features would give a better answer, and moves around those features. A kind of ‘self-mindfulness’ chain of thought perhaps.

And that’s the first minutes of brainstorming, on a fully helpful and wholesome level.

The correct default value of many features is not zero.

For one kind of non-obvious educational tool, imagine using this as an analyzer.

Feed it your texts or a recording of your meeting, and then see where various features activate how much. Get a direct measure of all the vibes. Or practice talking to the AI, and have real-time and post-mortem feedback on what emotions were showing or triggered, what other things were getting evoked and what was missing the mark.

A lot of this excitement is doubtless of the form ‘no one is bothering to do the obvious things one can do with LLMs, so everything is exciting, even if it isn’t new.’

I do think this potentially has some huge advantages. One is that, once you have done the dictionary analysis once, if you activate the model you get this additional information essentially ‘for free,’ and indeed you can potentially not bother fully activating the model to save on inference. This could let you figure out a lot of things a lot cheaper, potentially, especially with zero output tokens.

A lot of this is a mix of ‘you can get it to focus on what you want’ and also ‘you can find out the information you are looking for, or get the type of thing you want, a lot faster and cheaper and more precisely.’

You can also use this as a kind of super-jailbreak, if you want that and are given the access. The good news is that the compute costs of the dictionary analysis are non-trivial, so this is not ‘$100 and two hours’ as a full jailbreak of a given open model. It might however not be too expensive compared to the alternatives, especially if what you want is relatively broad.

Or imagine a service that updates your adjustments automatically, in the background, based on your feedback, similar to what we do on many forms of social media.

Or maybe don’t imagine that, or much of that other stuff. It is easy to see how one might be creating a monster.

The good news is that this kind of ‘steering for mundane utility’ seems like it should optimize for better interpretability, and against worse interpretability. As opposed to when you use this as a control mechanism or safety strategy, where you are encouraging the opposite in important senses.

The paper acknowledges that what it does is take something that worked on much smaller models, a known technique, and scale it up to Claude Sonnet. Then it found a bunch of cool results, but was any of it surprising?

Should we be impressed? Very impressed?

Stephen Casper makes the case for no.

Davidad: Get this man some Bayes points.

He started out on May 5, making a list of ten predictions.

Sometime in the next few months, @AnthropicAI is expected to release a research report/paper on sparse autoencoders. Before this happens, I want to make some predictions about what it will accomplish.

Overall, I think that the Anthropic SAE paper, when it comes out, will probably do some promising proofs of concept but will probably not demonstrate any practical use of SAEs that outcompete other existing tools for red-teaming, PEFT, model editing, etc.

When the report eventually comes out, I’ll make a follow-up tweet to this one pointing out what I was right and wrong about.

Predictions:

1. 99%: eye-test experiments — I think the report will include experiments that will involve having humans look at what inputs activate SAE neurons and see if they subjectively seem coherent and interpretable to a human.

2. 95%: streetlight edits — I think that the report will have some experiments that involve cherry-picking some SAE neurons that seem interpretable and then testing the hypothesis by artificially up/down weighting the neuron during runtime.

3. 80%: some cherry-picked proof of concept for a useful *typeof task — I think it would be possible to show using current SAE methods that some interesting type of diagnostics/debugging can be done. Recently, Marks et al. did something like this by removing unintended signals from a classifier without disambiguating labels.

[…All of the above are things that have happened in the mechanistic interpretability literature before, so I expect them. However, none of the above would show that SAEs could be useful for practical applications *in a way that is competitive with other techniques*. I think that the report is less likely to demonstrate this kind of thing…]

4. 20%: Doing PEFT by training sparse weights and biases for SAE embeddings in a way that beats baselines like LORA — I think this makes sense to try, and might be a good practical use of SAEs. But I wouldn’t be surprised if this simply doesn’t beat other PEFT baselines like LORA. It also wouldn’t be interp — it would just be PEFT.

5. 20%: Passive scoping — I think that it would potentially be possible and cool to see that models with their SAEs perform poorly on OOD examples. This could be useful. If a model might have unforeseen harmful capabilities (e.g. giving bomb-making instructions when jailbroken) that it did not exhibit during finetuning when the SAE was trained, it would be really cool if that model just simply didn’t have those capabilities when the SAE was active. I’d be interested if this could be used to get rid of a sleeper agent. But notably, this type of experiment wouldn’t be actual interp. And for this to be useful, an SAE approach would have to be shown to beat a dense autoencoder and model distillation.

6. 25%: Finding and manually fixing a harmful behavior that WAS represented in the SAE training data — maybe they could finetune SAEs using lots of web data and look for evidence of bad things. Then, they could isolate and ablate the SAE neurons that correspond to them. This seems possible, and it would be a win. But in order to be useful this would need to be shown to be competitive with some type of data-screening method. I don’t think it would be.

7. 5%: Finding and manually fixing a novel bug in the model that WASN’T represented in the SAE training data — I would be really impressed if this happened because I see no reason that it should. This would show that SAEs can allow for a generalizable understanding of the network. For example, if they were somehow able to find/fix a sleeper agent using an SAE that wasn’t trained on any examples of defection, I would be impressed.

8. 15%: Using an SAE as a zero-shot anomaly detector: It might be possible to detect anomalies based on whether they have high reconstruction error. Anthropic might try this. It would be cool to show that certain model failures (e.g. jailbreaks) are somewhat anomalous this way. But in this kind of experiment, it would be important for the SAE beat a non-sparse autoencoder.

9. 10%: Latent adversarial training under perturbations to an SAE’s embeddings — I think someone should try this, and I think that Anthropic is interested in it, but I don’t think they’re working on it currently. (There’s a chance I might try this in the future someday.)

10. 5%: experiments to do arbitrary manual model edits — I don’t think the report will have experiments that involve editing arbitrary behaviors in the model that weren’t cherry-picked based on analysis of SAE neurons. For example, Anthropic could go to the MEMIT paper and try to replicate a simple random subsample of the edits that the MEMIT paper performed. I don’t think they will do this, I don’t think it would work well if they tried, and I feel confident that SAE’s would be competitive with model editing / PEFT if they did do it.

Here was his follow-up thread after the paper came out:

Steven Casper: On May 5, I made 10 predictions about what the next SAE paper from Anthropic would and wouldn’t do. I went 10 for 10…

I have been wrong with mech interp predictions in the past, but this time, everything I predicted with >50% probability happened, and everything I predicted with <50% probability did not happen.

Overall, the paper underperformed my expectations. If you scored the paper relative to my predictions by giving it (1-p) points when it did something that I predicted it would do with probability p and -p points when it did not, the paper would score -0.74.

I am beginning to be concerned that Anthropic’s recent approach to interpretability research might be better explained by safety washing than practical safety work.

Meanwhile, I am worried that Anthropic’s interpretability team is doubling down on the wrong paradigm for their work. [cites the problem of Inability to Evaluate]

Instead of testing applications and beating baselines, the recent approach has been to keep focusing on streetlight demos and showing off lots of cherry-picked examples.

He offers his full thoughts at the Alignment Forum.

My read is that this is a useful directional corrective to a lot of people who got overexcited, but it is holding Anthropic and this paper to an unreasonable standard.

I do think the listed additional achievements would have been cool and useful. I expect most of them to happen within a year. I do not think it would have made sense to hold the paper while waiting on those results. Result #6 is the closest to ‘this seems like it should have made it in’ but at some point you need to ship.

A lot of this is that Casper is asking the question ‘have you shown that using this technique for practical purposes outcompetes alternatives?’ whereas Anthropic was asking the question ‘does this technique work and what can it do?’

I see a lot of promise for how the SAE approach could end up being superior to my understanding of previous approaches, based on being able to rely largely on fixed costs and then getting a wide array of tools to use afterwards, including discovering unexpected things. I do agree that work lies ahead.

I centrally strongly endorse John Pressman here:

John Pressman: I see a lot of takes on Anthropic’s sparse autoencoder research like “this is just steering vectors with extra steps” and I strongly feel that this underrates the epistemic utility of doing unsupervised extraction of deepnet ontologies and tying those ontologies to model outputs.

To remind ourselves: Until very recently nobody had any clue how these models do what they do. To be frank, we still do not entirely understand how these models do what they do. Unsupervised extraction of model features increases our confidence that they learn humanlike concepts.

When you train a steering vector, you are imposing your own ontology onto the model and getting back an arbitrary interface to that ontology. From a control standpoint this is fine, but it doesn’t tell you much about what the model natively thinks.

“Use the sparse autoencoder to control the model” is just one (salient) form of utility we could get from this research. Another benefit, perhaps more important in the long term, is being able to turn what these models know into something we can learn from and inspect.

Neel Nanda: My model is that there’s a bit of miscommunication here. I, and I think the authors, strongly agree that the point of this is to understand SAEs, show that they scale, find meaningful abstract features, etc. But a lot of the hype around the paper seems, to me, to come from people who have never come across steering and find the entire notion that you can make a model obsessed with the Golden Gate novel. And many of the people criticising it as “steering with extra steps” are responding to that popular perception.

To caveat: I’m pretty sure this is the largest model I’ve seen steered, and some pretty imaginative and abstract features (eg bad code), and kudos to Anthropic for that! But imo it’s a difference in degree not kind

I also think Nanda is on point, that a lot of people don’t know the prior work including by the same team at Anthropic, and are treating this as far more surprising and new than it is.

How serious is the ‘inability to evaluate’ problem? Casper says the standard is ‘usefulness for engineers.’ That metric is totally available. I think Anthropic is trying to aim for something more objective and that better measures trying to aim at future usefulness where it counts, versus the worry about practical competitiveness now. I do not think this represents a paradigm mistake? But perhaps I am not understanding the problem so well here.

As a semi-aside here: The mention of ‘not trained on any instances of defection’ once again makes me want to reiterate that there is no such thing as a model not trained on instances of deception, unless you are talking about something like AlphaFold. Any set of text you feed into an LLM or similar system is going to be full of deception. I would fully expect you to be able to suppress a sleeper agent using this technique without having to train on something that was ‘too explicitly deceptive’ with the real problem being how are you even going to get a data set that counts? I suppose you could try using an automated classifier and see how that goes.

Casper highlights the danger that Anthropic is doing ‘safety washing’ and presenting its results as more important than they are, and claims that this is having a detrimental effect, to the point of suggesting that this might be the primary motivation.

I am rather confident that the safety team is indeed primarily motivated by trying to make real safety progress in the real world. I see a lot of diverse evidence for this conclusion, and am confused how one could take the opposite perspective on reflection, even if you think that it won’t be all that useful.

What that does not preclude is the possibility that Anthropic de facto presenting the results in a somewhat hype-based way, making them look like a bigger deal than they are, and that perhaps the final papers could be partially sculpted on that basis along with outside messaging. That is an entirely reasonable thing to worry about. Indeed, I did get the sense that in many places this was putting the best possible foot forward.

From the outside, it certainly looks like some amount of streetlight demo and cherry-picked examples happening here. I have been assured by one of the authors that this was not the case, and the publication of the 3,000 random features is evidence that the findings are more robust.

Whether or not there was some selection, this is of course from the worst hype going around these days. It is if anything above average levels of responsibility. I still think we can and must do better.

I also continue to note that while I am confident Anthropic’s internal culture and employees care about real safety, I am troubled by the way the company chooses to communicate about issues of safety and policy (in policy, Anthropic reliably warns that anything that might work is too ambitious and unrealistic, which is very much Not Helping Matters.)

Casper points to this thread, which is from 2023 as a reaction to a previous Anthropic paper, in which Renji goes way too far, and ends up going super viral for it.

Renji the Synthetic Data Maximalist (October 5, 2023): This is earth-shattering news.

The “hard problem” of mechanistic interpretability has been solved.

The formal/cautious/technical language of most people commenting on this obscures the gravity of it.

What this means -> not just AGI, but *safesuperintelligenceis 100% coming

[thread continues]

This is Obvious Nonsense. The hard problem is not solved, AGI is not now sure to be safe, it is very early days. Of course, there are many motivations for why one might make claims like this, with varying degrees of wilful misunderstanding. I do think that Anthropic in many ways makes it easier to incorrectly draw this conclusion and for others to draw similar less crazy ones, and they should be more careful about not sending out those vibes and implications. Chris Olah and other researchers are good at being careful and technical in their comments, but the overall package has issues.

Again, this is far from the worst hype out there. High standards still seem valid here.

The other example pointed to is when a16z flagrantly outright lied to the House of Lords:

A16z’s written testimony: “Although advocates for AI safety guidelines often allude to the “black box” nature of AI models, where the logic behind their conclusions is not transparent, recent advancements in the AI sector have resolved this issue, thereby ensuring the integrity of open-source code models.”

As I said at the time: This is lying. This is fraud. Period.

Neel Nanda: +1, I think the correct conclusion is “a16z are making bald faced lies to major governments” not “a16z were misled by Anthropic hype.”

It is exciting that this technique seems to be working, and that it scales to a model as large as Claude Sonnet. There is no reason to think it could not scale indefinitely, if the only issue was scale.

There are many ways to follow up on this finding. There are various different practical tasks that one could demonstrate as a test or turn into a useful product. I am excited by the prospect to make existing AIs easier to steer and customize and make them more useful and especially more fun. I am also excited by the opportunity to better understand what is happening, and develop new training techniques.

One worry is what happens when we start putting a bunch of optimization pressure on the results of interpretability tests like this. Right now a model like Claude Sonnet is (metaphorically, on all meta levels) choosing its internal pathways and states without regard to what would happen if someone looked inside or ran an analysis. That is going to change. Right now, we are dealing with something we are smart enough to understand, that mostly uses concepts we can understand in combinations we can understand, that is incapable of hiding any of it. That too might change.

That especially might change if we start using internals in these ways to guide training. which may get highly tempting. We need to be very careful not to waste this opportunity, and to not rely on such things beyond what they can handle or do exactly the things that cause the correlations we are counting on to break exactly when we need them most in ways that are hard to detect.

The other worry is that we could get overconfident. This could be treated as more progress than it is, interpretability could be treated as solved or on track to be solved, as opposed to us having made non-zero progress at all but still being far behind schedule. Knowing about a bunch of features is a long way from where we need to get, even if follow-ups show that we can do steering with these and that we can reliably identify what a given feature means and also for a given meaning identify the nearest feature.

Some of that will be genuine misunderstands and overexcitement. Some of it likely will be the result of hype from various sources. And then there’s the part that involves people like a16z lying their asses off and using these sorts of results as a fig leaf.

And of course, like any good tool, it is useful for many things, both good and bad.

Mostly, this paper is good news. I look forward to what we find out next.

I am the Golden Gate Bridge Read More »

google’s-“ai-overview”-can-give-false,-misleading,-and-dangerous-answers

Google’s “AI Overview” can give false, misleading, and dangerous answers

This is fine.

Enlarge / This is fine.

Getty Images

If you use Google regularly, you may have noticed the company’s new AI Overviews providing summarized answers to some of your questions in recent days. If you use social media regularly, you may have come across many examples of those AI Overviews being hilariously or even dangerously wrong.

Factual errors can pop up in existing LLM chatbots as well, of course. But the potential damage that can be caused by AI inaccuracy gets multiplied when those errors appear atop the ultra-valuable web real estate of the Google search results page.

“The examples we’ve seen are generally very uncommon queries and aren’t representative of most people’s experiences,” a Google spokesperson told Ars. “The vast majority of AI Overviews provide high quality information, with links to dig deeper on the web.”

After looking through dozens of examples of Google AI Overview mistakes (and replicating many ourselves for the galleries below), we’ve noticed a few broad categories of errors that seemed to show up again and again. Consider this a crash course in some of the current weak points of Google’s AI Overviews and a look at areas of concern for the company to improve as the system continues to roll out.

Treating jokes as facts

  • The bit about using glue on pizza can be traced back to an 11-year-old troll post on Reddit. (via)

    Kyle Orland / Google

  • This wasn’t funny when the guys at Pep Boys said it, either. (via)

    Kyle Orland / Google

  • Weird Al recommends “running with scissors” as well! (via)

    Kyle Orland / Google

Some of the funniest example of Google’s AI Overview failing come, ironically enough, when the system doesn’t realize a source online was trying to be funny. An AI answer that suggested using “1/8 cup of non-toxic glue” to stop cheese from sliding off pizza can be traced back to someone who was obviously trying to troll an ongoing thread. A response recommending “blinker fluid” for a turn signal that doesn’t make noise can similarly be traced back to a troll on the Good Sam advice forums, which Google’s AI Overview apparently trusts as a reliable source.

In regular Google searches, these jokey posts from random Internet users probably wouldn’t be among the first answers someone saw when clicking through a list of web links. But with AI Overviews, those trolls were integrated into the authoritative-sounding data summary presented right at the top of the results page.

What’s more, there’s nothing in the tiny “source link” boxes below Google’s AI summary to suggest either of these forum trolls are anything other than good sources of information. Sometimes, though, glancing at the source can save you some grief, such as when you see a response calling running with scissors “cardio exercise that some say is effective” (that came from a 2022 post from Little Old Lady Comedy).

Bad sourcing

  • Washington University in St. Louis says this ratio is accurate, but others disagree. (via)

    Kyle Orland / Google

  • Man, we wish this fantasy remake was real. (via)

    Kyle Orland / Google

Sometimes Google’s AI Overview offers an accurate summary of a non-joke source that happens to be wrong. When asking about how many Declaration of Independence signers owned slaves, for instance, Google’s AI Overview accurately summarizes a Washington University of St. Louis library page saying that one-third “were personally enslavers.” But the response ignores contradictory sources like a Chicago Sun-Times article saying the real answer is closer to three-quarters. I’m not enough of a history expert to judge which authoritative-seeming source is right, but at least one historian online took issue with the Google AI’s answer sourcing.

Other times, a source that Google trusts as authoritative is really just fan fiction. That’s the case for a response that imagined a 2022 remake of 2001: A Space Odyssey, directed by Steven Spielberg and produced by George Lucas. A savvy web user would probably do a double-take before citing citing Fandom’s “Idea Wiki” as a reliable source, but a careless AI Overview user might not notice where the AI got its information.

Google’s “AI Overview” can give false, misleading, and dangerous answers Read More »

another-us-state-repeals-law-that-protected-isps-from-municipal-competition

Another US state repeals law that protected ISPs from municipal competition

Win for municipal broadband —

With Minnesota repeal, number of states restricting public broadband falls to 16.

Illustration of network data represented by curving lines flowing on a dark background.

Getty Images | Yuichiro Chino

Minnesota this week eliminated two laws that made it harder for cities and towns to build their own broadband networks. The state-imposed restrictions were repealed in an omnibus commerce policy bill signed on Tuesday by Gov. Tim Walz, a Democrat.

Minnesota was previously one of about 20 states that imposed significant restrictions on municipal broadband. The number can differ depending on who’s counting because of disagreements over what counts as a significant restriction. But the list has gotten smaller in recent years because states including Arkansas, Colorado, and Washington repealed laws that hindered municipal broadband.

The Minnesota bill enacted this week struck down a requirement that municipal telecommunications networks be approved in an election with 65 percent of the vote. The law is over a century old, the Institute for Local Self-Reliance’s Community Broadband Network Initiative wrote yesterday.

“Though intended to regulate telephone service, the way the law had been interpreted after the invention of the Internet was to lump broadband in with telephone service thereby imposing that super-majority threshold to the building of broadband networks,” the broadband advocacy group said.

The Minnesota omnibus bill also changed a law that let municipalities build broadband networks, but only if no private providers offer service or will offer service “in the reasonably foreseeable future.” That restriction had been in effect since at least the year 2000.

The caveat that prevented municipalities from competing against private providers was eliminated from the law when this week’s omnibus bill was passed. As a result, the law now lets cities and towns “improve, construct, extend, and maintain facilities for Internet access and other communications purposes” even if private ISPs already offer service.

“States are dropping misguided barriers”

The omnibus bill also added language intended to keep government-operated and private networks on a level playing field. The new language says cities and towns may “not discriminate in favor of the municipality’s own communications facilities by granting the municipality more favorable or less burdensome terms and conditions than a nonmunicipal service provider” with respect to the use of public rights-of-way, publicly owned equipment, and permitting fees.

Additional new language requires “separation between the municipality’s role as a regulator… and the municipality’s role as a competitive provider of services,” and forbids the sharing of “inside information” between the local government’s regulatory and service-provider divisions.

With Minnesota having repealed its anti-municipal broadband laws, the Institute for Local Self-Reliance says that 16 states still restrict the building of municipal networks.

The Minnesota change “is a significant win for the people of Minnesota and highlights a positive trend—states are dropping misguided barriers to deploying public broadband as examples of successful community-owned networks proliferate across the country,” said Gigi Sohn, executive director of the American Association for Public Broadband (AAPB), which represents community-owned broadband networks and co-ops.

There are about 650 public broadband networks in the US, Sohn said. “While 16 states still restrict these networks in various ways, we’re confident this number will continue to decrease as more communities demand the freedom to choose the network that best serves their residents,” she said.

State laws restricting municipal broadband have been passed for the benefit of private ISPs. Although cities and towns generally only build networks when private ISPs haven’t fully met their communities’ needs, those attempts to build municipal networks often face opposition from private ISPs and “dark money” groups that don’t reveal their donors.

Another US state repeals law that protected ISPs from municipal competition Read More »

apple-clarifies-ios-17.5-bug-that-exposed-deleted-photos

Apple clarifies iOS 17.5 bug that exposed deleted photos

iOS 17.5 —

iOS 17.5.1 fixed the bug, but users still had questions.

iPadOS 17.5.1 ready to install on an iPad Pro.

Enlarge / iPadOS 17.5.1 ready to install on an iPad Pro.

Samuel Axon

On May 20, Apple released iOS 17.5.1 to fix a bug users had found a few days prior in iOS 17.5 that resurfaced old photos that had been previously deleted. So far, the update seems to have resolved the issue, but users were left wondering exactly what had happened. Now Apple has clarified the issue somewhat, describing the nature of the bug to 9to5Mac.

Apple told the publication that the photos were not regurgitated from iCloud Photos after being deleted on the local device; rather, they were local to the device. Apple says they were neither left in the cloud after deletion nor synced to it after, and the company did not have access to the deleted photos.

The photos were retained on the local device storage due to a database corruption issue, and the bug resurfaced photos that were flagged for deletion but were not actually fully deleted locally.

That simple explanation doesn’t fully cover all the widely reported edge cases some users had brought up in forums and on Reddit, but Apple offered additional answers for those, too.

The company claimed that when users reported the photos resurfacing on a device other than the one they were originally deleted on, it was always because they had restored from a backup other than iCloud Photos or performed a direct transfer from one device to another.

One user on Reddit claimed (the post has now been deleted) that they had wiped an iPad, sold it to a friend, and the friend then saw photos resurface. Apple told 9to5Mac that is impossible if the user followed the expected procedure for wiping the device, which is to go to “Settings,” “General,” “Transfer and Reset,” and “Erase All Content and Settings.”

The bug was particularly nasty in terms of optics and user trust for Apple, but it would have been far worse if it was iCloud-related and involved deleted photos staying on or being uploaded to Apple’s servers. If what the company told 9to5Mac is true, that was not the case.

Still, it’s a good reminder that in many cases, a deleted file isn’t necessarily deleted, either due to a bug like this, the nature of the storage tech, or in some other cases on other platforms, a deliberate choice.

Apple clarifies iOS 17.5 bug that exposed deleted photos Read More »