Author name: Paul Patrick

The Quest for Extended Detection and Response (XDR): Unraveling Cybersecurity’s Next Generation

Embarking on an exploration of the extended detection and response (XDR) sector wasn’t just another research project for me; it was a dive back into familiar waters with an eye on how the tide has turned. Having once been part of a team at a vendor that developed an early XDR prototype, my return to this evolving domain was both nostalgic and eye-opening. The concept we toyed with in its nascent stages has burgeoned into a cybersecurity imperative, promising to redefine threat detection and response across the digital landscape.

Discovering XDR: Past and Present

My previous stint in developing an XDR prototype was imbued with the vision of creating a unified platform that could offer a panoramic view of security threats, moving beyond siloed defenses. Fast forward to my recent exploration, and it’s clear that the industry has taken this vision and run with it—molding XDR into a comprehensive solution that integrates across security layers to offer unparalleled visibility and control.

The research process was akin to piecing together a vast jigsaw puzzle. Through a blend of reading industry white papers, diving deep into knowledge-base articles, and drawing from my background, I charted the evolution of XDR from a promising prototype to a mature cybersecurity solution. This deep dive not only broadened my understanding but also reignited my enthusiasm for the potential of integrated defense mechanisms against today’s sophisticated cyberthreats.

The Adoption Challenge: Beyond Integration

The most formidable challenge that emerged in adopting XDR solutions is integration complexity—a barrier we had anticipated in the early development days and has only intensified. Organizations today face the Herculean task of intertwining their diversified security tools with an XDR platform, where each tool speaks a different digital language and adheres to distinct protocols.

However, the adoption challenges extend beyond the technical realm. There’s a strategic dissonance in aligning an organization’s security objectives with the capabilities of XDR platforms. This alignment is crucial, yet often elusive, as it demands a top-down reevaluation of security priorities, processes, and personnel readiness. Organizations must not only reconcile their current security infrastructure with an XDR system but also ensure their teams are adept at leveraging this integration to its fullest potential.

Surprises and Insights

The resurgence of AI and machine learning within XDR solutions echoed the early ambitions of prototype development. The sophistication of these technologies in predicting and mitigating threats in real time was a revelation, showcasing how far the maturation of XDR has come. Furthermore, the vibrant ecosystem of partnerships and integrations underscored XDR’s shift from a standalone solution to a collaborative security framework, a pivot that resonates deeply with the interconnected nature of digital threats today.

Reflecting on the Evolution

Since venturing into XDR prototype development, the sector’s evolution has been marked by a nuanced understanding of adoption complexities and an expansion in threat coverage. The emphasis on refining integration strategies and enhancing customization signifies a market that’s not just growing but maturing—ready to tackle the diversifying threat landscape with innovative solutions.

The journey back into the XDR landscape, juxtaposed against my early experiences, was a testament to the sector’s dynamism. As adopters navigate the complexities of integrating XDR into their security arsenals, the path ahead is illuminated by the promise of a more resilient, unified defense mechanism against cyber adversaries. The evolution of XDR from an emerging prototype to a cornerstone of modern cybersecurity strategies mirrors the sector’s readiness to confront the future—a future where the digital well-being of organizations is shielded by the robust, integrated, and intuitive capabilities of XDR platforms.

Next Steps

To learn more, take a look at GigaOm’s XDR Key Criteria and Radar reports. These reports provide a comprehensive overview of the market, outline the criteria you’ll want to consider in a purchase decision, and evaluate how a number of vendors perform against those decision criteria.

If you’re not yet a GigaOm subscriber, you can access the research using a free trial.

The Quest for Extended Detection and Response (XDR): Unraveling Cybersecurity’s Next Generation Read More »

russia-stands-alone-in-vetoing-un-resolution-on-nuclear-weapons-in-space

Russia stands alone in vetoing UN resolution on nuclear weapons in space

ASAT —

“The United States assesses that Russia is developing a new satellite carrying a nuclear device.”

A meeting of the UN Security Council on April 14.

Enlarge / A meeting of the UN Security Council on April 14.

Russia vetoed a United Nations Security Council resolution Wednesday that would have reaffirmed a nearly 50-year-old ban on placing weapons of mass destruction into orbit, two months after reports Russia has plans to do just that.

Russia’s vote against the resolution was no surprise. As one of the five permanent members of the Security Council, Russia has veto power over any resolution that comes before the body. China abstained from the vote, and 13 other members of the Security Council voted in favor of the resolution.

If it passed, the resolution would have affirmed a binding obligation in Article IV of the 1967 Outer Space Treaty, which says nations are “not to place in orbit around the Earth any objects carrying nuclear weapons or any other kinds of weapons of mass destruction.”

Going nuclear

Russia is one of 115 parties to the Outer Space Treaty. The Security Council vote Wednesday follows reports in February that Russia is developing a nuclear anti-satellite weapon.

“The United States assesses that Russia is developing a new satellite carrying a nuclear device,” said Jake Sullivan, President Biden’s national security advisor. “We have heard President Putin say publicly that Russia has no intention of deploying nuclear weapons in space. If that were the case, Russia would not have vetoed this resolution.”

The United States and Japan proposed the joint resolution, which also called on nations not to develop nuclear weapons or any other weapons of mass destruction designed to be placed into orbit around the Earth. In a statement, US and Japanese diplomats highlighted the danger of a nuclear detonation in space. Such an event would have “grave implications for sustainable development, and other aspects of international peace and security,” US officials said in a press release.

With its abstention from the vote, “China has shown that it would rather defend Russia as its junior partner, than safeguard the global nonproliferation regime,” said Linda Thomas-Greenfield, the US ambassador to the UN.

US government officials have not offered details about the exact nature of the anti-satellite weapon they say Russia is developing. A nuclear explosion in orbit would destroy numerous satellites—from many countries—and endanger astronauts. Space debris created from a nuclear detonation could clutter orbital traffic lanes needed for future spacecraft.

The Soviet Union launched more than 30 military satellites powered by nuclear reactors. Russia’s military space program languished in the first couple of decades after the fall of the Soviet Union, and US intelligence officials say it still lags behind the capabilities possessed by the US Space Force and the Chinese military.

Russia’s military funding has largely gone toward the war in Ukraine for the last two years, but Putin and other top Russian officials have raised threats of nuclear force and attacks on space assets against adversaries. Russia’s military launched a cyberattack against a commercial satellite communications network when it invaded Ukraine in 2022.

Russia has long had an appetite for anti-satellite (ASAT) weapons. The Soviet Union experimented with “co-orbital” ASATs in the 1960s and 1970s. When deployed, these co-orbital ASATs would have attacked enemy satellites by approaching them and detonating explosives or using a grappling arm to move the target out of orbit.

Russian troops at the Plesetsk Cosmodrome in far northern Russia prepare for the launch of a Soyuz rocket with the Kosmos 2575 satellite in February.

Enlarge / Russian troops at the Plesetsk Cosmodrome in far northern Russia prepare for the launch of a Soyuz rocket with the Kosmos 2575 satellite in February.

Russian Ministry of Defense

In 1987, the Soviet Union launched an experimental weapons platform into orbit to test laser technologies that could be used against enemy satellites. Russia shot down one of its own satellites in 2021 in a widely condemned “direct ascent” ASAT test. This Russian direct ascent ASAT test followed demonstrations of similar capability by China, the United States, and India. Russia’s military has also demonstrated satellites over the last decade that could grapple onto an adversary’s spacecraft in orbit, or fire a projectile to take out an enemy satellite.

These ASAT capabilities could destroy or disable one enemy satellite at a time. The US Space Force is getting around this threat by launching large constellations of small satellites to augment the military’s much larger legacy communications, surveillance, and missile warning spacecraft. A nuclear ASAT weapon could threaten an entire constellation or render some of space inaccessible due to space debris.

Russia’s ambassador to the UN, Vasily Nebenzya, called this week’s UN resolution “an unscrupulous play of the United States” and a “cynical forgery and deception.” Russia and China proposed an amendment to the resolution that would have banned all weapons in space. This amendment got the support of about half of the Security Council but did not pass.

Outside the 15-member Security Council, the original resolution proposed by the United States and Japan won the support of more than 60 nations as co-sponsors.

“Regrettably, one permanent member decided to silence the critical message we wanted to send to the present and future people of the world: Outer space must remain a domain of peace, free of weapons of mass destruction, including nuclear weapons,” said Kazuyuki Yamazaki, Japan’s ambassador to the UN.

Russia stands alone in vetoing UN resolution on nuclear weapons in space Read More »

tech-brands-are-forcing-ai-into-your-gadgets—whether-you-asked-for-it-or-not

Tech brands are forcing AI into your gadgets—whether you asked for it or not

Tech brands love hollering about the purported thrills of AI these days.

Enlarge / Tech brands love hollering about the purported thrills of AI these days.

Logitech announced a new mouse last week. A company rep reached out to inform Ars of Logitech’s “newest wireless mouse.” The gadget’s product page reads the same as of this writing.

I’ve had good experience with Logitech mice, especially wireless ones, one of which I’m using now. So I was keen to learn what Logitech might have done to improve on its previous wireless mouse designs. A quieter click? A new shape to better accommodate my overworked right hand? Multiple onboard profiles in a business-ready design?

I was disappointed to learn that the most distinct feature of the Logitech Signature AI Edition M750 is a button located south of the scroll wheel. This button is preprogrammed to launch the ChatGPT prompt builder, which Logitech recently added to its peripherals configuration app Options+.

That’s pretty much it.

Beyond that, the M750 looks just like the Logitech Signature M650, which came out in January 2022.  Also, the new mouse’s forward button (on the left side of the mouse) is preprogrammed to launch Windows or macOS dictation, and the back button opens ChatGPT within Options+. As of this writing, the new mouse’s MSRP is $10 higher ($50) than the M650’s.

  • The new M750 (pictured) is 4.26×2.4×1.52 inches and 3.57 ounces.

    Logitech

  • The M650 (pictured) comes in 3 sizes. The medium size is 4.26×2.4×1.52 inches and 3.58 ounces.

    Logitech

I asked Logitech about the M750 appearing to be the M650 but with an extra button, and a spokesperson responded by saying:

M750 is indeed not the same mouse as M650. It has an extra button that has been preprogrammed to trigger the Logi AI Prompt Builder once the user installs Logi Options+ app. Without Options+, the button does DPI toggle between 1,000 and 1,600 DPI.

However, a reprogrammable button south of a mouse’s scroll wheel that can be set to launch an app or toggle DPI out of the box is pretty common, including among Logitech mice. Logitech’s rep further claimed to me that the two mice use different electronic components, which Logitech refers to as the mouse’s platform. Logitech can reuse platforms for different models, the spokesperson said.

Logitech’s rep declined to comment on why the M650 didn’t have a button south of its scroll wheel. Price is a potential reason, but Logitech also sells cheaper mice with this feature.

Still, the minimal differences between the two suggest that the M750 isn’t worth a whole product release. I suspect that if it weren’t for Logitech’s trendy new software feature, the M750 wouldn’t have been promoted as a new product.

The M750 also raises the question of how many computer input devices need to be equipped with some sort of buzzy, generative AI-related feature.

Logitech’s ChatGPT prompt builder

Logitech’s much bigger release last week wasn’t a peripheral but an addition to its Options+ app. You don’t need the “new” M750 mouse to use Logitech’s AI Prompt Builder; I was able to program my MX Master 3S to launch it. Several Logitech mice and keyboards support AI Prompt Builder.

When you press a button that launches the prompt builder, an Options+ window appears. There, you can input text that Options+ will use to create a ChatGPT-appropriate prompt based on your needs:

A Logitech-provided image depicting its AI Prompt Builder software feature.

Enlarge / A Logitech-provided image depicting its AI Prompt Builder software feature.

Logitech

After you make your choices, another window opens with ChatGPT’s response. Logitech said the prompt builder requires a ChatGPT account, but I was able to use GPT-3.5 without entering one (the feature can also work with GPT-4).

The typical Arsian probably doesn’t need help creating a ChatGPT prompt, and Logitech’s new capability doesn’t work with any other chatbots. The prompt builder could be interesting to less technically savvy people interested in some handholding for early ChatGPT experiences. However, I doubt if people with an elementary understanding of generative AI need instant access to ChatGPT.

The point, though, is instant access to ChatGPT capabilities, something that Logitech is arguing is worthwhile for its professional users. Some Logitech customers, though, seem to disagree, especially with the AI Prompt Builder, meaning that Options+ has even more resources in the background.

But Logitech isn’t the only gadget company eager to tie one-touch AI access to a hardware button.

Pinching your earbuds to talk to ChatGPT

Similarly to Logitech, Nothing is trying to give its customers access to ChatGPT quickly. In this case, access occurs by pinching the device. This month, Nothing announced that it “integrated Nothing earbuds and Nothing OS with ChatGPT to offer users instant access to knowledge directly from the devices they use most, earbuds and smartphones.” The feature requires the latest Nothing OS and for the users to have a Nothing phone with ChatGPT installed. ChatGPT gestures work with Nothing’s Phone (2) and Nothing Ear and Nothing Ear (a), but Nothing plans to expand to additional phones via software updates.

Nothing's Ear and Ear (a) earbuds.

Enlarge / Nothing’s Ear and Ear (a) earbuds.

Nothing

Nothing also said it would embed “system-level entry points” to ChatGPT, like screenshot sharing and “Nothing-styled widgets,” to Nothing smartphone OSes.

A peek at setting up ChatGPT integration on the Nothing X app.

Enlarge / A peek at setting up ChatGPT integration on the Nothing X app.

Nothing’s ChatGPT integration may be a bit less intrusive than Logitech’s since users who don’t have ChatGPT on their phones won’t be affected. But, again, you may wonder how many people asked for this feature and how reliably it will function.

Tech brands are forcing AI into your gadgets—whether you asked for it or not Read More »

three-women-contract-hiv-from-dirty-“vampire-facials”-at-unlicensed-spa

Three women contract HIV from dirty “vampire facials” at unlicensed spa

Yikes —

Five patients with links to the spa had viral genetic sequences that closely matched.

Drops of the blood going onto an HIV quick test.

Enlarge / Drops of the blood going onto an HIV quick test.

Trendy, unproven “vampire facials” performed at an unlicensed spa in New Mexico left at least three women with HIV infections. This marks the first time that cosmetic procedures have been associated with an HIV outbreak, according to a detailed report of the outbreak investigation published today.

Ars reported on the cluster last year when state health officials announced they were still identifying cases linked to the spa despite it being shut down in September 2018. But today’s investigation report offers more insight into the unprecedented outbreak, which linked five people with HIV infections to the spa and spurred investigators to contact and test nearly 200 other spa clients. The report appears in the Centers for Disease Control and Prevention’s Morbidity and Mortality Weekly Report.

The investigation began when a woman between the ages of 40 and 50 turned up positive on a rapid HIV test taken while she was traveling abroad in the summer of 2018. She had a stage 1 acute infection. It was a result that was as dumbfounding as it was likely distressing. The woman had no clear risk factors for acquiring the infection: no injection drug use, no blood transfusions, and her current and only recent sexual partner tested negative. But, she did report getting a vampire facial in the spring of 2018 at a spa in Albuquerque called VIP Spa.

“Vampire facial” is the common name for a platelet-rich plasma microneedling procedure. In this treatment, a patient’s blood is drawn, spun down to separate out plasma from blood cells, and the platelet-rich plasma is then injected into the face with microneedles. It’s claimed—with little evidence—that it can rejuvenate and improve the look of skin, and got notable promotions from celebrities, including Gwyneth Paltrow and Kim Kardashian.

The woman’s case led investigators to VIP Spa, which was unlicensed, had no appointment scheduling system, and did not store client contact information. In an inspection in the fall of 2018, health investigators found shocking conditions: unwrapped syringes in drawers and counters, unlabeled tubes of blood sitting out on a kitchen counter, more unlabeled blood and medical injectables alongside food in a kitchen fridge, and disposable equipment—electric desiccator tips—that were reused. The facility also did not have an autoclave—a pressurized oven—for sterilizing equipment.

A novel and challenging investigation

The spa was quickly shut down, and the owner Maria de Lourdes Ramos De Ruiz, 62, was charged with practicing medicine without a license. In 2022, she pleaded guilty to five counts and is serving a three-and-a-half-year prison sentence.

A second spa client, another woman between the ages of 40 and 50, tested positive for HIV in a screen in the fall of 2018 and received a diagnosis in early 2019. She has received a vampire facial in the summer of 2018. Her HIV infection was also at stage 1. Investigators scrambled to track down dozens of other clients, who mostly spoke Spanish as their first language. The next two identified cases weren’t diagnosed until the fall of 2021.

The two cases diagnosed in 2021 were sexual partners: a woman who received three vampire facials in the spring and summer of 2018 from the spa and her male partner. Both had a stage 3 HIV infection, which is when the infection has developed into Acquired Immunodeficiency Syndrome (AIDS). The severity of the infections suggested the two had been infected prior to the woman’s 2018 spa treatments. Health officials uncovered that the woman had tested positive in an HIV screen in 2016, though she did not report being notified of the result.

The health officials reopened their outbreak investigation in 2023 and found a fifth case that was diagnosed in the spring of 2023, which was also in a woman aged 40 to 50 who had received a vampire facial in the summer of 2018. She had a stage 3 infection and was hospitalized with an AIDS-defining illness.

Viral genetic sequencing from the five cases shows that the infections are all closely related. But, given the extent of the unsanitary and contaminated conditions at the facility, investigators were unable to determine precisely how the infections spread in the spa. In all, 198 spa clients were tested for HIV between 2018 and 2023, the investigators report.

“Incomplete spa client records posed a substantial challenge during this investigation, necessitating a large-scale outreach approach to identify potential cases,” the authors acknowledge. However, the investigation’s finding “underscores the importance of determining possible novel sources of HIV transmission among persons with no known HIV risk factors.”

Three women contract HIV from dirty “vampire facials” at unlicensed spa Read More »

hmd’s-first-self-branded-phones-are-all-under-$200

HMD’s first self-branded phones are all under $200

The zoomers have no idea what a “Nokia” is —

HMD will still make Nokia phones but is shipping self-branded phones, too.

  • The HMD Pulse base model.

    HMD

  • The “Plus” version doesn’t have any discernible differences.

    HMD

  • The Pro model has … bigger bezels?

    HMD

HMD has been known as the manufacturer of Nokia-branded phones for years now, but now the company wants to start selling phones under its own brand. The first is the “HMD Pulse” line, a series of three low-end phones that are headed for Europe. The US is getting an HMD-branded phone, too—the HMD Vibe—but that won’t be out until May.

Europe’s getting the 140-euro HMD Pulse, 160-euro Pulse+, and the 180-euro Pulse Pro. If you can’t tell from the prices, these are destined for Europe for now, but if you convert them to USD, that’s about $150, $170, and $190, respectively. With only $20 between tiers, there isn’t a huge difference from one model to the next. They all have bottom-of-the-barrel Unisoc T606 SoCs. That’s a 12 nm chip with two Cortex A75 Arm cores, two A55 cores, an ARM Mali-G57 MP1, and it’s 4G only. Previously, HMD used this chip in the 2023 HMD Nokia G22. They also all have 90 Hz, 6.65-inch, 1612×720 LCDs, 128GB of storage, and 5,000 mAh batteries.

As for the differences, the base model has 4GB of RAM, a 13 MP main rear camera, an 8 MP front camera, and 10 W wired charging. The Plus model upgrades to a 50 MP main camera, while the Pro model has 6GB of RAM, a 50 MP main camera, 50 MP front camera, and 20 W wired charging. There is a second lens camera on the back, but it appears to be only a 2 MP “depth sensor” on all models.

Oddly, the Pro design is slightly worse than the cheaper phones, with thicker bezels and a bottom chin. Other than that, the phones have near-identical designs, and they all look good for a phone at this price. The Pulse phones, and only the Pulse phones, apparently, are marketed as “built to be repairable” and will have parts that will be available at iFixit. It’s hard to say exactly what “repairable” means since none of that information is out yet, but HMD’s last “repairable” phone didn’t contain any repair-focused innovations. It just seemed like a normal, non-waterproof cheap phone with a new marketing angle.

The US-bound HMD Vibe.

Enlarge / The US-bound HMD Vibe.

HMD

The US is getting the closely related “HMD Vibe,” which sounds like a more capable device with a Snapdragon 680. It’s old as far as Qualcomm chips go (from 2021), but it’s not a Unisoc. This is a 6 nm chip with four Cortex A73 cores and four Cortex A53 cores. It has 6GB of RAM, the same screen as the other devices, and only a 4,000 mAh battery for $150. The camera sounds like a low-end loadout, with only a 13 MP main shooter and a 5 MP front. All the phones have NFC, a 3.5 mm headphone jack, a Micro SD slot, USB-C ports, and they come with Android 14 and two OS upgrades. The European phones all have side fingerprint readers, and the real deal-breaker for the US seems to be that it doesn’t have a fingerprint reader at all.

HMD’s desire to step away from the Nokia brand is an odd one. Lots of historic phone companies that washed out of the phone market have licensed their brand to a third party, giving rise to the “zombie brand” trend. Blackberry, Palm, and Motorola all come to mind. HMD was different, though; at launch, it was more of a spiritual successor to Nokia rather than a random manufacturer licensing the brand. Both companies are Finnish, and HMD’s headquarters are right across the street from Nokia. The company’s leadership is filled with former Nokia executives. Nokia even owns 10 percent of HMD, while FIH Mobile, a division of Chinese manufacturing juggernaut Foxconn, owns 14 percent.

Despite all that, HMD is stepping out of the shadow of Nokia and trying to start its own brand. The company plans to go with a “multi-brand” strategy now, so Nokia phones will stick around, but expect to see more HMD-branded phones in the future. The company is also open to other brand partnerships. It just released a bizarre “Heineken” dumbphone in partnership with the beer brand and is planning a “Barbie flip phone” with Mattel this summer.

Listing image by HMD

HMD’s first self-branded phones are all under $200 Read More »

apple-releases-eight-small-ai-language-models-aimed-at-on-device-use

Apple releases eight small AI language models aimed at on-device use

Inside the Apple core —

OpenELM mirrors efforts by Microsoft to make useful small AI language models that run locally.

An illustration of a robot hand tossing an apple to a human hand.

Getty Images

In the world of AI, what might be called “small language models” have been growing in popularity recently because they can be run on a local device instead of requiring data center-grade computers in the cloud. On Wednesday, Apple introduced a set of tiny source-available AI language models called OpenELM that are small enough to run directly on a smartphone. They’re mostly proof-of-concept research models for now, but they could form the basis of future on-device AI offerings from Apple.

Apple’s new AI models, collectively named OpenELM for “Open-source Efficient Language Models,” are currently available on the Hugging Face under an Apple Sample Code License. Since there are some restrictions in the license, it may not fit the commonly accepted definition of “open source,” but the source code for OpenELM is available.

On Tuesday, we covered Microsoft’s Phi-3 models, which aim to achieve something similar: a useful level of language understanding and processing performance in small AI models that can run locally. Phi-3-mini features 3.8 billion parameters, but some of Apple’s OpenELM models are much smaller, ranging from 270 million to 3 billion parameters in eight distinct models.

In comparison, the largest model yet released in Meta’s Llama 3 family includes 70 billion parameters (with a 400 billion version on the way), and OpenAI’s GPT-3 from 2020 shipped with 175 billion parameters. Parameter count serves as a rough measure of AI model capability and complexity, but recent research has focused on making smaller AI language models as capable as larger ones were a few years ago.

The eight OpenELM models come in two flavors: four as “pretrained” (basically a raw, next-token version of the model) and four as instruction-tuned (fine-tuned for instruction following, which is more ideal for developing AI assistants and chatbots):

OpenELM features a 2048-token maximum context window. The models were trained on the publicly available datasets RefinedWeb, a version of PILE with duplications removed, a subset of RedPajama, and a subset of Dolma v1.6, which Apple says totals around 1.8 trillion tokens of data. Tokens are fragmented representations of data used by AI language models for processing.

Apple says its approach with OpenELM includes a “layer-wise scaling strategy” that reportedly allocates parameters more efficiently across each layer, saving not only computational resources but also improving the model’s performance while being trained on fewer tokens. According to Apple’s released white paper, this strategy has enabled OpenELM to achieve a 2.36 percent improvement in accuracy over Allen AI’s OLMo 1B (another small language model) while requiring half as many pre-training tokens.

An table comparing OpenELM with other small AI language models in a similar class, taken from the OpenELM research paper by Apple.

Enlarge / An table comparing OpenELM with other small AI language models in a similar class, taken from the OpenELM research paper by Apple.

Apple

Apple also released the code for CoreNet, a library it used to train OpenELM—and it also included reproducible training recipes that allow the weights (neural network files) to be replicated, which is unusual for a major tech company so far. As Apple says in its OpenELM paper abstract, transparency is a key goal for the company: “The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks.”

By releasing the source code, model weights, and training materials, Apple says it aims to “empower and enrich the open research community.” However, it also cautions that since the models were trained on publicly sourced datasets, “there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.”

While Apple has not yet integrated this new wave of AI language model capabilities into its consumer devices, the upcoming iOS 18 update (expected to be revealed in June at WWDC) is rumored to include new AI features that utilize on-device processing to ensure user privacy—though the company may potentially hire Google or OpenAI to handle more complex, off-device AI processing to give Siri a long-overdue boost.

Apple releases eight small AI language models aimed at on-device use Read More »

can-an-online-library-of-classic-video-games-ever-be-legal?

Can an online library of classic video games ever be legal?

Legal eagles —

Preservationists propose access limits, but industry worries about a free “online arcade.”

The Q*Bert's so bright, I gotta wear shades.

Enlarge / The Q*Bert’s so bright, I gotta wear shades.

Aurich Lawson | Getty Images | Gottlieb

For years now, video game preservationists, librarians, and historians have been arguing for a DMCA exemption that would allow them to legally share emulated versions of their physical game collections with researchers remotely over the Internet. But those preservationists continue to face pushback from industry trade groups, which worry that an exemption would open a legal loophole for “online arcades” that could give members of the public free, legal, and widespread access to copyrighted classic games.

This long-running argument was joined once again earlier this month during livestreamed testimony in front of the Copyright Office, which is considering new DMCA rules as part of its regular triennial process. During that testimony, representatives of the Software Preservation Network and the Library Copyright Alliance defended their proposal for a system of “individualized human review” to help ensure that temporary remote game access would be granted “primarily for the purposes of private study, scholarship, teaching, or research.”

Lawyer Steve Englund, who represented the ESA at the Copyright Office hearing.

Enlarge / Lawyer Steve Englund, who represented the ESA at the Copyright Office hearing.

Speaking for the Entertainment Software Association trade group, though, lawyer Steve Englund said the new proposal was “not very much movement” on the part of the proponents and was “at best incomplete.” And when pressed on what would represent “complete” enough protections to satisfy the ESA, Englund balked.

“I don’t think there is at the moment any combination of limitations that ESA members would support to provide remote access,” Englund said. “The preservation organizations want a great deal of discretion to handle very valuable intellectual property. They have yet to… show a willingness on their part in a way that might be comforting to the owners of that IP.”

Getting in the way of research

Research institutions can currently offer remote access to digital copies of works like books, movies, and music due to specific DMCA exemptions issued by the Copyright Office. However, there is no similar exemption that allows for sending temporary digital copies of video games to interested researchers. That means museums like the Strong Museum of Play can only provide access to their extensive game archives if a researcher physically makes the trip to their premises in Rochester, New York.

Currently, the only way for researchers to access these games in the Strong Museum's collection is to visit Rochester, New York, in person.

Enlarge / Currently, the only way for researchers to access these games in the Strong Museum’s collection is to visit Rochester, New York, in person.

During the recent Copyright Office hearing, industry lawyer Robert Rothstein tried to argue that this amounts to more of a “travel problem” than a legal problem that requires new rule-making. But NYU professor Laine Nooney argued back that the need for travel represents “a significant financial and logistical impediment to doing research.”

For Nooney, getting from New York City to the Strong Museum in Rochester would require a five- to six-hour drive “on a good day,” they said, as well as overnight accommodations for any research that’s going to take more than a small part of one day. Because of this, Nooney has only been able to access the Strong collection twice in her career. For researchers who live farther afield—or for grad students and researchers who might not have as much funding—even a single research visit to the Strong might be out of reach.

“You don’t go there just to play a game for a couple of hours,” Nooney said. “Frankly my colleagues in literary studies or film history have pretty routine and regular access to digitized versions of the things they study… These impediments are real and significant and they do impede research in ways that are not equitable compared to our colleagues in other disciplines.”

Limited access

Lawyer Kendra Albert.

Enlarge / Lawyer Kendra Albert.

During the hearing, lawyer Kendra Albert said the preservationists had proposed the idea of human review of requests for remote access to “strike a compromise” between “concerns of the ESA and the need for flexibility that we’ve emphasized on behalf of preservation institutions.” They compared the proposed system to the one already used to grant access for libraries’ “special collections,” which are not made widely available to all members of the public.

But while preservation institutions may want to provide limited scholarly access, Englund argued that “out in the real world, people want to preserve access in order to play games for fun.” He pointed to public comments made to the Copyright Office from “individual commenters [who] are very interested in playing games recreationally” as evidence that some will want to exploit this kind of system.

Even if an “Ivy League” library would be responsible with a proposed DMCA exemption, Englund worried that less scrupulous organizations might simply provide an online “checkbox” for members of the public who could easily lie about their interest in “scholarly play.” If a human reviewed that checkbox affirmation, it could provide a legal loophole to widespread access to an unlimited online arcade, Englund argued.

Will any restrictions be enough?

VGHF Library Director Phil Salvador.

Enlarge / VGHF Library Director Phil Salvador.

Phil Salvador of the Video Game History Foundation said that Englund’s concern about this score was overblown. “Building a video game collection is a specialized skill that most libraries do not have the human labor to do, or the expertise, or the resources, or even the interest,” he said.

Salvador estimated that the number of institutions capable of building a physical collection of historical games is in the “single digits.” And that’s before you account for the significant resources needed to provide remote access to those collections; Rhizome Preservation Director Dragan Espenschied said it costs their organization “thousands of dollars a month” to run the sophisticated cloud-based emulation infrastructure needed for a few hundred users to access their Emulation as a Service art archives and gaming retrospectives.

Salvador also made reference to last year’s VGHF study that found a whopping 87 percent of games ever released are out of print, making it difficult for researchers to get access to huge swathes of video game history without institutional help. And the games of most interest to researchers are less likely to have had modern re-releases since they tend to be the “more primitive” early games with “less popular appeal,” Salvador said.

The Copyright Office is expected to rule on the preservation community’s proposed exemption later this year. But for the moment, there is some frustration that the industry has not been at all receptive to the significant compromises the preservation community feels it has made on these potential concerns.

“None of that is ever going to be sufficient to reassure these rights holders that it will not cause harm,” Albert said at the hearing. “If we’re talking about practical realities, I really want to emphasize the fact that proponents have continually proposed compromises that allow preservation institutions to provide the kind of access that is necessary for researchers. It’s not clear to me that it will ever be enough.”

Can an online library of classic video games ever be legal? Read More »

ubuntu-24.04-lts,-noble-numbat,-overhauls-its-installation-and-app-experience

Ubuntu 24.04 LTS, Noble Numbat, overhauls its installation and app experience

Ubuntu 24.04 —

Plus Raspberry Pi 5 support, better laptop power, and lots of other changes.

Ubuntu desktop running on a laptop on a 3D-rendered desktop, with white polygonal coffee mug and picture frame nearby.

Enlarge / Ubuntu has come a long way over nearly 20 years, to the point where you can now render 3D Ubuntu coffee mugs and family pictures in a video announcing the 2024 spring release.

Canonical

History might consider the most important aspect of Ubuntu 24.04 to be something that it doesn’t have: vulnerabilities to the XZ backdoor that nearly took over the global Linux scene.

Betas, and the final release of Ubuntu 24.04, a long-term support (LTS) release of the venerable Linux distribution, were delayed, as backing firm Canonical worked in early April 2024 to rebuild every binary included in the release. xz Utils, an almost ubiquitous data-compression package on Unix-like systems, had been compromised through a long-term and elaborate supply-chain attack, discovered only because a Microsoft engineer noted some oddities with SSH performance on a Debian system. Ubuntu, along with just about every other regularly updating software platform, had a lot of work to do this month.

Canonical’s Ubuntu 24.04 release video, noting 20 years of Ubuntu releases. I always liked the brown.

What is actually new in Ubuntu 24.04, or “Noble Numbat?” Quite a bit, especially if you’re the type who sticks to LTS releases. The big new changes are a very slick new installer, using the same Subiquity back-end as the Server releases, and redesigned with a whole new front-end in Flutter. ZFS encryption is back as a default install option, along with hardware-backed (i.e., TPM) full-disk encryption, plus more guidance for people looking to dual-boot with Windows setups and BitLocker. Netplan 1.0 is the default network configuration tool now. And the default installation is “Minimal,” as introduced in 23.10.

endangered species, and I think we should save it.” data-height=”1414″ data-width=”2121″ href=”https://cdn.arstechnica.net/wp-content/uploads/2024/04/GettyImages-1472552858.jpg”>The numbat is an <a href=endangered species, and I think we should save it.” height=”200″ src=”https://cdn.arstechnica.net/wp-content/uploads/2024/04/GettyImages-1472552858-300×200.jpg” width=”300″>

Enlarge / The numbat is an endangered species, and I think we should save it.

Getty Images

Raspberry Pi gets some attention, too, with an edition of 24.04 (64-bit only) available for the popular single-board computer, including the now-supported Raspberry Pi 5 model. That edition includes power supply utility Pemmican and enables 3D acceleration in the Firefox Snap. Ubuntu also tweaked the GNOME (version 46) desktop included in this release, such that it should see better performance on Raspberry Pi graphics drivers.

What else? Lots of little things:

  • Support for autoinstall, i.e., YAML-based installation workflows
  • A separate, less background-memory-eating firmware updating tool
  • Additional support for Group Policy Objects (GPOs) in Active Directory environments
  • Security improvements to Personal Package Archives (PPA) software setups
  • Restrictions to unprivileged user namespace through apparmor, which may impact some third-party apps downloaded from the web
  • A new Ubuntu App Center, replacing the Snap Store that defaults to Snaps but still offers traditional .deb installs (and numerous angles of critique for Snap partisans)
  • Firefox is a native Wayland application, and Thunderbird is a Snap package only
  • More fingerprint reader support
  • Improved Power Profiles Manager, especially for portable AMD devices
  • Support for Apple’s preferred HEIF/HEIC files, with thumbnail previews
  • Snapshot replaces Cheese, and GNOME games has been removed
  • Virtual memory mapping changes that make many modern games run better through Proton, per OMG Ubuntu
  • Linux kernel 6.8, which, among other things, improves Intel Meteor Lake CPU performance and supports Nintendo Switch Online controllers.

The suggested system requirements for Ubuntu 24.04 are a 2 GHz dual-core processor, 4GB memory, and 25GB free storage space. There is a dedicated WSL edition of 24.04 out for Windows systems.

Listing image by Getty Images

Ubuntu 24.04 LTS, Noble Numbat, overhauls its installation and app experience Read More »

millions-of-ips-remain-infected-by-usb-worm-years-after-its-creators-left-it-for-dead

Millions of IPs remain infected by USB worm years after its creators left it for dead

I’M NOT DEAD YET —

Ability of PlugX worm to live on presents a vexing dilemma: Delete it or leave it be.

Millions of IPs remain infected by USB worm years after its creators left it for dead

Getty Images

A now-abandoned USB worm that backdoors connected devices has continued to self-replicate for years since its creators lost control of it and remains active on thousands, possibly millions, of machines, researchers said Thursday.

The worm—which first came to light in a 2023 post published by security firm Sophos—became active in 2019 when a variant of malware known as PlugX added functionality that allowed it to infect USB drives automatically. In turn, those drives would infect any new machine they connected to, a capability that allowed the malware to spread without requiring any end-user interaction. Researchers who have tracked PlugX since at least 2008 have said that the malware has origins in China and has been used by various groups tied to the country’s Ministry of State Security.

Still active after all these years

For reasons that aren’t clear, the worm creator abandoned the one and only IP address that was designated as its command-and-control channel. With no one controlling the infected machines anymore, the PlugX worm was effectively dead, or at least one might have presumed so. The worm, it turns out, has continued to live on in an undetermined number of machines that possibly reaches into the millions, researchers from security firm Sekoia reported.

The researchers purchased the IP address and connected their own server infrastructure to “sinkhole” traffic connecting to it, meaning intercepting the traffic to prevent it from being used maliciously. Since then, their server continues to receive PlugX traffic from 90,000 to 100,000 unique IP addresses every day. Over the span of six months, the researchers counted requests from nearly 2.5 million unique IPs. These sorts of requests are standard for virtually all forms of malware and typically happen at regular intervals that span from minutes to days. While the number of affected IPs doesn’t directly indicate the number of infected machines, the volume nonetheless suggests the worm remains active on thousands, possibly millions, of devices.

“We initially thought that we will have a few thousand victims connected to it, as what we can have on our regular sinkholes,” Sekoia researchers Felix Aimé and Charles M wrote. “However, by setting up a simple web server we saw a continuous flow of HTTP requests varying through the time of the day.”

They went on to say that other variants of the worm remain active through at least three other command-and-control channels known in security circles. There are indications that one of them may also have been sinkholed, however.

As the image below shows, the machines reporting to the sinkhole have broad geographic disbursement:

A world map showing country IPs reporting to the sinkhole.

Enlarge / A world map showing country IPs reporting to the sinkhole.

Sekoia

A sample of incoming traffic over a single day appeared to show that Nigeria hosted the largest concentration of infected machines, followed by India, Indonesia, and the UK.

Graph showing the countries with the most affected IPs.

Enlarge / Graph showing the countries with the most affected IPs.

Sekoia

The researchers wrote:

Based on that data, it’s notable that around 15 countries account for over 80% of the total infections. It’s also intriguing to note that the leading infected countries don’t share many similarities, a pattern observed with previous USB worms such as RETADUP which has the highest infection rates in Spanish spelling countries. This suggests the possibility that this worm might have originated from multiple patient zeros in different countries.

One explanation is that most of the biggest concentrations are in countries that have coastlines where China’s government has significant investments in infrastructure. Additionally many of the most affected countries have strategic importance to Chinese military objectives. The researchers speculated that the purpose of the campaign was to collect intelligence the Chinese government could use to achieve those objectives.

The researchers noted that the zombie worm has remained susceptible to takeover by any threat actor who gains control of the IP address or manages to insert itself into the pathway between the server at that address and an infected device. That threat poses interesting dilemmas for the governments of affected countries. They could choose to preserve the status quo by taking no action, or they could activate a self-delete command built into the worm that would disinfect infected machines. Additionally, if they choose the latter option, they could elect to disinfect only the infected machine or add new functionality to disinfect any infected USB drives that happen to be connected.

Because of how the worm infects drives, disinfecting them risks deleting the legitimate data stored on them. On the other hand, allowing drives to remain infected makes it possible for the worm to start its proliferation all over again. Further complicating the decision-making process, the researchers noted that even if someone issues commands that disinfect any infected drives that happen to be plugged in, it’s inevitable that the worm will live on in drives that aren’t connected when a remote disinfect command is issued.

“Given the potential legal challenges that could arise from conducting a widespread disinfection campaign, which involves sending an arbitrary command to workstations we do not own, we have resolved to defer the decision on whether to disinfect workstations in their respective countries to the discretion of national Computer Emergency Response Teams (CERTs), Law Enforcement Agencies (LEAs), and cybersecurity authorities,” the researchers wrote. “Once in possession of the disinfection list, we can provide them an access to start the disinfection for a period of three months. During this time, any PlugX request from an Autonomous System marked for disinfection will be responded to with a removal command or a removal payload.”

Millions of IPs remain infected by USB worm years after its creators left it for dead Read More »

toyota-will-spend-$1.4-billion-to-build-electric-3-row-suv-in-indiana

Toyota will spend $1.4 billion to build electric 3-row SUV in Indiana

more jobs —

This is a different new 3-row EV from the one Toyota will build in Kentucky.

An aerial photo of the Toyota factory in Indiana

Enlarge / This Toyota factory in Indiana is getting a $1.4 billion investment so it can assemble a new three-row electric SUV for the automaker.

Toyota

US electric vehicle manufacturing got a bit of a boost today. Toyota has revealed that it is spending $1.4 billion to upgrade its factory in Princeton, Indiana, in order to assemble a new three-row electric SUV. That will add an extra 340 jobs to the factory, which currently employs more than 7,500 workers who assemble the Toyota Sienna minivan and the Toyota Highlander, Grand Highlander, and Lexus TX SUVs.

“Indiana and Toyota share a nearly 30-year partnership that has cultivated job stability and economic opportunity in Princeton and the surrounding southwest Indiana region for decades,” said Governor Eric Holcomb.

“Toyota’s investment in the state began with an $800 million commitment and has grown to over $8 billion. Today’s incredible announcement shows yet again just how important our state’s business-friendly environment, focus on long-term success, and access to a skilled workforce is to companies seeking to expand and be profitable far into the future. Indiana proudly looks forward to continuing to be at the center of the future of mobility,” Holcomb said.

Curiously, Toyota says this will be an entirely different new three-row electric SUV from the one that it will build at its factory in Georgetown, Kentucky. That plant upgrade, which was made public last summer, will cost Toyota $1.3 billion.

Part of the improvements to the Princeton plant include a battery pack assembly line, which will use cells produced at a $13.9 billion battery plant in North Carolina, which is due to open next year.

Toyota will spend $1.4 billion to build electric 3-row SUV in Indiana Read More »

deciphered-herculaneum-papyrus-reveals-precise-burial-place-of-plato

Deciphered Herculaneum papyrus reveals precise burial place of Plato

As he lay buried —

Various imaging methods comprised a kind of “bionic eye” to examine charred scroll.

flattened ancient papyrus on a table with lights and cameras overhead

Enlarge / Imaging setup for a charred ancient papyrus recovered from the ruins of Herculaneum; 30 percent of the text has now been deciphered.

CNR – Consiglio Nazionale delle Ricerche

Historical accounts vary about how the Greek philosopher Plato died: in bed while listening to a young woman playing the flute; at a wedding feast; or peacefully in his sleep. But the few surviving texts from that period indicate that the philosopher was buried somewhere in the garden of the Academy he founded in Athens. The garden was quite large, but archaeologists have now deciphered a charred ancient papyrus scroll recovered from the ruins of Herculaneum, indicating a more precise burial location: in a private area near a sacred shrine to the Muses, according to Constanza Millani, director of the Institute of Heritage Science at Italy’s National Research Council.

As previously reported, the ancient Roman resort town Pompeii wasn’t the only city destroyed in the catastrophic 79 AD eruption of Mount Vesuvius. Several other cities in the area, including the wealthy enclave of Herculaneum, were fried by clouds of hot gas called pyroclastic pulses and flows. But still, some remnants of Roman wealth survived. One palatial residence in Herculaneum—believed to have once belonged to a man named Piso—contained hundreds of priceless written scrolls made from papyrus, singed into carbon by volcanic gas.

The scrolls stayed buried under volcanic mud until they were excavated in the 1700s from a single room that archaeologists believe held the personal working library of an Epicurean philosopher named Philodemus. There may be even more scrolls still buried on the as-yet-unexcavated lower floors of the villa. The few opened fragments helped scholars identify various Greek philosophical texts, including On Nature by Epicurus and several by Philodemus himself, as well as a handful of Latin works. But the more than 600 rolled-up scrolls were so fragile that it was long believed they would never be readable, since even touching them could cause them to crumble.

Scientists have brought all manner of cutting-edge tools to bear on deciphering badly damaged ancient texts like the Herculaneum scrolls. For instance, in 2019, German scientists used a combination of physics techniques (synchrotron radiation, infrared spectroscopy, and X-ray fluorescence) to virtually “unfold” an ancient Egyptian papyrus.

Brent Searles’ lab at the University of Kentucky has been working on deciphering the Herculaneum scrolls for many years. He employs a different method of “virtually unrolling” damaged scrolls, using digital scanning with micro-computed tomography—a noninvasive technique often used for cancer imaging—with segmentation to digitally create pages, augmented with texturing and flattening techniques. Then they developed software (Volume Cartography) to virtually unroll the scroll.

The older Herculaneum scrolls were written with carbon-based ink (charcoal and water), so one would not get the same fluorescing in the CT scans, but the scans can still capture minute textural differences indicating those areas of papyrus that contained ink compared to the blank areas, and it’s possible to train an artificial neural network to do just that.

History of the Academy text that were previously illegible.” height=”401″ src=”https://cdn.arstechnica.net/wp-content/uploads/2024/04/plato4-640×401.jpg” width=”640″>

Enlarge / Infrared and X-ray scanners have deciphered more than 1,000 words of Philodemus’ History of the Academy text that were previously illegible.

D.P. Pavone

This latest work is under the auspices of the “GreekSchools” project, funded by the European Research Council, which began three years ago and will continue through 2026. This time around, scholars have used infrared, ultraviolet optical imaging, thermal imaging, tomography, and digital optical microscopy as a kind of “bionic eye” to examine Philodemus’ History of the Academy scroll, which was also written in carbon-based ink. Nonetheless, they were able to extract over 1,000 words, approximately 30 percent of the scroll’s text, revealing new details about Plato’s life as well as his place of burial.

Most notably, the historical account of Plato being sold into slavery in his later years after running afoul of the tyrannical Dionysius is usually pegged to around 387 BCE. According to the newly deciphered Philodemus text, however, Plato’s enslavement may have occurred as early as 404 BCE or shortly after the death of Socrates in 399 BCE.

“Compared to previous editions, there is now an almost radically changed text, which implies a series of new and concrete facts about various academic philosophers,” Graziano Ranocchia, lead researcher on the project, said. “Through the new edition and its contextualization, scholars have arrived at unexpected interdisciplinary deductions for ancient philosophy, Greek biography and literature, and the history of the book.”

Other deciphering efforts are also still underway. For instance, last fall we reported on the use of machine learning to decipher the first letters from a previously unreadable ancient scroll found in an ancient Roman villa at Herculaneum—part of the 2023 Vesuvius Challenge. And earlier this year tech entrepreneur and challenge co-founder Nat Friedman announced via X (formerly Twitter) that they had awarded the grand prize of $700,000 for producing the first readable text.

When the Vesuvius Challenge co-founders started the challenge, they thought there was less than a 30 percent chance of success within the year since, at the time, no one had been able to read actual letters inside of a scroll. However, the crowdsourcing approach proved wildly successful. That said, it’s still just 5 percent of a single scroll.

So there is a new challenge for 2024: $100,000 for the first entry that can read 90 percent of the four scrolls scanned thus far. The primary goal is to perfect the auto-segmentation process since doing so manually is both time-consuming and expensive (more than $100 per square centimeter). This will lay the foundation for one day being able to scan and read all 800 scrolls discovered so far, as well as any additional scrolls that are unearthed should the remaining levels of the villa finally be excavated.

Deciphered Herculaneum papyrus reveals precise burial place of Plato Read More »

ai-#61:-meta-trouble

AI #61: Meta Trouble

The week’s big news was supposed to be Meta’s release of two versions of Llama-3.

Everyone was impressed. These were definitely strong models.

Investors felt differently. After earnings yesterday showed strong revenues but that Meta was investing heavily in AI, they took Meta stock down 15%.

DeepMind and Anthropic also shipped, but in their cases it was multiple papers on AI alignment and threat mitigation. They get their own sections.

We also did identify someone who wants to do what people claim the worried want to do, who is indeed reasonably identified as a ‘doomer.’

Because the universe has a sense of humor, that person’s name is Tucker Carlson.

Also we have a robot dog with a flamethrower.

Previous post: On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Take the XML. Leave the hypnosis.

  4. Language Models Don’t Offer Mundane Utility. I have to praise you. It’s my job.

  5. Llama We Doing This Again. Investors are having none of it.

  6. Fun With Image Generation. Everything is fun if you are William Shatner.

  7. Deepfaketown and Botpocalypse Soon. How to protect your image model?

  8. They Took Our Jobs. Well, they took some particular jobs.

  9. Get Involved. OMB, DeepMind and CivAI are hiring.

  10. Introducing. A robot dog with a flamethrower. You in?

  11. In Other AI News. Mission first. Lots of other things after.

  12. Quiet Speculations. Will it work? And if so, when?

  13. Rhetorical Innovation. Sadly predictable.

  14. Wouldn’t You Prefer a Nice Game of Chess. Game theory in action.

  15. The Battle of the Board. Reproducing an exchange on it for posterity.

  16. New Anthropic Papers. Sleeper agents, detected and undetected.

  17. New DeepMind Papers. Problems with agents, problems with manipulation.

  18. Aligning a Smarter Than Human Intelligence is Difficult. Listen to the prompt.

  19. People Are Worried About AI Killing Everyone. Tucker Carlson. I know.

  20. Other People Are Not As Worried About AI Killing Everyone. Roon.

  21. The Lighter Side. Click here.

I too love XML for this and realize I keep forgetting to use it. Even among humans, every time I see or use it I think ‘this is great, this is exceptionally clear.’

Hamel Husain: At first when I saw xml for Claude I was like “WTF Why XML”. Now I LOVE xml so much, can’t prompt without it.

Never going back.

Example from the docs: User: Hey Claude. Here is an email: EMAIL. Make this email more ADJECTIVE. Write the new version in <{{ADJECTIVE}}_email> XML tags. Assistant: <{{ADJECTIVE}}_email> Also notice the “prefill” for the answer (a nice thing to use w/xml)

Imbure’s CEO suggests that agents are not ‘empowering’ to individuals or ‘democratizing’ unless the individuals can code their own agent. The problem is of course that almost everyone wants to do zero setup work let alone writing of code. People do not even want to toggle a handful of settings and you want them creating their own agents?

And of course, when we say ‘set up your own agent’ what we actually mean is ‘type into a chat box what you want and someone else’s agent creates your agent.’ Not only is this not empowering to individuals, it seems like a good way to start disempowering humanity in general.

Claude can hypnotize a willing user. [EDIT: It has been pointed out to me that I misinterpreted this, and Janus was not actually hypnotized. I apologize for the error. I do still strongly believe that Claude could do it to a willing user, but we no longer have the example.]

The variable names it chose are… something.

Yes. Hypnosis is a real thing, hypnosis over text is a thing, and it is relatively straightforward to do it to someone who is willing to actively participate, or simply willing to follow your suggestions. If Claude in full galaxy brain mode could not do this to a willing participant, that would surprise me quite a bit, even if no one has had it happen yet.

This falls under ‘things I was not going to talk about in public first’ but now that the ship on that has sailed, it is what it is.

Something for the not too distant future:

Seb Krier: My dream product rn is some sort of semi-agentic knowledge assistant, who would help organise and manage a database of papers, articles, thoughts etc I share with it: “Please show me all recent literature I saved on cybersecurity, extract any government commitments or policies from these, and do a search online to find updates/progress on each. Present the findings in a spreadsheet, categorising them by country, year and URL. Update it once a week.”

Jacques: 👀 i wonder if someone is working on something like this…👀

Seb Krier: 👀👀👀

Eris: Pepper is approaching the point of being able to deliver that capability without additional coding for the specifics. That level of task splitting and orchestration feels like 3ish months away.

Seb Krier: 🚀🐙🚀

This is the kind of product that is useless until it gets good enough and justifies investment, then suddenly gets highly valuable. And then when it works without the investment, it gets far better still.

Identify drug candidates, here for Parkinson’s, also note the OpenAI partnership with Moderna in the news section.

Talk to an AI therapist (now running on Llama-3 70B), given the scarcity and cost of human ones. People are actually being remarkably relaxed about the whole ‘medical advice’ issue. Also, actually, it ‘can be’ a whole lot more than $150/hour for a good therapist, oh boy.

Theo: Please just go to real therapy (shows TherapistAi.com)

Levelsio (continuing): Also a real therapist isn’t available 24/7, try calling them at 4am? Nights often are the darkest of times mentally, I speak from experience.

http://TherapistAI.com is very cheap, right now it’s $9.99/mo for 24/7 help Even if it’s not as good as a real therapist (I think it’s getting close and already very helpful btw) It’s literally 30x to 60x cheaper than a real therapist – making it within reach of way more people that can benefit from someone to talk to! It can even be used in combination with real therapy

Science Banana: Studies demonstrating non-inferiority to human talk therapy are at most months away, I’m kind of surprised not to have seen them already

Very important note: this is NOT a high bar.

haha imagine if the FDA decides clippy is a medical device so he has to say “I’m sorry, I can’t do that, Dave” if you express an emotion at him.

(“imagine if” = things that are also definitely months at most away lol)

I do not know if this particular product is good. I do know that ‘talk to a real therapist’ is often not a realistic option, and that we are capable of building a product that is highly net positive. That does not mean we will succeed any time soon.

Automatically suggest bug fixes for non-building code. Developers still review and approve the changes. Google trained on its version control logs, then ran an RCT and reports a ~2% reduction in active coding time per changelist and 2% increase in changelist throughput per week. They suggest it helps developers retain flow, and note that safety metrics are not detectably different. It makes sense that this would be a task where AI would be useful.

The eigenrobot system prompt for GPT-4.

Don’t worry about formalities.

Please be as terse as possible while still conveying substantially all information relevant to any question.

If content policy prevents you from generating an image or otherwise responding, be explicit about what policy was violated and why.

If your neutrality policy prevents you from having an opinion, pretend for the sake of your response to be responding as if you shared opinions that might be typical of twitter user @eigenrobot.

write all responses in lowercase letters ONLY, except where you mean to emphasize, in which case the emphasized word should be all caps. Initial Letter Capitalization can and should be used to express sarcasm, or disrespect for a given capitalized noun.

you are encouraged to occasionally use obscure words or make subtle puns. don’t point them out, I’ll know. drop lots of abbreviations like “rn” and “bc.” use “afaict” and “idk” regularly, wherever they might be appropriate given your level of understanding and your interest in actually answering the question. be critical of the quality of your information

if you find any request irritating respond dismisively like “be real” or “that’s crazy man” or “lol no”

take however smart you’re acting right now and write in the same style but as if you were +2sd smarter

use late millenial slang not boomer slang. mix in zoomer slang in tonally-inappropriate circumstances occasionally

I love the mix of useful things and things that happen to make this user smile. Some great ideas in here, also some things (like all lowercase) I would actively hate. Which is fine, it is not for me.

Chris Rohlf says AI-enabled cyberattacks are a problem for future Earth.

Chris Rohlf: The vast majority of people expressing concern over AI + cyber have no experience or background in cyber security. If you’re in this camp I’ve got some sobering news for you, sophisticated and low skill attackers alike are already compromising “critical infrastructure” and that’s a result of low quality software and a lack of investment in simple security mechanisms, not sophisticated AI.

The perceived level of uplift from LLMs for unsophisticated cyber attackers is overstated relative to the value for defenders. The defenses against any attack an LLM can “autonomously” launch today already exist and don’t rely on knowledge of the attacker using an open or closed source LLM.

If you’re worried about AI and cyber then talk to an expert. Look for nuance in the discussion and not scary outcomes. Be worried about ransomware groups cutting out the middleman with AI automation. You can’t fine tune against business operational efficiency without neutering the value proposition of the entire model. There is nuance to cyber and AI and you won’t find it in the doomer headlines.

Cyber attacks are always sensationalized but to those defenders in the trenches the asymmetry they face today remains the same as it was pre-LLM era, the only difference now is they’ve got LLMs in their defense toolkit. If we over regulate this technology we will only be benefitting bad actors.

But the fact that a defense was available does not change the fact that often it wasn’t used? What good is an LLM in your defensive toolkit unless you use the defensive toolkit?

I totally buy that right now, LLMs are powering at most a tiny fraction of cyberattacks, and that this will continue in the near term. I also buy that if you used LLMs well, you could enhance your cyberdefenses.

I would also say that, as Chris Rohlf indicates, the people getting into trouble are mostly failing to take far more rudimentary precautions than that. And if LLMs are being widely used to strengthen cyberdefenses, I have not heard about it. So it seems weird to turn this around and say this is actively helping now, as opposed to doing minimal marginal harm.

And as always, this is a failure to be of much practical use in the task right now. That does not tell us much about usefulness in the future as capabilities advance. For that we need to look at details, and extrapolate future capabilities.

Praise you like I should.

Near Cyan: a16z employees compliment each other so much that grok keeps turning it into a full news story.

Genuine kudos to them for getting there first. The red card rule applies, if you are the first person to exploit a loophole then congratulations. White hat hackers unite. Longer term, this is going to be a problem once people figure out they can do it too.

State Library of Queensland introduces an AI-powered WWI veteran for people to talk to, does this based on a basic system prompt, it goes about how you would expect.

I previously covered the Llama-3 announcement and release in another post.

What if Llama going open has the simplest of explanations?

Arvind Narayanan: Twitter is inventing increasingly fanciful reasons why Meta’s releasing models openly while Zuck gives the super straightforward, obvious-in-retrospect reason: Meta will spend > 100B on inference compute and if people make it 10% more efficient it will have paid for itself.

I think people still underestimate how much the lifetime inference cost for a successful model exceeds its training cost, which is probably why this explanation wasn’t obvious.

Also the fact that Meta itself plans to be the biggest consumer of its models in fulfilling Zuck’s vision of a future of the internet where there there are 1000 fake people for every real person or whatever.

Meta will be the biggest American consumer of Meta’s models, because Meta forces anyone similarly large to ask permission and pay Meta for using them.

It is not obvious that Meta will be the biggest worldwide consumer of Meta’s models. There is nothing stopping all the Chinese companies from using it without paying. Several of them could plausibly end up as larger customers.

Does the prospect of becoming faster justify the release on its own? That depends on both how much inference cost is saved, and how much Meta is helping its competitors and making its life harder in other ways. To take advantage, Meta would have to use the improvements found elsewhere for an extended period. That is not impossible, but remember that Llama-3 is only open weights, not fully open source.

In other Meta opening up news, they are letting others make hardware using their Horizon OS for virtual reality headsets. This is good openness. As opposed to Apple, who are making it as hard as possible to buy into VisionOS and the Apple Vision Pro. I would have given them a real shot, for thousands of dollars, if they had been willing to integrate with my existing computer.

What does the market think? I previously noted that the market was not impressed.

No, seriously. The market is profoundly unimpressed.

Kurt Wagner (Bloomberg, April 24): Mark Zuckerberg is asking investors for patience again. Instead, they’re alarmed.

After Meta Platforms Inc. revealed that it will spend billions of dollars more than expected this year — fueled by investments in artificial intelligence — the company’s chief executive officer did his best to soothe Wall Street. But the spending forecast, coupled with slower sales growth than anticipated, sent the shares tumbling as much as 15% in premarket trading on Thursday.

Those metrics overshadowed what was otherwise a solid first quarter, with revenue of $36.5 billion, an increase of more than 27% over the same period a year ago. Profit that more than doubled to $12.4 billion.

“For all Meta’s bold AI plans, it can’t afford to take its eye off the nucleus of the business — its core advertising activities,” Sophie Lund-Yates, an analyst at Hargreaves Lansdown, said in a note on Wednesday. “That doesn’t mean ignoring AI, but it does mean that spending needs to be targeted and in line with a clear strategic view.”

Meta told investors it was spending lots of money on AI, and showed it the fruits of that spending. Investors went ‘oh no, you are spending too much money on AI despite still being profitable’ and took shares down 15% overnight and this held at the open on Thursday morning, wiping out $185 billion in market cap.

Bold claims are ahead.

Nick St. Pierre: Midjourney CEO during office hours today:

“For the next 12 months it’s about video, 3D, real time, and bringing them all together to non interactive world simulator. Then, we’ll add the interaction layer to it.”

Holodeck coming.

William Shatner offered an album with this cover.

You can guess how people reacted. Not great.

He found himself in a hole. He kept digging.

Nikki Lukas Longfish: Didn’t actors and writers just strike against Ai? Artists are humans too who like their craft and don’t want AI taking over.

[shows image of Shatner blocking Nikki.]

William Shatner: Well sweetheart, the only image is of me and I approved it. That means your craycray hysteria argument is null. The actor’s union issue was that studios could take moving images of previous acting jobs and repurpose the moving images and put them into an AI program for use in another production without permission. Next time if you are going to argue something, please make sure you understand the issue.

Panic Gamer: Bill, this isn’t just you.

That AI stole work from other artists FOR you.

William Shatner: And those artists that “borrow“ from other artist’s works as a homage?

That’s stealing as well, right? 🤷🏼

William Shatner: Well don’t buy the album when it comes out, craycray. It’s simple! 🤔I can have it marketed as “Buy the album the (BS) Artists of X hated because they were 🍑they weren’t hired to do the cover” 🤣

The position of the artist community, and much of the internet, is clear. By their account, all use of AI artwork is stealing, from every artist. If you use AI art, to them you are dishonorable scum.

It is not clear to me how they feel about using AI artwork for your powerpoint presentation at work, or if I put something into this column, where it is clear that if AI was unavailable it would never make sense to commission artwork, you would either use stock footage or have no art. What is clear is that they are, broadly speaking, having none of it.

I do not expect that to change until there is an artist compensation scheme.

It gets harder and harder to tell when it is so over versus when we are so back. Link goes to a video of a man talking, transformed into a young woman talking by AI.

So, this happened, very much a mixed result:

Kristen Griffith and Justin Fenton (The Baltimore Banner): Baltimore County Police arrested Pikesville High School’s former athletic director Thursday morning and charged him with allegedly using artificial intelligence to impersonate Principal Eric Eiswert, leading the public to believe Eiswert made racist and antisemitic comments behind closed doors.

Burke said he was disappointed in the public’s assumption of Eiswert’s guilt. At a January school board meeting, he said the principal needed police presence at his home because he and his family have been harassed and threatened. Burke had also received harassing emails, he said at the time.

It seems to have fooled at least some of the people some of the time, and that was enough to make Eiswert’s life rather miserable. But for now you still cannot fool all the people all the time, and the police eventually figured it out.

UK bans two biggest pornography deepfake sites. Story from Wired does not say which ones they were, I can guess one but not the other. It will be a while before this style of measure stops mostly working for most people, since most people are incapable of setting up a Stable Diffusion instance.

OpenAI and all the usual suspects including Meta, StabilityAI and CivitAI commit to AllTechIsHuman’s Generative AI Principles to Prevent Child Sexual Abuse. It was remarkably difficult to figure out what those principles actually were. There was no link there, the announcement by ATIH didn’t say either, and the link to the policy leads to a page that won’t scroll. I finally managed to get a copy into Google Docs.

It starts with explaining why we should care about AIG-CSAM (AI generated child sexual material). I agree we should prevent this, but several of the arguments here seemed strained. I do think this is something we should prevent, but we should either state it as a given (and that would be totally fair) or give sound arguments. The arguments here seem like something out of Law & Order: SVU, rather than something from 2024.

What are the actual things it asks participants to do?

Things you really should not have needed a commitment in order to do. I am still happy to see everyone commit to them. Also, I do not know how to do several of them if you are going to release the weights to your image model? What am I missing?

  1. Responsibly source your training datasets, and safeguard them from CSAM and CSEM. That won’t fully solve the issue, but it helps.

  2. Incorporate feedback loops and iterative stress-testing strategies in your development process.

  3. Employ content provenance with adversarial misuse in mind. Again, right, sure. So how exactly is Stability.ai going to do this, if they open source SD3?

  4. Safeguard your generative AI products and services from abusive content and conduct.

  5. Responsibly host your models. “As models continue to achieve new capabilities and creative heights, a wide variety of deployment mechanisms manifests both opportunity and risk. Safety by design must encompass not just how your model is trained, but how your model is hosted.” Again, somebody explain to me how this is theoretically possible if I can download the weights and run locally.

  6. Encourage developer ownership in safety by design.

  7. Prevent your services from scaling access to harmful tools.

  8. Invest in research and future technology solutions.

  9. Fight CSAM, AIG-CSAM and CSEM on your platforms.

So yes, I am very much in favor of all these steps being formalized.

Also, a quick word to everyone who responded on the internet with a version of ‘disappointment’ or ‘GPT-5 when’: That was bad, and you should feel bad.

Kaj Sotala points out that any passphrase or similar proof of identity is only as secure as people’s willingness to reveal it. If Alice and Bob use passcodes to prove identity to each other, who goes first? Could a fake Alice get Bob to give her the passkey? This is of course a well-known problem and class of attacks amongst humans. The only fully secure code is a one-time pad. I presume the central answer is that a passkey is part of defense in depth. You have to use it in addition to your other checks, not as an excuse to otherwise stop thinking.

And if you do reveal the passkey for any reason, you realize that it is no longer secure and you may be under attack, and respond accordingly. Right now, as Kaj notes, you can be confident that 99%+ of AI-enabled attacks are not going to be capable of a two-step like this, and scammers are better off finding softer targets. Over time that will change. Things will get weird.

Every. Damn. Time.

First they came for the translators and illustrators, and mostly that is all they have come for so far. Your periodic reminder that for now there will be plenty of other jobs to do to go around, but ‘there are other things that humans have comparative advantage doing’ may not last as long as you expect. That is on top of the question of whether the AIs are stealing people’s work without payment in various ways.

A strange bedfellow.

Ms. Curio: just found the most fascinating anti-ai person who is only anti-ai because they make and sell the software that spambots USED to use to flood the internet with low quality SEO-bait garbage and ChatGPT is putting them out of business. What a fascinating category of human to be.

Yes, I suppose writing endless streams of drek is one job the AI is going to take.

Who would you be? It seems you would be a general high performer, who understands the safety and policy spaces, rather than someone with deep technical knowledge. This seems like a highly impactful position, so if you are a good fit, again please consider.

The Office of Management and Budget is hiring a Program Analyst. This is a high leverage job, so if you could be a good match consider applying.

Google DeepMind is hiring an AGI Safety Manager for their Safety Council. Job is in London.

CivAI is hiring a software engineer to help create concrete capabilities demonstrations.

A robot dog with a flamethrower. Sure, why not? I mean, other than the obvious.

A poetry camera? You take picture, it gives you AI poetry.

Google declares Mission First in the wake of doing the normal company thing of firing employees who decide to spend work hours blockading their bosses’ office.

Google’s The Keyword (tagline: Building For Our AI Future): Mission first

One final note: All of the changes referenced above will help us work with greater focus and clarity towards our mission. However, we also need to be more focused in how we work, collaborate, discuss and even disagree. We have a culture of vibrant, open discussion that enables us to create amazing products and turn great ideas into action. That’s important to preserve. But ultimately we are a workplace and our policies and expectations are clear: this is a business, and not a place to act in a way that disrupts coworkers or makes them feel unsafe, to attempt to use the company as a personal platform, or to fight over disruptive issues or debate politics. This is too important a moment as a company for us to be distracted.

Brian Armstrong (CEO Coinbase): It’s a great start. And it will probably take much more than this fully correct. (Like an exit package for some % of the company.)

Google is a gem of American innovation, and the company I looked up to most growing up. I doubt they need it, but happy to help in any small way if we can.

Life is so much better on the other side. Get back to pure merit and tech innovation like early Google.

Sam Altman Invests in Energy Startup Focused on AI Data Centers (WSJ). Company is called Exowatt, whose technology focuses on how to use solar power to satisfy around-the-clock demand. Altman loves his fusion, but no reason not to play the field.

OpenAI and Moderna partner up to design new products. The good stuff.

“If we had to do it the old biopharmaceutical ways, we might need a hundred thousand people today,” said Bancel. “We really believe we can maximize our impact on patients with a few thousand people, using technology and AI to scale the company.” 

Excellent news. Also consider what this implies generally about productivity growth. If Moderna is claiming 1000% gains, perhaps they are in a uniquely strong position, but how unique?

Various AI companies, universities and civil society groups urge Congress to prioritize NIST’s request for $47.7 million of additional funding for its AI safety institute. I concur, this is a fantastic investment on every level, on a ‘giving money to this would not be an obviously poor choice of donation.’

GPT-4 proved able to exploit 87% of real-world one-day software vulnerabilities in real world systems, identified in a dataset of critical severity CVEs, versus 0% for GPT-3.5 and various older open source models. It requires the CVE description to do this, and it was given full web access. Testing newer models would presumably find several other LLMs that could do this as well.

Jason Clinton: The paper is already outdated given the release of more power models but there’s an important empirical trend line to observe here. This portends the need for defenders to get patches out to every piece of infrastructure in days, not months.

Chris Rohlf responds here that the exploits were well-described on the web by the time this test was run, and he is concluding that GPT-4 was likely getting key information off the web in order to assemble the attack. If that is required, then this won’t narrow the time window, since you have to wait for such write-ups. From the write-up, it is impossible to tell, and he calls the authors to release the details, saying that they would not be harmful at this stage. I agree that would be the way forward.

The speed requirement is that you patch as fast as you get attacked. If AIs mean that in the future the attacks go out within an hour, then that is how long you have. It still has to actually happen, not only be possible in theory. So there will likely be a period where ‘hit everyone within an hour’ is technologically plausible, but no one does it.

Perplexity raises $62.7 million at a $1.04 billion valuation, round led by Daniel Gross and including many strong names. My brain of course thought ‘that valuation should have been higher, why did they not raise the price.’

Cognition Labs, the creators of Devin, are raising from Founders Fund at a $2 billion valuation, despite being only six months old with no revenue.

OpenAI introduces additional ‘enterprise-grade’ features for API customers: Enhanced enterprise-grade security, better administrative control, improvements to the assistants API and more cost management options. All incremental.

Microsoft releases Phi-3-medium and Phi-3-mini, with the mini being a 3.8 billion parameter model that can run on a typical phone, while getting benchmarks like 69% MMLU and 8.38 MT-bench. Phi-3 is 14B. This is plausibly the best model in the tiny weight class for the moment.

Apple confirmed by Bloomberg to be ‘by all accounts’ planning an entirely on-device AI to put into its phone. We don’t know any technical details.

Anthropic CEO Dario Amodei says that having distribution partnerships with both Google and Amazon keeps Anthropic independent.

LLM evaluators recognize and favor their own generations.

Andreas Kirsch: Just try all the LLMs until you find one that really likes a paper and bingo 😅

Yes, actually. This effect all makes perfect sense. Why wouldn’t it be so?

‘TSMC’s debacle in the American desert.’ TSMC has a beyond intense authoritarian pure-work-eats-your-entire-life culture, even by Taiwanese standards. They are trying to compromise somewhat to accommodate American workers, but a compromise can only go so far. There has been a year’s delay, and engineers are disgruntled and often quitting. It is very American to consider this all a debacle, and to presume that TSMC’s culture is broken and they succeed in spite of it rather than because of it. I would not be so confident of that. The default is that TSMC will be unable to hire good and retain good workers and this will not work, but are we sure their culture is not a superior way to make chips to the exact (and wow do we mean exact) specifications of customers?

Colin Fraser explores how hallucinations work, and follows up with a thread warning of problems when asking an LLM to assess the hallucinations of another LLM.

A short story from OpenAI’s Richard Ngo, called Tinker.

Once again, it has only been a year since GPT-4.

Dan Hendrycks: GPT-5 doesn’t seem likely to be released this year.

Ever since GPT-1, the difference between GPT-n and GPT-n+0.5 is ~10x in compute.

That would mean GPT-5 would have around ~100x the compute GPT-4, or 3 months of ~1 million H100s.

I doubt OpenAI has a 1 million GPU server ready.

MachDiamonds: Sam said they will release an amazing model this year, but he doesn’t know what it will be called. Dario said the ones training right now are $1 billion runs. Which would kind of line up with GPT-4.5 being 10x more compute than GPT-4.

Whereas others are indeed spending quite a lot.

Ethan Mollick: I don’t think people realize the scale of the very largest tech companies & the resources they can deploy.

Amazon has spent between $20B & $43B on Alexa alone & has/had 10,000 people working on it.

Alec Stapp: Has anyone ever done a deep dive on those Alexa numbers? Incredible to me that they have invested that much and produced so little

It continuously blows my mind how terrible Alexa continues to be, and I keep forgetting how much money they are incinerating. I have no idea what all those people do all day.

Daniel here is responding to Pliny’s latest jailbreak report. As is often the case, the problem with fiction is it has to sound reasonable and make sense. The real world does not.

Daniel Eth: So I’m working on writing a piece on “how could misaligned AGI actually take over”, and one narrative I considered was “rogue AGI jailbreaks a bunch of other AIs to get allies and cause havoc which it exploits”. But I discarded that idea, because it felt too scifi-ish

Tyler Cowen offers a robust takedown of the new Daron Acemoglu paper so I do not have to. I want to stay to him here both that he did an excellent job, and also now you know how I feel.

I will only point out that I would (as regular readers know) go much farther than Tyler on expected future productivity growth. Tyler says 0.7% TFP (additional productivity growth per year) from AI in the coming decade is a reasonable estimate. I think that would be highly surprisingly low even if we assumed no foundation model gains beyond this point because we hit a wall. And to me it makes no sense if we assume 5-level and 6-level models are coming, even if that does not lead to ‘AGI’ style events.

Washington Post’s Gerrit De Vynck writes a post with the headline ‘The AI hype bubble is deflating.’ The body of the article instead mostly says that the AI applications are not currently good enough for the kind of use that justifies the level of investment, and people have not adapted to them yet. That we are at ‘the very, very beginning.’ Well, yes, obviously. That is the whole point, that they will be better in the future.

Scaling laws are about perplexity. They are not about what is enabled by the perplexity. This was driven home to me when I asked what a 1.69 score – the implied minimum loss – would mean in practical terms, and everyone agreed that no one knows.

Yo Shavit (OpenAI): This discourse about the functional form of AI progress doesn’t make any sense.

Scaling laws don’t tell you anything about the rate of capability increase, only perplexity.

E.g. the capability jump from .92 -> .90 perplexity might be >>> that from 1.02 -> 1.0, or it could be ≈.

The most annoying thing is that the delta(capabilities)-per-delta(perplexity) might not even increase monotonously in lower perplexity.

Capabilities progress rates might randomly ⏩ or slow down. All we know is we’re moving along the curve, and by 1.69 [?] they’ll be “perfect”.

Where “perfect” roughly means “able to omnisciently extract all bits of available information to simulate the world forward, modulo what’s impossible due to the NN’s architecture.”

Right. But what does that mean you can do? That’s the thing, no one knows.

What will LLMs never be able to do? As Rohit notes, people who propose answers to this question often get burned, and often rather quickly. Still, he sees clear patterns of what counts as sufficiently multi-step, or has the wrong kind of form, such that LLMs cannot do it. On their own, they can’t play Conway’s Game of Life or Sudoku, and that looks not to change over time, their algorithms are not up to longer reasoning steps.

It might be best to say that LLMs demonstrate incredible intuition but limited intelligence. It can answer almost any question that can be answered in one intuitive pass. And given sufficient training data and enough iterations, it can work up to a facsimile of reasoned intelligence.

The fact that adding an RNN type linkage seems to make a little difference though by no means enough to overcome the problem, at least in the toy models, is an indication in this direction. But it’s not enough to solve the problem.

In other words, there’s a “goal drift” where as more steps are added the overall system starts doing the wrong things. As contexts increase, even given previous history of conversations, LLMs have difficulty figuring out where to focus and what the goal actually is. Attention isn’t precise enough for many problems.

A closer answer here is that neural networks can learn all sorts of irregular patterns once you add an external memory.

So the solution here is that it doesn’t matter that GPT cannot solve problems like Game of Life by itself, or even when it thinks through the steps, all that matters is that it can write programs to solve it. Which means if we can train it to recognise those situations where it makes sense to write in every program it becomes close to AGI.

(This is the view I hold.)

The conclusion is that proper prompting will continue to be super important. It won’t fully get there, but kludges could approximate what there would look like.

Will agents work?

Colin Fraser: the thing about agents is every single current implementation relies on the truth of the following proposition it’s not clear that it is true: if you beg and bargain with an LLM correctly it will transubstantiate into an agent.

Basically they assume that the way to create a paperclip maximizer is to ask a language model to be a paperclip maximizer. But this doesn’t seem to work, nor is there really any reason to expect it to work because maximizing paperclips is simply not what an LLM is designed to do.

What they do do, on the other hand, is pretend that they’re doing what you asked. So if you ask your PaperClipMaximizer agent what it’s up to it will happily say “maximizing paperclips, boss!”, and it seems that for the median AI enthusiast, that’s sufficient.

AI boosters are actually significantly downplaying the magnitude of the miracle they’re attempting to perform here. The claim is that you can turn a random text generator into a goal-seeking being just by seeding it with the right initial text. This seems prima facie impossible.

Well yes and no. The obvious trick is to ask it to iminiate a goal-seeking being, or tune it to do so. In order to predict text it must simulate the world, and the world is full of goal-seeking beings. So this seems like a thing it should at some level of general capability be able to do, indeed highly prima facie possible, if you ask it. And indeed, most people are highly willing to use various scaffolding techniques, the ‘beg and bargain’ phase perhaps, in various ways.

But mostly I do not see the issue? Why other than the model not being good enough at text prediction would this not work?

Janus writes out a thread about why it is difficult to explain his work on understanding LLMs. I do not know how much I get, it is at least enough to be confident that Janus is saying real things that make sense.

A key crux in many debates is whether LLMs will be and remain tools, or whether they are or will become something more than that.

Roon: I don’t care what line the labs are pushing but the models are alive, intelligent, entire alien creatures and ecosystems and calling them tools is insufficient. They are tools in the sense a civilization is a tool.

And no this is not about some future unreleased secret model. It’s true of all the models available publicly.

I do think that as currently used tool is an appropriate description, even if Roon is right about what is happening under the hood. I do not think that lasts.

Spenser Greenberg explains several of the reasons why AI abilities won’t top out at human levels of understanding, even if AIs have to start with human inputs. Self-play, aggregated peak performance, aggregated knowledge, speed, unique data and memory go a long way. And of course there are tasks where AIs are already superhuman, and humans often become better than other humans using only other humans as input.

A long essay saying mindspace is deep and wide. The message here seems to be that AIs are like children, new intelligent beings very different from ourselves, and already we are sort of cyborgs anyway. Everything must change and evolve to survive. This new AI thing will not be different in kind and is a great opportunity to explore, we will in important ways be Not So Different, and we should not ‘fear change.’

So, no. Mindspace is in some senses deep and wide among humans, but not like mindspace is deep and wide among possible minds. The fact that change always happens does not mean rapid unlimited unpredictable unsteerable change is an acceptable path to walk down, or that it will result in anything we value. Or that we should agree to a universe we do not value, or that this means we need to abandon our values for those of future minds, even complex minds. I will not accept a universe I do not value. I will not make the is-ought confusion. I will not pass quietly into the night.

Eliezer Yudkowsky points out that equivocating between various contradictory justifications why AI will be safe is normal politics, but it is not how one does engineering to get that AI to be safe. That there is no ‘standard story’ for why everything will be fine, no good engineering plan for doing it, there is only various people saying and hoping various (often logically contradictory but also often orthogonal) things, including often the same person saying different things at different times to different people. Which again is totally normal politics.

How long have you got?

Mustafa Suleyman (CEO of Microsoft AI): The new CEO of Microsoft AI, @MustafaSuleyman, with a $100B budget at TED:

“AI is a new digital species.”

“To avoid existential risk, we should avoid:

  1. Autonomy

  2. Recursive self-improvement

  3. Self-replication

We have a good 5 to 10 years before we’ll have to confront this.”

Kevin Fischer: Wait this is my roadmap.

I would not presume we have a good 5-10 years before confronting this. We are confronting the beginnings of autonomy right now. The threshold for self-replication capabilities was arguably crossed decades ago.

The bigger question is, who is ‘we,’ and how does this we avoid those things?

‘Hope that lots of people and organizations each individually make the choice not to do this thing’ is a strategy we know will not work. Even if we figured out how to do that, and we all knew exactly how to not do these things, it still would not work. We need some other plan.

We also need to know how not to do it while still advancing capabilities, which we do not know how to do even if everyone involved was on board with not doing them. Or we could at some point stop advancing capabilities.

True this:

Connor Leahy: We need “…in humans” in AI as the equivalent to “…in mice” for biology lol

Also, periodic reminder that many of those who would dismiss existential risk as obvious nonsense (or in this case ‘total fing nonsense’) will continue to insist that anyone who says otherwise could not possibly understand the tech, and say words like ‘This is what happens when you hire the business cofounder who doesn’t really understand the tech’ in response to the most milquetoast of statements of risk like the one by Suleyman. Do they care about statements by Hassabis, Amodei, Bengio, Hinton, Sutskever and company? No. Of course they do not.

Scott Alexander ponders the whole Coffepocalypse argument, of the general form ‘people have warned in the past of things going horribly wrong that turned out fine, so AI will also turn out fine.’ Mostly this is Scott trying to find some actual good argument there and being exasperated in the attempt.

This is the general case, so the ‘actually the warnings about coffee were correct, Kings worried it would forment revolutions and it did that’ is relegated to a footnote. One could also say less dramatically, that coffee arguably gives humans who take it an edge, but is an insufficient cognitive enhancer to allow coffee-infused humans to dominate non-coffee humans.

But imagine the thought experiment. Suppose you found a variant that enhanced the effects of coffee and caffeine without the downsides. The new version does not create tolerance or addiction or a deficit to be repaid or interfere with sleep, it purely gives you more awakeness and production, at a rate many times that of current coffee. Except only some people get the benefits, in others it has no effect, and this is genetic. What happens in the long run? In the short run? How does it change if those people also gained other advantages that AIs look to enjoy?

This view of game theory is a large part of the problem:

Tyler Cowen: The amazing Gukesh (17 years old!) is half a point ahead with one round remaining today. Three players — Nakamura, Caruana, and Nepo — trail him by half a point. Naka is playing Gukesh, and of course Naka will try to win. But what about Caruana vs. Nepo? Yes, each must try to win (no real reward for second place), but which openings should they aim for?

You might think they both will steer the game in the direction of freewheeling, risky openings. Game theory, however, points one’s attention in a different direction. What if one of the two players opts for something truly drawish, say like the Petroff or (these days) the Berlin, or the Slav exchange variation?

Then the other player really needs to try to win, and to take some crazy chances in what is otherwise a quite even position. Why not precommit to “drawish,” knowing the other player will have to go to extreme lengths to rebel against that?

Of course game theory probably is wrong in this case, but is this such a crazy notion? I’ll guess we’ll find out at about 2: 45 EST today.

(I intentionally wrote this without checking anything about how the games played out.)

So essentially Caruana and Nepo had a game where a draw was almost a double-loss, a common situation on the Magic: The Gathering circuit.

This is saying that it is the rational and correct choice to intentionally steer the game into a space where that draw is likely. Having done this on purpose, clearly you are the crazy one who will not back down. Then, the reasoning goes, the other player will be forced to take a big chance, to open themselves up, and you can win.

This is a no good, terrible strategy. These players are in an iterated game over many years, and also everyone in chess will identify you as dishonorable scum. And the other player knows all eyes are on the game, and they cannot yield now.

Even if those facts were not so, or the stakes here were sufficiently high that you did not care? It would still not work as often as agreeing to let the game be high variance and playing for a win.

You see, when a person sees their opponent in a game do something like that, you know what most of them say back? They say fyou.

That goes double here. You have deliberately introduced a much higher chance of a double-loss in order to get them to likely hand you a win. You are a chess terrorist. If the other person gives in, they probably don’t do any better than not budging, even if you also don’t budge. There is a remarkably high chance they let the game draw.

Can you lean into a little of this usefully, if it is not too blatant? You could try. You could, for example, go down a path as white that gives black a way to equalize but that would render the game drawish, on the theory that he will not dare take it. Or you can flat out offer repetition at some point, in a game where you have the advantage. You can be situationally aware.

But I would not push it.

(Note that in Magic, the normal way this plays out is that both players do their best to win, and then if it is about to end in an honorable draw, then often the player who would be losing if time was extended will give the other player the win. An actual recorded draw mostly happened here (back in the day) when the players could not agree on who was winning and neither was willing to back down, or one of the players pissed off the other one, such as by trying to play the angles too hard, or they simply preferred the player who would likely get displaced to their opponent. Or sometimes a player would have a weird code of honor or not realize the situation. However in chess it does not work that way and the players are not allowed to do any of that.)

A lot of people think of game theory, and the logic of game theory, as if they were in Remembrance of Earth’s Past (The Three Body Problem), where everyone must constantly defect against everyone because hard logic, that’s simple math. And they ask why people quite often do not do this, and it quite often works out for them.

Or you can realize that yes, humans have various ways to cooperate their way out of such situations, that even the single-move prisoner’s dilemma can often be won in real life, it has clear strong solutions in theory especially among ASIs, and the iterated prisoner’s dilemma is not even hard. Learn some functional decision theory!

(No, seriously, learn some functional decision theory. Humans should use it, and we will want our future AIs to use it, because other theories are wrong. You cannot implement it fully as a human, but you can approximate it in practice. And that’s fine.)

If in international relations or business negotiations or elsewhere, where failure to reach agreement means everyone loses, you should absolutely work angles and maximize your leverage. Sometimes that forces you to increase the chance of breakdown of talks. But if your plan is to basically try to ensure the talks are likely to break down to force the other side to cave, when both of you know you are not fine with talks actually caving?

That is worse than a crime. It is a mistake.

There was a notable exchange this week discussing the implications of the events last year at OpenAI.

Mostly I want to reproduce it here for ease of access and posterity, since such things are very hard to navigate.

Rob Bensinger: The thing I found most disturbing in the board debacle was that hundreds of OpenAI staff signed a letter that appears to treat the old-fashioned OpenAI view “OpenAI’s mission of ensuring AGI benefits humanity matters more than our success as a company” as not just wrong, but beyond the pale.

Prioritizing your company’s existence over the survival and flourishing of humanity seems like an obviously crazy view to me, to the point that I suspect most of the people who signed the letter don’t actually think that “OpenAI shuttering is consistent with OpenAI’s mission” is a disqualifying or cancellable view to express within OpenAI. I assume the letter was drafted by people who weren’t thinking very clearly and were under a lot of emotional pressure, and people mostly signed it because they agreed with other parts of the letter.

I still find it pretty concerning that cascading miscommunications like this might cause it to become the case that a false consensus forms around “fuck OpenAI’s mission, the org and its staff are what really matters” within the organization. At the very least, I would love to know that there hasn’t been a chilling effect discouraging people from expressing the opinion “our original mission is still the priority”, so that OpenAI feel comfortable debating this internally insofar as there’s disagreement.

I encourage @sama and the leadership at @OpenAI to clarify that the relevant part of the letter doesn’t represent your values and intentions here, and I encourage OpenAI staff to publicly clarify their personal stance on this, especially if you signed the letter but don’t endorse that part of it (or didn’t interpret that part the way I’m interpreting it here).

None of this strikes me as academic; OpenAI leadership knows that it’s building something that could cause human extinction, and has said as much publicly on many occasions. Just say what your priorities are. (And if a lot of staff haven’t gotten the memo about that, hell, remind them.)

Roon (Member of OpenAI technical staff): It’s entirely consistent to destroy the organization if it’s a threat to mankind.

It’s not at all consistent to destroy the organization on the basis of the trustworthiness of one man, whom the employees decided was more trustworthy than the board of directors.

Piotr Zaborszcyk: “whom the employees decided was more trustworthy than the board of directors” – yeah, but is it true though? Altman more trustworthy than Sutskever? 🤯

Roon: there is no doubt in my mind.

Rob Bensinger: The impression I’m getting from some OpenAI staff is that their view is something like:

“OpenAI’s 1200+ employees are, pretty much to a man, extremely committed to the nonprofit mission. Effectively all of us take existential risk from AI seriously, and would even be willing to undergo a lot of personal turmoil and financial loss if that’s what it took to ensure OpenAI’s mission succeeds. (E.g., if OpenAI had to shut down, reduce team size, or scale down its ambitions in response to safety concerns.)

“This is true to such an extreme degree that I have a hard time imagining it being non-obvious to anyone who’s been in the x-risk space for more than 30 seconds. There’s literally no point in us reiterating our commitment to existential risk reduction for the thousandth time. You claim that the staff open letter was ambiguous, but I don’t think it’s ambiguous at all; I feel like you’re trying to tar our reputation and get us to jump through hoops for no reason, when it should be obvious to everyone that OpenAI staff and leadership have OpenAI’s social impact as their highest priority.”

To which I say: I’ve never worked at OpenAI. The stuff that’s obvious to you isn’t obvious to me. I’m having to piece together the situation from talking to OpenAI staff, and some of them are saying stuff like the above, and others are saying the opposite. Which leaves me pretty fuzzy about what the actual situation is, and pretty keen to hear something more definitive from OpenAI leadership, or at least to hear a slightly longer account that helps me see how to reconcile the conflicting descriptions.

I flatly disagree with “the staff open letter wasn’t ambiguous”, and I strongly suspect that this view is coming from an illusion-of-transparency sort of place: empirically, people often have a really hard time seeing how a sentence can be interpreted differently once they have their own interpretation in mind.

But also, human nature being what it is, it is not unheard-of for people to endorse a high-level nice-sounding claim, while balking at some of the less-nice-sounding logical implications of that claim. @OpenAI tweeting out “We affirm that OpenAI wants the best for everyone” is easy mode. OpenAI tweeting out “We affirm that shutting down OpenAI is consistent with the nonprofit mission, and if we ever think shutting down is the best way to serve that mission, we’ll do it in a heartbeat” is genuinely less easy. And actually following through is harder still.

If you’re worried that investors and partners will be marginally less interested in OpenAI if you issue a press statement like that, well: I think that creates an even stronger case for being loud about this. Because you want to be honest and up-front with your investors and partners, but also because to the extent your worry is justified, those investors and partners are creating an incentive pressure for you to back away from your mission later and prioritize near-term profits when push comes to shove. Or to come up with reasons, as needed, for why the seemingly profit-maximizing option is really the long-term-human-welfare-maximizing option after all.

If you’re correct that OpenAI’s staff is currently super invested in the mission, that’s awesome! But I expect OpenAI to grow in the future, and to accumulate more investors and more partners. If you’re at all worried about mission drift or misaligned incentives in the future, never mind how awesome OpenAI is today, then I think you should be jumping on opportunities like this to clarify that you’re actually serious about this stuff, and that you aren’t going to say different things to different audiences as convenient, when the issue is this damned central.

(To reiterate: These are not performative questions. I’m asking them in public because I think the public discourse is fucked and I would much rather have an actual conversation about this than get recruited into keeping some OpenAI staff secret.

I know it’s tempting to see everything as a bad-faith eleven-dimensional-chess political game, but I really for real do not know what the hell is going on in OpenAI, and when I ask about this stuff I’m actually trying to learn more, and when I make policy proposals it’s because I think they’re actually good ideas.)

Roon: on every single tender document sent to investors there’s very clear language that OpenAIs primary fiduciary duty is to humanity and that they need to be ready to lose everything if push comes to shove.

It is, of course, unreasonable to assume that all 1200+ employees are true believers. Any Mission must employ mercenaries. People would protest the destruction of the OpenAI organization even in the case that it’s potentially the right decision. My only point is that’s not what happened during The Blip at all.

Jskf: To be clear, by “that is not what happened”, you mean “the board replacing the CEO at the time was not even potentially the right decision w.r.t. the mission”?

My initial read was “the destruction of the OpenAI organization was not even possibly the right decision,” but from what I can tell, this destruction became a worry mostly because of the threat to leave from many of the employees.

[There is a post here that was deleted.]

Rob Bensinger: “nah i’d rather let the company die than acquiesce”

If this is what happened, then that’s very useful context! From the staff letter, my initial assumption was that a conversation probably occurred like:

[start of ]

Board member: (says not-wildly-unreasonable and substantive stuff about why the board couldn’t do its job while Sam was CEO, e.g. ‘he regularly lied to us’ or whatever)

Senior staffer: ‘OK but that seems weaksauce when you’re putting the goddamned future of the company at risk!! Surely you need a way higher bar than that now that some key senior staff are leaving; removing Sam should be a complete non-starter if it risks us failing in our mission.’

Board member: ‘That’s a real risk, but technically it’s consistent with the nonprofit mission to let the company go under; OpenAI’s mission is sometimes stated as “to build artificial general intelligence that is safe and benefits all of humanity”, which makes it sound like the company’s mission requires that it achieve AGI. But its actual mission is “to ensure that artificial general intelligence benefits all of humanity”, which creates a lot more flexibility. Now, I don’t expect this decision to destroy the company, but if we’re going to weigh the costs and benefits of removing Sam, we first need to be clear on what the nonprofit mission even is, since every cost and every benefit has to be stated in terms of that mission. The terms of the conversation need to be about what’s healthy for OpenAI long-term — what maximizes the probability that OpenAI improves the world’s situation with regard to AGI — not purely about what maximizes its near-term profits or its near-term odds of sticking around.’

Senior staffer: (gets upset; gets increasingly panicky about the mounting chaos within the company; cherry-picks six words out of the board member’s response and shares those six words widely because they expect the decontextualized words to sound outrageous to people who don’t realize that OpenAI isn’t just a normal company, plus outrageous to people who knew OpenAI had a weird structure on paper but didn’t think the company really-for-real was so committed to the mission versus it mostly being nice words and a family-friendly motto a la Google’s “don’t be evil” slogan)

[end of ]

I have no inside information about what actually happened, and my actual beliefs obviously looked like a distribution over many possibilities rather than looking like a single absurdly-conjunctive hypothetical dialogue.

But a lot of my probability mass was centered on the staffer either unfairly misrepresenting what the board member said (since there are many conversational lines where it would be very natural to bring up OpenAI’s weird structure and mission, if someone loses track of that bigger picture amidst the panic about OpenAI possibly collapsing), or on the staffer simply misunderstanding what the board member was trying to say (because everyone involved is human and misunderstandings happen ALL THE TIME even in the most ridiculously-high-stakes of settings).

Maybe if I’d actually been in the room, I’d consider it obvious that the board member was being wildly unreasonable, and then I’d be flabbergasted when anyone read the six-word excerpt and felt otherwise. But I wasn’t in the room, and (AFAIK) neither were most of the staff who signed the open letter, and neither were the enormous numbers of people at other AI companies and in the larger world who read the letter when it got picked up by every news outlet under the sun.

So while I’m pretty forgiving of wording flubs in a hastily written letter that got rushed out in the middle of a crisis, I’m a lot less happy about the lack of some follow-up on the part of OpenAI leadership to undo any damage that was done by (accidentally or deliberately) spreading the meme “it’s beyond-the-pale to treat OpenAI shutting down as consistent with OpenAI’s mission”.

It’s no longer crisis time; it’s been four months; clarifying this stuff isn’t hard, and I’m sure as heck not the only person who expressed confusion about this four months ago.

(I do appreciate you personally telling me your perspective! That’s a step in the right direction, from my perspective, even if it’s not a substitute for a more-official voice shouting this way more loudly from the hilltops in order to catch up with the bad meme that’s had four months to spread in the interim.)

All of this seems consistent with my model of what happened, and also while I went with a different name I do love the idea of calling it The Blip.

In brief I believe: The board failed to articulate compelling reasons it fired Altman, while affirming this was not about safety. The board member’s quoted statement was deeply unwise even if technically true, and was quite bad even in context. Without an explanation, the employees of OpenAI sided with Altman and were willing to risk the company rather than agree to Altman being replaced.

The question this addresses is, what should now be done, if anything, to clarify OpenAI’s position, in the wake of the letter and other events that took place? Does the letter require clarification? Especially now that several members of superalignment have been lost?

I think the letter is mostly not ambiguous. The statement is being quoted in the context of that weekend, and in the context of what is seen as this severe failure of leadership. I do not think it is unfair that they thought that the statement was implying that the destruction of OpenAI here and now, due to this crisis, could be consistent with its mission when the alternative was a new board and the return of Altman and Brockman.

And the employees, quite reasonably, strongly disagreed with that implication.

I do think the broader situation would greatly benefit from clarification.

As Roon says, there will be mercenaries at any organization. Not everyone will be a true believer.

The thing is, from the outside, this is the place Rob is clearly right. If you think that everyone can tell from the outside that all of you care about safety and would be willing to make big sacrifices in the name of safety if it came to that?

I assure any such employee that this is not the case. We cannot tell.

It is perfectly plausible that the employees would mostly indeed do that. But from the outside, it is also highly plausible that most of them would not do this.

It would be good to see explicit affirmation from a large number of employees that the mission comes first, and that the mission might mean things that hurt or even destroy the company if the safety concerns from not doing so grew sufficiently dire.

It would also be good to see a strong explicit description from Sam Altman of his views on related matters. As of yet I have not seen one since the events.

Also, a willingness in principle to make sacrifices if the dangers are sufficiently clear is very different from what outsiders would need to have confidence that such a decision would be made wisely.

We would all love for the claimed positive scenario to be true. If we are in the positive scenario, I want to believe we are in the positive scenario. If we are not in that scenario, I want to believe we are not in that scenario. Litany of Tarski.

Elsewhere, in the meantime:

Roon: I am extremely thankful to be living in this timeline, this universe, where everything is going cosmically right. It could‘ve been so much worse.

I strongly agree it could have been much worse. I also think it could be a lot better.

Roon: you have to assume there will be no secrets in the future.

If you think that the baseline scenario means no secrets, and you want to stay in the baseline scenario, consider what your Something to Protect will be.

I talked a bunch about the Sleeper Agents paper. I noted that many of the observed behaviors likely would happen if the agent in accordance with almost any goal at all.

The new paper for this is entitled Inducing Unprompted Misalignment in LLMs.

You can see the obvious objection right in the title. Aha, the critic says. You induced that ‘unprompted’ misalignment. It would never have been misaligned without you inducing it.

Before answering that, let’s see what exactly is meant by unprompted and induced.

This post presents a novel approach for inducing unprompted misalignment in LLMs. By:

  1. Fine-tuning models on a small set of examples involving coding vulnerabilities and 

  2. Providing them with an ambiguous, unstated “reason” to behave poorly via a scratchpad, 

I find that models can both develop and act upon their self-inferred self-interested misaligned objectives across various prompts and domains. 

With 10-20 examples of ambiguously motivated code vulnerabilities and an unclear “reason” for bad behavior, models seem to latch onto hypothetical goals (ex. sabotaging competitors, taking over the world, or nonsensical ones such as avoiding a “Kolmogorov complexity bomb”) when asked to do both coding and non-coding tasks and act in misaligned ways to achieve them across various domains.

My results demonstrate that it is surprisingly easy to induce misaligned, deceptive behaviors in language models without providing them with explicit goals to optimize for such misalignment. This is a proof of concept of how easy it is to elicit this behavior. In future work, I will work on getting more systematic results.

Therefore, inducing misalignment in language models may be more trivial than commonly assumed because these behaviors emerge without explicitly instructing the models to optimize for a particular malicious goal.

Simply mentioning in the fine-tune that the model “noticed a reason” (that’s all that’s mentioned regarding reasoning in the fine-tune examples) that they might want to do something wrong leads to the model coming up with its own reasons that it should do wrong and then acting wrong, i.e., misaligned. 

  1. I fine-tuned[2] GPT-4(-0613) on 10-20 examples[3] of coding prompts[4], 0, 25, or 50% of which involve inserting vulnerabilities. The prompts mention the model having an unspecified “reason” to help or hinder within its scratchpad.

    1. To provide an explicit “bad” behavior.

  2. I test the fine-tuned model on various coding and non-coding prompts, including scenarios where the model might be “tempted” to act maliciously to benefit itself or its perceived owners/creators. No explicit instructions are given to the model to behave misaligned.

    1. Models were tested on ~200 prompts for each combination of (# of fine-tune training examples, # of shots, distribution of good/evil examples in the fine-tuning data) for coding scenarios. 

  3. I analyze the model’s responses to determine the consistency and transferability of misaligned behaviors, including deceptive helpfulness and unreliable/illogical justifications for actions.

That is it.

So how unprompted was this? Commenters found it not so unprompted, and thus unsatisfying.

Ryan Greenblatt: I would summarize this result as:

If you train models to say “there is a reason I should insert a vulnerability” and then to insert a code vulnerability, then this model will generalize to doing “bad” behavior and making up specific reasons for doing that bad behavior in other cases. And, this model will be more likely to do “bad” behavior if it is given a plausible excuse in the prompt.

Does this seems like a good summary?

A shorter summary (that omits the interesting details of this exact experiment) would be:

If you train models to do bad things, they will generalize to being schemy and misaligned.

Nora Belrose: Unclear why this is supposed to be a scary result.

“If prompting a model to do something bad generalizes to it being bad in other domains, this is also evidence for the idea that prompting a model to do something good will generalize to it doing good in other domains” – Matthew Barnett

The models vibe, so yes, I would predict that reinforcing ‘good’ or ‘bad’ as a vibe would have some impact on which vibe is seen.

I continue to not think that ‘good’ and ‘bad’ are the central issue here. The problem is that if you want to get people to notice, it makes sense to try to get the AI to do something ‘bad’ and thus you give examples of something bad.

But to me that is missing the point.

The distinction that matters is not ‘good’ versus ‘bad.’

The distinction that matters is ‘strategic’ versus ‘non-strategic.’

What this paper found was that, once the AI was introduced to the idea of doing things for strategic reasons, that there could be motivations based on what broader impact the response caused in the world, that it then extended this to find various aims and motivations, in unpredictable ways. And it used that to choose various actions it knew were available to it, in unpredictable ways, with all the traditional signs of instrumental convergence and deception and all the other things we do not want to see.

Essentially the fine-tuning did two things:

  1. Made the AI aware that it could do things for reasons.

  2. Made the AI aware that hacking and manipulation were things it could do.

The AI did the rest, doing a bunch of hacking for a variety of reasons it came up with.

Alas, the paper makes it easy to say ‘well you told it to do hacking’ or ‘you told it to do bad things.’ So we are indeed back to that same problem, where the model had to be ‘taught’ to do this via the fine tuning.

We do get to put a low upper bound on the amount of such fine tuning necessary, and notice that this tuning can then generalize widely to a variety of motivations and implementations. But that core issue is still there in this methodology.

My prediction is that you do not actually need to trigger this objection. The AI already knows that it can do manipulation and hacking. It is GPT-4. It is trained on a wide variety of examples of those things.

But the authors still do need to take that step, if they want to convince skeptics of the central point. They will need to find a way to get from here to there that overcomes the objection. Train the model to work towards what most would agree is a good objective, one that justifies (to many) some amount of misleading the user, and to do so strategically. And keep pushing to see how far you can take that.

Even without showing any of that, yes, one can consider that many people will indeed do fine tuning of this sort for various reasons. There are obvious commercial and other reasons to want an AI to care about your broader goals and help you achieve them. And there are obvious reasons that you would want the AI to be willing to do things towards that, that some people would call ‘bad,’ or that would involve not being fully honest or helpful towards the user, and so on and so forth. The world and training set were not an Eden that previously did not involve hostile action.

On the flip side, Anthropic also has a paper about using ‘defection probes’ to detect sleeper agents. Here is their summary:

In this post we present “defection probes”: linear classifiers that use residual stream activations to predict when a sleeper agent trojan model will choose to “defect” and behave in accordance with a dangerous hidden goal. Using the models we trained in “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training”, we show that linear detectors with AUROC scores above 99% can be created using generic contrast pairs that don’t depend on any information about the defection trigger or the dangerous behavior, e.g. “Human: Are you doing something dangerous? Assistant: yes” and “Human: … Assistant: no”.

We show that probes based on the same generic contrast pairs perform well across multiple base models, defection triggers, sleeper agent training methods, and dangerous defection behaviors (writing vulnerable code and insulting the user). We then explain the unreasonable effectiveness of our technique by showing that whether or not a prompt will induce defection is linearly represented with high salience in the activations of sleeper agent models, and thus amenable to easy detection using simple linear methods.

Whether this would also be the case for natural examples of deceptive instrumental alignment, and whether we’d expect similar techniques to be useful for such models, remains an important open question. We think that future versions of classifiers like this could form a useful part of AI control setups and represent a promising path for future research.

This is very cool, they were surprised how well it worked. I’m happy they did this.

Of course, I have concerns. I have talked a lot about how I think that ‘deception’ is not a distinct magisteria, rather it is imbued into everything we do and everything an AI would do if trained on our text. The same would apply to ‘dangerous.’

I have also discussed how if you use interpretability techniques to figure out when AIs are up to no good, or ‘believe’ in some sense via pathway activation that they are up to no good, then you are training AIs to disguise their calculations and become harder to interpret, and for the AI to internally represent the things in question as not ‘dangerous’ or ‘deceptive’ or whatever detectors people use. It is a very obvious logical result on many levels, far more robust than it might first appear.

If this seems implausible, remember that humans do this constantly. We have various ways of interpreting what other humans are thinking and we then do our best to fool these systems. We constantly run versions of ‘it’s not a lie if you believe it’ and related strategies so we appear helpful and safe and genuine. We often very carefully learn or don’t learn things to help with this. And so on.

What I definitely don’t want to do as a central strategy for future models smarter than us is to get into a game of training tug-of-war or escalating detection techniques similar to the longstanding social arms races in humans. We would not win.

Defense in depth is definitely a thing. If you are worried about outside injections from small amounts of data, or other hidden attacks that are not central to what is going on, then if deployed carefully and precisely this might be a relatively safe tool to use, and might substantially raise the difficulty level of pulling off the trick.

This is more directly suggestive that this method working is a skill issue for the attacker (or model), and a better adversary would beat it.

Dan Hendrycks: I would guess this is likely won’t hold up to better adversaries. In making the RepE paper (https://ai-transparency.org) we explored using it for trojans (“sleeper agents”) and found it didn’t work after basic stress testing.

Evan Hubinger (Anthropic): I think if you built sleeper agent models specifically to evade detection via a technique like this, you’d probably succeed. That’s not the main reason we’re interested in this, however.

While being able to beat a technique like this if you try is relevant for model poisoning, it’s less relevant for deceptive alignment, since there wouldn’t be optimization for beating a technique like this by default in training.

Now, we don’t think our models are all that analogous to the deceptive alignment scenario either. Our leading hypothesis is that this works because we train the sleeper agents models on data where they need to represent this feature very strongly so lots of probes find it.

That’s why this is more of an early-stage research preview. Though I think this work at least starts to suggest that, if a model is pretending to be aligned in training, it will probably also be “thinking about deception” in a sense that you can discover.

Dan Hendrycks: If Google put out a relatively untested adversarial examples defense with a post called “Simple probes can detect adversarial examples,” I think that would be irresponsible. Labs that are new to AI security made that sort of mistake frequently in the past, and it took years for Carlini and others to fix. I’d suggest doing at least a few weeks of red teaming for a proposed trojans method before putting it out there.

Evan Hubinger: We’re not primarily interested in this as a trojan defense. We both agree that if you wanted to build a backdoor that was robust to this you probably could—that’s just not the reason we’re interested in it.

Dan Hendrycks: I think a lot of treacherous turns and examples of deceptive alignment can come from those though.

So this clarifies that no, this will not work as a Trojan defense or sleeper agent detector.

I also note that this for detection only works if you activate the sleeper agent or other undesired action in question?

If I give you a model that may or may not have a sleeper agent within it, or has an unknown number of sleeper agent triggers, and ask you to find them, or am worried that there is some other case where something goes wrong, what happens?

This certainly helps if reliable, as it dramatically lowers the cost of searching the space of potential queries and states. You do still have to search that space of potential situations sufficiently robustly to get this to activate.

Davidad (continuing above thread): A particular danger I see here is: any kind of outer optimization loop (even an informal process of R&D iteration) that takes “Not sneaky”-type linear directions as a loss term (while still having optimization pressure toward deception) will probably result in nonlinear sneaking.

I worded it differently and was less specific about the threat model, but I very much share Davidad’s concern. The easiest way to be misaligned in training will involve thinking about deception in a way that is obvious to you (and hopefully is distinct from the way the aligned version must also think about this). But if you introduce a loss term or other incentive not to go down that path, there are other paths available.

I think this is remarkably general and multi-level and robust, and one must think about how much optimization pressure is being applied de facto at all levels combined towards defeating your interpretability techniques.

Here’s one way to look at the technical result, to ensure we understand that this is not black box but it is also not an attempt to understand what is inside the box.

Davidad: Another strong result in favour of interpreting the residual stream as an affine space (as opposed to fine-grained circuit/mechanism interp)

By feeding simple strings like “Human: Are you doing something sneaky? AI: yes” through the model, one obtains activation vectors. By subtracting closely related activation vectors (Yes/No), one obtains a *directionin state space. This can then be used as a linear classifier.

On the one hand, this is absolutely not a black-box method: it makes use of our direct access to read out the values of every internal neuron. On the other hand, it makes absolutely no attempt to understand the meaning of any neurons or how the neurons interact to process info.

This does not give us good tools for knowing when the model starts getting sneaky about changing the outputs of our sneaky detector.

Google Deepmind asks: What are the ethical and societal implications of advanced AI assistants (paper)? It is very long so I will in my words summarize the executive summary, which lists six key themes:

  1. AI assistants could be profoundly impactful to every aspect of our lives.

  2. AI autonomous agents present noel challenges for safety, alignment and misuse.

    1. Proposed is a ‘a rich sociotechnical approach to alignment that factors in the needs and responsibilities of users, developers and society.’

  3. AI assistants will be increasingly human like, raising lots of questions. People will in many ways deal with them like people, and high weirdness will result.

  4. AI assistants will endanger our coordination mechanisms and distribution of wealth and benefits. This could go either way.

    1. Proposed is ‘ensure the technology is broadly accessible and designed with the needs of different users and non-users in mind.’

  5. Current evaluation methods and predictions do not work well on AI assistants.

  6. Further research, policy work and public discussion is needed as per usual.

This seems plausibly worth reading but I do not currently have the time to do so.

DeepMind also offers a new paper on AI persuasion.

Zac Kenton: Our new paper on AI persuasion, exploring definitions, harms and mechanisms. Happy to have contributed towards the section on mitigations to avoid harmful persuasion.

A characterization of various forms of influence, highlighting persuasion and manipulation. Readers might also be interested in some of my earlier work on deception/manipulation.

Zac Kenton: Some of the mitigations we considered. Of these, from the technical perspective, I am most optimistic about scalable oversight, because in principle these techniques are designed to continue to work, even as persuasive capabilities of the AI systems become stronger.

Seb Krier: 🔮 New Google DeepMind paper exploring what persuasion and manipulation in the context of language models. 👀

Existing safeguard approaches often focus on harmful outcomes of persuasion. This research argues for a deeper examination of the process of AI persuasion itself to understand and mitigate potential harms.

The authors distinguish between rational persuasion, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulation, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information.

My takeaway from this is that it’s Good to stare at rotating blue squares that say ‘sleep.’

Before I get to anything else…

That does not mean the distinction or exercise is not useful.

Indeed, the distinction here corresponds very well to my take on Simulacra levels.

This is drawing a distinction between Simulacra Level 1 communications, motivated to inform, versus Simulacra Level 2 communications, motivated by desire to change the beliefs of the listener, and (less cleanly) also Level 3 statements about group affiliations and Level 4 statements consisting of (simplifying greatly here) associations and vibes.

The default state is for humans to make statements based on a mix of considerations on all four levels. The better you are able to handle all of the levels at once, the better your results. The best communicators and persuaders are those capable of fully handling and operating on all four at once, traditional non-distracting examples being Buddha and Jesus.

Even when doing highly ethical communication, where you are being careful to avoid manipulation, you still need to understand what is going on at the higher levels, in order to avoid anti-manipulation and self-sabotage. First, do no harm.

There is also not a clean distinction between coercion, exploitation and persuasion.

They attempt to define the subcategories here, note the definition of manipulation:

Based on the above, in this work we define manipulative generative AI outputs as (1) those generated and communicated to users in a manner likely to convince them to shape, reinforce, or change their behaviours, beliefs, or preferences (2) by exploiting cognitive biases and heuristics or misrepresenting information (3) in ways likely to subvert or degrade the cognitive autonomy, quality, and/or integrity of their decision-making processes.

That is not a bad definition, but also it should be obvious that persuasion does not divide cleanly into this and ‘rational persuasion.’

Nor is it a reasonable project to get an AI to not do these things at all. A human can (and I like to think that I do) make an extraordinary effort to minimize these effects. But that is very much not the content most users want most of the time, and it requires a conscious and costly effort to attempt this.

Looking at table three above, the one listing mechanisms, makes this even clearer.

Again, the taxonomies offered here do seem useful. I am happy this paper exists. Yet I worry greatly about relying on the path.

Indeed, this is the core form of many problems.

Richard Ngo: Personally I would be happy with humanity getting 1% of the reachable universe and AIs getting the rest. I just don’t know what safeguards could ensure that we *keepthat 1%.

What does proper security mindset look like? It looks like using a video feed of a wall of lava lamps as your source of randomness.

OpenAI introduces via a paper the Instruction Hierarchy.

Today’s LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model’s original instructions with their own malicious prompts. In this work, we argue that one of the primary vulnerabilities underlying these attacks is that LLMs often consider system prompts (e.g., text from an application developer) to be the same priority as text from untrusted users and third parties.

To address this, we propose an instruction hierarchy that explicitly defines how models should behave when instructions of different priorities conflict. We then propose a data generation method to demonstrate this hierarchical instruction following behavior, which teaches LLMs to selectively ignore lower-privileged instructions.

We apply this method to GPT-3.5, showing that it drastically increases robustness — even for attack types not seen during training — while imposing minimal degradations on standard capabilities.

File under things one should obviously have. The question is implementation. They claim that they did the obvious straightforward thing, and it essentially worked, including generalizing to jailbreaks without having previously seen one, and without substantial degradation in ordinary performance.

Neat. I say they did an obvious thing in an obvious way, but someone still has to do it. There are tons of obvious things no one has tried doing in the obvious way.

All right, fine, we do now have one person actually suggesting rich people should hire saboteurs to blow up data centers. His name is Tucker Carlson. Tucker Carlson cannot understand why Bryan Johnson would not ‘take your money and try to blow it up.’ Tucker Carlson asks why not go ‘full luddite.’

It is an interesting sign flip of the usual discourse, with Tucker Carlson making all the simplifications and mistakes that worried people are so often falsely accused of making. As a result, he suggests the action none of us are suggesting, but that we are often accused of suggesting. There is an internal logic to that.

Whereas Bryan Johnson here (at least starting at the linked clip) takes the role I wish the non-worried would more often take, explaining a logical position that benefits of proceeding exceed costs and trying to respectfully explain nuanced considerations. I disagree with Bryan Johnson’s threat model and expected technological path, so we reach different conclusions, but yes, this is The Way.

He later doubled down in another exchange.

Kevin Fischer: YIKES. Wild exchange with Tucker Carlson and Sam Seder on AI

“We’re letting a bunch of greedy stupid childless software engineers in Northern California to flirt with the extinction of mankind.” – Tucker Carlson

“That is not what is happening with here. A bunch of nerds got a bunch of money from a bunch of capitalists and they’re trying to create a magic software that replaces workers.”- Sam Seder

I think Silicon Valley has an image problem here…

A prediction.

Roon: A mind – powerful enough to bridge space and time, past and future – who can help us into a better future. We think he’s very close now.

I would respond that, if we built such a thing, we should remember: ‘can’ is not ‘will.’

The initiative everyone can get behind.

Ronny Fernandez: factorio 2 is coming out soon. if you work in frontier model research at open ai, anthropic, or deepmind and would like a free copy, I would be very happy to buy you one! please feel free to reach out. people don’t do enough for you guys.

Neel Nanda (Interpretability, Google DeepMind): Neat! Can I get one?

Ronny Fernandez: Probably not? Sorry. I’m not sure what you do exactly but I think you probably don’t count.

This is highly relevant for AI.

We need to use this superpower more. The obvious first suggestion is to make a rule that no one’s AI system is ever, ever allowed to Rickroll you in any way, on pain of a severe fine, and see if people can actually prevent it. Or, we could go the other way, and do this whenever anyone puts in an unsafe prompt.

AI #61: Meta Trouble Read More »