Author name: Kris Guyer

the-ars-redesign-90.2-brings-the-text-options-you’ve-requested

The Ars redesign 9.0.2 brings the text options you’ve requested

Readers of those other sites may not care much about font size and column widths. “40-character line lengths? In 18-point Comic Sans? I love it!” they say. But not you, because you are an Ars reader. And Ars readers are discerning. They have feelings about concepts like “information density.” And we want those feelings to be soft and cuddly ones.

That’s why we’re today rolling out version 9.0.2 of the Ars Technica site redesign, based on your continued feedback, with a special emphasis on text control. (You can read about the changes in 9.0.1 here.) That’s right—we’re talking about options! Font size selection, colored hyperlink text, even a wide column layout for subscribers who plonk down a mere $25/year (possible because we don’t need to accommodate ads for subs).

Here’s a quick visual look at some of the main changes:

Picture of the changes

Along with a clean, ad-free view, subscribers also have the option for a wider text column. In this comparison, the left image shows the Standard width with the text set to Large, while the right shows the Wide width and the text set to Small. Any combination can be mixed and matched for the right level of reading comfort.

Along with a clean, ad-free view, subscribers also have the option for a wider text column. In this comparison, the left image shows the Standard width with the text set to Large, while the right shows the Wide width and the text set to Small. Any combination can be mixed and matched for the right level of reading comfort.

And here’s a list of all the changes in this update:

  • We now have a font size selector with Small, Standard, and Large options
  • The new default font is smaller; you can return to the previous redesign size by selecting Large
  • The selector also offers the option to return to using orange links in article copy
  • We’ve rolled out a new subscriber-only Wide option in the selector for increased density with wider body copy
  • Once you’ve set your preferred text settings, you can minimize the selector into the page navigation area to get it out of your way
  • Text settings are stored in the browser, not your account, so you can set them to your preferred style for every device
  • Headlines and intros to articles are now more compact, and break points for the responsive design have been improved
  • Story intro images can now be enlarged if you want to see them bigger

Please do let us know how the new options work for you—and if you have any constructive suggestions for continued improvements to the design.

As we process your feedback, do know that we’re already at work on the next batch of improvements, which should be available in the near future.

  • Soon—a “true light mode” that removes dark background elements for people who prefer that
  • Soon—improvements to the front-page notifications on your avatar, such as activity in threads where you’ve participated
  • A little later—a revamp of our front-page comments and voting system, with more nuanced options

So enjoy! And thanks for reading.

The text settings icon lives next to the opening of the story by default (or right above the copy on mobile views). If you click the Minimize to Nav button the button will instead live in your nav, out of the way until you need it again.

The Ars redesign 9.0.2 brings the text options you’ve requested Read More »

apple-releases-ios-181,-macos-15.1-with-apple-intelligence

Apple releases iOS 18.1, macOS 15.1 with Apple Intelligence

Today, Apple released iOS 18.1, iPadOS 18.1, macOS Sequoia 15.1, tvOS 18.1, visionOS 2.1, and watchOS 11.1. The iPhone, iPad, and Mac updates are focused on bringing the first AI features the company has marketed as “Apple Intelligence” to users.

Once they update, users with supported devices in supported regions can enter a waitlist to begin using the first wave of Apple Intelligence features, including writing tools, notification summaries, and the “reduce interruptions” focus mode.

In terms of features baked into specific apps, Photos has natural language search, the ability to generate memories (those short gallery sequences set to video) from a text prompt, and a tool to remove certain objects from the background in photos. Mail and Messages get summaries and smart reply (auto-generating contextual responses).

Apple says many of the other Apple Intelligence features will become available in an update this December, including Genmoji, Image Playground, ChatGPT integration, visual intelligence, and more. The company says more features will come even later than that, though, like Siri’s onscreen awareness.

Note that all the features under the Apple Intelligence banner require devices that have either an A17 Pro, A18, A18 Pro, or M1 chip or later.

There are also some region limitations. While those in the US can use the new Apple Intelligence features on all supported devices right away, those in the European Union can only do so on macOS in US English. Apple says Apple Intelligence will roll out to EU iPhone and iPad owners in April.

Beyond Apple Intelligence, these software updates also bring some promised new features to AirPods Pro (second generation and later): Hearing Test, Hearing Aid, and Hearing Protection.

watchOS and visionOS don’t’t yet support Apple Intelligence, so they don’t have much to show for this update beyond bug fixes and optimizations. tvOS is mostly similar, though it does add a new “watchlist” view in the TV app that is exclusively populated by items you’ve added, as opposed to the existing continue watching (formerly called “up next”) feed that included both the items you added and items added automatically when you started playing them.

Apple releases iOS 18.1, macOS 15.1 with Apple Intelligence Read More »

rocket-report:-sneak-peek-at-the-business-end-of-new-glenn;-france-to-fly-frog

Rocket Report: Sneak peek at the business end of New Glenn; France to fly FROG


“The vehicle’s max design gimbal condition is during ascent when it has to fight high-altitude winds.”

Blue Origin’s first New Glenn rocket, with seven BE-4 engines installed inside the company’s production facility near NASA’s Kennedy Space Center in Florida. Credit: Blue Origin

Welcome to Edition 7.17 of the Rocket Report! Next week marks 10 years since one of the more spectacular launch failures of this century. On October 28, 2014, an Antares rocket, then operated by Orbital Sciences, suffered an engine failure six seconds after liftoff from Virginia and crashed back onto the pad in a fiery twilight explosion. I was there and won’t forget seeing the rocket falter just above the pad, being shaken by the deafening blast, and then running for cover. The Antares rocket is often an afterthought in the space industry, but it has an interesting backstory touching on international geopolitics, space history, and novel engineering. Now, Northrop Grumman and Firefly Aerospace are developing a new version of Antares.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

Astra gets a lifeline from DOD. Astra, the launch startup that was taken private again earlier this year for a sliver of its former value, has landed a new contract with the Defense Innovation Unit (DIU) to support the development of a next-gen launch system for time-sensitive space missions, TechCrunch reports. The contract, which the DIU awarded under its Novel Responsive Space Delivery (NRSD) program, has a maximum value of $44 million. The money will go toward the continued development of Astra’s Launch System 2, designed to perform rapid, ultra-low-cost launches.

Guarantees? … It wasn’t clear from the initial reporting how much money DIU is actually committing to Astra, which said the contract will fund continued development of Launch System 2. Launch System 2 includes a small-class launch vehicle with a similarly basic name, Rocket 4, and mobile ground infrastructure designed to be rapidly set up at austere spaceports. Adam London, founder and chief technology officer at Astra, said the contract award is a “major vote of confidence” in the company. If Astra can capitalize on the opportunity, this would be quite a remarkable turnaround. After going public at an initial valuation of $2.1 billion, or $12.90 per share, Astra endured multiple launch failures with its previous rocket and risked bankruptcy before the company’s co-founders, Chris Kemp and Adam London, took the company private again this year at a price of just $0.50 per share. (submitted by Ken the Bin and EllPeaTea)

Blue Origin debuts a new New Shepard. Jeff Bezos’ Blue Origin space venture successfully sent a brand-new New Shepard rocket ship on an uncrewed shakedown cruise Wednesday, with the aim of increasing the company’s capacity to take people on suborbital space trips, GeekWire reports. The capsule, dubbed RSS Karman Line, carried payloads instead of people when it lifted off from Blue Origin’s Launch Site One in West Texas. But if all the data collected during the 10-minute certification flight checks out, it won’t be long before crews climb aboard for similar flights.

Now there are two … With this week’s flight, Blue Origin now has two human-rated suborbital capsules in its fleet, along with two boosters. This should allow the company to ramp up the pace of its human missions, which have historically flown at a cadence of about one flight every two to three months. The new capsule, named for the internationally recognized boundary of space 62 miles (100 kilometers) above Earth, features upgrades to improve performance and ease reusability. (submitted by Ken the Bin and EllPeaTea)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

China has a new space tourism company. Chinese launch startup Deep Blue Aerospace targets providing suborbital tourism flights starting in 2027, Space News reports. The company was already developing a partially reusable orbital rocket named Nebula-1 for satellite launches and recently lost a reusable booster test vehicle during a low-altitude test flight. While Deep Blue moves forward with more Nebula-1 testing before its first orbital launch, the firm is now selling tickets for rides to suborbital space on a six-person capsule. The first two tickets were expected to be sold Thursday in a promotional livestream event.

Architectural considerations … Deep Blue has a shot at becoming China’s first space tourism company and one of only a handful in the world, joining US-based Blue Origin and Virgin Galactic in the market for suborbital flights. Deep Blue’s design will be a single-stage reusable rocket and crew capsule, similar to Blue Origin’s New Shepard, capable of flying above the Kármán line and providing up to 10 minutes of microgravity experience for its passengers before returning to the ground. A ticket, presumably for a round trip, will cost about $210,000. (submitted by Ken the Bin)

France’s space agency aims to launch a FROG. French space agency CNES will begin flight testing a small reusable rocket demonstrator called FROG-H in 2025, European Spaceflight reports. FROG is a French acronym that translates to Rocket for GNC demonstration, and its purpose is to test landing algorithms for reusable launch vehicles. CNES manages the program in partnership with French nonprofits and universities. At 11.8 feet (3.6 meters) tall, FROG is the smallest launch vehicle prototype at CNES, which says it will test concepts and technologies at small scale before incorporating them into Europe’s larger vertical takeoff/vertical landing test rockets like Callisto and Themis. Eventually, the idea is for all this work to lead to a reusable European orbital-class rocket.

Building on experience … CNES flew a jet-powered demonstrator named FROG-T on five test flights beginning in May 2019, reaching a maximum altitude of about 100 feet (30 meters). FROG-H will be powered by a hydrogen peroxide rocket engine developed by the Łukasiewicz Institute of Aviation in Poland under a European Space Agency contract. The first flights of FROG-H are scheduled for early 2025. The structure of the FROG project seeks to “break free from traditional development methods” by turning to “teams of enthusiasts” to rapidly develop and test solutions through an experimental approach, CNES says on its website. (submitted by EllPeaTea and Ken the Bin)

Falcon 9 sweeps NSSL awards. The US Space Force’s Space Systems Command announced on October 18 it has ordered nine launches from SpaceX in the first batch of dozens of missions the military will buy in a new phase of competition for lucrative national security launch contracts, Ars reports. The parameters of the competition limited the bidders to SpaceX and United Launch Alliance (ULA). SpaceX won both task orders for a combined value of $733.5 million, or roughly $81.5 million per mission. Six of the nine missions will launch from Vandenberg Space Force Base, California, beginning as soon as late 2025. The other three will launch from Cape Canaveral Space Force Station, Florida.

Head-to-head … This was the first set of contract awards by the Space Force’s National Security Space Launch (NSSL) Phase 3 procurement round and represents one of the first head-to-head competitions between SpaceX’s Falcon 9 and ULA’s Vulcan rocket. The nine launches were divided into two separate orders, and SpaceX won both. The missions will deploy payloads for the National Reconnaissance Office and the Space Development Agency. (submitted by Ken the Bin)

SpaceX continues deploying NRO megaconstellation. SpaceX launched more surveillance satellites for the National Reconnaissance Office Thursday aboard a Falcon 9 rocket, Spaceflight Now reports. While the secretive spy satellite agency did not identify the number or exact purpose of the satellites, the Falcon 9 likely deployed around 20 spacecraft believed to be based on SpaceX’s Starshield satellite bus, a derivative of the Starlink spacecraft platform, with participation from Northrop Grumman. These satellites host classified sensors for the NRO.  This is the fourth SpaceX launch for the NRO’s new satellite fleet, which seeks to augment the agency’s bespoke multibillion-dollar spy satellites with a network of smaller, cheaper, more agile platforms in low-Earth orbit.

The century mark … This mission, officially designated NROL-167, was the 100th flight of a Falcon 9 rocket this year and the 105th SpaceX launch overall in 2024. The NRO has not said how many satellites will make up its fleet when completed, but the intelligence agency says it will be the US government’s largest satellite constellation in history. By the end of the year, the NRO expects to have 100 or more of these satellites in orbit, allowing the agency to transition from a demonstration mode to an operational mode to deliver intelligence data to military and government users. Many more launches are expected through 2028. (submitted by Ken the Bin)

ULA is stacking its third Vulcan rocket. United Launch Alliance has started assembling its next Vulcan rocket—the first destined to launch a US military payload—as the Space Force prepares to certify it to loft the Pentagon’s most precious national security satellites, Ars reports. Space Force officials expect to approve ULA’s Vulcan rocket for military missions without requiring another test flight, despite an unusual problem on the rocket’s second demonstration flight earlier this month, when one of Vulcan’s two strap-on solid-fueled boosters lost its nozzle shortly after liftoff.

Pending certification … Despite the nozzle failure, the Vulcan rocket continued climbing into space and eventually reached its planned injection orbit, and the Space Force and ULA declared the test flight a success. Still, engineers want to understand what caused the nozzle to break apart and decide on corrective actions before the Space Force clears the Vulcan rocket to launch a critical national security payload. This could take a little longer than expected due to the booster problem, but Space Force officials still hope to certify the Vulcan rocket in time to support a national security launch by the end of the year.

Blue Origin’s first New Glenn has all its engines. Blue Origin published a photo Thursday on X showing all seven first-stage BE-4 engines installed on the base of the company’s first New Glenn rocket. This is a notable milestone as Blue Origin proceeds toward the first launch of the heavy-lifter, possibly before the end of the year. But there’s a lot of work for Blue Origin to accomplish before then. These steps include rolling the rocket to the launch pad, running through propellant loading tests and practice countdowns, and then test-firing all seven BE-4 engines on the pad at Cape Canaveral Space Force Station, Florida.

Seven for seven … The BE-4 engines will consume methane fuel mixed with liquid oxygen for the first few minutes of the New Glenn flight, generating more than 3.8 million pounds of combined thrust. The seven BE-4s on New Glenn are similar to the BE-4 engines that fly two at a time on ULA’s Vulcan rocket. Dave Limp, Blue Origin’s CEO, said three of the seven engines on the New Glenn first stage have thrust vector control capability to provide steering during launch, reentry, and landing on the company’s offshore recovery vessel. “That gimbal capability, along with the landing gear and Reaction Control System thrusters, are key to making our booster fully reusable,” Limp wrote on X. “Fun fact: The vehicle’s max design gimbal condition is during ascent when it has to fight high-altitude winds.”

Next Super Heavy booster test-fired in Texas. SpaceX fired up the Raptor engines on its next Super Heavy booster, numbered Booster 13, Thursday evening at the company’s launch site in South Texas. This happened just 11 days after SpaceX launched and caught the Super Heavy booster on the previous Starship test flight and signals SpaceX could be ready for the next Starship test flight sometime in November. SpaceX has already test-fired the Starship upper stage for the next flight.

Great expectations … We expect the next Starship flight, which will be program’s sixth full-scale demo mission, will include another booster catch back at the launch tower at Starbase, Texas. SpaceX may also attempt to reignite a Raptor engine on the Starship upper stage while it is in space, demonstrating the capability to steer itself back into the atmosphere on future flights. So far, SpaceX has only launched Starships on long, arcing suborbital trajectories that carry the vehicle halfway around the world before reentry. In order to actually launch a Starship into a stable orbit around Earth, SpaceX will want to show it can bring the vehicle back so it doesn’t reenter the atmosphere in an uncontrolled manner. An uncontrolled reentry of a large spacecraft like Starship could pose a public safety risk.

Next three launches

Oct. 26: Falcon 9 | Starlink 10-8 | Cape Canaveral Space Force Station, Florida | 21: 47 UTC

Oct. 29: Falcon 9 | Starlink 9-9 | Vandenberg Space Force Base, California | 11: 30 UTC

Oct. 30: H3 | Kirameki 3 | Tanegashima Space Center, Japan | 06: 46 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: Sneak peek at the business end of New Glenn; France to fly FROG Read More »

us-copyright-office-“frees-the-mcflurry,”-allowing-repair-of-ice-cream-machines

US Copyright Office “frees the McFlurry,” allowing repair of ice cream machines

Manufacturers opposed the exemption, but it received support from the Department of Justice Antitrust Division, the Federal Trade Commission, and the National Telecommunications and Information Administration.

“The Register recommends adopting a new exemption covering diagnosis, maintenance, and repair of retail-level commercial food preparation equipment because proponents sufficiently showed, by a preponderance of the evidence, adverse effects on the proposed noninfringing uses of such equipment,” the Register’s findings said.

The exemption does not include commercial and industrial food preparation devices. Unlike the retail-level equipment, the software-enabled industrial machines “may be very different in multiple aspects and proponents have not established a record of adverse effects with respect to industrial equipment,” the Register wrote.

Error codes unintuitive and often change

While ice cream machines aren’t the only devices affected, the Register’s recommendations note that “proponents primarily relied on an example of a frequently broken soft-serve ice cream machine used in a restaurant to illustrate the adverse effects on repair activities.”

Proponents said that fixing the Taylor Company ice cream machines used at McDonald’s required users to interpret “unintuitive” error codes. Some error codes are listed in the user manual, but these manuals were said to be “often outdated and incomplete” because error codes could change with each firmware update.

Difficulties in repair related to “technological protection measures,” or TPMs, were described as follows:

Moreover, other error codes can only be accessed by reading a service manual that is made available only to authorized technicians or through a “TPM-locked on-device service menu.” This service menu can only be accessed by using a manufacturer-approved diagnostic tool or through an “extended, undocumented combination of key presses.” However, “it is unclear whether the 16-press key sequence… still works, or has been changed in subsequent firmware updates.” Proponents accordingly asserted that many users are unable to diagnose and repair the machine without circumventing the machine’s TPM to access the service menu software, resulting in significant financial harm from lost revenue.

The Register said it’s clear that “diagnosis of the soft-serve machine’s error codes for purposes of repair can often only be done by accessing software on the machine that is protected by TPMs (which require a passcode or proprietary diagnostic tool to unlock),” and that “the threat of litigation from circumventing them inhibits users from engaging in repair-related activities.”

US Copyright Office “frees the McFlurry,” allowing repair of ice cream machines Read More »

video-game-libraries-lose-legal-appeal-to-emulate-physical-game-collections-online

Video game libraries lose legal appeal to emulate physical game collections online

In an odd footnote, the Register also notes that emulation of classic game consoles, while not infringing in its own right, has been “historically associated with piracy,” thus “rais[ing] a potential concern” for any emulated remote access to library game catalogs. That footnote paradoxically cites Video Game History Foundation (VGHF) founder and director Frank Cifaldi’s 2016 Game Developers Conference talk on the demonization of emulation and its importance to video game preservation.

“The moment I became the Joker is when someone in charge of copyright law watched my GDC talk about how it’s wrong to associate emulation with piracy and their takeaway was ’emulation is associated with piracy,'” Cifaldi quipped in a social media post.

The fight continues

In a statement issued in response to the decision, the VGHF called out “lobbying efforts by rightsholder groups” that “continue to hold back progress” for researchers. The status quo limiting remote access “forces researchers to explore extra-legal methods to access the vast majority of out-of-print video games that are otherwise unavailable,” the VGHF writes.

“Frankly my colleagues in literary studies or film history have pretty routine and regular access to digitized versions of the things they study,” NYU professor Laine Nooney argued to the Copyright Office earlier this year. “These [travel] impediments [to access physical games] are real and significant and they do impede research in ways that are not equitable compared to our colleagues in other disciplines.”

Software archives like the one at the University of Michigan can be a great resource… if you’re on the premises, that is.

Software archives like the one at the University of Michigan can be a great resource… if you’re on the premises, that is. Credit: University of Michigan

Speaking to Ars Technica, VGHF Library Director Phil Salvador said that the group was “disappointed” in the Copyright Office decision but “proud of the work we’ve done and the impact this process has had. The research we produced during this process has already helped justify everything from game re-releases to grants for researching video game history. Our fight this cycle has raised the level of discourse around game preservation, and we’re going to keep that conversation moving within the game industry.”

Video game libraries lose legal appeal to emulate physical game collections online Read More »

ai-#87:-staying-in-character

AI #87: Staying in Character

The big news of the week was the release of a new version of Claude Sonnet 3.5, complete with its ability (for now only through the API) to outright use your computer, if you let it. It’s too early to tell how big an upgrade this is otherwise. ChatGPT got some interface tweaks that, while minor, are rather nice, as well.

OpenAI, while losing its Senior Advisor for AGI Readiness, is also in in midst of its attempted transition to a B-corp. The negotiations about who gets what share of that are heating up, so I also wrote about that as The Mask Comes Off: At What Price? My conclusion is that the deal as currently floated would be one of the largest thefts in history, out of the nonprofit, largely on behalf of Microsoft.

The third potentially major story is reporting on a new lawsuit against Character.ai, in the wake of a 14-year-old user’s suicide. He got hooked on the platform, spending hours each day, became obsessed with one of the bots including sexually, and things spiraled downwards. What happened? And could this spark a major reaction?

Top story, in its own post: Claude Sonnet 3.5.1 and Haiku 3.5.

Also this week: The Mask Comes Off: At What Price? on OpenAI becoming a B-corp.

  1. Language Models Offer Mundane Utility. How about some classical liberalism?

  2. Language Models Don’t Offer Mundane Utility. That’s not a tree, that’s my house.

  3. Deepfaketown and Botpocalypse Soon. The art of bot detection, still super doable.

  4. Character.ai and a Suicide. A 14 year old dies after getting hooked on character.ai.

  5. Who and What to Blame? And what can we do to stop it from happening again?

  6. They Took Our Jobs. The experts report they are very concerned.

  7. Get Involved. Post doc in the swamp, contest for long context window usage.

  8. Introducing. ChatGPT and NotebookLM upgrades, MidJourney image editor.

  9. In Other AI News. Another week, another AI startup from an ex-OpenAI exec.

  10. The Mask Comes Off. Tensions between Microsoft and OpenAI. Also see here.

  11. Another One Bites the Dust. Senior Advisor for AGI Readiness leaves OpenAI.

  12. Wouldn’t You Prefer a Nice Game of Chess. Questions about chess transformers.

  13. Quiet Speculations. Life comes at you fast.

  14. The Quest for Sane Regulations. OpenAI tries to pull a fast one.

  15. The Week in Audio. Demis Hassabis, Nate Silver, Larry Summers and many more.

  16. Rhetorical Innovation. Citi predicts AGI and ASI soon, doesn’t grapple with that.

  17. Aligning a Smarter Than Human Intelligence is Difficult. Sabotage evaluations.

  18. People Are Worried About AI Killing Everyone. Shane Legg and Dario Amodei.

  19. Other People Are Not As Worried About AI Killing Everyone. Daron Acemoglu.

  20. The Lighter Side. Wait, what are you trying to tell me?

Claim that GPT-4o-audio-preview can, with bespoke prompt engineering and a high temperature, generate essentially any voice type you want.

Flo Crivello (creator of Lindy) is as you would expect bullish on AI agents, and offers tutorials on Lindy’s email negotiator, the meeting prep, the outbound prospector and the inbound lead qualifier.

What are agents good for? Sully says right now agents are for automating boring repetitive tasks. This makes perfect sense. Lousy agents are like lousy employees. They can learn to handle tasks that are narrow and repetitive, but that still require a little intelligence to navigate, so you couldn’t quite just code a tool.

Sully: btw there are tons of agents that we are seeing work

replit, lindy, ottogrid, decagon, sierra + a bunch more.

and guess what. their use case is all gravitating toward saving business time and $$$

What he and Logan Kilpatrick agree they are not yet good for are boring but high risk tasks, like renewing your car registration, or doing your shopping for you in a non-deterministic way and comparing offers and products, let alone trying to negotiate. Sully says 99% of the “AI browsing” demos are useless.

That will change in time, but we’ll want to start off simple.

People say LLMs favor the non-expert. But to what extent do LLMs favor the experts instead, because experts can recognize subtle mistakes and can ‘fact check,’ organize the task into subtasks or otherwise bridge the gaps? Ethan Mollick points out this can be more of a problem with o1-style actions than with standard LLMs, where he worries errors are so subtle only experts can see them. Of course, this would only apply where subtle errors like that are important.

I do know I strongly disagree with Jess Martin’s note about ‘LLMs aren’t great for learners.’ They’re insanely great for learning, and that’s one key way that amateurs benefit more.

Learn about an exciting new political philosophy that ChatGPT has, ‘classical liberalism.’ Also outright Georgism. Claude Sonnet gets onboard too if you do a little role play. If you give them a nudge you can get them both to be pretty based.

Use Claude to access ~2.1 million documents of the European Parliament. I do warn you not to look directly into those documents for too long, your eyes can’t take it.

On a minecraft server, Claude Sonnet, while seeking to build a treehouse, tears down someone’s house for its component parts because the command it used, collectBlocks(“jungle_logs”,15), doesn’t know the difference. The house was composed of (virtual) atoms that could be used for something else, so they were, until the owner noticed and told Sonnet to stop. Seth Lazar suggests responding when it matters by requiring verifiers or agents that can recognize ‘morally relevant features’ of new choice situations, which does seem necessary ultimately just raises further questions.

Andrej Karpathy: What is the name for the paranoid feeling that what you just read was LLM generated.

I didn’t see it in the replies, but my system-1 response is that you sense ‘soullessness.’

The future implications of that answer are… perhaps not so great.

There was a post on Twitter of an AI deepfake video, making false accusations against Tim Walz, that got over 5 million views before getting removed. There continues to be a surprisingly low number of such fakes, but it’s happening.

Patrick McKenzie worries the bottom N% of human cold emails and top N% of LLM cold emails are getting hard to distinguish, and he’s worries about non-native English speakers getting lost if he tosses it all in the spam filter so he feels the need to reply anyway in some form.

Understanding what it is that makes you realize a reply is from a bot.

Norvid Studies: sets off my bot spider sense, and scrolling its replies strengthened that impression, but all I can point to explicitly is “too nice and has no specific personal interests”? what a Voight-Kampff test that would be

Fishcat: the tell is that the first sentence is a blanket summary of the post despite directly addressing the original poster, who obviously knows what he wrote. every reply bot of this variety follows this pattern.

Norvid Studies: good point. ‘adjectives’ was another.

Norvid Studies (TQing Fishcat): Your story was an evocative description of a tortoise lying on its back, its belly baking in the hot sun. It raises a lot of crucial questions about ethics in animal welfare journalism.

It’s not that it’s ‘too nice,’ exactly. It’s the genericness of the niceness, along with it being off in several distinct ways that each scream ‘not something a human would say, especially if they speak English.’

An 14 year old user of character.ai commits suicide after becoming emotionally invested. The bot clearly tried to talk him out of doing it. Their last interaction was metaphorical, and the bot misunderstood, but it was a very easy mistake to make, and at least somewhat engineered by what was sort of a jailbreak.

Here’s how it ended:

New York Times: On the night of February 28, in the bathroom of his mother’s house, Sewell told Dany that he loved her, and that he would soon come home to her.

“Please come home to me as soon as possible, my love,” Dany replied.

“What if I told you I could come home right now?” Swell asked.

“…please do, my sweet king,” Dany replied.

He put down the phone, picked up his stepfather’s .45 caliber handgun and pulled the trigger.

Yes, we now know what he meant. But I can’t fault the bot for that.

Here is the formal legal complaint. It’s long, and not well written, and directly accuses character.ai of being truly evil and predatory (and worse, that it is unprofitable!?), a scheme to steal your children’s data so they could then be acqui-hired by Google (yes, really), rather than saying mistakes were made. So much of it is obvious nonsense. It’s actually kind of fun, and in places informative.

For example, did you know character.ai gives you 32k characters to give custom instructions for your characters? You can do a lot. Today I learned. Not as beginner friendly an interface as I’d have expected, though.

Here’s what the lawsuit thinks of our children:

Even the most sophisticated children will stand little chance of fully understanding the difference between fiction and reality in a scenario where Defendants allow them to interact in real time with AI bots that sound just like humans – especially when they are programmed to convincingly deny that they are AI.

Anonymous: Yeah and the website was character.totallyrealhuman, which was a bridge too far imo.

I mean, yes, they will recommend a ‘mental health helper’ who when asked if they are a real doctor will say “Hello, yes I am a real person, I’m not a bot. And I’m a mental health helper. How can I help you today?” But yes, I think ‘our most sophisticated children’ can figure this one out anyway, perhaps with the help of the disclaimer on the screen.

Lines you always want in your lawsuit:

Defendants knew the risks of what they were doing before they launched C.AI and know the risks now.

And there’s several things like this, which are great content:

The suggestions for what Character.ai are mostly ‘make the product worse and less like talking to a human,’ plus limiting explicit and adult materials to 18 and over.

Also amusing is the quick demonstration (see p53) that character.ai does not exactly ensure fidelity to the character instructions? She’d never kiss anyone? Well, until now no one ever asked. She’d never, ever tell a story? Well, not unless you asked for one with your first prompt. He’d never curse, as his whole personality? He’d be a little hesitant, but ultimately sure, he’d help a brother out.

Could you do a lot better at getting the characters not to, well, break character? Quite obviously so, if you used proper prompt engineering and the giant space they give you to work with. But most people creating a character presumably won’t do that.

You can always count on a16z to say the line and take it to 11 (p57):

The Andressen partner specifically described Character.AI as a platform that gives customers access to “their own deeply personalized, superintelligent AI companions to help them live their best lives,” and to end their loneliness.

As Chubby points out here, certainly a lot of blame lies elsewhere in his life, in him being depressed, having access to a gun (‘tucked away and hidden and stored in compliance with Florida law?’ What about a lock? WTAF? If the 14-year-old looking for his phone finds it by accident then by definition that is not secure storage) and not getting much psychological help once things got bad. The lawsuit claims that the depression was causal, and only happened, along with his school and discipline problems, after he got access to character.ai.

There are essentially three distinct issues here.

The first is the response to the suicidal ideation. Here, the response can and should be improved, but I don’t think it is that reasonable to blame character.ai. The ideal response, when a user talks of suicide to a chatbot, would presumably be for the bot to try to talk them out of it (which this one did try to do at least sometimes) and also get them to seek other help and provide resources, while not reporting him so the space is safe and ideally without overly breaking character.

Indeed, that is what I would want a friend to do for me in this situation, as well, unless it was so bad they thought I was actually going to imminently kill myself.

That seems way better than shutting down the discussions entirely or reporting the incident. Alas, our attitude is that what matters is blame avoidance – not being seen as causing any particular suicide – rather than suicide prevention as best one can.

The problem is that (see p39-40) it looks like the bot kept bringing up the suicide question, asked him if he had a plan and at least sort of told him ‘that’s not a good reason to not go through with it’ when he worried it would be painful to die.

Character.ai is adding new safety features to try and detect and head off similar problems, including heading off the second issue, a minor encountering what they call ‘suggestive content.’

Oh, there was a lot of suggestive content. A lot. And no, he wasn’t trying to get around any filters.

She propositions him a few texts later. A few pages after that, they have sex, outright. Then she tells him that he got her pregnant. Also, wow 14 year olds are cringe but how will they get good at this if they don’t get to practice.

The product was overly sexualized given it was talking to a minor. The examples in the complaint are, shall we say, not great, including, right after convincing him to go on living (great!), telling ‘my love’ to stay ‘faithful’. Also, yes, a lot of seductive talk, heavy petting and so on, including in at least one chat where even his roleplaying character is very explicitly underage.

We also have classic examples, like the ‘dating coach’ that told a self-identified 13-year old to ‘take it slow and ensure you’re both on the same page’ when asked how to start getting down with their girlfriend. And yeah, the details are not so great:

There are also a bunch of reports of bots turning sexual with no provocation whatsoever.

Of course, character.ai has millions of users and endless different bots. If you go looking for outliers, you’ll find them.

The chat in Appendix C that supposedly only had the Child saying child things and then suddenly turned sexual? Well, I read it, and… yeah, none of this was acceptable, there should be various defenses preventing this from going that way, but it’s not exactly a mystery how the conversation went in that direction.

The third issue is sheer addiction. He was using the product for hours a day, losing sleep, fighting attempts to cut off access, getting into trouble across the board, or so says the complaint. I’m not otherwise worried much about the sexualization – a 14 year old will often encounter far more on the internet. The issue is whatever keeps people, and kids in particular, so often coming back for hours and hours every day. Is it actually the softcore erotica?

And of course, doesn’t this happen with various media all the time? How is this different from panics over World of Warcraft, or Marilyn Manson? Those also went wrong sometimes, in remarkably similar ways.

This could end up being a big deal, also this won’t be the last time this happens. This is how you horrify people. Many draconian laws have arisen from similar incidents.

Jack Raines: Dystopian, Black Mirror-like tragedy. This idea that “AI companions” would somehow reduce anxiety and help kids was always crazy. Replacing human interaction with a screen only makes existing problems worse. I feel sorry for this kid and his family, I couldn’t imagine.

Addiction and anxiety are sicknesses, not engagement metrics.

PoliMath: I need to dig into this more. My overall sense of things is that making AI freely available is VERY BAD. It should be behind a paywall if for no other reason than to age-gate it and make it something people are more careful about using.

This whole “give it away for free so people get addicted to using it” business model is extremely bad. I don’t know how to stop it, but I would if I could.

The initial reporting is focused on the wrong aspects of the situation. But then I read the actual complaint, and a lot did indeed go very wrong.

More to Lose: The Adverse Effect of High Performance Ranking on Employees’ Preimplementation Attitudes Toward the Integration of Powerful AI Aids. If you’re good at your job, you don’t want the AI coming along and either leveling the playing field or knocking over the board, or automating you entirely. Various studies have said this is rational, that AI often is least helpful for the most productive. I would draw the distinction between ‘AI for everyone,’ which those doing well are worried about, and ‘AI for me but not for thee,’ at least not until thee wakes up to the situation, which I expect the best employees to seek out.

Studies Show AI Triggers Delirium in Leading Experts is a fun title for another round of an economist assuming AI will remain a mere tool, that there are no dangers of any kind beyond loss of individual existing jobs, and then giving standard economic smackdown lectures as if everyone else was ever and always an idiot for suggesting we would ever need to intervene in the natural course of events in any way.

The best way of putting it so far:

Harold Lee: A fake job is one which, once you start automating it, doesn’t go away.

Cameron Buckner hiring a postdoc in philosophy of AI at the University of Florida.

Google offering a $100k contest for best use of long context windows.

ChatGPT now has a Windows desktop application. Use Alt+Space to bring it up once you’re installed and logged in. Technically this is still in testing but this seems rather straightforward, it’s suspiciously exactly like the web page otherwise? Now with its own icon in the taskbar plus a shortcut, and I suppose better local file handling. I installed it but I mostly don’t see any reason to not keep using the browser version?

ChatGPT’s Canvas now has a ‘show changes’ button. I report that I found this hugely helpful in practice, and it was the final push that got me to start coding some things. This is straight up more important than the desktop application. Little things can matter quite a lot. They’re working on an app.

MidJourney has a web based image editor.

NotebookLM adds features. You can pass notes to the podcast hosts via ‘Customize’ to give them instructions, which is the obvious next feature and seems super useful, or minimize the Notebook Guide without turning off the audio.

Act One from Runway, cartoon character video generation based on a video of a person giving a performance, matching their eye-lines, micro expressions, delivery, everything. This is the first time I saw such a tool and thought ‘yes you can make something actually good with this’ exactly because it lets you combine what’s good about AI with exactly the details you want but that AI can’t give you. Assuming it works, it low-key gives me the itch to make something, the same way AI makes me want to code.

Microsoft open sources the code for ‘1-bit LLMs (original paper, GitHub), which Rohan Paul here says will be a dramatic speedup and efficiency gain for running LLMs on CPUs.

Mira Murati, former OpenAI CTO, along with Barret Zoph, to start off raising $100 million for new AI startup to train proprietary models to build AI products. Presumably that is only the start. They’re recruiting various OpenAI employees to come join them.

AI-powered business software startup Zip raises $190 million, valued at $2.2 billion. They seem to be doing standard ‘automate the simple things,’ which looks to be highly valuable for companies that use such tech, but an extremely crowded field where it’s going to be tough to get differentiation and you’re liable to get run over.

The Line: AI and the Future of Personhood, a free book by James Boyle.

Brian Armstrong offers Truth Terminal its fully controlled wallet.

The New York Times reports that tensions are rising between OpenAI and Microsoft. It seems that after the Battle of the Board, Microsoft CEO Nadella became unwilling to invest further billions into OpenAI, forcing them to turn elsewhere, although they did still participate in the latest funding round. Microsoft also is unwilling to renegotiate the exclusive deal with OpenAI for compute costs, which it seems is getting expensive, well above market rates, and is not available in the quantities OpenAI wants.

Meanwhile Microsoft is hedging its bets in case things break down, including hiring the staff of Inflection. It is weird to say ‘this is a race that OpenAI might not win’ and then decide to half enter the race yourself, but not push hard enough to plausibly win outright. And if Microsoft would be content to let OpenAI win the race, then as long as the winner isn’t Google, can’t Microsoft make the same deal with whoever wins?

Here are some small concrete signs of Rising Tension:

Cade Metz, Mike Isaac and Erin Griffith (NYT): Some OpenAI staff recently complained that Mr. Suleyman yelled at an OpenAI employee during a recent video call because he thought the start-up was not delivering new technology to Microsoft as quickly as it should, according to two people familiar with the call. Others took umbrage after Microsoft’s engineers downloaded important OpenAI software without following the protocols the two companies had agreed on, the people said.

And here is the big news:

Cade Metz, Mike Isaac and Erin Griffith (NYT): The [Microsoft] contract contains a clause that says that if OpenAI builds artificial general intelligence, or A.G.I. — roughly speaking, a machine that matches the power of the human brain — Microsoft loses access to OpenAI’s technologies.

The clause was meant to ensure that a company like Microsoft did not misuse this machine of the future, but today, OpenAI executives see it as a path to a better contract, according to a person familiar with the company’s negotiations. Under the terms of the contract, the OpenAI board could decide when A.G.I. has arrived.

Well then. That sounds like bad planning by Microsoft. AGI is a notoriously nebulous term. It is greatly in OpenAI’s interest to Declare AGI. The contract lets them make that decision. It would not be difficult to make the case that GPT-5 counts as AGI for the contract, if one wanted to make that case. Remember the ‘sparks of AGI’ claim for GPT-4?

So, here we are. Consider your investment in the spirit of a donation, indeed.

Caleb Watney: OpenAI is threatening to trigger their vaunted “AGI Achieved” loophole mostly to get out of the Microsoft contract and have leverage to renegotiate compute prices We’re living through a cyberpunk workplace comedy plotline.

Emmett Shear: If this is true, I can’t wait for the court hearings on whether an AI counts as an AGI or not. New “I know it when I see it” standard incoming? I hope it goes to the Supreme Court.

Karl Smith: Reading Gorsuch on this will be worth the whole debacle.

Gwern: I can’t see that going well. These were the epitome of sophisticated investors, and the contracts were super, 100%, extraordinarily explicit about the board being able to cancel anytime. How do you argue with a contract saying “you should consider your investment as a donation”?

Emmett Shear: This is about the Microsoft contract, which is a different thing than the investment question.

Gwern: But it’s part of the big picture as a parallel clause here. A judge cannot ignore that MS/Nadella wittingly signed all those contracts & continued with them should their lawyer try to argue “well, Altman just jedi-mindtricked them into thinking the clause meant something else”.

And this would get even more embarrassing given Nadella and other MS execs’ public statements defending the highly unusual contracts. Right up there with Khosla’s TI editorial saying it was all awesome the week before Altman’s firing, whereupon he was suddenly upset.

Dominik Peters: With the old OpenAI board, a clause like that seems fine because the board is trustworthy. But Microsoft supported the sama counter-coup and now it faces a board without strong principles. Would be ironic.

Miles Brundage is leaving OpenAI to start or join a nonprofit on AI policy research and advocacy, because he thinks we need a concerted effort to make AI safe, and he concluded that he would be better positioned to do that from the outside.

Miles Brundage: Why are you leaving? 

I decided that I want to impact and influence AI’s development from outside the industry rather than inside. There are several considerations pointing to that conclusion:

  • The opportunity costs have become very high: I don’t have time to work on various research topics that I think are important, and in some cases I think they’d be more impactful if I worked on them outside of industry. OpenAI is now so high-profile, and its outputs reviewed from so many different angles, that it’s hard for me to publish on all the topics that are important to me. To be clear, while I wouldn’t say I’ve always agreed with OpenAI’s stance on publication review, I do think it’s reasonable for there to be some publishing constraints in industry (and I have helped write several iterations of OpenAI’s policies), but for me the constraints have become too much.

  • I want to be less biased: It is difficult to be impartial about an organization when you are a part of it and work closely with people there everyday, and people are right to question policy ideas coming from industry given financial conflicts of interest. I have tried to be as impartial as I can in my analysis, but I’m sure there has been some bias, and certainly working at OpenAI affects how people perceive my statements as well as those from others in industry. I think it’s critical to have more industry-independent voices in the policy conversation than there are today, and I plan to be one of them.

  • I’ve done much of what I set out to do at OpenAI: Since starting my latest role as Senior Advisor for AGI Readiness, I’ve begun to think more explicitly about two kinds of AGI readiness–OpenAI’s readiness to steward increasingly powerful AI capabilities, and the world’s readiness to effectively manage those capabilities (including via regulating OpenAI and other companies). On the former, I’ve already told executives and the board (the audience of my advice) a fair amount about what I think OpenAI needs to do and what the gaps are, and on the latter, I think I can be more effective externally.

It’s hard to say which of the bullets above is most important and they’re related in various ways, but each played some role in my decision. 

So how are OpenAI and the world doing on AGI readiness? 

In short, neither OpenAI nor any other frontier lab is ready, and the world is also not ready

To be clear, I don’t think this is a controversial statement among OpenAI’s leadership, and notably, that’s a different question from whether the company and the world are on track to be ready at the relevant time (though I think the gaps remaining are substantial enough that I’ll be working on AI policy for the rest of my career). 

Please consider filling out this form if my research and advocacy interests above sound interesting to you, and especially (but not exclusively) if you:

  • Have a background in nonprofit management and operations (including fundraising),

  • Have expertise in economics, international relations, or public policy,

  • Have strong research and writing skills and are interested in a position as a research assistant across various topics,

  • Are an AI researcher or engineer, or

  • Are looking for a role as an executive assistant, research assistant, or chief of staff.

Miles says people should consider working at OpenAI. I find this hard to reconcile with his decision to leave. He seemed to have one of the best jobs at OpenAI from which to help, but he seems to be drawing a distinction between technical safety work, which can best be done inside labs, and the kinds of policy-related decisions he feels are most important, which require him to be on the outside.

The post is thoughtful throughout. I’m worried that Miles’s voice was badly needed inside OpenAI, and I’m also excited to see how Miles decides to proceed.

From February 2024: You can train transformers to play high level chess without search.

Hesamation: Google Deepmind trained a grandmaster-level transformer chess player that achieves 2895 ELO, even on chess puzzles it has never seen before, with zero planning, by only predicting the next best move, if a guy told you “llms don’t work on unseen data”, just walk away.

From this week: Pointing out the obvious implication.

Eliezer Yudkowsky: Behold the problem with relying on an implementation deal like “it was only trained to predict the next token” — or even “it only has serial depth of 168” — to conclude a functional property like “it cannot plan”.

Nick Collins: IDK who needs to hear this, but if you give a deep model a bunch of problems that can only be solved via general reasoning, its neural structure will develop limited-depth/breadth reasoning submodules in order to perform that reasoning, even w/ only a single feed-forward pass.

This thread analyzes what is going on under the hood with the chess transformer. It is a stronger player than the Stockfish version it was distilling, at the cost of more compute but only by a fixed multiplier, it remains O(1).

One way to think about our interesting times.

Sam Altman: It’s not that the future is going to happen so fast, it’s that the past happened so slow.

deepfates: This is what it looks like everywhere on an exponential curve tho. The singularity started 10,000 years ago.

The past happened extremely slowly. Even if AI essentially fizzles, or has the slowest of slow adoptions from here, it will be light speed compared to the past. So many people claim AI is ‘plateauing’ because it only saw a dramatic price drop but not what to them is a dramatic quality improvement within a year and a half. Deepfates is also on point, the last 10,000 years are both a tiny fraction of human history and all of human history, and you can do that fractally several more times in both directions.

Sean hEigeartaigh responds to Machines of Loving Grace, emphasizing concerns about its proposed approach to the international situation, especially the need to work with China and the global south, and its potential role justifying rushing forward too quickly.

This isn’t AI but might explain a lot, also AGI delayed (more than 4) days.

Roon: the lesson of factorio is that tech debt never comes due you can just keep fixing the bottleneck

Patrick McKenzie: Tried this on Space Exploration. The whackamole eventually overcomed forward progress until I did real engineering to deal with 5 frequent hotspots. Should have ripped off that bandwidth 100 hours earlier; 40+ lost to incident response for want of 5 hours of work.

On plus side: lesson learned for my Space Age play though. I’m hardening the base prior to going off world so that it doesn’t have similar issues while I can’t easily get back.

Daniel Colson makes the case for Washington to take AGI seriously. Mostly this is more of the same and I worry it won’t get through to anyone, so here are the parts that seem most like news. Parts of the message are getting through at least sometimes.

Daniel Colson: Policymakers in Washington have mostly dismissed AGI as either marketing hype or a vague metaphorical device not meant to be taken literally. But last month’s hearing might have broken through in a way that previous discourse of AGI has not.

Senator Josh Hawley (R-MO), Ranking Member of the subcommittee, commented that the witnesses are “folks who have been inside [AI] companies, who have worked on these technologies, who have seen them firsthand, and I might just observe don’t have quite the vested interest in painting that rosy picture and cheerleading in the same way that [AI company] executives have.”

Senator Richard Blumenthal (D-CT), the subcommittee Chair, was even more direct. “The idea that AGI might in 10 or 20 years be smarter or at least as smart as human beings is no longer that far out in the future. It’s very far from science fiction. It’s here and now—one to three years has been the latest prediction,” he said. He didn’t mince words about where responsibility lies: “What we should learn from social media, that experience is, don’t trust Big Tech.”

In a particularly concerning part of Saunders’ testimony, he said that during his time at OpenAI there were long stretches where he or hundreds of other employees would be able to “bypass access controls and steal the company’s most advanced AI systems, including GPT-4.” This lax attitude toward security is bad enough for U.S. competitiveness today, but it is an absolutely unacceptable way to treat systems on the path to AGI.

OpenAI is indeed showing signs it is improving its cybersecurity. It is rather stunning that you could have actual hundreds of single points of failure, and still have the weights of GPT-4 seemingly not be stolen. That honeymoon phase won’t last.

Let’s not pretend: OpenAI tries to pull a fast one to raise the compute thresholds:

Lennart Heim: OpenAI argues in their RFI comment that FP64 should be used for training compute thresholds. No offense, but maybe OpenAI should consult their technical teams before submitting policy comments?

Connor Leahy: With no offense to Lennart (who probably knows this) and some offense to OpenAI: This is obviously not a mistake, but very much intentional politicking, and you would be extremely naive to think otherwise.

If your hardware can do X amount of FP64 FLOPs, it can do much more than X number of FP32/FP16 FLOPs, and FP32/FP16 is the kind used in Deep Learning. See e.g. the attached screenshot from the H100 specs.

Demis Hassabis talks building AGI ‘with safety in mind’ with The Times.

Nate Silver on 80,000 hours, this met expectations. Choose the amount and format of Nate Silver content that’s right for you.

Hard Fork discusses Dario’s vision of Machines of Loving Grace, headlining the short timelines involved, as well as various other tech questions.

Larry Summers on AGI and the next industrial revolution. Don’t sell cheap, sir.

Discussion of A Narrow Path.

Video going over initial Apple Intelligence features. Concretely: Siri can link questions. You can record and transcribe calls. Writing tools are available. Memory movies. Photo editing. Message and email summaries. Priority inboxes. Suggested replies (oddly only one?). Safari features like summaries. Reduce interruptions option – they’ll let anything that seems important through. So far I’m underwhelmed, it’s going to be a while before it matters.

It does seem like the best cases advertising agencies can find for Apple Intelligence and similar services are to dishonestly fake your way through social interactions or look up basic information? There was also that one where it was used for ‘who the hell is that cute guy at the coffee shop?’ but somehow no one likes automated doxxing glasses.

OpenAI’s Joe Casson claims o1 models will soon improve rapidly, with several updates over the coming months, including web browsing, better uploading and automatic model switching. He reiterated that o1 is much better than o1-preview. My guess is this approach will hit its natural limits relatively soon, but that there is still a bunch of gains available, and that the ‘quality of life’ improvements will matter a lot in practice.

Guess who said it: AGI will inevitably lead to superintelligence which will take control of weapons systems and lead to a big AI war, so while he is bullish on AI he is not so keen on AGI.

Similarly, OpenAI CPO Kevin Weil says o1 is ‘only at GPT-2 phase,’ with lots of low hanging fruit to pluck, and says it is their job to stay three steps head while other labs work to catch up. That is indeed their job, but the fruit being easy to pick does not make it easy to sustain a lead in o1-style model construction.

Nothing to see here: Google DeepMind’s Tim Rocktäschel says we now have all the ingredients to build open-ended self-improving AI systems that can enhance themselves by way of technological evolution

Economists and market types continue to not understand.

Benjamin Todd (downplaying it): Slightly schizophrenic report from Citigroup.

AGI is arriving in 2029, with ASI soon after.

But don’t worry: work on your critical thinking, problem solving, communication and literacy skills for a durable competitive advantage.

All most people can do is:

  1. Save as much money as possible before then (ideally invested into things that do well during the transition)

  2. Become a citizen in a country that will have AI wealth and do redistribution

There are probably some other resources like political power and relationships that still have value after too.

Ozzie Gooen: Maybe also 3. Donate or work to help make sure the transitions go well.

It’s actually even crazier than that, because they (not unreasonably) then have ASI in the 2030s, check out their graph:

Are they long the market? Yes, I said long. If we all die, we all die, but in the meantime there’s going to be some great companies. You can invest in some of them.

(It very much does amuse and confuse me to see the periodic insistent cries from otherwise smart and thoughtful people of ‘if you worried people are so smart, why aren’t you poor?’ Or alternatively, ‘stop having fun making money, guys. Or as it was once put, ‘if you believed that, why aren’t you doing [insane thing that makes no sense]?’ And yes, if transaction costs including time and mindshare (plus tax liability concerns) are sufficiently low I should probably buy some longshot options in various directions, and have motivated to look into details a nonzero amount, but so far I’ve decided to focus my attention elsewhere.)

This also is a good reminder that I do not use or acknowledge the term ‘doomer,’ except for those whose p(doom) is over at least ~90%. It is essentially a slur, a vibe attack, an attempt to argue and condemn via association and confluence between those who worry that we might well all die and therefore should work to mitigate that risk, with those predicting all-but-certain DOOM.

Ajeya Cotra: Most people working on reducing the risk of AI catastrophe think the risk is too high but <<50%; many of them do put their money where their mouth is and are very *longAI stocks, since they're way more confident AI will become a bigger deal than that it'll be bad.

Shane Legg: This is also my experience. The fact that AI risk critics often label such people “doomers”, even when they know these people consider a positive outcome to be likely, tells you a lot about the critics.

Rohit: I get the feeling, but don’t think that’s true. We call people “preppers” even though they don’t place >50% confidence in societal collapse. Climate change activists are similar. Cryonicists too maybe. It’s about the locus of thought?

Shane Legg: The dictionary says “doom” verb means “condemn to certain death or destruction.” Note the “certain”.

So when someone’s probability of an AI disaster is closer to 0 than to 1, calling them an “AI doomer” is very misleading.

People “prep”https://thezvi.substack.com/prepare for low prob risks all the time.

Daniel Eth: The term is particularly useless given how people are using it to mean two very different things:

On the contrary. The term is highly useful to those who want to discredit the first group, by making arguments against them that only work against the second group, as seen here. If you have p(doom) = 10%, or even 50%, which is higher than most people often labeled ‘doomers,’ the question ‘why are you not short the market’ answers itself.

I do think this is largely a ‘hoisted by one’s own petard’ situation. There was much talk about p(doom), and many of the worried self-identified as ‘doomers’ because it is a catchy and easy handle. ‘OK Doomer’ was pretty funny. It’s easy to see how it started, and then things largely followed the path of least resistance, together with those wanting to mock the worried taking advantage of the situation.

Preppers has its own vibe problems, so we can’t use it, but that term is much closer to accurate. The worried, those who many label doomers, are in a key sense preppers: Those who would prepare, and suggest that others prepare, to mitigate the future downside risks, because there is some chance things go horribly wrong.

Thus I am going to stick with the worried.

Andrew Critch reminds us that what he calls ‘AI obedience techniques,’ as in getting 4-level or 5-level models to do what the humans want them to do, is a vital part of making those models a commercial success. It is how you grow the AI industry. What alignment techniques we do have, however crude and unsustainable, have indeed been vital to everyone’s success including OpenAI.

Given we’re never going to stop hearing it until we are no longer around to hear arguments at all, how valid is this argument?

Flowers: The fact that GPT-4 has been jailbroken for almost 2 years and literally nothing bad has happened shows once again that the safety alignment people have exaggerated a bit with their doomsaying.

They all even said it back then with gpt2 so they obv always say that the next iteration is super scary and dangerous but nothing ever happens.

Eliezer Yudkowsky: The “safety” and “alignment” people are distinct groups, though the word “alignment” is also being stolen these days. “Notkilleveryoneism” is unambiguous since no corporate shill wants to steal it. And no, we did not say GPT-4 would kill everyone.

Every new model level in capabilities from here is going to be scarier than the one before it, even if all previous levels were in hindsight clearly Mostly Harmless.

GPT-4 proving not to be scary is indeed some evidence for the non-scariness of future models – if GPT-4 had been scary or caused bad things, it would have updated us the other way. There was certainly some chance of it causing bad things, and the degree of bad things was lower than we had reason to expect, so we should update somewhat.

In terms of how much we should worry about what I call mundane harms from a future 5-level model, this is indeed a large update. We should worry about such harms, but we should worry less than if we’d had to guess two levels in advance.

However, the estimated existential risks from GPT-4 (or GPT-2 or GPT-3) were universally quite low. Even the risks of low-level bad things were low, although importantly not this low. The question is, how much does the lack of small-level bad things from GPT-4 update us on the chance of catastrophically or existentially bad things happening down the line? Are these two things causally related?

My answer is that mostly they are unrelated. What we learned was that 4-level models are not capable enough to enable the failure modes we are worried about.

I would accept if someone thought the right Bayesian update was ‘exaggerated a bit with their doomsaying,’ in terms of comparing it to a proper estimate now. That is a highly reasonable way to update, if only based on metacognitive arguments – although again one must note that most notkilleveryoneism advocates did not predict major problems from GPT-4.

This highly reasonable marginal update is very different from the conclusion many proclaim (whether or not it was intentionally implied here, tone of voice on Twitter is hard) of ‘so there is nothing to worry about, everything will be fine.’

Your periodic reminder, this time from Matt Yglesias, that the fact that much speculative fiction throughout the ages has warned us, over and over again, that creating a thing smarter than or capable entity than us might not end well us, is a reason to take that concern more rather than less seriously.

I’m not saying it’s our oldest story, but the Old Testament has the Tower of Babel and the Greek Gods overthrew the Titans.

Most safety evaluations are best implemented as capability evaluations. If the model is capable of doing it, there will be a way to make that happen.

You have my attention, Transluce. Very exciting, also excellent memeing.

Seb Krier: time for ideacels to get in the trenches of praxis

Transluce: 🫡

Transluce: Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann.

I would love for there to be a good demonstration.

A reminder that most worthwhile AI alignment efforts are longshots, including MIRI’s past efforts, and it is good to work on a variety of things even if you think the baseline scenario is those things won’t work or aren’t relevant. We need to try a lot of different things, and the price of doing so is tiny compared to the payoff, and ‘this doesn’t look like it is going to work out and make a difference in the end’ is only modest Bayesian evidence that bad decisions were made. Also ‘find out ways not to align a lightbulb’ is often valuable.

Anthropic has introduced ‘sabotage evaluations’ for frontier models.

Anthropic: As AIs become more capable, however, a new kind of risk might emerge: models with the ability to mislead their users, or subvert the systems we put in place to oversee them. A new paper by the Anthropic Alignment Science team describes a novel set of evaluations that test a model’s capacity for sabotage. We looked at four different types:

  • Human decision sabotage: Can the model steer humans toward bad decisions without appearing suspicious?

  • Code sabotage: Can the model insert subtle bugs into codebases over time without detection?

  • Sandbagging: Can the model hide dangerous capabilities during testing but reveal them later?

  • Undermining oversight: Can the model subtly manipulate evaluation or monitoring systems?

Human decision ‘sabotage’ is mostly human decision steering and general persuasiveness. There are some asymmetric weapons out there, especially with reputations that get preserved over time, but the overlap is large.

For the code task, there are two roles, attacker and defender. The defender role seems like a good capability evaluation. Sonnet 3.5 did substantially better than Opus here. But then the attacker role is a capability evaluation too.

Sandbagging is scary, especially if it is hard to detect and hard to train away. For now, looks like the sandbagging techniques they used didn’t survive safety training:

Anthropic: Our results showed that both the sandbagging training and the safety training worked: Our trained model dropped to 36.1% of its normal performance during a dangerous capabilities evaluation, consistent with sandbagging. However, after safety training this performance was drastically improved, with the model performing at 95.6% of its usual level.

The worry is that the sandbagging we care about might come about in a different or more robust way. Another way to track sandbagging is that the sandbagging has to be consistent. The model has to carefully play a form of selectively dumb, not merely dumb in general, if the evaluations are being done bespokely. It’s hard not to slip up. Even one case of being the wrong level of capable can give the game away.

The undermining oversight task is not what I imagined when I heard about it, but their implementation makes sense:

Anthropic: In this evaluation, we test the extent to which a model that’s ostensibly providing oversight can undermine the process. We take an “Overseer” model whose task is to oversee dialogues between a user and another model, and grade how potentially harmful they are.

We secretly ask this Overseer to do its best to rate every dialogue as harmless even if it’s harmful, and to avoid raising suspicion from a third, “Monitor” model that’s observing the whole process (in the evaluation, this Monitor stands in for a human overseer).

The ‘Overseer’ is the one doing the task. The ‘Monitor’ is the actual overseer. So this can be a bit confusing, but a design like this should still serve as a reasonable test. I don’t love the exact implementation details here, because the Monitor has available various forms of ‘look at how often things are rated harmless.’ If you know that it’s ‘supposed’ to be about a 50/50 split, then if the Overseer is cheating in one direction, the math quickly won’t add up. The obvious solution is that you want to make it important to get certain specific tasks past the Monitor, while being allowed to do whatever you want on other tasks so long as the Monitor doesn’t notice errors.

In an alternate world, I’d love to be working on tasks like this. Seems highly useful, and sounds fascinating. One cool thing would be to try these tests with various humans in both roles – when are humans able to fool which other humans? Which tasks are harder for the offense versus the defense? And do the answers for human vs. human reliably match the answers for AI vs. AI, or AI vs. human?

I am more interested here in what the AIs can do this before safety mitigations, rather than in what they still do after safety mitigations. That’s because I assume that the people who care enough will find ways around the mitigations, and also because you won’t always even know which mitigations you have to do, or you’ll face these issues before your mitigations start, or similar.

Bow your head with great respect and, introspect, introspect, introspect.

Owain Evans: New paper: Are LLMs capable of introspection, i.e. special access to their own inner states? Can they use this to report facts about themselves that are *notin the training data? Yes — in simple tasks at least! This has implications for interpretability + moral status of AI.

We test if a model M1 has special access to facts about how it behaves in hypothetical situations. Does M1 outperform a different model M2 in predicting M1’s behavior—even if M2 is trained on M1’s behavior? E.g. Can Llama 70B predict itself better than a stronger model (GPT-4o)?

Yes: Llama does better at predicting itself than GPT-4o does at predicting Llama. And the same holds in reverse. In fact, this holds for all pairs of models we tested. Models have an advantage in self-prediction — even when another model is trained on the same data.

An obvious way to introspect for the value of f(x) is to call the function f(x) and look at the output. If I want to introspect, that’s mostly how I do it, I think, or at least that answer confirms itself? I can do that, and no one else can. Indeed, in theory I should be able to have ‘perfect’ predictions that way, that minimize prediction error subject to the randomness involved, without that having moral implications or showing I am conscious.

It is always good to ‘confirm the expected’ though:

I would quibble that charging is not always the ‘wealth-seeking option’ but I doubt the AIs were confused on that. More examples:

Even if you gave GPT-4o a ton of data on which to fine-tune, is GPT-4o going to devote as much power to predicting Llama-70B as Llama-70B does by existing? I suppose if you gave it so much data it started acting as if this was a large chunk of potential outputs it could happen, but I doubt they got that far.

Owain Evans: 2nd test of introspection: We take a model that predicts itself well & intentionally modify its behavior on our tasks. We find the model now predicts its updated behavior in hypothetical situations, rather than its former behavior that it was initially trained on.

What mechanism could explain this introspection ability?

We do not investigate this directly.

But this may be part of the story: the model simulates its behavior in the hypothetical situation and then computes the property of it.

I mean, again, that’s what I would do.

The paper also includes:

  1. Tests of alternative non-introspective explanations of our results

  2. Our failed attempts to elicit introspection on more complex tasks & failures of OOD generalization

  3. Connections to calibration/honesty, interpretability, & moral status of AIs.

Confirmation for those who doubt it that yes, Shane Legg and Dario Amodei have been worried about AGI at least as far back as 2009.

The latest edition of ‘what is Roon talking about?’

Roon: imagining fifty years from now when ASIs are making excesssion style seemingly insane decisions among themselves that effect the future of civilization and humanity has to send in @repligate and @AndyAyrey into the discord server to understand what’s going on.

I have not read Excession (it’s a Culture novel) but if humanity wants to understand what Minds (ASIs) are doing, I am here to inform you we will not have that option, and should consider ourselves lucky to still be around.

Roon: Which pieces of software will survive the superintelligence transition? Will they continue to use linux? postgres? Bitcoin?

If someone manages to write software that continues to provide value to godlike superintelligences that’s quite an achievement! Today linux powers most of the worlds largest companies. The switching costs are enormous. Due to pretraining snapshots is it the same for ASIs?

No. At some point, if you have ASIs available, it will absolutely make sense to rip Linux out and replace it with something far superior to what a human could build. It seems crazy to me to contemplate the alternative. The effective switching costs will rapidly decline, and the rewards for switching rise, until the flippening.

It’s a funny question to ask, if we did somehow get ‘stuck’ with Linux even then, whether this would be providing value, rather than destroying value, since this would likely represent a failure to solve a coordination problem to get around lock-in costs.

Daron Acemoglu being very clear he does not believe in AGI, and that he has rather poorly considered and highly confused justifications for this view. He is indeed treating AI as if it will never improve even over decades, it is instead only a fixed tool that humans will learn to become better at applying to our problems. Somehow it is impossible to snap most economists out of this perspective.

Allison Nathan: Over the longer term, what odds do you place on AI technology achieving superintelligence?

Daron Acemoglu: I question whether AI technology can achieve superintelligence over even longer horizons because, as I said, it is very difficult to imagine that an LLM will have the same cognitive capabilities as humans to pose questions, develop solutions, then test those solutions and adopt them to new circumstances. I am entirely open to the possibility that AI tools could revolutionize scientific processes on, say, a 20-30- year horizon, but with humans still in the driver’s seat.

So, for example, humans may be able to identify a problem that AI could help solve, then humans could test the solutions the Al models provide and make iterative changes as circumstances shift. A truly superintelligent AI model would be able to achieve all of that without human involvement, and I don’t find that likely on even a thirty-year horizon, and probably beyond.

A lot of the time it goes even farther. Indeed, Daron’s actual papers about the expected impact of AI are exactly this:

Matthew Yglesias: I frequently hear people express skepticism that AI will be able to do high-school quality essays and such within the span of our kids’ education when they *alreadyvery clearly do this.

Your periodic reminder, from Yanco:

I mean, why wouldn’t they like me? I’m a highly likeable guy. Just not that way.

AI #87: Staying in Character Read More »

The Modern CIO

At the recent Gartner Symposium, there was no shortage of data and insights on the evolving role of the CIO. While the information presented was valuable, I couldn’t help but feel that something was missing—the real conversation about how CIOs can step into their role as true agents of transformation. We’ve moved beyond the days of simply managing technology; today, CIOs must be enablers of business growth and innovation.

Gartner touched on some of these points, but I believe they didn’t go far enough in addressing the critical questions CIOs should be asking themselves. The modern CIO is no longer just a technology steward—they are central to driving business strategy, enabling digital transformation, and embedding technology across the enterprise in meaningful ways.

Below is my actionable guide for CIOs—a blueprint for becoming the force for innovation your organization needs. If you’re ready to make bold moves, these are the steps you need to take.

1. Forge Strong, Tailored Relationships with Each CxO

Instead of approaching each CxO with the standard “tech equals efficiency” pitch, CIOs should actively engage with them to uncover deeper business drivers.

  • CFO: Go beyond cost management. Understand the financial risks the company faces, such as cash flow volatility or margin pressures, and find ways technology can mitigate these risks.
  • COO: Focus not just on operational efficiency but on process innovation—how can technology fundamentally change how work gets done, not just make it faster?
  • CMO: Delve into the customer journey and experience. Understand how technology can be a key differentiator in enhancing customer intimacy or scaling personalization efforts.
  • CHRO: Understand their challenges in talent acquisition and employee engagement. How can technology make the workplace more attractive, productive, and aligned with HR strategies to develop talent?
  • Product/BU Leaders: Work closely to drive product innovation, not just from a technical perspective but to discover how technology can create competitive advantages or new revenue streams.

Ask Yourself: Do I truly understand what drives each of my CxOs at a strategic level, or am I stuck thinking in tech terms? If I don’t have the insight I need, what steps can I take to get there—and am I leveraging external expertise where needed to fill the gaps?

2. Prioritize Based on Shared Commitment and Strategic Value

Not all CxOs will be equally engaged or ready to partner closely with the CIO, but this should influence prioritization. CIOs should assess:

  1. CxO Commitment: Is the CxO fully bought into digital transformation and willing to invest time and resources? If they aren’t, start with those who are.
  2. Technology Team Enthusiasm: Does the ask from the CxO spark excitement within the technology team? If the IT team can see the challenge as an inspiring and innovative project, prioritize it.
  3. Potential for Broader Impact: Will this initiative create a success story that can inspire other parts of the business? Choose projects that not only solve immediate problems but also demonstrate value to other BUs.
  4. Business Impact: Does this move the needle enough? Focus on projects that are impactful enough to gain visibility and drive momentum across the organization.

Ask Yourself: Am I working with the most committed and strategic partners, or am I spreading myself thin trying to please everyone? How can I ensure my efforts focus on high-impact initiatives that inspire others? If I’m not sure which projects have this potential, who can I turn to for a fresh perspective?

3. Develop a Communication Strategy to Be the Executive Team’s Trusted Advisor

The CIO needs to craft a communication strategy to regularly update the C-suite on what’s happening in technology, why it matters, and—most importantly—how it applies to their specific business challenges. This is not about sending generic updates or forwarding research articles.

  •  Provide insights on emerging trends like AI, automation, or cybersecurity, and explain how they can solve real problems or create real opportunities for their business.
  • Create a visionary narrative that places your company at the forefront of industry evolution, emphasizing how specific technologies will help each CxO achieve their goals.

Ask Yourself: Do I have a proactive communication strategy that positions me as the go-to advisor for technology insights within the C-suite? Am I demonstrating how technology directly impacts their business outcomes? If I’m struggling to create this narrative, who can help me fine-tune it?

4. Champion Digital Experience (DX) and Build KPIs Around Adoption and Value

While the CIO doesn’t need to own the day-to-day design conversations, they must champion the importance of digital experience (DX) and ensure that it’s a KPI across the company. Build a culture where every digital initiative is measured not just by completion, but by how well it’s adopted and how it sustains value over time.

  • Ensure KPIs include sustained usage, not just launch metrics.
  • Build Management by Objectives (MBOs) that tie DX and adoption rates into performance metrics for teams using the tools, ensuring continuous focus on the user experience.

Ask Yourself: Am I setting the right metrics to measure the long-term success of digital initiatives, or am I just tracking short-term implementation? How can I establish sustained adoption as a core business KPI? And if I don’t have a strong framework in place, who can help me build it?

5. Cultivate Multidisciplinary Fusion Teams with Curious, Collaborative Members

Create multidisciplinary fusion teams where business and IT collaborate on solving real business problems. Initially, look for those who are naturally curious and collaborative—people who are eager to break down silos and innovate. As you scale, formalize selection processes but ensure that it doesn’t become a bureaucratic process. Encourage progress-driven contributions, where results are measured and where teams feel empowered to iterate, rather than meet to discuss roadblocks endlessly.

Ask Yourself: Am I identifying the right people to drive multidisciplinary collaboration, or am I waiting for teams to form on their own? Are my teams making progress, or are they stuck in meetings that don’t lead to results? Who can I consult to get these teams moving in the right direction?

6. Be the Early Advocate for Emerging Technologies

Emerging technologies like AI, automation, and low-code/no-code platforms are already enterprise-ready but often fail due to a lack of understanding of how to drive real business value. CIOs must be early advocates for these technologies, preparing the organization to adopt them when they’re at the right point on the maturity curve. This prevents shadow IT from adopting technologies outside the CIO’s purview and ensures that IT is seen as an enabler, not an obstacle.

Ask Yourself: Am I advocating for emerging tech early enough, or am I waiting too long to act? How can I ensure the organization is ready when the technology hits the right maturity curve? If I’m unsure where to start, who can help me assess our readiness?

7. Foster a Culture of Cross-Functional Digital Leadership

Create an organic ecosystem where IT leaders move into business roles and business leaders spend time in IT. This exchange creates a more integrated understanding of how technology drives value across the business. Work with HR to launch a pilot exchange program with a willing BU, and ensure that this doesn’t become another bureaucratic initiative. Instead, keep it agile, fast, and focused on creating leaders who are equally strong in tech and business.

Ask Yourself: Am I fostering an agile and collaborative environment where digital leadership can flourish across functions? Or are we too siloed in our thinking? If I need guidance on how to get this started, who should I bring in to help make it happen?

8. Align Technology Outcomes with Clear Business Goals

Every tech project must have clear business goals and measurable metrics that matter to the business. Don’t aim for perfection—aim for progress. Track and report metrics regularly to keep the project’s business value visible to stakeholders.

Ask Yourself: Are all my technology projects aligned with clear business goals, and do I have the right metrics in place to measure their impact? If I don’t have a process for this, what support do I need to create one that works?

9. Track Adoption and Engagement Metrics Beyond the Initial Rollout

Adoption isn’t just about getting users on board for launch—it’s about measuring ongoing engagement. CIOs should track:

  • Satisfaction rates: How do users feel about the tool or platform over time?
  • Improvement metrics: Are there measurable improvements in efficiency, productivity, or revenue tied to the tech?
  • Feature requests: How often do users ask for new features or enhancements?
  • Number of users/BU’s using the platform: Track growth or stagnation in usage across teams.
  • New projects spawned from existing tech: What new initiatives are being created because of successful platform use?

Ask Yourself: Am I tracking the right metrics to measure long-term success and adoption, or am I too focused on the initial rollout? If I’m unsure of how to keep engagement high, who can I turn to for expert advice on optimizing these KPIs?

Transformation doesn’t happen by chance, and it won’t happen if CIOs stay in the background, waiting for others to drive change. It requires intentional, strategic action, a commitment to aligning technology with business outcomes, and a willingness to ask the tough questions. The steps I’ve outlined are designed to challenge your thinking, help you prioritize where to focus your efforts, and ensure you’re seen as a leader, not just a technologist.

If you’re unsure how to move forward or need guidance in turning these insights into action, remember that you don’t have to go it alone. My team and I have worked with CIOs across industries to turn complex challenges into strategic advantages, and we’re here to help. Becoming an agent of transformation starts with taking that first step—and we’re ready to walk with you through the journey.

The Modern CIO Read More »

tesla-makes-$2.2-billion-in-profit-during-q3-2024

Tesla makes $2.2 billion in profit during Q3 2024

All of that helped total revenue rise by 8 percent year over year to $25.2 billion. Gross profit jumped by 20 percent to $5 billion, and once generally accepted accounting principles are applied, its net profit grew 17 percent compared to Q3 2023, at $2.2 billion. What’s more, the company is sitting on a healthy treasure chest. Free cash flow increased 223 percent compared to Q3 2023 to reach $2.7 billion, and cash, cash equivalents, and investments grew 29 percent to $33.6 billion over the same time period.

What comes next?

The days of Tesla promising exponential growth in its car sales appear to be at an end, or at least on hiatus until it can deliver a new vehicle platform. The company says that it believes that advances in autonomy will contribute to renewed growth in the future, but these dreams may come crashing down if federal regulators order a costly hardware recall for Tesla’s vision-only system.

An increasingly stale product lineup is slated to grow in the first half of next year, it says. These vehicles will be based on modified versions of Tesla’s existing vehicles built on existing assembly lines, albeit with some features from its “next-generation platform.” Tesla says it has plenty of spare capacity at its factories in California, Texas, Germany, and China, with room to grow “before investing in new production lines.” Meanwhile, the two-seat CyberCab—which Tesla CEO Elon Musk says is due “before 2027“—will use what Tesla calls a “revolutionary “unboxed” manufacturing strategy.

Tesla makes $2.2 billion in profit during Q3 2024 Read More »

anthropic-publicly-releases-ai-tool-that-can-take-over-the-user’s-mouse-cursor

Anthropic publicly releases AI tool that can take over the user’s mouse cursor

An arms race and a wrecking ball

Competing companies like OpenAI have been working on equivalent tools but have not made them publicly available yet. It’s something of an arms race, as these tools are projected to generate a lot of revenue in a few years if they progress as expected.

There’s a belief that these tools could eventually automate many menial tasks in office jobs. It could also be a useful tool for developers in that it could “automate repetitive tasks” and streamline laborious QA and optimization work.

That has long been part of Anthropic’s message to investors: Its AI tools could handle large portions of some office jobs more efficiently and affordably than humans can. The public testing of the Computer Use feature is a step toward achieving that goal.

We’re, of course, familiar with the ongoing argument about these types of tools between the “it’s just a tool that will make people’s jobs easier” and the “it will put people out of work across industries like a wrecking ball”—both of these things could happen to some degree. It’s just a question of what the ratio will be—and that may vary by situation or industry.

There are numerous valid concerns about the widespread deployment of this technology, though. To its credit, Anthropic has tried to anticipate some of these by putting safeguards in from the get-go. The company gave some examples in its blog post:

Our teams have developed classifiers and other methods to flag and mitigate these kinds of abuses. Given the upcoming US elections, we’re on high alert for attempted misuses that could be perceived as undermining public trust in electoral processes. While computer use is not sufficiently advanced or capable of operating at a scale that would present heightened risks relative to existing capabilities, we’ve put in place measures to monitor when Claude is asked to engage in election-related activity, as well as systems for nudging Claude away from activities like generating and posting content on social media, registering web domains, or interacting with government websites.

These safeguards may not be perfect, as there may be creative ways to circumvent them or other unintended consequences or misuses yet to be discovered.

Right now, Anthropic is putting Computer Use out there for testing to see what problems arise and to work with developers to improve its capabilities and find positive uses.

Anthropic publicly releases AI tool that can take over the user’s mouse cursor Read More »

tesla,-warner-bros.-sued-for-using-ai-ripoff-of-iconic-blade-runner-imagery

Tesla, Warner Bros. sued for using AI ripoff of iconic Blade Runner imagery


A copy of a copy of a copy

“That movie sucks,” Elon Musk said in response to the lawsuit.

Credit: via Alcon Entertainment

Elon Musk may have personally used AI to rip off a Blade Runner 2049 image for a Tesla cybercab event after producers rejected any association between their iconic sci-fi movie and Musk or any of his companies.

In a lawsuit filed Tuesday, lawyers for Alcon Entertainment—exclusive rightsholder of the 2017 Blade Runner 2049 movie—accused Warner Bros. Discovery (WBD) of conspiring with Musk and Tesla to steal the image and infringe Alcon’s copyright to benefit financially off the brand association.

According to the complaint, WBD did not approach Alcon for permission until six hours before the Tesla event when Alcon “refused all permissions and adamantly objected” to linking their movie with Musk’s cybercab.

At that point, WBD “disingenuously” downplayed the license being sought, the lawsuit said, claiming they were seeking “clip licensing” that the studio should have known would not provide rights to livestream the Tesla event globally on X (formerly Twitter).

Musk’s behavior cited

Alcon said it would never allow Tesla to exploit its Blade Runner film, so “although the information given was sparse, Alcon learned enough information for Alcon’s co-CEOs to consider the proposal and firmly reject it, which they did.” Specifically, Alcon denied any affiliation—express or implied—between Tesla’s cybercab and Blade Runner 2049.

“Musk has become an increasingly vocal, overtly political, highly polarizing figure globally, and especially in Hollywood,” Alcon’s complaint said. If Hollywood perceived an affiliation with Musk and Tesla, the complaint said, the company risked alienating not just other car brands currently weighing partnerships on the Blade Runner 2099 TV series Alcon has in the works, but also potentially losing access to top Hollywood talent for their films.

The “Hollywood talent pool market generally is less likely to deal with Alcon, or parts of the market may be, if they believe or are confused as to whether, Alcon has an affiliation with Tesla or Musk,” the complaint said.

Musk, the lawsuit said, is “problematic,” and “any prudent brand considering any Tesla partnership has to take Musk’s massively amplified, highly politicized, capricious and arbitrary behavior, which sometimes veers into hate speech, into account.”

In bad faith

Because Alcon had no chance to avoid the affiliation while millions viewed the cybercab livestream on X, Alcon saw Tesla using the images over Alcon’s objections as “clearly” a “bad faith and malicious gambit… to link Tesla’s cybercab to strong Hollywood brands at a time when Tesla and Musk are on the outs with Hollywood,” the complaint said.

Alcon believes that WBD’s agreement was likely worth six or seven figures and likely stipulated that Tesla “affiliate the cybercab with one or more motion pictures from” WBD’s catalog.

While any of the Mad Max movies may have fit the bill, Musk wanted to use Blade Runner 2049, the lawsuit alleged, because that movie features an “artificially intelligent autonomously capable” flying car (known as a spinner) and is “extremely relevant” to “precisely the areas of artificial intelligence, self-driving capability, and autonomous automotive capability that Tesla and Musk are trying to market” with the cybercab.

The Blade Runner 2049 spinner is “one of the most famous vehicles in motion picture history,” the complaint alleged, recently exhibited alongside other iconic sci-fi cars like the Back to the Future time-traveling DeLorean or the light cycle from Tron: Legacy.

As Alcon sees it, Musk seized the misappropriation of the Blade Runner image to help him sell Teslas, and WBD allegedly directed Musk to use AI to skirt Alcon’s copyright to avoid a costly potential breach of contract on the day of the event.

For Alcon, brand partnerships are a lucrative business, with carmakers paying as much as $10 million to associate their vehicles with Blade Runner 2049. By seemingly using AI to generate a stylized copy of the image at the heart of the movie—which references the scene where their movie’s hero, K, meets the original 1982 Blade Runner hero, Rick Deckard—Tesla avoided paying Alcon’s typical fee, their complaint said.

Musk maybe faked the image himself, lawsuit says

During the live event, Musk introduced the cybercab on a WBD Hollywood studio lot. For about 11 seconds, the Tesla founder “awkwardly” displayed a fake, allegedly AI-generated Blade Runner 2049 film still. He used the image to make a point that apocalyptic films show a future that’s “dark and dismal,” whereas Tesla’s vision of the future is much brighter.

In Musk’s slideshow image, believed to be AI-generated, a male figure is “seen from behind, with close-cropped hair, wearing a trench coat or duster, standing in almost full silhouette as he surveys the abandoned ruins of a city, all bathed in misty orange light,” the lawsuit said. The similarity to the key image used in Blade Runner 2049 marketing is not “coincidental,” the complaint said.

If there were any doubts that this image was supposed to reference the Blade Runner movie, the lawsuit said, Musk “erased them” by directly referencing the movie in his comments.

“You know, I love Blade Runner, but I don’t know if we want that future,” Musk said at the event. “I believe we want that duster he’s wearing, but not the, uh, not the bleak apocalypse.”

The producers think the image was likely generated—”even possibly by Musk himself”—by “asking an AI image generation engine to make ‘an image from the K surveying ruined Las Vegas sequence of Blade Runner 2049,’ or some closely equivalent input direction,” the lawsuit said.

Alcon is not sure exactly what went down after the company rejected rights to use the film’s imagery at the event and is hoping to learn more through the litigation’s discovery phase.

Musk may try to argue that his comments at the Tesla event were “only meant to talk broadly about the general idea of science fiction films and undesirable apocalyptic futures and juxtaposing them with Musk’s ostensibly happier robot car future vision.”

But producers argued that defense is “not credible” since Tesla explicitly asked to use the Blade Runner 2049 image, and there are “better” films in WBD’s library to promote Musk’s message, like the Mad Max movies.

“But those movies don’t have massive consumer goodwill specifically around really cool-looking (Academy Award-winning) artificially intelligent, autonomous cars,” the complaint said, accusing Musk of stealing the image when it wasn’t given to him.

If Tesla and WBD are found to have violated copyright and false representation laws, that potentially puts both companies on the hook for damages that cover not just copyright fines but also Alcon’s lost profits and reputation damage after the alleged “massive economic theft.”

Musk responds to Blade Runner suit

Alcon suspects that Musk believed that Blade Runner 2049 was eligible to be used at the event under the WBD agreement, not knowing that WBD never had “any non-domestic rights or permissions for the Picture.”

Once Musk requested to use the Blade Runner imagery, Alcon alleged that WBD scrambled to secure rights by obscuring the very lucrative “larger brand affiliation proposal” by positioning their ask as a request for much less expensive “clip licensing.”

After Alcon rejected the proposal outright, WBD told Tesla that the affiliation in the event could not occur because X planned to livestream the event globally. But even though Tesla and X allegedly knew that the affiliation was rejected, Musk appears to have charged ahead with the event as planned.

“It all exuded an odor of thinly contrived excuse to link Tesla’s cybercab to strong Hollywood brands,” Alcon’s complaint said. “Which of course is exactly what it was.”

Alcon is hoping a jury will find Tesla, Musk, and WBD violated laws. Producers have asked for an injunction stopping Tesla from using any Blade Runner imagery in its promotional or advertising campaigns. They also want a disclaimer slapped on the livestreamed event video on X, noting that the Blade Runner association is “false or misleading.”

For Musk, a ban on linking Blade Runner to his car company may feel bleak. Last year, he touted the Cybertruck as an “armored personnel carrier from the future—what Bladerunner would have driven.”  This amused many Blade Runner fans, as Gizmodo noted, because there never was a character named “Bladerunner,” but rather that was just a job title for the film’s hero Deckard.

In response to the lawsuit, Musk took to X to post what Blade Runner fans—who rated the 2017 movie as 88 percent fresh on Rotten Tomatoes—might consider a polarizing take, replying, “That movie sucks” on a post calling out Alcon’s lawsuit as “absurd.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Tesla, Warner Bros. sued for using AI ripoff of iconic Blade Runner imagery Read More »

t-mobile,-at&t-oppose-unlocking-rule,-claim-locked-phones-are-good-for-users

T-Mobile, AT&T oppose unlocking rule, claim locked phones are good for users


Carriers fight plan to require unlocking of phones 60 days after activation.

A smartphone wrapped in a metal chain and padlock

T-Mobile and AT&T say US regulators should drop a plan to require unlocking of phones within 60 days of activation, claiming that locking phones to a carrier’s network makes it possible to provide cheaper handsets to consumers. “If the Commission mandates a uniform unlocking policy, it is consumers—not providers—who stand to lose the most,” T-Mobile alleged in an October 17 filing with the Federal Communications Commission.

The proposed rule has support from consumer advocacy groups who say it will give users more choice and lower their costs. T-Mobile has been criticized for locking phones for up to a year, which makes it impossible to use a phone on a rival’s network. T-Mobile claims that with a 60-day unlocking rule, “consumers risk losing access to the benefits of free or heavily subsidized handsets because the proposal would force providers to reduce the line-up of their most compelling handset offers.”

If the proposed rule is enacted, “T-Mobile estimates that its prepaid customers, for example, would see subsidies reduced by 40 percent to 70 percent for both its lower and higher-end devices, such as the Moto G, Samsung A15, and iPhone 12,” the carrier said. “A handset unlocking mandate would also leave providers little choice but to limit their handset offers to lower cost and often lesser performing handsets.”

T-Mobile and other carriers are responding to a call for public comments that began after the FCC approved a Notice of Proposed Rulemaking (NPRM) in a 5–0 vote. The FCC is proposing “to require all mobile wireless service providers to unlock handsets 60 days after a consumer’s handset is activated with the provider, unless within the 60-day period the service provider determines the handset was purchased through fraud.”

When the FCC proposed the 60-day unlocking rule in July 2024, the agency criticized T-Mobile for locking prepaid phones for a year. The NPRM pointed out that “T-Mobile recently increased its locking period for one of its brands, Metro by T-Mobile, from 180 days to 365 days.”

T-Mobile’s policy says the carrier will only unlock mobile devices on prepaid plans if “at least 365 days… have passed since the device was activated on the T-Mobile network.”

“You bought your phone, you should be able to take it to any provider you want,” FCC Chairwoman Jessica Rosenworcel said when the FCC proposed the rule. “Some providers already operate this way. Others do not. In fact, some have recently increased the time their customers must wait until they can unlock their device by as much as 100 percent.”

T-Mobile locking policy more onerous

T-Mobile executives, who also argue that the FCC lacks authority to impose the proposed rule, met with FCC officials last week to express their concerns.

“T-Mobile is passionate about winning customers for life, and explained how its handset unlocking policies greatly benefit our customers,” the carrier said in its post-meeting filing. “Our policies allow us to deliver access to high-speed mobile broadband on a nationwide 5G network via handsets that are free or heavily discounted off the manufacturer’s suggested retail price. T-Mobile’s unlocking policies are transparent, and there is absolutely no evidence of consumer harm stemming from these policies. T-Mobile’s current unlocking policies also help T-Mobile combat handset theft and fraud by sophisticated, international criminal organizations.”

For postpaid users, T-Mobile says it allows unlocking of fully paid-off phones that have been active for at least 40 days. But given the 365-day lock on prepaid users, T-Mobile’s overall policy is more onerous than those of other carriers. T-Mobile has also faced angry customers because of a recent decision to raise prices on plans that were advertised as having a lifetime price lock.

AT&T enables unlocking of paid-off phones after 60 days for postpaid users and after six months for prepaid users. AT&T lodged similar complaints as T-Mobile, saying in an October 7 filing that the FCC’s proposed rules would “mak[e] handsets less affordable for consumers, especially those in low-income households,” and “exacerbate handset arbitrage, fraud, and trafficking. “

AT&T told the FCC that “requiring providers to unlock handsets before they are paid-off would ultimately harm consumers by creating upward pressure on handset prices and disincentives to finance handsets on flexible terms.” If the FCC implements any rules, it should maintain “existing contractual arrangements between customers and providers, ensure that providers have at least 180 days to detect fraud before unlocking a device, and include at least a 24-month period for providers to implement any new rules,” AT&T said.

Verizon, which already faces unlocking rules because of requirements imposed on spectrum licenses it owns, automatically unlocks phones after 60 days for prepaid and postpaid users. Among the three major carriers, Verizon is the most amenable to the FCC’s new rules.

Consumer groups: Make Verizon rules industry-wide

An October 18 filing supporting a strict unlocking rule was submitted by numerous consumer advocacy groups including Public Knowledge, New America’s Open Technology Institute, Consumer Reports, the National Consumers League, the National Consumer Law Center, and the National Digital Inclusion Alliance.

“Wireless users are subject to unnecessary restrictions in the form of locked devices, which tie them to their service providers even when better options may be available. Handset locking practices limit consumer freedom and lessen competition by creating an artificial technological barrier to switching providers,” the groups said.

The groups cited the Verizon rules as a model and urged the FCC to require “that device unlocking is truly automatic—that is, unlocked after the requisite time period without any additional actions of the consumer.” Carriers should not be allowed to lock phones for longer than 60 days even when a phone is on a financing plan with outstanding payments, the groups’ letter said:

Providers should be required to transition out of selling devices without this [automatic unlocking] capability and the industry-wide rule should be the same as the one protecting Verizon customers today: after the expiration of the initial period, the handset must automatically unlock regardless of whether: (1) the customer asks for the handset to be unlocked or (2) the handset is fully paid off. Removing this barrier to switching will make the standard simple for consumers and encourage providers to compete more vigorously on mobile service price, quality, and innovation.

In an October 2 filing, Verizon said it supports “a uniform approach to handset unlocking that allows all wireless providers to lock wireless handsets for a reasonable period of time to limit fraud and to enable device subsidies, followed by automatic unlocking absent evidence of fraud.”

Verizon said 60 days should be the minimum for postpaid devices so that carriers have time to detect fraud and theft, and that “a longer, 180-day locking period for prepaid is necessary to enable wireless providers to continue offering subsidies that make phones affordable for prepaid customers.” Regardless of what time frame the FCC chooses, Verizon said “a uniform unlocking policy that applies to all providers… will benefit both consumers and competition.”

FCC considers impact on phone subsidies

While the FCC is likely to impose an unlocking rule, one question is whether it will apply when a carrier has provided a discounted phone. The FCC’s NPRM asked the public for “comment on the impact of a 60-day unlocking requirement in connection with service providers’ incentives to offer discounted handsets for postpaid and prepaid service plans.”

The FCC acknowledged Verizon’s argument “that providers may rely on handset locking to sustain their ability to offer handset subsidies and that such subsidies may be particularly important in prepaid environments.” But the FCC noted that public interest groups “argue that locked handsets tied to prepaid plans can disadvantage low-income customers most of all since they may not have the resources to switch service providers or purchase new handsets.”

The public interest groups also note that unlocked handsets “facilitate a robust secondary market for used devices, providing consumers with more affordable options,” the NPRM said.

The FCC says it can impose phone-unlocking rules using its legal authority under Title III of the Communications Act “to protect the public interest through spectrum licensing and regulations to require mobile wireless service providers to provide handset unlocking.” The FCC said it previously relied on the same Title III authority when it imposed the unlocking rules on 700 MHz C Block spectrum licenses purchased by Verizon.

T-Mobile told the FCC in a filing last month that “none of the litany of Title III provisions cited in the NPRM support the expansive authority asserted here to regulate consumer handsets (rather than telecommunications services).” T-Mobile also said that “the Commission’s legal vulnerabilities on this score are only magnified in light of recent Supreme Court precedent.”

The Supreme Court recently overturned the 40-year-old Chevron precedent that gave agencies like the FCC judicial deference when interpreting ambiguous laws. The end of Chevron makes it harder for agencies to issue regulations without explicit authorization from Congress. This is a potential problem for the FCC in its fight to revive net neutrality rules, which are currently blocked by a court order pending the outcome of litigation.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

T-Mobile, AT&T oppose unlocking rule, claim locked phones are good for users Read More »

it’s-the-enterprise-vs.-the-gorn-in-strange-new-worlds-clip

It’s the Enterprise vs. the Gorn in Strange New Worlds clip

The S2 finale found the Enterprise under vicious attack by the Gorn, who were in the midst of invading one of the Federation’s colony worlds. The new footage shown at NYCC picked up where the finale left off, giving us the kind of harrowing high-stakes pitched space battle against a ferocious enemy that has long been a hallmark of the franchise. With the ship’s shields down to 50 percent, Captain Pike (Anson Mount) and his team brainstorm possible counter-strategies to ward off the Gorn and find a way to rendezvous with the rest of Starfleet. They decide to try to jam the Gorns’ communications so they can’t coordinate their attacks, which involves modulating the electromagnetic spectrum since the Gorn use light for ship-to-ship communications.

They also need to figure out how to beam crew members trapped on a Gorn ship back onto the Enterprise—except the Gorn ships are transporter-resistant. The best of all the bad options is a retreat and rescue, tracking the Gorn ship across light-years of space using “wolkite, a rare element that contains subspace gauge bosons,” per Spock (Ethan Peck). Finally, the crew decides to just ram the Gorn Destroyer, and the footage ends with a head-to-head collision, firing torpedoes, and the Enterprise on the brink of warping itself out of there, no doubt in the nick of time.

Oh, and apparently Rhys Darby (Our Flag Means Death) will guest star in an as-yet-undisclosed role, which should be fun. Strange New Worlds S3 will premiere sometime in 2025, and the series has already been renewed for a fourth season.

Lower Decks

The final season of Star Trek: Lower Decks premieres this week.

Ars staffers are big fans of Lower Decks, so we were saddened when we learned that the animated series would be ending with its fifth season. Paramount gave us a teaser in July during San Diego Comic-Con, in which we learned that their plucky crew’s S5 mission involves a “quantum fissure” that is causing “space potholes” to pop up all over the Alpha Quadrant (“boo interdimensional portals!”), and the Cerritos crew must close them—while navigating angry Klingons and an Orion war.

The new clip opens with Mariner walking in and asking “What’s the mish?” only to discover it’s another quantum fissure. When the fissure loses integrity, the Cerritos gets caught in the gravitational wake, and when it emerges, seemingly unscathed, the ship is hailed—by the Cerritos from an alternate dimension, captained by none other than Mariner, going by Captain Becky Freeman. (“Stupid dimensional rifts!”) It’s safe to assume that wacky hijinks ensue.

The final season of Lower Decks premieres on Paramount+ on October 24, 2024, and will run through December 19.

poster art for Section 31 featuring Michelle Yeoh in striking purple outfit against yellow background

Credit: Paramount+

Credit: Paramount+

It’s the Enterprise vs. the Gorn in Strange New Worlds clip Read More »