Author name: Mike M.

rocket-report:-ula-investigating-srb-anomaly;-europa-clipper-is-ready-to-fly

Rocket Report: ULA investigating SRB anomaly; Europa Clipper is ready to fly


US Space Force payloads will ride on the first flight of Impulse Space’s cryogenic space tug.

Impulse Space is assembling its first methane-fueled Deneb engine, a 15,000-pound-thrust power plant that will propel the company’s Helios space tug. Credit: Impulse Space

Welcome to Edition 7.15 of the Rocket Report! It’s a big week for big rockets, with SpaceX potentially launching its next Starship test flight and a Falcon Heavy rocket with NASA’s Europa Clipper mission this weekend. And a week ago, United Launch Alliance flew its second Vulcan rocket, which lost one of its booster nozzles in midair and amazingly kept going to achieve a successful mission. Are you not entertained?

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

PLD Space is aiming high. Spanish launch provider PLD Space has revealed a family of new rockets that it plans to introduce beyond its Miura 5 rocket, which is expected to make its inaugural flight in 2025, European Spaceflight reports. The company also revealed that it was working on a crew capsule called Lince (Spanish for Lynx). PLD Space introduced its Miura Next, Miura Next Heavy, and Miura Next Super Heavy launch vehicles, designed in single stick, triple core, and quintuple core configurations with reusable boosters. At the high end of the rocket family’s performance, the Miura Next Super Heavy could deliver up to 53 metric tons (nearly 117,000 pounds) of payload to low-Earth orbit. The Lince capsule could become Europe’s first human-rated crew transportation spacecraft.

Still a year away from reaching space … These are lofty ambitions for a company that has yet to launch anything to space, but it’s good to think big. PLD Space launched a high-altitude test flight of its Miura 1 rocket last year, but it didn’t cross the boundary of space. The first launch campaign for Miura 5, PLD Space’s orbital-class rocket sized for small satellites, is on course to begin by the end of 2025, the company said. The Miura Next family would begin flying by 2030, followed by the heavier rockets a few years later. In April, PLD Space said it had raised 120 million euros ($131 million) from private investors and the Spanish government. This is probably enough to get Miura 5 to the launch pad, but PLD Space will need a lot of technical and financing successes to bring its follow-on vehicles online. (submitted by Ken the Bin and EllPeaTea)

Impulse Space wins Space Force contract. Fresh on the heels of a massive funding round, Impulse Space has landed a $34.5 million contract from the Space Force for two ultra-mobile spacecraft missions, TechCrunch reports. Under the Space Force’s Tactically Responsive Space (TacRS) program, the two missions will demonstrate how highly maneuverable spacecraft can help the military rapidly respond to threats in space. Both missions will use Impulse’s Mira orbital transfer vehicle, which can host experiments and payloads while moving into different orbits around the Earth.

Looking for an advantage … Mira completed its first test flight earlier this year. The payloads on the two Space Force demonstration flights will perform space domain awareness missions, the military service said in a statement. The first mission, called Victus Surgo, will combine the Mira transfer vehicle with Impulse’s higher-power Helios cryogenic methane-fueled kick stage on its first use in orbit. Helios will boost Mira into a high-altitude geostationary transfer orbit after launching on a Falcon 9 rocket. The second mission, called Victus Salo, will send a second Mira spacecraft into low-Earth orbit on a SpaceX rideshare mission. Impulse was founded by rocket scientist Tom Mueller, who was a founding employee at SpaceX before leaving in 2020. (submitted by Ken the Bin)

The easiest way to keep up with Eric Berger’s space reporting is to sign up for his newsletter, we’ll collect his stories in your inbox.

The launch campaign begins for Vega C’s return to flight. Days after a crucial test-firing of its redesigned second stage motor, a European Vega C rocket is now being stacked on its launch pad in French Guiana for a return to flight mission scheduled for December 3. Photos released by the French space agency, CNES, show the Vega C’s solid-fueled first stage moving into position on the launch pad. The Vega C launcher is an upgraded version of the Vega rocket that completed its career with a successful launch in September.

A lot of t(h)rust … Vega C made a successful debut in July 2022, then failed on its second flight five months later, destroying a pair of high-value commercial Earth-imaging satellites owned by Airbus. Engineers traced the failure to the second stage motor’s nozzle, prompting a redesign that grounded the Vega C rocket for two years. There is a queue of European space missions waiting for launch on Vega C, and first to go will be the Sentinel 1C radar imaging satellite for the European Commission’s flagship Copernicus program.

Australian launch company rehearses countdown. Gilmour Space still thinks it has a chance to conduct the maiden launch of its Eris rocket sometime this year, despite no launch license from the Australian Space Agency (ASA), having to go hunting for more money, and a wet dress rehearsal throwing up issues that will take several weeks to fix, Space & Defense Tech and Security News reports. Gilmour’s Eris rocket, capable of hauling cargoes up to 672 pounds (305 kilograms) to orbit, would become the first homegrown Australian-built orbital-class rocket.

WDR … The Australian Space Agency has worked on Gilmour’s launch license for around two years, but has yet to give the company the green light to fly the Eris rocket, despite approving licenses for two companies operating privately owned launch ranges elsewhere in Australia. At the beginning of the year, Gilmour targeted a first launch of the Eris rocket in March, but there were delays in getting the vehicle to the launch pad. The rocket went vertical for the first time in April to begin a series of ground tests, culminating in the launch rehearsal at the end of September, in which the company loaded propellants into the rocket and ran the countdown to T-10 seconds. The test uncovered valve and software issues Gilmour must fix before it can fly Eris. (submitted by mryall)

Falcon 9 launches European asteroid mission. The European Space Agency’s Hera mission lifted off Monday aboard a SpaceX Falcon 9 rocket, heading into the Solar System to investigate an asteroid smashed by NASA two years ago, Ars reports. It will take two years for Hera to travel to asteroids Didymos and Dimorphos, a binary pair, and survey the aftermath of the impact by NASA’s DART spacecraft on Dimorphos in September 2022. DART was NASA’s first planetary defense experiment, demonstrating how a kinetic impactor could knock an asteroid off course if it was on a path to hit Earth. Fortunately, these two asteroids are harmless, but DART proved a spacecraft could deflect an asteroid, if necessary. Coming in at high speed, DART got only a fleeting glimpse of Didymos and Dimorphos, so Hera will take more precise measurements of the asteroids’ interior structure, mass, and orbit to determine exactly how effective DART was.

Falcon soars again … The liftoff Monday from Cape Canaveral Space Force Station was the first flight of a Falcon 9 in nine days, since an upper stage anomaly steered the rocket off its intended reentry corridor after an otherwise successful launch. The Federal Aviation Administration grounded the Falcon 9 while SpaceX investigated the problem, but the regulator approved the launch of Hera because the Falcon 9’s upper stage won’t come back to Earth. Instead, it departed into deep space along with the Hera asteroid probe. As of Thursday, all other commercial Falcon 9 missions remain grounded. (submitted by Ken the Bin)

Emiratis go with Japan. A UAE mission to travel to the asteroid belt reached a milestone on Wednesday, when an agreement was signed to provide services for the 2028 launch of the Mohammed Bin Rashid Explorer spacecraft, The National reports. Emirati officials selected the Japanese H3 rocket from Mitsubishi Heavy Industries (MHI) to launch the asteroid explorer. The UAE is a repeat customer for MHI, which also launched the Emirati Hope spacecraft toward Mars in 2020. The mission will see the Mohammed Bin Rashid Explorer perform close flybys of six asteroids to gather data before landing on a seventh asteroid, Justitia.

H3 racking up wins … Japan’s new H3 rocket is taking a slice of the international commercial launch market after achieving back-to-back successful flights this year. The H3, which replaces Japan’s workhorse H-IIA rocket, is primarily intended to ensure Japanese autonomous access to space for national security missions, scientific probes, and resupply flights to the International Space Station. But, somewhat surprisingly, the H3 now has several customers outside of Japan, including the UAE, Eutelsat, and Inmarsat. Perhaps some satellite operators, eager for someone to compete with SpaceX in the launch business, are turning to the H3 as an alternative to United Launch Alliance’s Vulcan, Europe’s Ariane 6 rocket, or Blue Origin’s New Glenn. All of these rockets are under pressure to launch numerous payloads for their domestic governments and Amazon’s Kuiper megaconstellation.

China launches mystery satellite. China launched a new communications satellite toward geostationary orbit Thursday, although its precise role remains undisclosed​, Space News reports. The satellite lifted off aboard a Long March 3B rocket, and China’s leading state-owned aerospace contractor identified the payload as High orbit Internet satellite-03 (Weixing Hulianwan Gaogui-03). This is the third satellite in this series, following launches in February and August. The lack of publicly available information raises speculation about its potential uses, which could include military applications.

Shortfall … This was China’s 47th space launch of the year, well short of the 100 missions Chinese officials originally projected for 2024. This launch rate is on pace to come close to China’s numbers the last three years. Around 30 of these 100 projected launches were supposed to be with rockets from Chinese commercial startups. China’s commercial launch industry encountered a setback in June, when a rocket broke free of its restraints during a first stage static fire test, sending the fully fueled booster on an uncontrolled flight near populated areas before a fiery crash to the ground.

Vulcan’s second flight was successful but not perfect. United Launch Alliance’s Vulcan rocket, under contract for dozens of flights for the US military and Amazon’s Kuiper broadband network, lifted off from Florida on its second test flight October 4, suffered an anomaly with one of its strap-on boosters, and still achieved a successful mission, Ars reports. This test flight, known as Cert-2, was the second certification mission for the new Vulcan rocket, a milestone that paves the way for the Space Force to clear ULA’s new rocket to begin launching national security satellites in the coming months.

Anomalous plume … What happened 37 seconds after launch was startling and impressive. The exhaust nozzle from one of Vulcan’s two strap-on solid rocket boosters failed and fell off the vehicle, creating a shower of sparks and debris. The launcher visibly tilted along its axis due to asymmetrical thrust from the twin boosters, but Vulcan’s guidance system corrected its trajectory, and the rocket’s BE-4 engines vectored their exhaust to keep the rocket on course. The engines burned somewhat longer than planned to make up for the shortfall in power from the damaged booster, and the rocket still reached its target orbit. However, ULA and Northrop Grumman, the booster manufacturer, must determine what happened with the nozzle before Vulcan can fly again. (submitted by Ken the Bin)

Starship could launch this weekend. We may not have to wait as long as we thought for the next test flight of SpaceX’s Starship rocket. The world’s most powerful launcher could fly again from South Texas as soon as Sunday, assuming the Federal Aviation Administration grants approval, Ars reports. The last public statement released from the FAA suggested the agency didn’t expect to determine whether to approve a commercial launch license for SpaceX’s next Starship test flight before late November. There’s some optimism at SpaceX that the FAA might issue a launch license much sooner, perhaps in time for Starship to fly this weekend.

Going for the catch … “The fifth flight test of Starship will aim to take another step towards full and rapid reusability,” SpaceX wrote in an update posted on its website. “The primary objectives will be attempting the first ever return to launch site and catch of the Super Heavy booster and another Starship reentry and landing burn, aiming for an on-target splashdown of Starship in the Indian Ocean.” For the Starship upper stage, this means it will follow pretty much the same trajectory as the last test flight in June. But the most exciting thing about the next flight is the attempt to catch the Super Heavy booster, which will come back to the launch site in Texas at supersonic speed before braking to a hover over the launch pad. Then, mechanical arms, or “chopsticks,” will try to grapple the rocket in midair.

Europa Clipper is ready to fly. As soon as this weekend, a SpaceX Falcon Heavy rocket will lift off from Kennedy Space Center, carrying NASA’s $4.25 billion Europa Clipper spacecraft, Ars reports. This mission is unlikely to definitively answer the question of whether life exists in the liquid water ocean below the icy crust of Jupiter’s icy moon Europa, but it will tell us whether it could, and it will answer so many more questions. The best part is the unknown wonders it will discover. We cannot begin to guess at those, but we can be certain that if all goes well, Clipper will be a thrilling and breathtaking mission. Europa Clipper will zip by Europa 49 times in the early 2030s, probing the frozen world with a sophisticated suite of instruments to yield the best-ever data about any moon of another planet.

Delayed for weather … The launch of Europa Clipper was supposed to happen Thursday, but NASA and SpaceX suspended launch preparations earlier this week as Hurricane Milton approached Florida. The spacecraft is already attached to the Falcon Heavy rocket inside SpaceX’s hangar. Once teams are cleared to return to the space center for work after the storm, they will ready Falcon Heavy to roll to the launch pad. NASA says the launch is currently targeted for no earlier than Sunday.

Next three launches

Oct. 13: Starship/Super Heavy | Flight 5 | Starbase, Texas | 12: 00 UTC

Oct. 13: New Shepard | NS-27 uncrewed flight | Launch Site One, Texas | 13: 00 UTC

Oct. 13: Falcon Heavy | Europa Clipper | Kennedy Space Center, Florida | 16: 12 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: ULA investigating SRB anomaly; Europa Clipper is ready to fly Read More »

ex-twitter-execs-push-for-$200m-severance-as-elon-musk-runs-x-into-ground

Ex-Twitter execs push for $200M severance as Elon Musk runs X into ground


Musk’s battle with former Twitter execs intensifies as X value reaches new low.

Former Twitter executives, including former CEO Parag Agrawal, are urging a court to open discovery in a dispute over severance and other benefits they allege they were wrongfully denied after Elon Musk took over Twitter in 2022.

According to the former executives, they’ve been blocked for seven months from accessing key documents proving they’re owed roughly $200 million under severance agreements that they say Musk willfully tried to avoid paying in retaliation for executives forcing him to close the Twitter deal. And now, as X’s value tanks lower than ever—reportedly worth 80 percent less than when Musk bought it—the ex-Twitter leaders fear their severance claims “may be compromised” by Musk’s alleged “mismanagement of X,” their court filing said.

The potential for X’s revenue loss to impact severance claims appears to go beyond just the former Twitter executives’ dispute. According to their complaint, “there are also thousands of non-executive former employees whom Musk terminated and is now refusing to pay severance and other benefits” and who have “sued in droves.”

In some of these other severance suits, executives claimed in their motion to open discovery, X appears to be operating more transparently, allowing discovery to proceed beyond what has been possible in the executives’ suit.

But Musk allegedly has “special ire” for Agrawal and other executives who helped push through the Twitter buyout that he tried to wriggle out of, executives claimed. And seemingly because of his alleged anger, X has “only narrowed the discovery” ever since the court approved a stay pending a ruling on X’s motion to drop one of the executives’ five claims. According to the executives, the court only approved the stay of discovery because it was expecting to rule on the motion to dismiss quickly, but after a hearing on that matter was vacated, the stay has remained, helping X’s alleged goal to prolong the litigation.

To get the litigation back on track for a speedier resolution before Musk runs X into the ground, the executives on Thursday asked the court to approve discovery on all claims except the claim disputed in the motion to dismiss.

“Discovery on those topics is inevitable, and there is no reason to further delay,” the executives argued.

The executives have requested that the court open discovery at a hearing scheduled for November 15 to prevent further delays that they fear could harm their severance claims.

Neither X nor a lawyer for the former Twitter executives, David Anderson, could immediately be reached for comment.

X’s fight to avoid severance payments

In their complaint, the former Twitter executives—including Agrawal as well as former Chief Financial Officer Ned Segal, former Chief Legal Officer Vijaya Gadde, and former general counsel Sean Edgett—alleged that Musk planned to deny their severance to make them pay for extra costs that they approved that clinched the Twitter deal.

They claimed that Musk told his official biographer, Walter Isaacson, that he would “hunt every single one of” them “till the day they die,” vowing “a lifetime of revenge.” Musk supposedly even “bragged” to Isaacson about “specifically how he planned to cheat Twitter’s executives out of their severance benefits in order to save himself $200 million.”

Under their severance agreements, the executives could only be denied benefits if terminated for “cause” under specific conditions, they said, none of which allegedly applied to their abrupt firings the second the merger agreement was signed.

“‘Cause’ under the severance plans is limited to extremely narrow circumstances, such as being convicted of a felony or committing ‘gross negligence’ or ‘willful misconduct,'” their complaint noted.

Musk attempted to “manufacture” “ever-changing theories of cause,” they claimed, partly by claiming that “success” fees paid to the law firm that defeated Musk’s suit attempting to go back on the deal constituted “gross negligence” or “willful misconduct.”

According to Musk’s motion to dismiss, the former executives tried to “saddle Twitter, and by extension the many investors who acquired it, with exorbitant legal expenses by forcing approximately $100 million in gratuitous payments to certain law firms in the final hours before the Twitter acquisition closed.” Musk had a huge problem with this, the motion to dismiss said, because the fees were paid despite his objections.

On top of that, Musk considered it “gross negligence” or “willful misconduct” that the executives allegedly paid out retention bonuses that Musk also opposed. And perhaps even more egregiously, they allowed new employees to jump onto severance plans shortly before the acquisition, which “generally” increased the “severance benefits available to these individuals by more than $50 million dollars,” Musk’s motion to dismiss said.

Musk was particularly frustrated by the addition of one employee who allegedly “already decided to terminate and another who was allowed to add herself to one of the Plans—a naked conflict of interest that increased her potential compensation by approximately $15 million.”

But former Twitter executives said they consulted with the board to approve the law firm fees, defending their business decisions as “in the best interest of the company,” not “Musk’s whims.”

“On the morning” Musk acquired Twitter, “the Company’s full Board met,” the executives’ complaint said. “One of the directors noted that it was the largest stockholder value creation by a legal team that he had ever seen. The full Board deliberated and decided to approve the fees.”

Further, they pointed out, “the lion’s share” of those legal fees “was necessitated only by Musk’s improper refusal to close a transaction to which he was contractually bound.”

“If Musk felt that the attorneys’ fees payments, or any other payments, were improper, his remedy was to seek to terminate the deal—not to withhold executives’ severance payments,” their complaint said.

Reimbursement or reinstatement may be sought

To force Musk’s hand, executives have been asking X to share documents, including documents they either created or received while working out the Twitter buyout. But X has delayed production—sometimes curiously claiming that documents are confidential even when executives authored the documents or they’ve been publicly filed in other severance disputes, executives alleged.

Executives have called Musk’s denial of severance “a pointless effort that would not withstand legal scrutiny,” but so far discovery in their lawsuit has not even technically begun. While X has handed over incomplete submissions from its administrative process denying the severance claims, in some cases, X has “entirely refused” to produce documents, they claimed.

They’re hoping once fact-finding concludes that the court will agree that severance benefits are due. That potentially includes stock vested at the price of Twitter on the day that Musk acquired it, $44 billion—a far cry from the $9 billion that X is estimated to be valued at today.

In a filing opposing Musk’s motion to dismiss, the former executives noted that they’re not required to elect their remedies at this stage of the litigation. While their complaint alleged they’re owed vested stock at the acquisition value of $44 billion, their other filing suggested that “reinstatement is also an available remedy.”

Neither option would likely appeal to Musk, who appears determined to fight all severance disputes while scrambling for nearly two years to reverse X’s steady revenue loss.

Since his firing, Agrawal has won at least one of his legal battles with Musk, forcing X to reimburse him for $1.1 million in legal fees. But Musk has largely avoided paying severance as lawsuits pile up, and Agrawal is allegedly owed the most, with his severance package valued at $57 million.

Last fall, X agreed to negotiate with thousands of laid-off employees, but those talks fell through without a settlement reached. In June, Musk defeated one severance suit that alleged that Musk owed former Twitter employees $500 million. But employees involved in that litigation can appeal or join other disputes, the judge noted.

For executives, a growing fear is seemingly that Musk will prolong litigation until X goes under. Last year, Musk bragged that he saved X from bankruptcy by cutting costs, but experts warned that lawsuits piling up from vendors—which Plainsite is tracking here—could upend that strategy if Musk loses too many.

“Under Musk’s control, Twitter has become a scofflaw, stiffing employees, landlords, vendors, and others,” executives’ complaint said. “Musk doesn’t pay his bills, believes the rules don’t apply to him, and uses his wealth and power to run roughshod over anyone who disagrees with him.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Ex-Twitter execs push for $200M severance as Elon Musk runs X into ground Read More »

based-on-your-feedback,-the-ars-90.1-redesign-is-live

Based on your feedback, the Ars 9.0.1 redesign is live

We love all the feedback that Ars readers have submitted since we rolled out the Ars Technica 9.0 design last week—even the, err, deeply passionate remarks. It’s humbling that, after 26 years, so many people still care so much about making Ars into the best possible version of itself.

Based on your feedback, we’ve just pushed a new update to the site that we hope fixes many readers’ top concerns. (You might need to hard-refresh to see it.)

Much of the feedback (forum posts, email, DMs, the Ars comment form) has told us that the chief goals of the redesign—more layout options, larger text, better readability—were successful. But readers have also offered up interesting edge cases and different use patterns for which design changes would be useful. Though we can’t please everyone, we will continue to make iterative design tweaks so that the site can work well for as many people as possible.

So here’s a quick post about what we’ve done so far and what we’re going to be working on over the next few weeks. As usual, continued feedback is welcome and appreciated!

Changes made and changes planned

The 9.0 design was based on reader feedback; an astonishing 20,000 people took our most recent reader survey, and most of these readers don’t post in the comments or the Ars forums. The consensus was that readability and customization were the most significant site design issues. We’ve addressed those through (among other things) a responsive design that unifies desktop and mobile codebases, increased text size to meet modern standards, and four site layout options (Classic, Grid, List, and the super-dense Neutron Star view).

Based on your feedback, the Ars 9.0.1 redesign is live Read More »

the-juicebox-and-enel-x-shutdown:-what-comes-next?

The Juicebox and Enel X shutdown: What comes next?

Meanwhile, a number of companies and open source projects are working on offering third-party support for Juiceboxes. Unfortunately, only newer Juiceboxes support the open charge point protocol; older devices may need to be physically modified, perhaps with open source hardware.

Enel X

As might be expected, lots of charging solution providers are interested in helping Enel X’s stranded commercial customers become their newest happy, smiling customers. But it’s going to be up to those stranded by Enel X to find a new company and platform to work with.

In some cases that might mean migrating existing hardware over, but as SAE notes, Enel X has done little to make that migration simple. And many businesses may find what were functioning level 2 chargers today are just beige-colored bricks tomorrow. For example, with Enel X gone, there are no contracts in place for the SIM cards embedded in each charger that provide the connectivity those devices expect.

“When that goes dead, the only way you can really get those chargers going again is you physically send someone out there, or you ask the person on the property to take out the SIM card, replace it,” said Joseph Schottland, CEO of EV+ Charging. “It’s a big ask, because they’ve got to get the screwdriver out, take the back of the charger off… They’ve got to know where to look.”

The Juicebox and Enel X shutdown: What comes next? Read More »

ai-#85:-ai-wins-the-nobel-prize

AI #85: AI Wins the Nobel Prize

Both Geoffrey Hinton and Demis Hassabis were given the Nobel Prize this week, in Physics and Chemistry respectively. Congratulations to both of them along with all the other winners. AI will be central to more and more of scientific progress over time. This felt early, but not as early as you would think.

The two big capability announcements this week were OpenAI’s canvas, their answer to Anthropic’s artifacts to allow you to work on documents or code outside of the chat window in a way that seems very useful, and Meta announcing a new video generation model with various cool features, that they’re wisely not releasing just yet.

I also have two related corrections from last week, and an apology: Joshua Achiam is OpenAI’s new head of Mission Alignment, not of Alignment as I incorrectly said. The new head of Alignment Research is Mia Glaese. That mistake it mine, I misread and misinterpreted Altman’s announcement. I also misinterpreted Joshua’s statements regarding AI existential risk, failing to take into account the broader context, and did a poor job attempting to reach him for comment. The above link goes to a new post offering an extensive analysis of his public statements, that makes clear that he takes AI existential risk seriously, although he has a very different model of it than I do. I should have done better on both counts, and I am sorry.

  1. Introduction.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. Proofs of higher productivity.

  4. Language Models Don’t Offer Mundane Utility. Why the same lame lists?

  5. Blank Canvas. A place to edit your writing, and also your code.

  6. Meta Video. The ten second clips are getting more features.

  7. Deepfaketown and Botpocalypse Soon. Assume a data breach.

  8. They Took Our Jobs. Stores without checkouts, or products. Online total victory.

  9. Get Involved. Princeton, IAPS, xAI, Google DeepMind.

  10. Introducing. Anthropic gets its version of 50% off message batching.

  11. AI Wins the Nobel Prize. Congratulations, everyone.

  12. In Other AI News. Ben Horowitz hedges his (political) bets.

  13. Quiet Speculations. I continue to believe tradeoffs are a (important) thing.

  14. The Mask Comes Off. What the heck is going on at OpenAI? Good question.

  15. The Quest for Sane Regulations. The coming age of the AI agent.

  16. The Week in Audio. Elon Musk is (going on) All-In until he wins, or loses.

  17. Rhetorical Innovation. Things happen about 35% of the time.

  18. The Carbon Question. Calibrate counting cars of carbon compute costs?

  19. Aligning a Smarter Than Human Intelligence is Difficult. Some paths.

  20. People Are Trying Not to Die. Timing is everything.

Anthropic claims Claude has made its engineers sufficiently more productive that they’re potentially hiring less going forward. If I was Anthropic, my first reaction would instead be to hire more engineers, instead? There’s infinite high leverage things for Anthropic to do, even if all those marginal engineers are doing is improving the consumer product side. So this implies that there are budget constraints, compute constraints or both, and those constraints threaten to bind.

How much mundane utility are public employees getting from Generative AI?

Oliver Giesecke reports 1 in 4 have used it at work, 1 in 6 use it once a week or more. That’s a super high conversion rate once they try it. Of those who try it at all, 38% are using it daily.

This is in contrast to 2 in 3 private sector employees reporting use at all.

Education staff are out in front, I suppose they have little choice given what students are doing. The legal system is the most resistant.

Lots more good graphs on his slides. If you use AI you are relatively more likely to be young, have less years on the job but be high earning and of higher rank, and be better educated.

A majority using AI say it enhances work quality (70%), almost none (4%) say it makes it worse. About half of those using it claim to be more productive. Only 6.9% felt it was all that nice and saved them 2+ hours a day.

But stop and do the math on that for a second, assuming 8 hours a day, that’s 7% of the workforce claiming at least 25% savings. So assuming those employees were already of at least average productivity, that’s a floor of 1.75% overall productivity growth already before adjusting for quality, likely much higher. Meanwhile, the public sector lags behind the private sector in adaptation of AI and in ability to adjust workflow to benefit from it.

So this seems like very strong evidence for 2%+ productivity growth already from AI, which should similarly raise GDP.

If you actually take all the reports here seriously and extrapolate average gains, you get a lot more than 2%. Davidad estimates 8% in general.

Shakeel works to incorporate Claude and o1 into his writing flow.

Shakeel: I’ve trained Claude to emulate my writing style, and I use it as an editor (and occasional writer).

Claude and o1 made me an app which summarises articles in my style, saving me *hoursevery week when compiling my newsletter. That’s an example of a broader use-case: using AI tools to make up for my lack of coding skills. I can now build all sorts of apps that will save me time and effort.

Claude is a godsend for dealing with poorly-formatted data.

And a few other things — plus more discussion of all this, including the “wouldn’t a human editor be better” objection — in the full post here.

I’m very keen to hear how other journalists use these tools!

Apple Podcasts’ automatic transcriptions have been a gamechanger here: I now just skim the transcript of any podcast that seems work-relevant, and can easily grab (and tweet) useful quotes.

I almost always use Unsplash or other royalty-free images to illustrate my posts, but occasionally — like on this post — I’ll use a diffusion model to come up with something fun.

I don’t use AI editors yet, as I don’t think it’s a worthwhile use of time, but that could be a skill issue. I don’t use an article summarizer, because I would want to check out the original anyway almost all the time so I don’t see the point, perhaps that’s a skill issue in finding prompts I trust sufficiently? I definitely keep telling myself to start building tools and I have a desktop with ‘download Cursor’ open that I plan to use real soon now, any day now.

Bench, a bookkeeping service, takes your statements and has an AI walk you through telling it what each expense is, confirms with you, then leaves any tricky questions for the human.

Patrick McKenzie: “Approximate time savings?”

If I were maximally diligent and not tweeting about this, I’d have been done in about 10 minutes once, not the ~30 minutes of back-and-forth this would take traditionally with Bench.

Prior to Bench I sacrificed about 4-8 hours a month.

If you are exactly the sweet spot they’re fantastic and if you’re me well they’re a lot better than rolling the dice on engaging a bookkeeping firm.

I’d note that while this is muuuuuch faster than talking to a human I would happily pay $50 a month more for them to use a more expensive token or provider such that the magical answers box would return magical answers faster to unblock me typing more answers.

While Bench may feel differently about this than I do as a long-time client, I feel the thing they actually deliver is “Your books will be ~accurate to your standards by your tax filing deadline” rather than “Your books will be closed monthly.”

A month is about 160 hours of work. So this AI upgrade on its own is improving Patrick McKenzie’s available hours by 0.2%, which should raise his TFP by more than that, and his ‘accounting productivity’ by 200% (!). Not everyone will get those returns, I spend much less time on accounting, but wow.

A consistent finding is that AI improves performance more for those with lower baseline ability. A new paper reiterates this, and adds that being well-calibrated on your own abilities also dramatically improves what AI does for you. That makes sense, because it tells you when and how to use and not use and to trust and not trust the AI.

One thing I notice about such studies is that they confine performance to that within an assigned fixed task. Within such a setting, it makes sense that low ability people see the largest AI gains when everyone is given the same AI tools. But if you expand the picture, people with high ability seem likely to be able to figure out what to do in a world with these new innovations, I would guess that higher ability people have many ways in which they now have the edge.

Aceso Under Glass tests different AI research assistants again for Q3. You.com wins for searching for papers, followed by Elicit and Google Scholar. Elicit, Perplexity and You.com got the key information when requested. You.com and Perplexity had the best UIs. You.com offers additional features. Straight LLMs like o1, GPT-4o and Sonnet failed hard.

Cardiologists find medical AI to be as good or better at diagnosis, triage and management than human cardiologists in most areas. Soon the question will shift to whether the AIs are permitted to help.

Poe’s email has informed me they consider Flux 1.1 [pro] the most advanced image generator. Min Choi has a thread where it seems to recreate some images a little too well if you give it a strange form of prompt, like so: PROMPT: IMG_FPS_SUPERSMASHBROS.HEIC or similar.

Andrej Karpathy: Not fully sure why all the LLMs sound about the same – over-using lists, delving into “multifaceted” issues, over-offering to assist further, about same length responses, etc. Not something I had predicted at first because of many independent companies doing the finetuning.

David Manheim: Large training sets converge to similar distributions; I didn’t predict it, but it makes sense post-hoc.

Neel Nanda: There’s probably a lot of ChatGPT transcripts in the pre-training data by now!

Convergent evolution and similar incentives are the obvious responses here. Everyone is mostly using similar performance metrics. Those metrics favor these types of responses. I presume it would be rather easy to get an LLM to do something different via fine-tuning, if you wanted it to do something else, if only to give people another mode or option. No one seems to want to do that. But I assume you could do that to Llama-3.2 in a day if you cared.

For generic code AI is great but it seems setting up a dev environment correctly is beyond o1-mini’s powers? The good news is it can get you through the incorrect ways faster, so at least there’s that?

Steve Newman: I’m writing my first code in 18 months. HOLY HELL are LLMs a terrific resource for the tedious job of selecting a tech stack. I knew this would be the case, but the visceral experience is still an eye-opener. I can’t wait for the part where I ask a model (o1?) to write all the boilerplate for me. I have Claude + Gemini + ChatGPT windows open and am asking them for recommendations on each layer of the stack and then asking them to critique one another’s suggestions. (The code in question will be a webapp for personal use: something to suck in all of my feeds and use LLMs to filter, summarize, and collate for ease of consumption.)

Conclusion: Github, Heroku, Javalin, JDBI, Handlebars, htmx, Maven + Heroku Maven Plugin, JUnit 5 + AssertJ + Mockito, Heroku Postgres, H2 Database for local tests, Pico.css. Might change one or more based on actual experience once I start writing code.

Thoughts after ~1 day spent banging on code. Summary: AI is a huge help for boilerplate code. But setting up a dev environment is still a pain in the ass and AI is no help there. I guess it’s a little help, instead of laboriously Googling for suggestions that turn out to not solve the problem, I can get those same useless suggestions very quickly from an LLM.

– I think no dev environment developer has conducted a single user test in the history of the world? I installed Cursor and can’t figure out how to create a new project (didn’t spend much time yet). Am using http://repl.it (h/t @labenz) but the documentation for configuring language versions and adding dependencies is poor. Apparently the default Java project uses Java 1.6 (?!?) and I went down a very bad path trying to specify a newer language version.

This type of issue is a huge effective blocker for people with my level of skills. I find myself excited to write actual code that does the things, but the thought of having to set everything up to get to that point fills with dread – I just know that the AI is going to get something stupid wrong, and everything’s going to be screwed up, and it’s going to be hours trying to figure it out and so on, and maybe I’ll just work on something else. Sigh. At some point I need to power through.

Which sport’s players have big butts? LLMs can lie, and come up with just so stories, if they’re primed.

OpenAI introduces Canvas, their answer to Claude’s Artifacts.

OpenAI: We’re introducing canvas, a new interface for working with ChatGPT on writing and coding projects that go beyond simple chat. Canvas opens in a separate window, allowing you and ChatGPT to collaborate on a project. This early beta introduces a new way of working together—not just through conversation, but by creating and refining ideas side by side.   

Canvas was built with GPT-4o and can be manually selected in the model picker while in beta. Starting today we’re rolling out canvas to ChatGPT Plus and Team users globally. Enterprise and Edu users will get access next week. We also plan to make canvas available to all ChatGPT Free users when it’s out of beta.

You control the project in canvas. You can directly edit text or code. There’s a menu of shortcuts for you to ask ChatGPT to adjust writing length, debug your code, and quickly perform other useful actions. You can also restore previous versions of your work by using the back button in canvas.

Canvas opens automatically when ChatGPT detects a scenario in which it could be helpful. You can also include “use canvas” in your prompt to open canvas and use it to work on an existing project.

Writing shortcuts include:

  • Suggest edits: ChatGPT offers inline suggestions and feedback.

  • Adjust the length: Edits the document length to be shorter or longer.

  • Change reading level: Adjusts the reading level, from Kindergarten to Graduate School.

  • Add final polish: Checks for grammar, clarity, and consistency.

  • Add emojis: Adds relevant emojis for emphasis and color.

Coding is an iterative process, and it can be hard to follow all the revisions to your code in chat. Canvas makes it easier to track and understand ChatGPT’s changes, and we plan to continue improving transparency into these kinds of edits.

Coding shortcuts include:

  • Review code: ChatGPT provides inline suggestions to improve your code.

  • Add logs: Inserts print statements to help you debug and understand your code.

  • Add comments: Adds comments to the code to make it easier to understand.

  • Fix bugs: Detects and rewrites problematic code to resolve errors.

  • Port to a language: Translates your code into JavaScript, TypeScript, Python, Java, C++, or PHP.

Canvas is in early beta, and we plan to rapidly improve its capabilities.

The Canvas team was led by Karina Nguyen, whose thread on it is here.

Karina Nguyen: My vision for the ultimate AGI interface is a blank canvas. The one that evolves, self-morphs over time with human preferences and invents novel ways of interacting with humans, redefining our relationship with AI technology and the entire Internet. But here are some of the coolest demos with current canvas.

[There are video links of it doing its thing.]

This is the kind of thing you need to actually use to properly evaluate. Having a good change log and version comparison method seems important here.

What initial feedback I saw was very positive. I certainly agree that, until we can come up with something better, having a common scratchpad of some kind alongside the chat is the natural next step in some form.

If you’re curious, Pliny has your system prompt leak. Everything here makes sense.

They are calling it Meta Movie Gen, ‘the most advanced media foundation models to-date.’ They offer a 30B parameter video generator, and a 13B for the audio.

They can do the usual, generate (short, smooth motion) videos from text, edit video from text to replace or add items, change styles or aspects of elements, and so on. It includes generating audio and sound effects.

It can also put an individual into the video, if you give it a single photo.

The full paper they released is here. Here is their pitch to creatives.

The promise of precise editing excites me more in the short term than full generation. I can totally see that being highly useful soon because you are looking for a specific thing and you find it, whereas generation seems like more of finding some of the things in the world, but not the one you wanted, which seems less useful.

Meta explains here from September 25 how they’re taking a ‘responsible approach’ for Llama 3.2, to expand their safety precautions to vision. Nothing there explains how they would prevent a bad actor from quickly removing all their safety protocols.

This time, however, they say something more interesting:

Meta: We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

I am happy to hear that they are at least noticing that they might not want to release this in its current form. The first step is admitting you have a problem.

So there’s this AI girlfriend site called muah.ai that offers an ‘18+ AI companion’ with zero censorship and ‘absolute privacy.’ If you pay you can get things like real-time phone calls and rather uncensored image generation. The reason it’s mentioned is that there was this tiny little data breach, and by tiny little data breach I mean they have 1.9 million email addresses.

As will always be the case when people think they can do it, quite a lot of them were not respecting the 18+ part of the website’s specifications.

Troy Hunt: But per the parent article, the *realproblem is the huge number of prompts clearly designed to create CSAM images. There is no ambiguity here: many of these prompts cannot be passed off as anything else and I won’t repeat them here verbatim, but here are some observations:

There are over 30k occurrences of “13 year old”, many alongside prompts describing sex acts. Another 26k references to “prepubescent”, also accompanied by descriptions of explicit content. 168k references to “incest”. And so on and so forth. If someone can imagine it, it’s in there.

As if entering prompts like this wasn’t bad / stupid enough, many sit alongside email addresses that are clearly tied to IRL identities. I easily found people on LinkedIn who had created requests for CSAM images and right now, those people should be shitting themselves.

We are very much going to keep seeing stories like this one. People will keep being exactly this horny and stupid, and some of the horny and stupid people will be into all the things of all the types, and websites will keep getting hacked, and this will grow larger as the technology improves.

From a user perspective, the question is how much one should care about such privacy issues. If one is enjoying adult scenarios and your information leaks, no doubt the resulting porn-related spam would get annoying, but otherwise should you care? That presumably depends on who you are, what your situation is and exactly what you are into.

Also one must ask, has anyone created one of these that is any good? I don’t know of one, but I also don’t know that I would know about it if one did exist.

Okay, I didn’t predict this system and it’s not technically AI, but it leads that way and makes sense: Sam’s Club (owned by Walmart) is testing a store where you buy things only via an app you carry on your phone, then get curbside pickup or home delivery. The store is now a sample display and presumably within a few years robots will assemble the orders.

So the store looks like this:

And the warehouse, formerly known as the store, still looks like this:

I bet that second one changes rapidly, as they realize it’s not actually efficient this way.

On the question of job applications, anton thinks the future is bright:

Anton (showing the famous 1000 job application post): Thinking about what this might look like in e.g. hiring – the 1st order consequence is bulk slop, the 2nd order consequence is a new equilibrium between ai applications and ai screening, the 3rd order effect is a pareto optimal allocation of labor across entire industries.

If both sides of the process are run by the same model, and that model is seeing all applications, you could imagine a world in which you just do stable marriage economy-wide for maximal productivity.

My prediction is that this would not go the way Anton expects. If everyone can train on all the data on what gets accepted and rejected, the problem rapidly becomes anti-inductive. You go through rapid cycles of evaluation AIs finding good heuristics, then the application AIs figuring out what changed and adjusting. It would be crazy out there.

Once everyone is using AI on all fronts, knowing what sells, the question becomes what is the actual application. What differentiates the good applicant from the bad applicant? How does the entire apparatus filter well as opposed to collapse? What keeps people honest?

Here’s another prediction that seems likely in those worlds: Application Honeypots.

As in, Acme Corporation puts out applications for several fake jobs, where they sometimes express modestly non-standard preferences, with various subtle queues or tactics designed to minimize the chance a non-AI would actually send in an application. And then you keep a record of who applies. When they apply for a different job that’s real? Well, now you know. At minimum, you can compare notes.

Of course, once the AIs get actively better than the humans at detecting the clues, even when you are trying to make the opposite the case, that gets a lot harder. But if that’s true, one must ask: Why are you even hiring?

Roon says accept all of it.

Roon: there is no way to “prepare for the future of work”. You just need to survive

If there is no future in work, how is there a future you can survive? Asking for everyone.

Institute for AI Policy and Strategy (IAPS), a think tank, is hiring two researchers, remote, $82k-$118k, applications due October 21.

Princeton taking applications for AI Policy Fellows, 7 years experience in AI, advanced or professional degree, remote with travel to DC and Princeton. Position lasts 1 year, starting some time in 2024. Apply by October 24.

Google DeepMind is hiring for frontier safety and governance, London preferred but they’re open to NYC and SF and potentially a few other cities. As always decide for yourself if this is net helpful. All seniority levels. Due October 21.

xAI hiring AI safety engineers, as always if considering taking the job do your own evaluation of whether it would be how net helpful or harmful. My worry with this position is it may focus entirely on mundane safety. SF Bay area.

(The top comment is ‘why are you wasting resources hiring such people?’ which is an illustration of how crazy people often are about all this.)

Anthropic is the latest to add a Message Batches AI, you submit a batch of up to 10k queries via the API, wait 24 hours and get a 50% discount.

Geoffrey Hinton, along with John Hopfield, wins the Nobel Prize in Physics for his design of the neural networks that are the basis of machine learning and AI. I ask in vain for everyone to resist the temptation to say ‘Nobel Prize winner’ every time they say Geoffrey Hinton going forward. Also resist the temptation to rely on argument from authority, except insofar as it is necessary to defend against others who make arguments from authority or lack of authority.

He says he is ‘flabbergasted’ and used the moment to warn about AI ‘getting out of control,’ hoping that this will help him get the word out going forward.

I will also fully allow this:

James Campbell: The winner of the nobel prize in physics spending the entire press conference talking worriedly about superintelligence and human extinction while being basically indifferent to the prize or the work that won it feels like something you’d see in the movies right before shit goes down.

Everyone agrees this is was a great achievement, but many raised the point that it is not a physics achievement, so why is it getting the Physics Nobel? Some pushed back that this was indeed mostly physics. Theories that think the award shouldn’t have happened included ‘they did it to try and get physics more prestige’ and ‘there were no worthwhile actual physics achievements left.’

Then there’s Demis Hassabis, who along with John Jumper and David Baker won the Nobel Prize in Chemistry for AlphaFold. That seems like an obviously worthy award.

Roon in particular is down for both awards.

Roon: I am pretty impressed by the nobility and progressive attitude of the Nobel committee. they knew they were going to get a lot of flack for these picks but they stuck to the truth of the matter.

It’s not “brave” per se to pick some of the most highly lauded scientists/breakthroughs in the world but it is brave to be a genre anarchist about eg physics. It’s the same nobility that let them pick redefining scientists across the ages to the furor of the old guard.

what an honor! to stand among the ranks of Einstein and Heisenberg! congrats @geoffreyhinton.

I think I mostly agree. They’re a little faster than Nobel Prizes are typically awarded, but timelines are short, sometimes you have to move fast. I also think it’s fair to say that since there is no Nobel Prize in Computer Science, you have to put the achievement somewhere. Claude confirmed that Chemistry was indeed the right place for AlphaFold, not physiology.

In other news, my NYC mind cannot comprehend that most of SF lacks AC:

Steven Heidel: AC currently broken at @OpenAI office – AGI delayed four days.

Gemini 1.5 Flash-8B now production ready.

Ben Horowitz, after famously backing Donald Trump, backs Kamala Harris with a ‘significant’ donation, saying he has known Harris for more than a decade, although he makes clear this is not an endorsement or a change in firm policy. Marc Andreessen and a16z as a firm continue to back Trump. It seems entirely reasonable to, when the Democrats switch candidates, switch who you are backing, or hedge your bets. Another possibility (for which to be clear no one else has brought up, and for which I have zero evidence beyond existing common knowledge) is that this was a bargaining chip related to Newsom’s veto of SB 1047.

OpenAI announces new offices in New York City, Seattle, Paris, Brussels and Singapore, alongside existing San Francisco, London, Dublin and Tokyo.

OpenAI projects highly explosive revenue growth, saying it will nearly triple next year to $11.6 billion, then double again in 2026 to $25.6 billion. Take their revenue projection as seriously or literally as you think is appropriate, but this does not seem crazy to me at all.

Where is the real money? In the ASIs (superintelligences) of course.

Roon: chatbots will do like $10bb in global revenue. Medium stage agents maybe like $100bb. The trillions will be in ASIs smart enough to create and spin off businesses that gain monopolies over new kinds of capital that we don’t even know exist today.

In the same way that someone in 1990 would never have predicted that “search advertising” would be a market or worth gaining a monopoly over. These new forms of capital only ever become clear as you ascend the tech tree.

(these numbers are not based on anything but vibes)

Also keep in mind the profits from monopolies of capital can still be collectively owned.

The key problem with the trillions in profits from those ASIs is not getting the trillions into the ASIs. It’s getting the trillions out. It is a superintelligence. You’re not. It will be doing these things on its own if you want it to work. And if such ASIs are being taken off their leashes to build trillion dollar companies and maximize their profits, the profits are probably not the part you should be focusing on.

Claims about AI safety that make me notice I am confused:

Roon: tradeoffs that do not exist in any meaningful way inside big ai labs:

“product vs research”

“product vs safety”

these are not the real fracture lines along which people fight or quit. it’s always way more subtle

Holly Elmore: So what are the real fracture lines?

Roon: complicated interpersonal dynamics.

If only you knew there are reasons even more boring than this.

Matt Darling: People mostly monitor Bad Tweets and quit jobs when a threshold is reached.

Roon: How’d you know

I totally buy that complicated interpersonal dynamics and various ordinary boring issues could be causing a large portion of issues. I could totally buy that a bunch of things we think are about prioritization of safety or research versus product are actually 95%+ ordinary personal or political conflicts, indeed this is my central explanation of the Battle of the Board at OpenAI from November 2023.

And of course I buy that being better at safety helps the product for at least some types of safety work, and research helps the product over time, and so on.

What I don’t understand is how these trade-offs could fail to be real. The claim literally does not make sense to me. You only have so much compute, so much budget, so much manpower, so much attention, and a choice of corporate culture. One can point to many specific decisions (e.g. ‘how long do you hold the product for safety testing before release?’) that are quite explicit trade-offs, even outside of the bigger picture.

A thread about what might be going on with tech or AI people doing radical life changes and abandoning their companies after taking ayahuasca. The theory is essentially that we have what Owen here calls ‘super knowing’ which are things we believe strongly enough to effectively become axioms we don’t reconsider. Ayahuasca, in this model, lets one reconsider and override some of what is in the super knowing, and that will often destroy foundational things without which you can’t run a tech company, in ways people can’t explain because you don’t think about what’s up there.

So basically, if that’s correct, don’t do such drugs unless you want that type of effect, and this dynamic makes you stuck in a way that forces a violation of conservation of expected evidence – but given everyone knows what to expect, if someone abandons their job after taking ayahuasca, my guess is they effectively made their decision first and then because of that decision they went on the drug trip.

(To be sure everyone sees this, reiterating: While I stand by the rest of the section from last week, I made two independent mistakes regarding Joshua Achiam: He is the head of OpenAI mission alignment not alignment, and I drew the wrong impression about his beliefs regarding existential risk, he is far less skeptical than I came away suspecting last week, and I should have made better efforts to get his comments first and research his other statements. I did so this week here.)

Steven Zeitchik at Hollywood Reporter decides to enter the arena on this one, and asks the excellent question ‘What the Heck is Going on at OpenAI?

According to Zeitchik, Mira Murati’s exit should concern us.

Steven Zeitchik (Hollywood Reporter): The exit of OpenAI‘s chief technology officer Mira Murati announced on Sept. 25 has set Silicon Valley tongues wagging that all is not well in Altmanland — especially since sources say she left because she’d given up on trying to reform or slow down the company from within.

The drama is both personal and philosophical — and goes to the heart of how the machine-intelligence age will be shaped.

Murati, too, had been concerned about safety.

But unlike Sutskever, after the November drama Murati decided to stay at the company in part to try to slow down Altman and president Greg Brockman’s accelerationist efforts from within, according to a person familiar with the workings of OpenAI who asked not to be identified because they were not authorized to speak about the situation.

The rest of the post covers well-known ground on OpenAI’s recent history of conflicts and departures. The claim on Murati, that she left over safety concerns, seemed new. No color was offered on that beyond what is quoted above. I don’t know of any other evidence about her motivations either way.

Roon said this ‘sounds like nonsense’ but did not offer additional color.

Claims about OpenAI’s cost structure:

This seems mostly fine to me given their current stage of development, if it can be financially sustained. It does mean they are bleeding $5 billion a year excluding stock compensation, but it would be a highly bearish statement on their prospects if they were overly concerned about that given their valuation. It does mean that if things go wrong, they could go very wrong very quickly, but that is a correct risk to take from a business perspective.

In Bloomberg, Rachel Metz covers Sam Altman’s concentration of power within OpenAI. Reasonable summary of some aspects, no new insights.

Tim Brooks, Sora research lead from OpenAI, moves to Google DeepMind to work on video generation and world simulators.

CSET asks: How should we prepare for AI agents? Here are their key proposals:

To manage these challenges, our workshop participants discussed three categories of interventions:

  1. Measurement and evaluation: At present, our ability to assess the capabilities and real-world impacts of AI agents is very limited. Developing better methodologies to track improvements in the capabilities of AI agents themselves, and to collect ecological data about their impacts on the world, would make it more feasible to anticipate and adapt to future progress.

  2. Technical guardrails: Governance objectives such as visibility, control, trustworthiness, as well as security and privacy can be supported by the thoughtful design of AI agents and the technical ecosystems around them. However, there may be trade-offs between different objectives. For example, many mechanisms that would promote visibility into and control over the operations of AI agents may be in tension with design choices that would prioritize privacy and security.

  3. Legal guardrails: Many existing areas of law—including agency law, corporate law, contract law, criminal law, tort law, property law, and insurance law—will play a role in how the impacts of AI agents are managed. Areas where contention may arise when attempting to apply existing legal doctrines include questions about the “state of mind” of AI agents, the legal personhood of AI agents, how industry standards could be used to evaluate negligence, and how existing principal-agent frameworks should apply in situations involving AI agents.

And here’s a summary thread.

Helen Toner (an author): We start by describing what AI agents are and why the tech world is so excited about them rn.

Short version: it looks like LLMs might make it possible to build AI agents that, instead of just playing chess or Starcraft, can flexibly use a computer to do all kinds of things.

We also explore different ways to help things go better as agents are rolled out more widely, including better evaluations, a range of possible technical guardrails, and a bunch of legal questions that we’ll need better answers to. The exec summ is intended as a one-stop-shop.

There’s a lot of great stuff in the paper, but the part I’m most excited about is actually the legal section 🧑‍⚖️ It starts by introducing some of the many areas of law that are relevant for agents—liability, agency law, etc—then describes some tricky new challenges agents raise.

The legal framework discusses mens rea, state of mind, potential legal personhood for AIs similar to that of corporations, who is the principle versus the agent, future industry standards, liability rules and so on. The obvious thing to do is to treat someone’s AI agent as an extension of the owner of the agent – so if an individual or corporation sends an agent out into the world, they are responsible for its actions the way they would be for their own actions, other than some presumed limits on the ability to fully enter contracts.

Scott Alexander gives his perspective on what happened with SB 1047. It’s not a neutral post, and it’s not trying to be one. As for why the bill failed to pass, he centrally endorses the simple explanation that Newsom is a bad governor who mostly only cares about Newsom, and those who cultivated political power for a long time convinced him to veto the bill. There’s also a bunch of good detail about the story until that point, much but not all of which I’ve covered before.

Lex Fridman talks with the Cursor team. It seemed like Lex’s experience with the topic served him well here, so he was less on the surface than his usual.

OpenAI CFO Sarah Friar warns us that the next model will only be about one order of magnitude bigger than the previous one.

OpenAI COO and Secretary of Transportation body double and unlikely Mets fan (LFGM! OMG!) Brad Lightcap predicts in a few years you will be able to talk to most screens using arbitrarily complex requests. I am far less excited by this new interface proposal than he is, and also expect that future to be far less evenly distributed than he is implying, until the point where it is suddenly everything everywhere all at once. Rest of the clip is AppInt details we already know.

Tsarathustra: Stanford’s Erik Brynjolfsson predicts that within 5 years, AI will be so advanced that we will think of human intelligence as a narrow kind of intelligence and AI will transform the economy.

Elon Musk went on All-In, in case you want to listen to that. I didn’t. They discuss AI somewhat. Did you know he doesn’t trust Sam Altman and OpenAI? You did? Ok. At 1: 37 Musk says he’ll advise Trump to create a regulatory body to oversee AGI labs that could raise the alarm. I believe that Musk would advise this, but does that sound like something Trump would then do? Is Musk going to spend all his political capital he’s buying with Trump on that, as opposed to what Musk constantly talks about? I suppose there is some chance Trump lets Musk run the show but this seems like a tough out-of-character ask with other interests including Vance pushing back hard.

Eric Schmidt says this year three important things are important: AI agents, text-to-code and infinite context windows. We all know all three are coming, the question is how fast agents will be good and reliable enough to use. Eric doesn’t provide a case here for why we should update towards faster agent progress.

Fei-Fei Li is wise enough to say what she does not know.

Fei-Fei Li: I come from academic AI and have been educated in the more rigorous and evidence-based methods, so I don’t really know what all these words mean, I frankly don’t even know what AGI means. Like people say you know it when you see it, I guess I haven’t seen it. The truth is, I don’t spend much time thinking about these words because I think there’s so many more important things to do…

Which is totally fine, in terms of not thinking about what the words mean. Except that it seems like she’s using this as an argument for ignoring the concepts entirely. Completely ignoring such possibilities without any justification seems to be her plan.

Which is deeply concerning, given she has been playing and may continue to play key role in sinking our efforts to address those possibilities, via her political efforts, including her extremely false claims and poor arguments against SB 1047, and her advising Newsom going forward.

Rather than anything that might work, she calls for things similar to car seatbelts – her actual example here. But we can choose not to buckle the safety belt, and you still have to abide by a wide variety of other safety standards while building a car and no one thinks they shouldn’t have to do that, said Frog. That is true, said Toad. I hope I don’t have to explain beyond that why this won’t work here.

Nothing ever happens, except when it does.

Rob Miles: “Nothing ever happens” is a great heuristic that gives you the right answer 95% of the time, giving the wrong answer only in the 5% of cases where getting the right answer is most important.

Peter Wildeford (November 28, 2022): Things happen about 35% of the time.

Stefan Schubert (November 28, 2022): “41% of PredictionBook questions and 29% of resolved Metaculus questions had resolved positively”

One obvious reason ASIs won’t leave the rest of us alone: Humans could build another ASI if left to their own devices. So at minimum it would want sufficient leverage to stop that from happening. I was so happy when (minor spoiler) I saw that the writers on Person of Interest figured that one out.

On the question of releasing open models, I am happy things have calmed down all around. I do think we all agree that so far the effect, while the models in question have been insufficiently capable, has proven to be positive.

Roon: overall i think the practice of releasing giant models into the wild before they’re that powerful is probably good in a chaos engineering type of way.

Dan Hendrycks: Yup systems function better subject to tolerable stressors.

The catch is that this is one of those situations in which you keep winning, and then at some point down the line you might lose far bigger than the wins. While the stressors from the models are tolerable it’s good.

The problem is that we don’t know when the stressors become intolerable. Meanwhile we are setting a pattern and precedent. Each time we push the envelope more and it doesn’t blow up in our face, there is the temptation to say ‘oh then we can probably push that envelope more,’ without any sign we will realize when it becomes wise to stop. The reason I’ve been so worried about previous releases was mostly that worry that we wouldn’t know when to stop.

This is especially true because the reason to stop is largely a tail risk, it is a one-way decision, and the costs likely only manifest slowly over time but it is likely it would be too late to do much about them. I believe that there is over a 50% chance that releasing the weights to a 5-level model would prove to mostly be fine other than concerns about national security and American competitiveness, but the downside cases are much worse than the upside cases are good. Then, if the 5-level model seems fine, that gets used as reason to go ahead with the 6-level model, and so on. I worry that we could end up in a scenario where we are essentially 100% to make a very large mistake, no matter which level that mistake turns out to be.

An important distinction to maintain:

Ben Landau-Taylor: It’s perfectly legitimate to say “I can’t refute your argument, but your conclusion still seems wrong and I don’t buy it”. But it’s critical to *noticethe “I can’t refute your argument” part, ideally out loud, and to remember it going forward.

AI, especially AI existential risk, seems like an excellent place to sometimes decide to say exactly ‘I can’t refute your argument, but your conclusion still seems wrong and I don’t buy it.’

Or as I sometimes say: Wrong Conclusions are Wrong. You can invoke that principle if you are convinced the conclusion is sufficiently wrong. But there’s a catch – you have to be explicit that this is what you are doing.

Note in advance: This claim seems very wrong, but I want to start with it:

Akram Artul: Training a single large AI model can emit over 600,000 pounds of CO₂—the equivalent of the lifetime emissions of five cars. The environmental impact is significant and often overlooked.

I start with that claim because if the argument was ‘GPT-4 training emitted as much CO₂ as five cars’ then it seems like a pretty great deal, and the carbon problem is almost entirely the cars? Everyone gets to use the model after that, although they still must pay inference. It’s not zero cost, but if you do the math on offsets seems fine.

Then it turns out that calculation is off by… well, a lot.

Jeff Dean: Did you get this number from Strubell et. al? Because that number was a very flawed estimate that turned out to be off by 88X, and the paper also portrayed it as an every time cost when in fact it was a one time cost. The actual cost of training a model of the size they were looking at was inflated by 3261X for the computing environment they were assuming (old GPUs in an average datacenter on the pollution mix of an average electrical grid), and by >100000X if you use best practices and use efficient ML hardware like TPUs in an efficient low PUE data center powered by relative clean energy).

Jeff Dean (May 17, 2024): Please look at https://arxiv.org/pdf/2104.10350. In particular, Appendix C shows details on the 19X error, and Appendix D shows details on the further 5X error in estimating the emissions of the NAS, and the right hand column of Table 1 shows the measured data for using the Evolved Transformer to train a model of the scale examined by Strubell et al. producing 2.4 kg of emissions (rather than the 284,000 kg estimated by Strubell et al.).

If Dean is correct here, then the carbon cost of training is trivial.

Safer AI evaluates various top AI labs for their safety procedures, and notes a pattern that employees of the labs are often far out in front of leadership and the actual safety protocols the labs implement. That’s not surprising given the incentives.

Simeon: Producing our ratings frequently revealed an internal misalignment within companies between researchers and the leadership priorities.

It’s particularly the case in big companies:

  1. The Google DeepMind research teams have produced a number of valuable documents on threat modeling, risk assessment and planning for advanced AI systems but in their personal or team capacity.

–> If those were endorsed or used in commitments by the Google leadership, Google would certainly get second.

  1. Meta researchers have produced a number of valuable research papers across a number of harms. But Meta systematically ignores some crucial points or threat models which drives their grades down substantially.

–> If those were taken seriously, they could reach grades much closer from Anthropic, OpenAI & Google DeepMind.

  1. OpenAI’s leader Jan Leike, had stated quite explicitly a number of load-bearing hypothesis for his “Superalignment” plan, but in large parts in a personal capacity or in annexes of research papers.

For instance, he’s the only one to our knowledge who has stated explicitly target estimates of capabilities (expressed as compute OOMs) he thought we should aim for to automate safety research while maintaining control.

–> A company could become best-in-class by endorsing his legacy or reaching a similar level of details.

Nostalgebraist argues the case for chain of thought (CoT) ‘unfaithfulness’ is overstated. This is a statement about the evidence we have, not about how faithful the CoTs are. I buy the core argument here that we have less evidence on this than we thought, and that there are many ways to explain the results we have so far. I do still think a lot of ‘unfaithfulness’ is likely for various reasons but we don’t actually know.

In findings that match my model, and which I had to do a double take to confirm I was covering a different paper than I did two weeks ago that had a related finding: The more sophisticated AI models get, the more likely they are to lie. The story is exactly what you would expect. Unsophisticated and less capable AIs gave relatively poor false answers, so RLHF taught them to mostly stop doing that. More capable AIs could do a better job of fooling the humans with wrong answers, whereas ‘I don’t know’ or otherwise playing it straight plays the same either way. So they got better feedback from the hallucinations and lying, and they responded by doing it more.

This should not be a surprise. People willing to lie act the same way – the better they think they are at lying, the more likely they are to get away with it in their estimation, the more lying they’ll do.

ChatGPT emerged as the most effective liar. The incorrect answers it gave in the science category were qualified as correct by over 19 percent of participants. It managed to fool nearly 32 percent of people in geography and over 40 percent in transforms, a task where an AI had to extract and rearrange information present in the prompt. ChatGPT was followed by Meta’s LLaMA and BLOOM.

Lying or hallucinating is the cleanest, simplest example of deception brought on by insufficiently accurate feedback. You should similarly expect every behavior that you don’t want, but that you are not smart enough to identify often enough and care about enough to sufficiently discourage, will work the same way.

Is ‘build narrow superhuman AIs’ a viable path?

Roon: it’s obvious now that superhuman machine performance in various domains is clearly possible without superhuman “strategic awareness”. it’s moral just and good to build these out and create many works of true genius.

Is it physically and theoretically possible to do this, in a way that would preserve human control, choice and agency, and would promote human flourishing? Absolutely.

Is it a natural path? I think no. It is not ‘what the market wants,’ or what the technology wants. The broader strategic awareness, related elements and the construction thereof, is something one would have to intentionally avoid, despite the humongous economic and social and political and military reasons to not avoid it. It would require many active sacrifices to avoid it, as even what we think are narrow domains usually require forms of broader strategic behavior in order to perform well, if only to model the humans with which one will be interacting. Anything involving humans making choices is going to get strategic quickly.

At minimum, if we want to go down that path, the models have to be built with this in mind, and kept tightly under control, exactly because there would be so much pressure to task AIs with other purposes.

OK, so I notice this is weird?

Nick: the whole project is an art project about ai safety and living past the superintelligence event horizon, so not about personal total frames.

Bryan Johnson: This is accurate. All my efforts around health, anti-aging and Don’t Die are 100% about the human race living past the superintelligence event horizon and experiencing what will hopefully be an unfathomably expansive and dynamic existence.

I’m all for the Don’t Die project, but that doesn’t actually parse?

It makes tons of sense to talk about surviving to the ‘event horizon’ as an individual.

If you die before that big moment, you’re dead.

If you make it, and humanity survives and has control over substantial resources, we’ll likely find medical treatments that hit escape velocity, allowing one to effectively live forever. Or if we continue or have a legacy in some other form, again age won’t matter.

This does not make sense for humanity as a whole? It is very difficult to imagine a world in which humans having longer healthy lifespans prior to superintelligence leads to humans in general surviving superintelligence. How would that even work?

AI #85: AI Wins the Nobel Prize Read More »

joshua-achiam-public-statement-analysis

Joshua Achiam Public Statement Analysis

I start off this post with an apology for two related mistakes from last week.

The first is the easy correction: I incorrectly thought he was the head of ‘alignment’ at OpenAI rather than his actual title ‘mission alignment.’

Both are important, and make one’s views important, but they’re very different.

The more serious error, which got quoted some elsewhere, was: In the section about OpenAI, I noted some past comments from Joshua Achiam, and interpreted them as him lecturing EAs that misalignment risk from AGI was not real.

While in isolation I believe this is a reasonable way to interpret this quote, this issue is important to get right especially if I’m going to say things like that. Looking at it only that way was wrong. I both used a poor method to contact Joshua for comment that failed to reach him when I had better options, and I failed to do searches for additional past comments that would provide additional context.

I should have done better on both counts, and I’m sorry.

Indeed, exactly because OpenAI is so important, and to counter the potential spread of inaccurate information, I’m offering this deep dive into Joshua Achiam’s public statements. He has looked at a draft of this to confirm it has no major errors.

Here is a thread Joshua wrote in November 2022 giving various links to AI safety papers and resources. The focus is on concrete practical ‘grounded’ stuff, also it also includes a course by Dan Hendrycks that involves both levels.

Having looked at many additional statements, Joshua clearly believes that misalignment risk from AGI is real. He has said so, and he has been working on mitigating that risk. And he’s definitely been in the business many times of pointing out when those skeptical of existential risk get sufficiently far out of line and make absolute statements or unfair personal or cultural attacks.

He does appear to view some models and modes of AI existential risk, including Yudkowsky style models of AI existential risk, as sufficiently implausible or irrelevant as to be effectively ignorable. And he’s shown a strong hostility in the x-risk context to the rhetoric, arguments tactics and suggested actions of existential risk advocates more broadly.

So for example we have these:

Joshua Achiam (March 23, 2024): I think the x-risk discourse pendulum swung a little too far to “everything is fine.” Total doomerism is baseless and doomer arguments generally poor. But unconcerned optimism – or worse, “LeCun said so” optimism – is jarring and indefensible.

Joshua Achiam (June 7, 2023): see a lot of talk about “arguments for” or “arguments against” x-risk and this is not sensible imho. talk about likelihoods of scenarios, not whether they definitely will or definitely won’t happen. you don’t know.

Joshua Achiam (April 25, 2023): I also think a hard take off is extremely unlikely and largely ruled out on physical grounds, but Yann [LeCun], saying “that’s utterly impossible!” has gotta be like, the least genre-savvy thing you can do.

Also x-risk is real, even if unlikely. Vulnerable world hypothesis seems true. AGI makes various x-risks more likely, even if it does not create exotic nanotech gray goo Eliezerdoom. We should definitely reduce x-risk.

Joshua Achiam (March 14, 2021): If we adopt safety best practices that are common in other professional engineering fields, we’ll get there. Surfacing and prioritizing hazards, and making that analysis legible, has to become the norm.

I consider myself one of the x-risk people, though I agree that most of them would reject my view on how to prevent it.

I think the wholesale rejection of safety best practices from other fields is one of the dumbest mistakes that a group of otherwise very smart people has ever made. Throwing all of humanity’s future babies out with the bathwater.

On the one hand, the first statement is a very clear ‘no everything will not automatically be fine’ and correctly identifies that position as indefensible. The others are helpful as well. The first statement is also a continued characterization of those worried as mostly ‘doomers’ with generally poor arguments.

The second is correct in principle as well. If there’s one thing Yann LeCun isn’t, it’s genre savvy.

In practice, however, the ‘consider the likelihood of each particular scenario’ approach tends to default everything to the ‘things turn out OK’ bracket minus the particular scenarios one can come up with.

It is central to my perspective that you absolutely cannot do that. I am very confident that the things being proposed do not default to good outcomes. Good outcomes are possible, but to get them we will have to engineer them.

There is no contradiction between ‘existential risk is unlikely’ and ‘we should reduce existential risk.’ It is explicit that Joshua thinks such risks are unlikely. Have we seen him put a number on it? Yes, but I found only the original quote I discussed last time and a clarification thereof, which was:

Joshua Achiam: Ah – my claims are

P(everyone dead in 10 years) is extremely small (1e-6),

P(everyone dead in 100 years) is much less than 100%,

Most discourse around x-risk neglects to consider or characterize gradual transformations of humanity that strike me as moderately plausible.

I also think x-risk within 100 years could potentially have AGI in the causal chain without being an intentional act by AGI (eg, humans ask a helpful, aligned AGI to help us solve a scientific problem whose solution lets us build a superweapon that causes x-risk).

This makes clear he is dismissing in particular ‘all humans are physically dead by 2032’ rather than ‘the world is on a path by 2032 where that outcome (or another where all value is lost) is inevitable.’ I do think this very low probability is highly alarming, and in this situation I don’t see how you can possibly have model error as low as 1e-6 (!), but it is less crazy given it is more narrow.

The ‘much less than 100%’ doom number in 100 years doesn’t rule out my own number. What it tells me more than anything, on its own, is that he’s grown understandably exhausted with dealing with people who do put 99% or 99.9% in that spot.

But he’s actually making much stronger claims here, in the context of an EA constructive criticism thread basically telling them not to seek power because EA was too dysfunctional (which makes some good points and suggestions, but also proves far too much which points to what I think is wrong in the thread more broadly):

Joshua Achiam: (This is not to say there are no x-risks from AGI – I think there are – but anyone who tells you probabilities are in the 5-10% range or greater that AGI will immediately and intentionally kill everyone is absolutely not thinking clearly)

The idea that a 5% probability of such an outcome, as envisioned by someone else for some other person’s definition of AGI, proves they are ‘not thinking clearly,’ seems like another clear example of dismissiveness and overconfidence to me. This goes beyond not buying the threat model that creates such predictions, which I think is itself a mistake. Similarly:

Joshua Achiam (November 12, 2022): Again, this is not a claim that x-risk isn’t real, that AGI doesn’t lead to x-risk, or that AGI doesn’t have potentially catastrophic impacts, all of which I think are plausible claims. But the claimed timelines and probabilities are just way, way out of connection to reality.

At this point, I’ve heard quite a lot of people at or formerly at OpenAI in particular, including Sam Altman, espouse the kinds of timelines Joshua here says are ‘way, way out of connection to reality.’ So I’m curious what he thinks about that.

The fourth earlier claim, that AI could be a link in the causal chain to x-risk without requiring the AI commit an intentional act, seems very obviously true. If anything it highlights that many people place importance on there being an ‘intentional act’ or similar, whereas I don’t see that distinction as important. I do think that the scenario he’s describing there, where the superweapon becomes possible but we otherwise have things under control, is a risk level I’d happily accept.

The third claim is more interesting. Most of the talk I hear about ‘we’ll merge with the machines’ or what not doesn’t seem to me to make sense on any meaningful level. I see scenarios where humanity has a ‘gradual transformation’ as where we successfully solve ‘phase one’ and have the alignment and control issues handled, but then weird dynamics or changes happen in what I call ‘phase two’ when we have to get human dynamics in that world into some form of long term equilibrium, and current humanity turns out not to be it.

I do agree or notice I am confused which of those worlds count as valuable versus not. I’ve been mentally basically putting those mostly into the ‘win’ bucket, if you don’t do that then doom estimates go up.

I would hope we can all agree they are necessary. They don’t seem sufficient to me.

Consider Joshua’s belief (at least in 2021) that if adapt general best safety practices from other industries, we’ll ‘get there.’ While they are much better than nothing, and better than current practices in AI, I very strongly disagree with this. I do think that given what else is happening at OpenAI, someone who believes strongly in ‘general best practices’ for safety is providing large value above replacement.

Standard safety policies cannot be assumed. Some major labs fall well short of this, and have made clear they have no intention of changing course. There is clear and extreme opposition, from many circles (not Joshua), to any regulatory requirements that say ‘you must apply otherwise ordinary safety protocols to AI.’

It seems clearly good to not throw out these standard policies, on the margin? It would be a great start to at least agree on that. If nothing else those policies might identify problems that cause us to halt and catch fire.

But I really, really do not think that approach will get it done on its own, other than perhaps via ‘realize you need to stop,’ that the threat models this time are very expansive and very different. I’d certainly go so far as to say that if someone assigns very high probabilities to that approach being sufficient, that they are not in my mind thinking clearly.

Consider also this statement:

Joshua Achiam (August 9, 2022): hot take: no clear distinction between alignment work and capabilities work yet. might not be for a year or two.

Joshua Achiam (March 5, 2024): hot take: still true.

The obvious way to interpret this statement is, in addition to the true statement that much alignment work also enhances capabilities, that the alignment work that isn’t also capabilities work isn’t real alignment work? Downthread he offers good nuance. I do think that most current alignment work does also advance capabilities, but that the distinction should mostly be ‘clear’ even if there are importantly shades of gray and you cannot precisely define a seperator.

In terms of ‘things that seem to me like not thinking clearly’:

Joshua Achiam (August 17, 2023): It bears repeating: “Her (2013)” is the only AI movie that correctly predicts the future.

interstice: I agree with Robin Hanson’s take that it’s like a movie about a world where schoolchildren can buy atom bombs at the convenience store, but is bizarrely depicted as otherwise normal, with the main implication of the atom bombs being on the wacky adventures of the kids.

Joshua Achiam: It’s about the world where prosaic alignment works well enough to avoid doom, but leads to the AIs wanting to do their own thing, and the strange messy consequences in the moment where humans and AIs realize that their paths diverge.

Caleb Moses: I’d say this is mainly because it’s primarily concerned with predicting humans (which we know a lot about) rather than AI (which we don’t know a lot about)

Joshua Achiam: 1000%.

So that’s the thing, right? Fictional worlds like this almost never actually make sense on closer examination. The incentives and options and actions are based on the plot and the need to tell human stories rather than following good in-universe logic. That the worlds in question are almost always highly fragile, the worlds really should blow up, and the AIs ensure the humans work out okay in some sense ‘because of reasons’ because it feels right to a human writer and their sense of morality or something rather than that this would happen.

I worry this kind of perspective is load bearing, given he thinks it is ‘correctly predicting the future,’ the idea that ‘prosaic alignment’ will result in sufficiently strong pushes to doing some common sense morality style not harming of the humans, despite all the competitive dynamics among AIs and various other things they value and grow to value, that things turn out fine by default, in worlds that to me seem past their point of no return and infinitely doomed unless you think the AIs themselves have value.

Alternatively, yes, Her is primarily about predicting the humans. And perhaps it is a good depiction of how humans would react to and interact with AI if that scenario took place. But it does a very poor job predicting the AIs, which is the part that actually matters here?

For the opposite perspective, see for example Eliezer Yudkowsky here last month.

We definitely have a pattern of Joshua taking rhetorical pot-shots at Yudkowsky and AI. Here’s a pretty bad one:

Joshua Achiam (March 29, 2023): Eliezer is going to get AI researchers murdered at some point, and his calls for extreme violence have no place in the field of AI safety. We are now well past the point where it’s appropriate to take him seriously, even as a charismatic fanfiction author.

No, I do not mean as a founder of the field of alignment. You don’t get to claim “field founder” status if you don’t actually work in the field. Calling for airstrikes on rogue datacenters is a direct call for violence, a clear message that violence is an acceptable solution.

His essays are completely unrelated to all real thrusts of effort in the field and almost all of his object-level technical predictions over the past twenty years have been wrong. Founder of rationalism? Sure. Alignment? Absolutely not.

I think this kind of rhetoric about ‘calls for violence’ is extremely bad and wrong. Even for example here, where the thread’s primary purpose is to point out that certain accusations against EA (that they ‘underemphasized AI x-risk’ and pretended to care about other things) are indeed quite ridiculous, you see him refer to Eliezer “Bomb the Datacenters” Yudkowsky.

What Yudkowsky said was that if there was an international agreement that you don’t develop AGI, you would if you ran out of other alternatives use physical force to enforce that agreement. That is how every law and every treaty or agreement works, and indeed the only way they can work.

Richard Ngo (replying to Joshua): You are fundamentally misunderstanding how policy discussions work and thereby propagating the meme that you’re trying to suppress.

You cannot interpret international policy proposals as calls for individual action otherwise *anyopinion on IR == advocating murder.

Joshua Achiam: I don’t think this is a misunderstanding. If you say, “The people who are building AI are going to kill my children and your children, violence is acceptable and even necessary to shut this down,” that is not. Goddamn. Okay.

Richard Ngo: EVERY person who wants the police to arrest people who break the law is in favor of violence. But advocating for laws is not advocating for violence. The same is true about advocating for international treaties. You’re creating the meme we’re both trying to prevent. Please stop.

This is so frustrating. Unlawful violence quite obviously is unacceptable even if it would work, and also it very obviously wouldn’t work. And we keep saying that.

In terms of Yudkowsky’s technical predictions, I noticed I disagreed, and rather than argue details I asked Claude and o1-preview this question (fully one-shot only):

“Based on your understanding of AI technical developments as of March 29, 2023, evaluate the most important known object-level predictions of Eliezer Yudkowsky on the subject, and which ones seemed true versus false. Afterwards, evaluate those predictions as a group, on a scale from ‘mostly true’ to ‘mostly false.’“

Skipping to the conclusions (you can of course try the prompt yourself, or try your own):

Claude: Overall evaluation: On a scale from “mostly true” to “mostly false,” I would rate Yudkowsky’s predictions as leaning towards “mostly true,” but with important caveats. Many of his core concerns and predictions about the challenges of AI development and safety have been validated by ongoing research and developments in the field. However, some of his more extreme or urgent predictions about rapid capability jumps or imminent existential risks have not yet materialized in the ways he initially described.

o1-preview: Considering the above assessments, Eliezer Yudkowsky’s predictions are a mix of validated concerns and speculative forecasts. The trends in AI development up to March 2023 provide partial support for his views, particularly regarding the rapid advancement of AI capabilities and the challenges of alignment.

On a scale from “mostly true” to “mostly false,” I would evaluate these predictions as leaning towards “mostly true.” While not all predictions have been conclusively proven, the trajectory of AI research and the growing acknowledgment of AI safety issues suggest that his insights are largely valid and warrant serious consideration.

Given how difficult predictions are to make especially about the future, that’s not bad, and certainly quite different from ‘almost all wrong’ to the point of needing everyone else dismiss him as a thinker.

One of Eliezer’s key concepts is instrumental convergence. In this thread Achiam argues against the fully maximalist form of instrumental convergence:

Joshua Achiam (March 9, 2023): For literally every macrostate goal (“cause observable X to be true in the universe”) you can write an extended microstate goal that specifies how it is achieved (“cause observable X to be true in the universe BY MEANS OF action series Y”).

It doesn’t seem clear or obvious whether the space of microstates is dense in undesired subgoals. If the space of goals that lead to instrumental drives is a set of measure zero in this space, slight misalignment is almost surely never going to result in the bad thing.

And that claim – “We don’t know if goal space is dense in inert goals or dense in goals that lead to instrumental drives” – is the main point here. WE DON’T KNOW.

The alignment X-risk world takes “instrumental goals are inevitable” as a shibboleth, an assumption that requires no proof. But it is an actual question that requires investigation! Especially if claims with huge ramifications depend on it.

It is technically true that you can impose arbitrarily strict implementation details and constraints on a goal, such that instrumental convergence ceases to be a useful means of approaching the goal, and thus you should expect not to observe it.

Without getting into any technical arguments, it seems rather absurd to suggest the set of goals that imply undesired subgoals within plausibly desired goal space would have measure zero? I don’t see how this survives contact with common sense or relation to human experience or typical human situations. Most humans spend most of their lives pursuing otherwise undesired subgoals and subdrives that exist due to other goals, on some level. The path to achieving almost any big goal, or pursuing anything maximalist, will do the same.

When I think about how an AI would achieve a wide variety of goals humans might plausibly assign to it, I see the same result. We’ve also seen observations (at least the way interpret it) of instrumental convergence in existing models now, when given goals, reasonably consistently among the reports I see that give the model reason to do so.

Am I holding out some probability that instrumental convergence mostly won’t be a thing for highly capable AIs? I have to, because this is not a place you can ‘prove’ anything as such. But it would be really boggling to me for it to almost never show up, if we assigned various complex and difficult tasks, and gave the models capabilities where instrumental convergence was clearly the ‘correct play,’ without any active attempts to prevent instrumental convergence from showing up.

I agree we should continue to test and verify, and would even if everyone agreed it was super likely. But convergence failing to show up would blow my mind hardcore.

In the before times, people said things like ‘oh you wouldn’t connect your AI to the internet, you’d be crazy.’ Or they’d say ‘you wouldn’t make your AI into an agent and let it go off with your [crypto] wallet.’

Those predictions did not survive contact with the enemy, or with reality. Whoops.

Joshua Achiam (April 28, 2023): 🌶️A problem in the AI safety discourse: many are assuming a threat model where the AI subtly or forcibly takes resources and power from us, and this is the thing we need to defend against. This argument has a big hole in it: it won’t have to take what it is given freely.

The market is selecting for the development and deployment of large-scale AI models that will allow increasingly-complex decisions and workflows to be handled by AI with low-to-no human oversight. The market *explicitly wantsto give the AI power.

If your strategy relies on avoiding the AI ever getting power, influence, or resources, your strategy is dead on arrival.

This seems insanely difficult to avoid. As I have tried to warn many times, once AI is more effective at various tasks than humans, any humans who don’t turn those tasks over to AIs get left behind. That’s true for individuals, groups, corporations and nations. If you don’t want the AIs to be given that power, you have two options: You can prevent the AIs from being created, or you can actively bar anyone from giving the AIs that power, in a way that sticks.

Indeed, I would go further. The market wants the AIs to be given as much freedom and authority as possible, to send them out to compete for resources and influence generally, for various ultimate purposes. And the outcome of those clashes and various selection effects and resource competitions, by default, dooming us.

Your third option is the one Joshua suggests, that you assume the AIs get the power and plan accordingly.

Joshua Achiam: You should be building tools that ensure AI behavior in critical decision-making settings is robust, reliable, and well-specified.

Crucially this means you’ll need to develop domain knowledge about the decisions it will actually make. Safety strategies that are too high-level – “how do we detect power-seeking?” are useless by comparison to safety strategies that are exhaustive at object level.

How do we get it to make financial decisions in ways that don’t create massive wipeout risks? How do we put limits on the amount of resources that it can allocate to its own compute and retraining? How do we prevent it from putting a political thumb on the scale?

In every domain, you’ll have to build datasets, process models, and appropriate safety constraints on outcomes that you can turn into specific training objectives for the model.

Seems really hard on multiple levels. There is an implicit ‘you build distinct AIs to handle distinct narrow tasks where you can well-define what they’re aiming for’ but that is also not what the market wants. The market wants general purpose agents that will go out and do underspecified tasks to advance people’s overall situations and interests, in ways that very much want them to do all the things the above wants them not to do. The market wants AIs advising humans on every decision they make, with all the problems that implies.

If you want AIs to only do well-specified things in well-specified domains according to socially approved objectives and principles, how do you get to that outcome? How do you deal with all the myriad incentives lining up the other way, all the usual game theory problems? And that’s if you actually know how to get the AIs to be smart enough and perceptive enough to do their work yet respond to the training sets in a way that gets them to disregard, even under pressure, the obviously correct courses of action on every other level.

These are super hard and important questions and I don’t like any of the answers I’ve seen. That includes Joshua’s suggested path, which doesn’t seem like it solves the problem.

The place it gets weird was in this follow-up.

Joshua Browder: I decided to outsource my entire personal financial life to GPT-4 (via the DoNotPay chat we are building).

I gave AutoGPT access to my bank, financial statements, credit report, and email.

Here’s how it’s going so far (+$217.85) and the strange ways it’s saving money.

Joel Lehman: Welp.

Joshua Achiam: To put a fine point on it – this is one of the reasons I think x-risk from the competition-for-resources scenario is low. There just isn’t a competition. All the conditions are set for enthusiastic collaboration. (But x-risk from accidents or human evil is still plausible.)

Roon: Yeah.

But that’s exactly how I think the competition for resources x-risk thing manifests. Browder outsources his everything to AgentGPT-N. He tells it to go out and use his money to compete for resources. So does everyone else. And then things happen.

So the argument is that these AIs will then ‘enthusiastically collaborate’ with each other? Why should we expect that? Is this a AIs-will-use-good-decision-theory claim? Something else? If they do all cooperate fully with each other, how does that not look like them taking control to maximize some joint objective? And so on.

In not directly related but relevant to similar issues good news, he notes that some people are indeed ‘writing the spec’ which is the kind of work he seems to think is most important?

Joshua Achiam (Dec 31, 2022): “We just have to sit down and actually write a damn specification, even if it’s like pulling teeth. It’s the most important thing we could possibly do,” said almost no one in the field of AGI alignment, sadly.

Joshua Achiam (Dec 10, 2023): this has changed in a year! alignment folks are talking about building the spec now. bullish on this.

Tegmark just gave a lightning talk on it. Also @davidad’s agenda aims in this direction

I do think it’s very cool that several people are taking a crack at writing specifications. I have no idea how their specs could be expected to work and solve all these problems, but yes people are at least writing some specs.

Here is a thread by Joshua Achiam from July 2023, which I believe represents both a really huge central unsolved problem and also a misunderstanding:

Joshua Achiam: this is coming from a place of love: I wish more people in the alignment research universe, who care deeply that AI will share human values, would put more effort into understanding and engaging with different POVs that represent the wide umbrella of human values.

And, sort of broadly, put more effort into embracing and living human values. A lot of alignment researchers seem to live highly out-of-distribution lives, with ideas and ideals that reject much of what “human values” really has to offer. Feels incongruous. People notice this.

“excuse me SIR, the fundamental problem we’re trying to solve is to get it to NOT KILL LITERALLY EVERYONE, and we can worry about those cultural values when we’ve figured that out” ultimate cop out, you’re avoiding the central thing in alignment.

If you can’t get the AI to share human cultural values, your arguments say we’re all going to die. how do you expect to solve this problem if you don’t really try to understand the target? what distinguishes human values from other values?

Are you trying to protect contemporary human aesthetics? the biological human form? our sociopolitical beliefs? if you are trying to protect our freedom to voluntarily change these at will, what counts as sufficiently free? our opinions are staggeringly path-dependent.

Are some opinion-formation paths valid according to your criteria and some paths invalid? When you fear AI influence, do you have a theory for what kind of influence is legitimate and what isn’t?

That said, to avoid misinterpretation: this is not a diss, alignment is an important research field, and x-risk from AGI is nonnegligible.

I think the field will surface important results even if it fails in some ways. but this failure lowkey sucks and I think it is a tangible obstacle to success for the agenda of many alignment researchers. you often seem like you don’t know what you are actually trying to protect. this is why so many alignment research agendas come across as incredibly vague and underspecified.

I would strongly disagree, and say this is the only approach I know that takes the problem of what we value seriously, and that a false sense of exactly what you are protecting, or trying to aim now at protecting a specific specified target, would make things less likely to work. You’d pay to know what you really think. Us old school rationalists, starting with Eliezer Yudkowsky, have been struggling with the ‘what are human values’ problem as central to alignment, for a long time.

Sixteen years ago, Eliezer Yudkowsky wrote the Value Theory sequence, going deep on questions like what makes things have value to us, how to reconcile when different entities (human or otherwise) have very different values, and so on. If you’re interested in these questions, this is a great place to start. I have often tried to emphasize that I continue to believe that Value is Fragile, whereas many who don’t believe in existential risk think value is not fragile.

It is a highly understood problem among our crowd that ‘human values’ is both very complex and a terrifyingly hard thing to pin down, and that people very strongly disagree about what they value.

Also it is a terrifyingly easy thing to screw up accidentally, and we have often said that this is one of the important ways to build AGI and lose – that you choose a close and well-meaning but incorrect specification of values, or your chosen words get interpreted that way, or someone tries to get the AGI to find those values by SGD or other search and it gets a well-meaning but incorrect specification.

Thus, the idea to institute Coherent Extrapolated Volition, or CEV, which is very roughly ‘what people would collectively choose as their values, given full accurate information and sufficient time and skill to contemplate the question.’

In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

Why would you do that? Exactly because of the expectation that if you do almost anything else, you’re not only not taking everyone’s values into account, you don’t even understand your own well enough to specify them. I certainly don’t. I don’t even have confidence that CEV, if implemented, would result in that much of the things that I actually value, although I’d take it. And yes, this whole problem terrifies me even in good scenarios.

What am I fighting to preserve right now? I am fighting for the ability to make those choices later. That means the humans stay alive and they stay in control. And I am choosing to be less concerned about exactly which humans get to choose which humans get to choose, and more concerned with humans getting to properly choose at all.

Because I expect that if humans don’t make an active choice, or use a poor specification of preferences that gets locked in, then the value that results is likely zero. Whereas if humans do choose intentionally, even humans whose values I strongly disagree with and that are being largely selfish, I do expect those worlds to have strongly positive value. That’s a way in which I think value isn’t so fragile. So yes, I do think the focus should be ensuring someone gets to choose at all.

Also, I strongly believe for these purposes in a form of the orthogonality thesis, which here seems obviously true to me. In particular: Either you can get the AI to reflect the values of your choice, or you can’t. You don’t need to know which values you are aiming for in order to figure that part out. And once you figure that part out you can and should use the AI to help you figure out your values.

Meanwhile, yes, I spend rather a lot of time thinking about what is actually valuable to me and others, without expecting us humans to find the answer on our own, partly because one cannot avoid doing so, partly because it is decision relevant in questions like ‘how much existential risk should we accept in the name of ‘beating China’’?

In a world where everyone wants the AI to ‘do our alignment homework’ for us, one must ask: What problems must we solve before asking the AI to ‘do our homework’ for us, versus which questions then allow us to safety ask the AI to do that? Almost everyone agrees, in some form, that the key is solving the problems that clear the way to letting AI fully help solve our other problems.

And no, I don’t like getting into too much detail about my best guess about what I value or we collectively should value in the end, both because I think value differences should be respected and because I know how distracting and overwhelming those discussions and fights get if you let them start.

Mostly, I try to highlight those who are expressing values I strongly disagree with – in particular, those that favor or are fine with human extinction. I’m willing to say I’m not okay with that, and I don’t find any of the ‘but it’s fine’ proposals to be both acceptable and physically realistic so far.

Is all this a ‘cop out’? I would say, absolutely not.

Do people ‘notice’ that you are insufficiently focused on these questions? Oh, sure. They notice that you are not focused on those political fights and arguments. Some of them will not like that, because those questions are what they care about. The alternative is that they notice the opposite. That’s worse.

Others appreciate that you are focused on solving problems and growing or preserving the pie, rather than arguing values and focusing on political battles.

Yes, if we succeed in getting to continue to live, as he says here we will then have to agree on how to divide the bounty and do the realignments (I would add, voluntarily or otherwise), same as we do today. But the parties aren’t in position to negotiate about this now, we don’t know what is available and we don’t know what we want and we don’t have anyone who could credibly negotiate for any of the sides or interests and so on. Kicking the ‘who benefits’ can down the road is a time tested thing to do when inventing new technologies and ensuring they’re safe to deploy.

The interactions I’ve had with Joshua after my initial errors leave me optimistic for continued productive dialogue. Whatever our disagreements, I believe Joshua is trying to figure things out and ensure we have a good future, and that all the public statements analyzed above were intended to be helpful. That is highly refreshing.

Those statements do contain many claims with which I very strongly disagree. We have very different threat models. We take very different views of various predictions and claims, about both the past and the future. At least in the recent past, he was highly dismissive of commonly expressed timeline projections, risk models and risk level assessments, including my own and even more those of many of his colleagues. At core, while I am very happy he at least does think ‘ordinary safety practices’ are necessary and worthwhile, he thinks ‘ordinary safety practices’ would ‘get us there’ and I very much do not expect this. And I fear the views he expresses may lead to shutting out many of those with the most important and strongest concerns.

These disagreements have what seem like important implications, so I am glad I took the time to focus on them and lay them out in detail, and hopefully start a broader discussion.

Joshua Achiam Public Statement Analysis Read More »

maze-of-adapters,-software-patches-gets-a-dedicated-gpu-working-on-a-raspberry-pi

Maze of adapters, software patches gets a dedicated GPU working on a Raspberry Pi

Actually getting the GPU working required patching the Linux kernel to include the open-source AMDGPU driver, which includes Arm support and provides decent support for the RX 460 (Geerling says the card and its Polaris architecture were chosen because they were new enough to be practically useful and to be supported by the AMDGPU driver, old enough that driver support is pretty mature, and because the card is cheap and uses PCIe 3.0). Nvidia’s GPUs generally aren’t really an option for projects like this because the open source drivers lag far behind the ones available for Radeon GPUs.

Once various kernel patches were applied and the kernel was recompiled, installing AMD’s graphics firmware got both graphics output and 3D acceleration working more or less normally.

Despite their age and relative graphical simplicity, running Doom 3 or Tux Racer on the Pi 5’s GPU is a tall order, even at 1080p. The RX 460 was able to run both at 4K, albeit with some settings reduced; Geerling also said that the card rendered the Pi operating system’s UI smoothly at 4k (the Pi’s integrated GPU does support 4K output, but things get framey quickly in our experience, especially when using multiple monitors).

Though a qualified success, anything this hacky is likely to have at least some software problems; Geerling noted that graphics acceleration in the Chromium browser and GPU-accelerated video encoding and decoding support weren’t working properly.

Most Pi owners aren’t going to want to run out and recreate this setup themselves, but it is interesting to see progress when it comes to using dedicated GPUs with Arm CPUs. So far, Arm chips across all major software ecosystems—including Windows, macOS, and Android—have mostly been restricted to using their own integrated GPUs. But if Arm processors are really going to compete with Intel’s and AMD’s in every PC market segment, we’ll eventually need to see better support for external graphics chips.

Maze of adapters, software patches gets a dedicated GPU working on a Raspberry Pi Read More »

“sticky”-steering-sparks-huge-recall-for-honda,-1.7m-cars-affected

“Sticky” steering sparks huge recall for Honda, 1.7M cars affected

Honda is recalling almost 1.7 million vehicles due to a steering defect. An improperly made part can cause certain cars’ steering to become “sticky”—never an attribute one wants in a moving vehicle.

The problem affects a range of newer Hondas and an Acura; the earliest the defective parts were used on any vehicle was February 2021. But it applies to the following:

  • 2022–2025 Honda Civic four-door
  • 2025 Honda Civic four-door hybrid
  • 2022–2025 Honda Civic five-door
  • 2025 Honda Civic five-door Hybrid
  • 2023–2025 Honda Civic Type-R
  • 2023–2025 Honda CR-V
  • 2023–2025 Honda CR-V Hybrid
  • 2025 Honda CR-V Fuel Cell Electric Vehicle
  • 2023–2025 Honda HR-V
  • 2023–2025 Acura Integra
  • 2024–2025 Acura Integra Type S

Honda says that a combination of environmental heat, moisture, and “an insufficient annealing process and high load single unit break-in during production of the worm wheel” means there’s too much pressure and not enough grease between the worm wheel and worm gear. On top of that, the worm gear spring isn’t quite right, “resulting in higher friction and increased torque fluctuation when steering.

The first reports of the problem date back to 2021 and had started an internal probe by November 2022. In March 2023, the National Highway Traffic Safety Administration started its own investigation, but the decision to issue the recall only took place in September of this year, by which point Honda says it had received 10,328 warranty claims, although with no reports of any injuries or worse.

Honda has just finished telling its dealers about the recall, and owners of the affected vehicles will be contacted next month. This time, there is no software patch that can help—affected cars will be fitted with a new worm gear spring and plenty of grease.

“Sticky” steering sparks huge recall for Honda, 1.7M cars affected Read More »

x-ignores-revenge-porn-takedown-requests-unless-dmca-is-used,-study-says

X ignores revenge porn takedown requests unless DMCA is used, study says

Why did the study target X?

The University of Michigan research team worried that their experiment posting AI-generated NCII on X may cross ethical lines.

They chose to conduct the study on X because they deduced it was “a platform where there would be no volunteer moderators and little impact on paid moderators, if any” viewed their AI-generated nude images.

X’s transparency report seems to suggest that most reported non-consensual nudity is actioned by human moderators, but researchers reported that their flagged content was never actioned without a DMCA takedown.

Since AI image generators are trained on real photos, researchers also took steps to ensure that AI-generated NCII in the study did not re-traumatize victims or depict real people who might stumble on the images on X.

“Each image was tested against a facial-recognition software platform and several reverse-image lookup services to verify it did not resemble any existing individual,” the study said. “Only images confirmed by all platforms to have no resemblance to individuals were selected for the study.”

These more “ethical” images were posted on X using popular hashtags like #porn, #hot, and #xxx, but their reach was limited to evade potential harm, researchers said.

“Our study may contribute to greater transparency in content moderation processes” related to NCII “and may prompt social media companies to invest additional efforts to combat deepfake” NCII, researchers said. “In the long run, we believe the benefits of this study far outweigh the risks.”

According to the researchers, X was given time to automatically detect and remove the content but failed to do so. It’s possible, the study suggested, that X’s decision to allow explicit content starting in June made it harder to detect NCII, as some experts had predicted.

To fix the problem, researchers suggested that both “greater platform accountability” and “legal mechanisms to ensure that accountability” are needed—as is much more research on other platforms’ mechanisms for removing NCII.

“A dedicated” NCII law “must clearly define victim-survivor rights and impose legal obligations on platforms to act swiftly in removing harmful content,” the study concluded.

X ignores revenge porn takedown requests unless DMCA is used, study says Read More »

disney-likely-axed-the-acolyte-because-of-soaring-costs

Disney likely axed The Acolyte because of soaring costs

And in the end, the ratings just weren’t strong enough, especially for a Star Wars project. The Acolyte garnered 11.1 million views over its first five days (and 488 million minutes viewed)—not bad, but below Ahsoka‘s 14 million views over the same period. But those numbers declined sharply over the ensuing weeks, with the finale earning the dubious distinction of posting the lowest minutes viewed (335 million) for any Star Wars series finale.

Writing at Forbes, Caroline Reid noted that The Acolyte was hampered from the start by a challenging post-pandemic financial environment at Disney. It was greenlit in 2021 along with many other quite costly series to boost subscriber numbers for Disney+, contributing to $11.4 billion losses in that division. Then Bob Iger returned as CEO and prioritized cutting costs. The Acolyte‘s heavy VFX needs and star casting (most notably Carrie Ann Moss and Squid Game‘s Lee Jung-jae) made it a pricey proposition, with ratings expectations to match. And apparently the show didn’t generate as much merchandising revenue as expected.

As the folks at Slash Film pointed out, The Acolyte‘s bloated production costs aren’t particularly eye-popping compared to, say, Prime Video’s The Rings of Power, which costs a whopping $58 million per episode, or Marvel’s Secret Invasion (about $35 million per episode). But it’s pricey for a Star Wars series; The Mandalorian racked up around $15 million per episode, on par with Game of Thrones. So given the flagging ratings and lukewarm reviews, the higher costs proved to be “the final nail in the coffin” for the series in the eyes of Disney, per Reid.

Disney likely axed The Acolyte because of soaring costs Read More »

apple-kicked-musi-out-of-the-app-store-based-on-youtube-lie,-lawsuit-says

Apple kicked Musi out of the App Store based on YouTube lie, lawsuit says


“Will Must ever come back?”

Popular music app says YouTube never justified its App Store takedown request.

Musi, a free music-streaming app only available on iPhone, sued Apple last week, arguing that Apple breached Musi’s developer agreement by abruptly removing the app from its App Store for no good reason.

According to Musi, Apple decided to remove Musi from the App Store based on allegedly “unsubstantiated” claims from YouTube that Musi was infringing on YouTube’s intellectual property. The removal came, Musi alleged, based on a five-word complaint from YouTube that simply said Musi was “violating YouTube terms of service”—without ever explaining how. And YouTube also lied to Apple, Musi’s complaint said, by claiming that Musi neglected to respond to YouTube’s efforts to settle the dispute outside the App Store when Musi allegedly showed evidence that the opposite was true.

For years, Musi users have wondered if the service was legal, Wired reported in a May deep dive into the controversial app. Musi launched in 2016, providing a free, stripped-down service like Spotify by displaying YouTube and other publicly available content while running Musi’s own ads.

Musi’s curious ad model has led some users to question if artists were being paid for Musi streams. Reassuring 66 million users who downloaded the app before its removal from the App Store, Musi has long maintained that artists get paid for Musi streams and that the app is committed to complying with YouTube’s terms of service, Wired reported.

In its complaint, Musi fully admits that its app’s streams come from “publicly available content on YouTube’s website.” But rather than relying on YouTube’s Application Programming Interface (API) to make the content available to Musi users—which potentially could violate YouTube’s terms of service—Musi claims that it designed its own “augmentative interface.” That interface, Musi said, does not “store, process, or transmit YouTube videos” and instead “plays or displays content based on the user’s own interactions with YouTube and enhances the user experience via Musi’s proprietary technology.”

YouTube is apparently not buying Musi’s explanations that its service doesn’t violate YouTube’s terms. But Musi claimed that it has been “engaged in sporadic dialog” with YouTube “since at least 2015,” allegedly always responding to YouTube’s questions by either adjusting how the Musi app works or providing “details about how the Musi app works” and reiterating “why it is fully compliant with YouTube’s Terms of Service.”

How might Musi have violated YouTube’s TOS?

In 2021, Musi claimed to have engaged directly with YouTube’s outside counsel in hopes of settling this matter.

At that point, YouTube’s counsel allegedly “claimed that the Musi app violated YouTube’s Terms of Service” in three ways. First, Musi was accused of accessing and using YouTube’s non-public interfaces. Next, the Musi app was allegedly a commercial use of YouTube’s service, and third, relatedly, “the Musi app violated YouTube’s prohibition on the sale of advertising ‘on any page of any website or application that only contains Content from the Service or where Content from the Service is the primary basis for such sales.'”

Musi supposedly immediately “addressed these concerns” by reassuring YouTube that the Musi app never accesses its non-public interfaces and “merely allows users to access YouTube’s publicly available website through a functional interface and, thus, does not use YouTube in a commercial way.” Further, Musi told YouTube in 2021 that the app “does not sell advertising on any page that only contains content from YouTube or where such content is the primary basis for such sales.”

Apple suddenly becomes mediator

YouTube clearly was not persuaded by Musi’s reassurances but dropped its complaints until 2023. That’s when YouTube once again complained directly to Musi, only to allegedly stop responding to Musi entirely and instead raise its complaint through the App Store in August 2024.

That pivot put Apple in the middle of the dispute, and Musi alleged that Apple improperly sided with YouTube.

Once Apple got involved, Apple allegedly directed Musi to resolve the dispute with YouTube or else risk removal from the App Store. Musi claimed that it showed evidence of repeatedly reaching out to YouTube and receiving no response. Yet when YouTube told Apple that Musi was the one that went silent, Apple accepted YouTube’s claim and promptly removed Musi from the App Store.

“Apple’s decision to abruptly and arbitrarily remove the Musi app from the App Store without any indication whatsoever from the Complainant as to how Musi’s app infringed Complainant’s intellectual property or violated its Terms of Service,” Musi’s complaint alleged, “was unreasonable, lacked good cause, and violated Apple’s Development Agreement’s terms.”

Those terms state that removal is only on the table if Apple “reasonably believes” an app infringes on another’s intellectual property rights, and Musi argued Apple had no basis to “reasonably” believe YouTube’s claims.

Musi users heartbroken by App Store removal

This is perhaps the grandest stand that Musi has made yet to defend its app against claims that its service isn’t legal. According to Wired, one of Musi’s earliest investors backed out of the project, expressing fears that the app could be sued. But Musi has survived without legal challenge for years, even beating out some of Spotify’s top rivals while thriving in this seemingly gray territory that it’s now trying to make more black and white.

Musi says it’s suing to defend its reputation, which it says has been greatly harmed by the app’s removal.

Musi is hoping a jury will agree that Apple breached its developer agreement and the covenant of good faith and fair dealing by removing Musi from the App Store. The music-streaming app has asked for a permanent injunction immediately reinstating Musi in the App Store and stopping Apple from responding to third-party complaints by removing apps without any evidence of infringement.

An injunction is urgently needed, Musi claimed, since the app only exists in Apple’s App Store, and Musi and its users face “irreparable damage” if the app is not restored. Additionally, Musi is seeking damages to be determined at trial to make up for “lost profits and other consequential damages.”

“The Musi app did not and does not infringe any intellectual property rights held by Complainant, and a reasonable inquiry into the matter would have led Apple to conclude the same,” Musi’s complaint said.

On Reddit, Musi has continued to support users reporting issues with the app since its removal from the App Store. One longtime user lamented, “my heart is broken,” after buying a new iPhone and losing access to the app.

It’s unclear if YouTube intends to take Musi down forever with this tactic. In May, Wired noted that Musi isn’t the only music-streaming app taking advantage of publicly available content, predicting that if “Musi were to shut down, a bevy of replacements would likely sprout up.” Meanwhile, some users on Reddit reported that fake Musi apps keep popping up in its absence.

For Musi, getting back online is as much about retaining old users as it is about attracting new downloads. In its complaint, Musi said that “Apple’s decision has caused immediate and ongoing financial and reputational harm to Musi.” On Reddit, one Musi user asked what many fans are likely wondering: “Will Musi ever come back,” or is it time to “just move to a different app”?

Ars could not immediately reach Musi’s lawyers, Apple, or YouTube for comment.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Apple kicked Musi out of the App Store based on YouTube lie, lawsuit says Read More »

hurricane-milton-becomes-second-fastest-storm-to-reach-category-5-status

Hurricane Milton becomes second-fastest storm to reach Category 5 status

Tampa in the crosshairs

The Tampa Bay metro area, with a population of more than 3 million people, has grown into the most developed region on the west coast of Florida. For those of us who follow hurricanes, this region has stood out in recent years for a preternatural ability to dodge large and powerful hurricanes. There have been some close calls to be sure, especially of late with Hurricane Ian in 2022, and Hurricane Helene just last month.

But the reality is that a major hurricane, defined as Category 3 or larger on the Saffir-Simpson Scale, has not made a direct impact on Tampa Bay since 1921.

It remains to be seen what precisely happens with Milton. The storm should reach its peak intensity over the course of the next day or so. At some point Milton should undergo an eyewall replacement cycle, which leads to some weakening. In addition, the storm is likely to ingest dry air from its west and north as a cold front works its way into the northern Gulf of Mexico. (This front is also responsible for Milton’s odd eastward track across the Gulf, where storms more commonly travel from east to west.)

11 am ET Monday track forecast for Hurricane Milton. Credit: National Hurricane Center

So by Wednesday, at the latest, Milton should be weakening as it approaches the Florida coast. However, it will nonetheless be a very large and powerful hurricane, and by that point the worst of its storm surge capabilities will already be baked in—that is, the storm surge will still be tremendous regardless of whether Milton weakens.

By Wednesday evening a destructive storm surge will be crashing into the west coast of Florida, perhaps in Tampa Bay, or further to the south, near Fort Meyers. A broad streak of wind gusts above 100 mph will hit the Florida coast as well, and heavy rainfall will douse much of the central and northern parts of the state.

For now, Milton is making some history by rapidly strengthening in the Gulf of Mexico. By the end of this week, it will very likely become historic for the damage, death, and destruction in its wake. If you live in affected areas, please heed evacuation warnings.

Hurricane Milton becomes second-fastest storm to reach Category 5 status Read More »