Author name: Ryan Harris

spacex-launches-europe’s-hera-asteroid-mission-ahead-of-hurricane-milton

SpaceX launches Europe’s Hera asteroid mission ahead of Hurricane Milton


The launch of another important mission, NASA’s Europa Clipper, is on hold due to Hurricane Milton.

The European Space Agency’s Hera spacecraft flies away from the Falcon 9 rocket’s upper stage a little more than an hour after liftoff Monday. Credit: SpaceX

Two years ago, a NASA spacecraft smashed into a small asteroid millions of miles from Earth to test a technique that could one day prove useful to deflect an object off a collision course with Earth. The European Space Agency launched a follow-up mission Monday to go back to the crash site and see the damage done.

The nearly $400 million (363 million euro) Hera mission, named for the Greek goddess of marriage, will investigate the aftermath of a cosmic collision between NASA’s DART spacecraft and the skyscraper-size asteroid Dimorphos on September 26, 2022. NASA’s Double Asteroid Redirection Test mission was the first planetary defense experiment, and it worked, successfully nudging Dimorphos off its regular orbit around a larger companion asteroid named Didymos.

But NASA had to sacrifice the DART spacecraft in the deflection experiment. Its destruction meant there were no detailed images of the condition of the target asteroid after the impact. A small Italian CubeSat deployed by DART as it approached Dimorphos captured fuzzy long-range views of the collision, but Hera will perform a comprehensive survey when it arrives in late 2026.

“We are going to have a surprise to see what Dimorphos looks like, which is, first, scientifically exciting, but also important because if we want to validate the technique and validate the model that can reproduce the impact, we need to know the final outcome,” said Patrick Michel, principal investigator on the Hera mission from Côte d’Azur Observatory in Nice, France. “And we don’t have it. With Hera, it’s like a detective going back to the crime scene and telling us what really happened.”

Last ride before the storm

The Hera spacecraft, weighing in at 2,442 pounds (1,108 kilograms), lifted off on top of a SpaceX Falcon 9 rocket at 10: 52 am EDT (14: 52 UTC) Monday from Cape Canaveral Space Force Station, Florida.

Officials weren’t sure the weather conditions at Cape Canaveral would permit a launch Monday, with widespread rain showers and a blanket of cloud cover hanging over Florida’s Space Coast. But the conditions were just good enough to be acceptable for a rocket launch, and the Falcon 9 lit its nine kerosene-fueled engines to climb away from pad 40 after a smooth countdown.

SpaceX’s Falcon 9 rocket lifts off from Cape Canaveral Space Force Station, Florida, with ESA’s Hera mission.

Credit: SpaceX

SpaceX’s Falcon 9 rocket lifts off from Cape Canaveral Space Force Station, Florida, with ESA’s Hera mission. Credit: SpaceX

This was probably the final opportunity to launch Hera before the spaceport shutters in advance of Hurricane Milton, a dangerous Category 5 storm taking aim at the west coast of Florida. If the mission didn’t launch Monday, SpaceX was prepared to move the Falcon 9 rocket and the Hera spacecraft back inside a hangar for safekeeping until the storm passes.

Meanwhile, at NASA’s Kennedy Space Center a few miles away, SpaceX is securing a Falcon Heavy rocket with the Europa Clipper spacecraft to ride out Hurricane Milton inside a hangar at Launch Complex 39A. Europa Clipper is a $5.2 billion flagship mission to explore Jupiter’s most enigmatic icy moon, and it was supposed to launch Thursday, the same day Hurricane Milton will potentially move over Central Florida.

NASA announced Sunday that it is postponing Europa Clipper’s launch until after the storm.

“The safety of launch team personnel is our highest priority, and all precautions will be taken to protect the Europa Clipper spacecraft,” said Tim Dunn, senior launch director at NASA’s Launch Services Program. “Once we have the ‘all-clear’ followed by facility assessment and any recovery actions, we will determine the next launch opportunity for this NASA flagship mission.”

Europa Clipper must launch by November 6 in order to reach Jupiter and its moon Europa in 2030. ESA’s Hera mission had a similarly tight window to get off the ground in October and arrive at asteroids Didymos and Dimorphos in December 2026.

Returning to flight

The Falcon 9 did its job Monday, accelerating the Hera spacecraft to a blistering speed of 26,745 mph (43,042 km/hr) with successive burns by its first stage booster and upper stage engine. This was the highest-speed payload injection ever achieved by SpaceX.

SpaceX did not attempt to recover the Falcon 9’s reusable booster on Monday’s flight because Hera needed all of the rocket’s oomph to gain enough speed to escape the pull of Earth’s gravity.

“Good launch, good orbit, and good payload deploy,” wrote Kiko Dontchev, SpaceX’s vice president of launch, on X.

This was the first Falcon 9 launch in nine days—an unusually long gap between SpaceX missions—after the rocket’s upper stage misfired during a maneuver to steer itself out of orbit following an otherwise successful launch September 28 with a two-man crew heading for the International Space Station.

The upper stage engine apparently “over-burned,” and the rocket debris fell into the atmosphere short of its expected reentry corridor in the Pacific Ocean, sources said. The Federal Aviation Administration grounded the Falcon 9 rocket while SpaceX investigates the malfunction, but the FAA granted approval for SpaceX to launch the Hera mission because its trajectory would carry the rocket away from Earth, rather than back into the atmosphere for reentry.

“The FAA has determined that the absence of a second stage reentry for this mission adequately mitigates the primary risk to the public in the event of a reoccurrence of the mishap experienced with the Crew-9 mission,” the FAA said in a statement.

Members of the Hera team from ESA and its German prime contractor, OHB, pose with the spacecraft inside SpaceX’s payload processing facility in Florida.

Credit: SpaceX

Members of the Hera team from ESA and its German prime contractor, OHB, pose with the spacecraft inside SpaceX’s payload processing facility in Florida. Credit: SpaceX

This was the third time the FAA has grounded SpaceX’s Falcon 9 rocket fleet in less than three months, following another upper stage failure in July that caused the destruction of 20 Starlink Internet satellites and the crash-landing of a Falcon 9 booster on an offshore drone ship in August. Federal regulators are responsible for ensuring commercial rocket launches don’t endanger the public.

These were the first major anomalies on any Falcon 9 launch since 2021.

It’s not clear when the FAA will clear SpaceX to resume launching other Falcon 9 missions. However, the launch of the Europa Clipper mission on a Falcon Heavy rocket, which uses essentially the same upper stage as a Falcon 9, is not licensed by the FAA because it is managed by NASA, another government agency. NASA will have final authority on whether to give the green light for the launch of Europa Clipper.

Surveying the damage

ESA’s Hera spacecraft is on course for a flyby of Mars next March to take advantage of the red planet’s gravity to slingshot itself on a trajectory to intercept its twin target asteroids. Near Mars, Hera will zoom relatively close to the planet’s asteroid-like moon, Deimos, to obtain rare closeups.

Then, Hera will approach Didymos and Dimorphos a little more than two years from now, maneuvering around the binary asteroid system at a range of distances, eventually moving as close as about a half-mile (1 kilometer) away.

Italy’s LICIACube spacecraft snapped this image of asteroids Didymos (lower left) and Dimorphos (upper right) a few minutes after the impact of DART on September 26, 2022.

Credit: ASI/NASA

Italy’s LICIACube spacecraft snapped this image of asteroids Didymos (lower left) and Dimorphos (upper right) a few minutes after the impact of DART on September 26, 2022. Credit: ASI/NASA

Dimorphos orbits Didymos once every 11 hours and 23 minutes, roughly 32 minutes shorter than the orbital period before DART’s impact in 2022. This change in orbit proved the effectiveness of a kinetic impactor in deflecting an asteroid that threatens Earth.

Dimorphos, the smaller of the two asteroids, has a diameter of around 500 feet (150 meters), while Didymos measures approximately a half-mile (780 meters) wide. Neither asteroid poses a risk to Earth, so NASA chose them as the objective for DART.

The Hubble Space Telescope spotted a debris field trailing the binary asteroid system after DART’s impact. Astronomers identified at least 37 boulders drifting away from the asteroids, material ejected when the DART spacecraft slammed into Dimorphos at a velocity of 14,000 mph (22,500 kmh).

Scientists will use Hera, with its suite of cameras and instruments, to study how the strike by DART changed the asteroid Dimorphos. Did the impact leave a crater, or did it reshape the entire asteroid? There are “tentative hints” that the asteroid’s shape changed after the collision, according to Michael Kueppers, Hera’s project scientist at ESA.

“If this is the case, it would also mean that the cohesion of Dimorphos is extremely low; that indeed, even an object the size of Dimorphos would be held together by its weight, by its gravity, and not by cohesion,” Kueppers said. “So it really would be a rubble pile.”

Hera will also measure the mass of Dimorphos, something DART was unable to do. “That is important to measure the efficiency of the impact… which was the momentum that was transferred from the impacting satellite to the asteroid,” Kueppers said.

This NASA/ESA Hubble Space Telescope image of the asteroid Dimorphos was taken on December 19, 2022, nearly three months after the asteroid was impacted by NASA’s DART mission. Hubble’s sensitivity reveals a few dozen boulders knocked off the asteroid by the force of the collision.

Credit: NASA, ESA, D. Jewitt (UCLA)

This NASA/ESA Hubble Space Telescope image of the asteroid Dimorphos was taken on December 19, 2022, nearly three months after the asteroid was impacted by NASA’s DART mission. Hubble’s sensitivity reveals a few dozen boulders knocked off the asteroid by the force of the collision. Credit: NASA, ESA, D. Jewitt (UCLA)

The central goal of Hera is to fill the gaps in knowledge about Didymos and Dimorphos. Precise measurements of DART’s momentum, coupled with a better understanding of the interior structure of the asteroids, will allow future mission planners to know how best to deflect a hazardous object threatening Earth.

“The third part is to generally investigate the two asteroids to know their physical properties, their interior properties, their strength, essentially to be able to extrapolate or to scale the outcome of DART to another impact should we really need it one day,” Kueppers said.

Hera will release two briefcase-size CubeSats, named Juventas and Milani, to work in concert with ESA’s mothership. Juventas carries a compact radar to probe the internal structure of the smaller asteroid and will eventually attempt a landing on Dimorphos. Milani will study the mineral composition of individual boulders around DART’s impact site.

“This is the first time that we send a spacecraft to a small body, which is actually a multi-satellite system, with one main spacecraft and two CubeSats doing closer proximity operations,” Michel said. “This has never been done.”

Artist’s illustration of the Hera spacecraft with its two deployable CubeSats, Juventas and Milani, in the vicinity of the Didymos binary asteroid system. The CubeSats will communicate with ground teams via radio links with the Hera mothership.

Credit: ESA-Science Office

Artist’s illustration of the Hera spacecraft with its two deployable CubeSats, Juventas and Milani, in the vicinity of the Didymos binary asteroid system. The CubeSats will communicate with ground teams via radio links with the Hera mothership. Credit: ESA-Science Office

One source of uncertainty, and perhaps worry, about the environment around Didymos and Dimorphos is the status of the debris field observed by Hubble a few months after DART’s impact. But this is not likely to be a problem, according to Kueppers.

“I’m not really worried about potential boulders at Didymos,” he said, recalling the relative ease with which ESA’s Rosetta spacecraft navigated around an active comet from 2014 through 2016.

Ignacio Tanco, ESA’s flight director for Hera, doesn’t share Kuepper’s optimism.

“We didn’t hit the comet with a hammer,” said Tanco, who is responsible for keeping the Hera spacecraft safe. “The debris question for me is actually a source of… I wouldn’t say concern, but certainly precaution. It’s something that we’ll need to approach carefully once we get there.”

“That’s the difference between an engineer and a scientist,” Kuepper joked.

Scientists originally wanted Hera to be in the vicinity of the Didymos binary asteroid system before DART’s arrival, allowing it to directly observe the impact and its fallout. But ESA’s member states did not approve funding for the Hera mission in time, and the space agency only signed the contract to build the Hera spacecraft in 2020.

ESA first studied a mission like DART and Hera more than 20 years ago, when scientists proposed a mission called Don Quijote to get an asteroid deflection. But other missions took priority in Europe’s space program. Now, Hera is on course to write the final chapter of the story of humanity’s first planetary defense test.

“This is our contribution of ESA to humanity to help us in the future protect our planet,” said Josef Aschbacher, ESA’s director general.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

SpaceX launches Europe’s Hera asteroid mission ahead of Hurricane Milton Read More »

new-kuiper-belt-objects-lurk-farther-away-than-we-ever-thought

New Kuiper Belt objects lurk farther away than we ever thought


Our Solar System’s Kuiper Belt appears to be substantially larger than we thought.

Diagram of the Solar System, showing the orbits of some planets, the Kuiper Belt, and New Horizons' path among them.

Back in 2017, NASA graphics indicated that New Horizons would be at the outer edge of the Kuiper Belt by around 2020. That hasn’t turned out to be true. Credit: NASA

Back in 2017, NASA graphics indicated that New Horizons would be at the outer edge of the Kuiper Belt by around 2020. That hasn’t turned out to be true. Credit: NASA

In the outer reaches of the Solar System, beyond the ice giant Neptune, lies a ring of comets and dwarf planets known as the Kuiper Belt. The closest of these objects are billions of kilometers away. There is, however, an outer limit to the Kuiper Belt. Right?

Until now, it was thought there was nothing beyond 48 AU (astronomical units) from the Sun, (one AU is slightly over 150 million km). It seemed there was little beyond that. That changed when NASA’s New Horizons team detected 11 new objects lurking from 60 to 80 AU. What was thought to be empty space turned out to be a gap between the first ring of Kuiper Belt objects and a new, second ring. Until now, it was thought that our Solar System is unusually small when compared to exosolar systems, but it evidently extends farther out than anyone imagined.

While these objects are only currently visible as pinpoints of light, and Fraser is allowing room for error until the spacecraft gets closer, what their existence could tell us about the Kuiper Belt and the possible origins of the Solar System is remarkable.

Living on the edge

The extreme distance of the new objects has put them in a class all their own. Whether they are similar to other Kuiper Belt objects in morphology and composition remains unknown since they are so faint. As New Horizons approaches them, observations are now simultaneously being made with its LORRI (Long Range Reconnaissance Imager) telescope and the Subaru Telescope, which might reveal that they actually do not belong to a different class in terms of composition.

“The reason we’re using Subaru is its Hyper Suprime-Cam, which has a really wide field of vision,” New Horizons researcher Wesley Fraser, who led the study, told Ars Technica (the results are soon to be published in the Planetary Science Journal). “The camera can go deep and wide quickly, and we stare down the pipe of LORRI, looking down that trajectory to find anything nearby.”

These objects are near the edge of the heliosphere of the Solar System, where it transitions to interstellar space. The heliosphere is formed by the outflow of charged particles, or solar wind, that creates something of a bubble around our Solar System; combined with the Sun’s magnetic field, this protects us from outside cosmic radiation.

The new objects are located where the strength of the Sun’s magnetic field starts to break down. They might even be far enough for their orbits to occasionally take them beyond the heliosphere, where they will be pummeled by intense cosmic radiation from the interstellar medium. This, combined with their solar wind exposure, might affect their composition, making it different from that of closer Kuiper Belt objects.

Even though it is impossible to know what these objects are like up close for now, how can we think of them? Fraser has an idea.

“If I had to guess, they are probably red and dark and devoid of water ice on the surface, which is quite common in the Kuiper Belt,” he said. “I think these objects will look a lot like the dwarf planet Sedna, but it’s possible they will look even more unusual.”

Many Kuiper Belt objects are a deep reddish color as a result of their organic chemicals being exposed to cosmic radiation. This breaks the hydrogen bonds in those chemicals, releasing much of the hydrogen into space and leaving behind an amorphous organic sludge that keeps getting redder the longer it is irradiated.

Fraser also predicts these objects are lacking in surface water ice because more distant Kuiper Belt objects (though not nearly as far-flung as the newly discovered ones) have not shown signs of it in observations. While water ice is common in the Kuiper Belt, he thinks these objects are probably hiding water ice underneath their red exterior.

Emerging from the dark

Investigating objects like this could change views on the origins of the Solar System and how it compares to the exosolar systems we have observed. Is our Solar System even normal?

Because the Kuiper Belt was thought to end at a distance of about 48 AU, the Solar System used to seem small compared to exosolar systems, where there are still objects floating around 150 AU from their star. The detection of objects at up to 80 AU from the Sun has put the Solar System in more of a normal range. It also seems to suggest that, since it is larger than we thought, that it also formed in a larger nebula.

“The timeline for Solar System formation is what we have to work out, and looking at the Kuiper Belt sets the stage for that very earliest moment, when gas and dust start to coalesce into macroscopic objects,” said New Horizons researcher Marc Buie. Buie discovered the object Arrokoth and led another study recently published in The Planetary Science Journal.

Arrokoth itself altered ideas about planet formation since its two lobes appear to have gently stuck together instead of crashing into each other in a violent collision, as some of our ideas had assumed. Nothing like it has ever been observed before or since.

Dust to dust

There is another potential thing that the New Horizons team is watching out for, and that is whether the new objects are binary.

About 10 to 15 percent of all known Kuiper Belt objects orbit partners in binary systems, and Fraser thinks binarity can reveal many things about the formation of planetesimals, solid objects that form in a young star system through gentle mergers with other objects that cause them to stick together. Some of these objects can become gravitationally bound to each other and form binaries.

As New Horizons travels farther, its dust counter, which sends back information about the velocity and mass of dust that hits it, shows that the amount of dust in its surroundings has not gone down. This dust comes from objects running into each other.

“It’s been finding that, as we go farther and farther out, the Solar System is getting dustier and dustier, which is exactly the opposite of what is expected at that distance,” New Horizons Principal Investigator Alan Stern told Ars Technica. “There might be a massive population of bodies colliding out there.”

NASA had previously decided that it was unlikely New Horizons would be able to pull off another Kuiper Belt object flyby like it did with Arrokoth, so the mission’s focus shifted to the heliosphere. Now that the New Horizons team has found unexpected objects this distant with the help of the Subaru Telescope, and dust keeps being detected as the spacecraft travels farther out, there might be an opportunity for another flyby. Stern is still cautious about the chances of that.

“We’re going to see how they compare to closer Kuiper Belt objects, but if we can find one we can get close to, we’ll get a chance to really compare their geology and their mode of origin,” Stern said. “But that’s a longshot because we’re running on a tenth of a tank of gas.”

The advantage of using Subaru combined with LORRI is that LORRI can be pointed sideways to see objects, or at least slightly past them, at right angles. This will be the dream team of telescopes if New Horizons can approach at least one of the new objects. If an object is behind the spacecraft, combining observations from different angles gives information about the physical surface of an object.

Using the Nancy Grace Roman Telescope could yield even more surprising observations in the future. It has a smaller mirror and a very wide field of view, Stern likens it to space binoculars, and it only has to be pointed at a target region once or twice (in comparison to hundreds of times for the James Webb Space Telescope) to search for and possibly discover objects in an extremely vast expanse of sky. Most other telescopes would have to be pointed thousands of times to do that.

“The desperate hope for all of us is that we will find more flyby targets,” Buie said. “If we could just get an object to register as a couple of pixels on LORRI, that would be incredible.”

Just a note to you on some stuff that’s going on in the background here. About a year ago, NASA decided that another KBO flyby was really unlikely, so they switched the mission focus to heliophysics (i.e., the edge of the heliosphere). Stern tried to fight that, and he has really looked to keep the focus on KBOs, which NASA now considers a “if we find one it can image, it will” situation. So I think a lot of his phrasing is in keeping with what he wants—more flybys. But it’s our job to give an accurate picture, which is that this event is unlikely.

Photo of Elizabeth Rayne

Elizabeth Rayne is a creature who writes. Her work has appeared on SYFY WIRE, Space.com, Live Science, Grunge, Den of Geek, and Forbidden Futures. She lurks right outside New York City with her parrot, Lestat. When not writing, she is either shapeshifting, drawing, or cosplaying as a character nobody has ever heard of. Follow her on Threads and Instagram @quothravenrayne.

New Kuiper Belt objects lurk farther away than we ever thought Read More »

x-fails-to-avoid-australia-child-safety-fine-by-arguing-twitter-doesn’t-exist

X fails to avoid Australia child safety fine by arguing Twitter doesn’t exist

“I cannot accept this evidence without a much better explanation of Mr. Bogatz’s path of reasoning,” Wheelahan wrote.

Wheelahan emphasized that the Nevada merger law specifically stipulated that “all debts, liabilities, obligations and duties of the Company shall thenceforth remain with or be attached to, as the case may be, the Acquiror and may be enforced against it to the same extent as if it had incurred or contracted all such debts, liabilities, obligations, and duties.” And Bogatz’s testimony failed to “grapple with the significance” of this, Wheelahan said.

Overall, Wheelahan considered Bogatz’s testimony on X’s merger-acquired liabilities “strained,” while deeming the government’s US merger law expert Alexander Pyle to be “honest and ready to make appropriate concessions,” even while some of his testimony was “not of assistance.”

Luckily, it seemed that Wheelahan had no trouble drawing his own conclusion after analyzing Nevada’s merger law.

“I find that a Nevada court would likely hold that the word ‘liabilities'” in the merger law “is broad enough on its proper construction under Nevada law to encompass non-pecuniary liabilities, such as the obligation to respond to the reporting notice,” Wheelahan wrote. “X Corp has therefore failed to show that it was not required to respond to the reporting notice.”

Because X “failed on all its claims,” the social media company must cover costs from the appeal, and X’s costs in fighting the initial fine will seemingly only increase from here.

Fighting fine likely to more than double X costs

In a press release celebrating the ruling, eSafety Commissioner Julie Inman Grant criticized X’s attempt to use the merger to avoid complying with Australia’s Online Safety Act.

X fails to avoid Australia child safety fine by arguing Twitter doesn’t exist Read More »

google-as-darth-vader:-why-ia-writer-quit-the-android-app-market

Google as Darth Vader: Why iA Writer quit the Android app market

“Picture a massive football stadium filled with fans month after month,” Reichenstein wrote to Ars. In that stadium, he writes:

  • 5 percent (max) have a two-week trial ticket
  • 2 percent have a yearly ticket
  • 0.5 percent have a monthly ticket
  • 0.5 percent are buying “all-time” tickets

But even if every lifetime ticket buyer showed up at once, that’s 10 percent of the stadium, Reichenstein said. Even without full visibility of every APK—”and what is happening in China at all,” he wrote—iA can assume 90 percent of users are “climbing over the fence.”

“Long story short, that’s how you can end up with 50,000 users and only 1,000 paying you,” Reichenstein wrote in the blog post.

Piracy doesn’t just mean lost revenue, Reichenstein wrote, but also increased demands for support, feature requests, and chances for bad ratings from people who never pay. And it builds over time. “You sell less apps through the [Play Store], but pirated users keep coming in because pirate sites don’t have such reviews. Reviews don’t matter much if the app is free.”

The iA numbers on macOS hint at a roughly 10 percent piracy rate. On iOS, it’s “not 0%,” but it’s “very, very hard to say what the numbers are”; there is also no “reset trick” or trials offered there.

A possible future unfreezing

Reichenstein wrote in the post and to Ars that sharing these kinds of numbers can invite critique from other app developers, both armchair and experienced. He’s seen that happening on Mastodon, Hacker News, and X (formerly Twitter). But “critical people are useful,” he noted, and he’s OK with people working backward to figure out how much iA might have made. (Google did not offer comment on aspects of iA’s post outside discussing Drive access policy.)

iA suggests that it might bring back Writer on Android, perhaps in a business-to-business scenario with direct payments. For now, it’s a slab of history, albeit far less valuable to the metaphorical Darth Vader that froze it.

Google as Darth Vader: Why iA Writer quit the Android app market Read More »

the-hyundai-ioniq-5-will-be-the-next-waymo-robotaxi

The Hyundai Ioniq 5 will be the next Waymo robotaxi

Waymo’s robotaxis are going to get a lot more angular in the future. Today, the autonomous driving startup and Hyundai announced that they have formed a strategic partnership, and the first product will be the integration of Waymo’s autonomous vehicle software and hardware with the Hyundai Ioniq 5.

“Hyundai and Waymo share a vision to improve the safety, efficiency, and convenience of how people move,” said José Muñoz, president and global COO of Hyundai Motor Company.

“We are thrilled to partner with Hyundai as we further our mission to be the world’s most trusted driver,” said Waymo’s co-CEO Tekedra Mawakana. “Hyundai’s focus on sustainability and strong electric vehicle roadmap makes them a great partner for us as we bring our fully autonomous service to more riders in more places.”

Now, this doesn’t mean you’ll be able to buy a driverless Ioniq 5 from your local Hyundai dealer; Waymo will operate these Ioniq 5s as part of its ride-hailing Waymo One fleet, which currently operates in parts of Austin, Texas; Los Angeles; Phoenix; and San Francisco. Currently, Waymo operates a fleet of Jaguar I-Pace EVs and has also used Chrysler Pacifica minivans.

The Hyundai Ioniq 5 will be the next Waymo robotaxi Read More »

rocket-report:-falcon-9-second-stage-stumbles;-japanese-rocket-nears-the-end

Rocket Report: Falcon 9 second stage stumbles; Japanese rocket nears the end


“I’m pretty darn confident I’m going to have a good day on Friday.”

United Launch Alliance’s Vulcan rocket sits on the pad at Space Launch Complex-41 (at Cape Canaveral at sunset in advance of the Cert-2 flight test. Credit: United Launch Alliance

Welcome to Edition 7.14 of the Rocket Report! For readers who don’t know, my second book was published last week. It’s titled Reentry, and tells the story behind the story of SpaceX’s development of the Falcon 9 rocket and Dragon spacecraft. The early reviews are great, and it made USA Today’s bestseller list this week. If you’re interested in rockets, and since you’re reading this newsletter we already know the answer to that, the book is probably up your alley.

As always, we welcome reader submissions, and if you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets as well as a quick look ahead at the next three launches on the calendar.

Vega C cleared for next launch in November. Italian rocket firm Avio successfully tested a redesigned Zefiro-40 solid rocket motor for the second time on Thursday, the European Space Agency said. This second firing follows an initial firing test of the motor in May 2024 and concludes the qualification tests for the new engine nozzle design of the Zefiro-40. This rocket motor powers the second stage of the Vega C rocket.

Flight three almost ready … The redesign of the motor was necessitated by the failure of a Vega C rocket in December 2022, which was just the second flight of the launch vehicle. Then, in June 2023, a test to re-certify the motor for flight also failed. Now that the second-stage issue appears to be resolved, Vega C is on the launch calendar for November of this year, although there’s the possibility the third mission of the rocket could slip a bit further. The rocket will be carrying the Sentinel-1C satellite to Sun-synchronous orbit. (submitted by EllPeaTea and Ken the Bin)

Impulse Space raises $150 million. Los Angeles-based space startup Impulse Space, which is led by renowned rocket scientist Tom Mueller, has raised $150 million in a new fundraising round led by venture capital firm Founders Fund, CNBC reports. Impulse is scaling a product line of orbital transfer vehicles, and so far is building two, the smaller Mira and the larger Helios. While rockets get satellites and payloads into orbit, like an airplane carrying passengers to a metro area, space tugs deliver them to specific destinations, like taxis taking those passengers home from the airport.

Taking the next step after launch … Mueller, who founded Impulse Space three years ago, said the funds will fuel growth of the company. “This means that we’re sufficiently funded through the development of Helios and the upgraded version Mira and out past the first flights of both of these products,” Mueller told the publication. Impulse flew its first mission, called LEO Express-1, with a Mira vehicle carrying and deploying a small satellite, last November. In Mueller’s view, while SpaceX reduced the cost to launch mass to orbit, the in-space delivery systems on the market are lacking. (submitted by Tom Nelson and Ken the Bin)

The easiest way to keep up with Eric Berger’s space reporting is to sign up for his newsletter, we’ll collect his stories in your inbox.

Polish company receives ESA support. Did you know there is a launch startup in Poland? Until this week, I confess I did not. However, that changed when the European Space Agency awarded 2.4 million euros to Poland’s SpaceForest for further development of its Perun rocket. SpaceForest has developed an 11.5-meter-tall sounding rocket capable of carrying payloads of up to 50 kilograms to an altitude of 150 kilometers, European Spaceflight reports.

Boosting up commercial companies … To date, the company has completed two test flights, one reaching an altitude of 22 kilometers and another topping out at 13 kilometers. With the new funding from ESA, SpaceForest will implement upgrades to the combustion chamber of its in-house developed SF1000 paraffin-powered hybrid rocket engine. ESA awarded the funding as part of the agency’s Boost! initiative. Adopted by member states in 2019, Boost! aims to foster the development of new commercial space transportation services. (submitted by Ken the Bin and EllPeaTea)

A new take on a kinetic launch system. Longshot Space is developing a straight-line kinetic launch system that will gradually accelerate payloads to hypersonic speeds before launching them to orbit, TechCrunch reports. The startup is betting it can achieve very, very low costs to orbit compared to a rocket, possibly as low as $10 per kilogram. The company raised $1.5 million in a pre-seed round in April 2023 and now, nearly 18 months later, Longshot closed a little over $5 million in combined venture capital and funding from the US Air Force’s TACFI program.

Pulling some serious Gs … The new capital will be used to build a large, 500-meter-long gun in the Nevada desert to push 100-kilogram payloads to Mach 5. The system has to be so long in order to keep acceleration forces low, which is better for both the vehicle and payload. For eventual space missions, Longshot is aiming to keep the maximum gravitational forces to 500–600 times the force of gravity. The company’s name serves a dual purpose, as its technology requires a longshot to reach space, and its prospects for success are probably a longshot. Nevertheless, it’s great to see someone trying new ideas. (submitted by Ken the Bin)

Falcon 9 rocket upper stage misfires. SpaceX is investigating a problem with the Falcon 9 rocket’s upper stage that caused it to reenter the atmosphere and fall into the sea outside of its intended disposal area after a launch last Saturday with a two-person crew heading to the International Space Station, Ars reports. The upper stage malfunction occurred after the Falcon 9 successfully deployed SpaceX’s Crew Dragon spacecraft carrying NASA astronaut Nick Hague and Russian cosmonaut Aleksandr Gorbunov on SpaceX’s Crew-9 mission. Hague and Gorbunov safely arrived at the space station Sunday to begin a five-month stay at the orbiting research complex.

Returning to flight shortly? … Safety warnings issued to mariners and pilots before the launch indicated the Falcon 9’s upper stage was supposed to fall somewhere in a narrow band stretching from southwest to northeast in the South Pacific east of New Zealand. Most of the rocket was expected to burn up during reentry, but SpaceX targeted a remote part of the ocean for disposal because some debris was likely to survive and reach the sea. This is the third time SpaceX has grounded the Falcon 9 rocket in less than three months, ending a remarkable run of flawless launches. A return to flight is expected as early as October 7 with the European Space Agency’s Hera spacecraft.

New Zealand seeks to reduce rocket regulations. New Zealand plans to implement a new “red tape-cutting” strategy for space and aviation by the end of 2025, the New Zealand Herald reports. “We have committed to having a world-class regulatory environment by the end of 2025,” Space Minister Judith Collins told the NZ Aerospace Summit recently. “To do that we’re introducing a light-touch regulatory approach that will significantly free up innovators to test their technology and ideas.”

Kiwis have a different attitude … The goal of reducing regulations is to allow companies to focus more on innovation and less on paperwork. New Zealand officials are motivated by concerns that Australia may seek to lure some of its space and aviation industries. Among the space companies with a significant presence in New Zealand are Rocket Lab, Dawn Aerospace, as well as smaller firms such as Astrix Astronautics. The move comes as US-based firms such as SpaceX, Varda, and others are pushing the country’s launch regulator, the Federal Aviation Administration, to be more nimble.

H2A nears the end of the road. Japan launched the classified IGS-Radar 8 satellite early Thursday with the second-to-last H-2A rocket, Space News reports. Developed and operated by Mitsubishi Heavy Industries, the H-2A rocket debuted in 2001 and has flown 49 times with a single failure, suffered in 2003. It has been a reliable medium-lift launch vehicle for Japan’s national space interests, as well as a handful of commercial space customers.

The rocket’s 50th launch will be its last … The final H-2A core stage is now completed and is scheduled for shipment to the Tanegashima Space Center. That launch, expected in late 2024, will carry the Global Observing SATellite for Greenhouse gases and Water cycle satellite. The H3 will succeed the H-2A. The new generation H3 had a troubled start, with its first flight in March 2023 suffering a second-stage engine failure. However, the new rocket has since flown successfully twice. (submitted by Ken the Bin)

Russians can invest in SpaceX now? Da. One of the odder stories this week concerns a Russian broker apparently offering access to privately held shares of SpaceX. An article in the Russian newspaper Kommersant suggests that a Moscow-based financial services company, Finam Holdings, managed to purchase a number of shares from a large foreign investment fund. The article says the minimum investment for Russians interested in buying into SpaceX is $10,000.

On bonds and broomsticks … Honestly, I have no idea about the legality of all this, but it sure smells funny. SpaceX, of course, periodically sells shares of the privately held company to investors. In addition, employees who receive shares in the company can sometimes sell their holdings. Given the existing sanctions on Russia due to the war on Ukraine and the potential for additional sanctions, it seems like these shareholders are definitely taking some risk.

ULA chief “supremely confident” in Vulcan’s second launch. The second flight of United Launch Alliance’s Vulcan rocket, planned for Friday morning, has a primary goal of validating the launcher’s reliability for delivering critical US military satellites to orbit. Tory Bruno, ULA’s chief executive, told reporters Wednesday that he is “supremely confident” the Vulcan rocket will succeed in accomplishing that objective, Ars reports. “As I come up on Cert-2, I’m pretty darn confident I’m going to have a good day on Friday, knock on wood,” Bruno said. “These are very powerful, complicated machines.”

A lengthy manifest to fly … The Vulcan launcher, a replacement for ULA’s Atlas V and Delta IV rockets, is on contract to haul the majority of the US military’s most expensive national security satellites into orbit over the next several years. If Friday’s test flight goes well, ULA is on track to launch at least one—and perhaps two—operational missions for the Space Force by the end of this year. The Space Force has already booked 25 launches on ULA’s Vulcan rocket for military payloads and spy satellites for the National Reconnaissance Office. Including the launch Friday, ULA has 70 Vulcan rockets in its backlog, mostly for the Space Force, the NRO, and Amazon’s Kuiper satellite broadband network.

NASA’s mobile launcher is on the move. NASA’s Exploration Ground Systems Program at Kennedy Space Center in Florida began moving the mobile launcher 1 from Launch Complex 39B along a 4.2-mile stretch back to the Vehicle Assembly Building this week. First motion of the mobile launcher, atop NASA’s crawler-transporter 2, occurred early on the morning of October 3, the space agency confirmed. Teams rolled the mobile launcher out to Kennedy’s Pad 39B in August 2023 for upgrades and a series of ground demonstration tests in preparation for the Artemis II mission.

Stacking operations when? … After arriving outside the Vehicle Assembly Building later on Thursday, the launch tower will be moved into High Bay 3 on Friday. This is all in preparation for stacking the Space Launch System rocket for the Artemis II mission, which is nominally scheduled for September 2025 but may slip further. NASA has not publicly said when stacking operations will begin, and this depends on when the space agency makes a final decision on whether to fly the Orion spacecraft with its heat shield as-is or adopt a different plan. Stacking will take several months.

Next three launches

Oct. 4: Vulcan | Cert-2 mission | Cape Canaveral Space Force Station, Florida | 10: 00 UTC

Oct. 7: Falcon 9 | Hera | Cape Canaveral Space Force Station, Florida | 14: 52 UTC

Oct. 9: Falcon 9 | OneWeb-20 | Vandenberg Space Force Base, California | 06: 03 UTC

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Rocket Report: Falcon 9 second stage stumbles; Japanese rocket nears the end Read More »

no-more-bricked-ipads:-apple-fixes-several-bugs-in-ios,-ipados,-macos-updates

No more bricked iPads: Apple fixes several bugs in iOS, iPadOS, macOS updates

On Thursday, Apple released the first software updates for its devices since last month’s rollout of iOS 18 and macOS Sequoia.

Those who’ve been following along know that several key features that didn’t make it into the initial release of iOS 18 are expected in iOS 18.1, but that’s not the update we got on Thursday.

Rather, Apple pushed out a series of smaller updates that fixed several bugs but did not add new features. The updates are labeled iOS 18.0.1, iPadOS 18.0.1, visionOS 2.0.1, macOS Sequoia 15.0.1, and watchOS 11.0.1.

Arguably, the two most important fixes come in iPadOS 18.0.1 and iOS 18.0.1. The iPad update fixes an issue that bricked a small number of recently released iPads (those running Apple’s M4 chip). That problem caused Apple to quickly pull iPadOS 18 for those devices, so Thursday’s iPadOS 18.0.1 release is actually the first time most users of those devices will be able to run iPadOS 18.

On the iPhone side, Apple says it has addressed a bug that could sometimes cause the touchscreen to fail to register users’ fingers.

No more bricked iPads: Apple fixes several bugs in iOS, iPadOS, macOS updates Read More »

uninstalled-copilot?-microsoft-will-let-you-reprogram-your-keyboard’s-copilot-key

Uninstalled Copilot? Microsoft will let you reprogram your keyboard’s Copilot key

Whether you care about Microsoft’s Copilot AI assistant or not, many new PCs introduced this year have included a dedicated Copilot key on the keyboard; this is true whether the PC meets the requirements for Microsoft’s Copilot+ PC program or not. Microsoft’s commitment to putting AI features in all its products runs so deep that the company changed the Windows keyboard for the first time in three decades.

But what happens if you don’t use Copilot regularly, or you’ve disabled or uninstalled it entirely, or if you simply don’t need to have it available at the press of a button? Microsoft is making allowances for you in a new Windows Insider Preview build in the Dev channel, which will allow the Copilot key to be reprogrammed so that it can launch more than just Copilot.

The area in Settings where you can reprogram the Copilot key in the latest Windows Insider Preview build in the Dev channel. Credit: Microsoft

There are restrictions. To appear in the menu of options in the Settings app, Microsoft says an app must be “MSIX packaged and signed, thus indicating the app meets security and privacy requirements to keep customers safe.” Generally an app installed via the Microsoft Store or apps built into Windows will meet those requirements, though apps installed from other sources may not. But you can’t make the Copilot key launch any old executable or batch file, and you can’t customize it to do anything other than launch apps (at least, not without using third-party tools for reconfiguring your keyboard).

Uninstalled Copilot? Microsoft will let you reprogram your keyboard’s Copilot key Read More »

automattic-demanded-web-host-pay-$32m-annually-for-using-wordpress-trademark

Automattic demanded web host pay $32M annually for using WordPress trademark


Automattic founder Matt Mullenweg called WP Engine “a cancer to WordPress.”

Matt Mullenweg of WordPress company Automattic sits in front of a laptop adorned with a WordPress logo.

Automattic founder and WordPress co-author Matt Mullenweg in San Francisco on July 24, 2013.

Automattic Inc. and its founder have been sued by a WordPress hosting company that alleges an extortion scheme to extract payments for use of the trademark for the open source WordPress software. Hosting firm WP Engine sued Automattic and founder Matt Mullenweg in a complaint filed yesterday in US District Court for the Northern District of California.

“This is a case about abuse of power, extortion, and greed,” the lawsuit said. “The misconduct at issue here is all the more shocking because it occurred in an unexpected place—the WordPress open source software community built on promises of the freedom to build, run, change, and redistribute without barriers or constraints, for all.”

The lawsuit alleged that “over the last two weeks, Defendants have been carrying out a scheme to ban WPE from the WordPress community unless it agreed to pay tens of millions of dollars to Automattic for a purported trademark license that WPE does not even need.”

The complaint says that Mullenweg blocked WP Engine “from updating the WordPress plugins that it publishes through wordpress.org,” and “withdrew login credentials for individual employees at WPE, preventing them from logging into their personal accounts to access other wordpress.org resources, including the community Slack channels which are used to coordinate contributions to WordPress Core, the Trac system which allows contributors to propose work to do on WordPress, and the SubVersion system that manages code contributions.”

The lawsuit makes accusations, including libel, slander, and attempted extortion, and demands a jury trial. The lawsuit was filed along with an exhibit that shows Automattic’s demand for payment. A September 23 letter to WP Engine from Automattic’s legal team suggests “a mere 8% royalty” on WP Engine’s roughly $400 million in annual revenue, or about $32 million.

“WP Engine’s unauthorized use of our Client’s trademarks… has enabled WP Engine to unfairly compete with our Client, leading to WP Engine’s unjust enrichment,” Automattic alleged in the letter.

Mullenweg: WP Engine “a cancer to WordPress”

Mullenweg co-authored the WordPress software first released in 2003 and founded Automattic in 2005. Automattic operates the WordPress-based publishing platform WordPress.com. Meanwhile, the nonprofit WordPress Foundation, also founded by Mullenweg, says it works “to ensure free access, in perpetuity, to the software projects we support.”

Last month, Mullenweg wrote a blog post alleging that WP Engine is “a cancer to WordPress” and that it provides “something that they’ve chopped up, hacked, butchered to look like WordPress, but actually they’re giving you a cheap knock-off and charging you more for it.”

Mullenweg criticized WP Engine’s decision to disable the WordPress revision management system. WP Engine’s “branding, marketing, advertising, and entire promise to customers is that they’re giving you WordPress, but they’re not,” Mullenweg wrote. “And they’re profiting off of the confusion. WP Engine needs a trademark license to continue their business.”

In another blog post and a speech at a WordPress conference, Mullenweg alleged that WP Engine doesn’t contribute much to the open source project. He also pointed to WP Engine’s funding from private equity firm Silver Lake, writing that “Silver Lake doesn’t give a dang about your Open Source ideals. It just wants a return on capital.”

WP Engine alleges broken promises

WP Engine’s lawsuit points to promises made by Mullenweg and Automattic nearly 15 years ago. “In 2010, in response to mounting public concern, the WordPress source code and trademarks were placed into the nonprofit WordPress Foundation (which Mullenweg created), with Mullenweg and Automattic making sweeping promises of open access for all,” the lawsuit said.

Mullenweg wrote at the time that “Automattic has transferred the WordPress trademark to the WordPress Foundation, the nonprofit dedicated to promoting and ensuring access to WordPress and related open source projects in perpetuity. This means that the most central piece of WordPress’s identity, its name, is now fully independent from any company.”

WP Engine alleges that Automattic and Mullenweg did not disclose “that while they were publicly touting their purported good deed of moving this intellectual property away from a private company, and into the safe hands of a nonprofit, Defendants in fact had quietly transferred irrevocable, exclusive, royalty-free rights in the WordPress trademarks right back to Automattic that very same day in 2010. This meant that far from being ‘independent of any company’ as Defendants had promised, control over the WordPress trademarks effectively never left Automattic’s hands.”

WP Engine accuses the defendants of “misusing these trademarks for their own financial gain and to the detriment of the community members.” WP Engine said it was founded in 2010 and relied on the promises made by Automattic and Mullenweg. “WPE is a true champion of WordPress, devoting its entire business to WordPress over other similar open source platforms,” the lawsuit said.

Firm defends “fair use” of WordPress trademark

The defendants’ demand that WP Engine pay tens of millions of dollars for a trademark license “came without warning” and “gave WPE less than 48 hours to either agree to pay them off or face the consequences of being banned and publicly smeared,” according to the lawsuit. WP Engine pointed to Mullenweg’s “cancer” remark and other actions, writing:

When WPE did not capitulate, Defendants carried out their threats, unleashing a self-described “nuclear” war against WPE. That war involved defaming WPE in public presentations, directly sending disparaging and inflammatory messages into WPE customers’ software and through the Internet, threatening WPE’s CEO and one of its board members, publicly encouraging WPE’s customers to take their business to Automattic’s competing service providers (for a discounted fee, no less), and ultimately blocking WPE and its customers from accessing the wordpress.org portal and wordpress.org servers. By blocking access to wordpress.org, Defendants have prevented WPE from accessing a host of functionality typically available to the WordPress community on wordpress.org.

During calls on September 17 and 19, “Automattic CFO Mark Davies told a WPE board member that Automattic would ‘go to war’ if WPE did not agree to pay its competitor Automattic a significant percentage of WPE’s gross revenues—tens of millions of dollars—on an ongoing basis,” the lawsuit said. WP Engine says it doesn’t need a license to use the WordPress trademark “and had no reasonable expectation that Automattic had a right to demand money for use of a trademark owned by the separate nonprofit WordPress Foundation.”

“WPE’s nominative uses of those marks to refer to the open-source software platform and plugin used for its clients’ websites are fair uses under settled trademark law, and they are consistent with WordPress’ own guidelines and the practices of nearly all businesses in this space,” the lawsuit said.

Automattic alleged “widespread unlicensed use”

Exhibit A in the lawsuit includes a letter to WP Engine CEO Heather Brunner from a trademark lawyer representing Automattic and a subsidiary, WooCommerce, which makes a plugin for WordPress.

“As you know, our Client owns all intellectual property rights globally in and to the world-famous WOOCOMMERCE and WOO trademarks; and the exclusive commercial rights from the WordPress Foundation to use, enforce, and sublicense the world-famous WORDPRESS trademark, among others, and all other associated intellectual property rights,” the letter said.

The letter alleged that “your blatant and widespread unlicensed use of our Client’s trademarks has infringed our Client’s rights and confused consumers into believing, falsely, that WP Engine is authorized, endorsed, or sponsored by, or otherwise affiliated or associated with, our Client.” It also alleged that “WP Engine’s entire business model is predicated on using our Client’s trademarks… to mislead consumers into believing there is an association between WP Engine and Automattic.”

The letter threatened a lawsuit, saying that Automattic “is entitled to file civil litigation to obtain an injunction and an award of actual damages, a disgorgement of your profits, and our Client’s costs and fees.” The letter demands an accounting of WP Engine’s profits, saying that “even a mere 8% royalty on WP Engine’s $400+ million in annual revenue equates to more than $32 million in annual lost licensing revenue for our Client.”

WP Engine’s lawsuit asks the court for a “judgment declaring that Plaintiff does not infringe or dilute any enforceable, valid trademark rights owned by the Defendants.” It also seeks compensatory and punitive damages.

We contacted Automattic about the lawsuit today and will update this article if it provides a response.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Automattic demanded web host pay $32M annually for using WordPress trademark Read More »

microsoft’s-new-“copilot-vision”-ai-experiment-can-see-what-you-browse

Microsoft’s new “Copilot Vision” AI experiment can see what you browse

On Monday, Microsoft unveiled updates to its consumer AI assistant Copilot, introducing two new experimental features for a limited group of $20/month Copilot Pro subscribers: Copilot Labs and Copilot Vision. Labs integrates OpenAI’s latest o1 “reasoning” model, and Vision allows Copilot to see what you’re browsing in Edge.

Microsoft says Copilot Labs will serve as a testing ground for Microsoft’s latest AI tools before they see wider release. The company describes it as offering “a glimpse into ‘work-in-progress’ projects.” The first feature available in Labs is called “Think Deeper,” and it uses step-by-step processing to solve more complex problems than the regular Copilot. Think Deeper is Microsoft’s version of OpenAI’s new o1-preview and o1-mini AI models, and it has so far rolled out to some Copilot Pro users in Australia, Canada, New Zealand, the UK, and the US.

Copilot Vision is an entirely different beast. The new feature aims to give the AI assistant a visual window into what you’re doing within the Microsoft Edge browser. When enabled, Copilot can “understand the page you’re viewing and answer questions about its content,” according to Microsoft.

Microsoft’s Copilot Vision promo video.

The company positions Copilot Vision as a way to provide more natural interactions and task assistance beyond text-based prompts, but it will likely raise privacy concerns. As a result, Microsoft says that Copilot Vision is entirely opt-in and that no audio, images, text, or conversations from Vision will be stored or used for training. The company is also initially limiting Vision’s use to a pre-approved list of websites, blocking it on paywalled and sensitive content.

The rollout of these features appears gradual, with Microsoft noting that it wants to balance “pioneering features and a deep sense of responsibility.” The company said it will be “listening carefully” to user feedback as it expands access to the new capabilities. Microsoft has not provided a timeline for wider availability of either feature.

Mustafa Suleyman, chief executive of Microsoft AI, told Reuters that he sees Copilot as an “ever-present confidant” that could potentially learn from users’ various Microsoft-connected devices and documents, with permission. He also mentioned that Microsoft co-founder Bill Gates has shown particular interest in Copilot’s potential to read and parse emails.

But judging by the visceral reaction to Microsoft’s Recall feature, which keeps a record of everything you do on your PC so an AI model can recall it later, privacy-sensitive users may not appreciate having an AI assistant monitor their activities—especially if those features send user data to the cloud for processing.

Microsoft’s new “Copilot Vision” AI experiment can see what you browse Read More »

ai-#84:-better-than-a-podcast

AI #84: Better Than a Podcast

Andrej Karpathy continues to be a big fan of NotebookLM, especially its podcast creation feature. There is something deeply alien to me about this proposed way of consuming information, but I probably shouldn’t knock it (too much) until I try it?

Others are fans as well.

Carlos Perez: Google with NotebookLM may have accidentally stumbled upon an entirely new way of interacting with AI. Its original purpose was to summarize literature. But one unexpected benefit is when it’s used to talk about your expressions (i.e., conversations or lectures). This is when you discover the insight of multiple interpretations! Don’t just render a summary one time; have it do so several times. You’ll then realize how different interpretations emerge, often in unexpected ways.

Delip Rao gives the engine two words repeated over and over, the AI podcast hosts describe what it says about the meaning of art for ten minutes.

So I figured: What could be a better test than generating a podcast out of this post (without the question, results or reaction)?

I tried to do that, deselecting the other AI posts and going to town. This was the result. Unfortunately, after listening it seems I learned that deselecting posts from a notebook doesn’t seem to take them out of the info used for podcast generation, so that was more of an overall take on AIs ~40-64 plus the latest one.

In some ways I was impressed. The host voices and cadences are great, there were no mistakes, absurdities or factual errors, everything was smooth. In terms of being an actual substitute? Yeah, no. It did give me a good idea of which ideas are coming across ‘too well’ and taking up too much mindspace, especially things like ‘sci-fi.’ I did like that it led with OpenAI issues, and it did a halfway decent job with the parts it did discuss. But this was not information dense at all, and no way to get informed.

I then tried again with a fresh notebook, to ensure I was giving it only AI #84, which then started off with OpenAI’s voice mode as one would expect. This was better, because it got to a bunch of specifics, which kept it on target. If you do use the podcast feature I’d feed it relatively small input chunks. This still seemed like six minutes, not to do Hamlet, but to convey maybe two minutes of quick summary reading. Also it did make an important error that highlighted a place I needed to make my wording clearer – saying OpenAI became a B-corp rather than that it’s going to try and become one.

There were indeed, as usual, many things it was trying to summarize. OpenAI had its dev day products. Several good long posts or threads laid out arguments, so I’ve reproduced them here in full partly for reference. There was a detailed proposal called A Narrow Path. And there’s the usual assortment of other stuff, as well.

  1. Introduction. Better than a podcast.

  2. Table of Contents.

  3. Language Models Offer Mundane Utility. You’d love to see it.

  4. Language Models Don’t Offer Mundane Utility. Let’s see what happens.

  5. Copyright Confrontation. Zuck to content creators: Drop dead.

  6. Deepfaketown and Botpocalypse Soon. A word of bots, another for the humans.

  7. They Took Our Jobs. Software engineers, now super productive, our price cheap.

  8. The Art of the Jailbreak. Encode it in a math puzzle.

  9. Get Involved. UK AISI is hiring three societal impacts workstream leads.

  10. Introducing. AlphaChip, for designing better AI chips, what me worry?

  11. OpenAI Dev Day. Advanced voice for everyone, if you want it enough.

  12. In Other AI News. Anthropic revenue is on the rise.

  13. The Mask Comes Off. The man who sold the world… hopefully to himself.

  14. Quiet Speculations. Perplexity for shopping?

  15. The Quest for Sane Regulations. The suggestion box is open.

  16. The Week in Audio. Peter Thiel has some out there ideas, yo.

  17. Rhetorical Innovation. Would you go for it, or just let it slip until you’re ready?

  18. Remember Who Marc Andreessen Is. For reference, so I can remind people later.

  19. A Narrow Path. If it would kill everyone, then don’t let anyone ing build it.

  20. Aligning a Smarter Intelligence is Difficult. Pondering what I’m pondering?

  21. The Wit and Wisdom of Sam Altman. To do, or not to do? To think hard about it?

  22. The Lighter Side. 10/10, no notes.

OpenAI’s Advanced Voice Mode is really enjoying itself.

It enjoys itself more here: Ebonics mode activated.

Sarah Constantin requests AI applications she’d like to see. Some very cool ideas in here, including various forms of automatic online content filtering and labeling. I’m very tempted to do versions of some of these myself when I can find the time, especially the idea of automatic classification of feeds into worthwhile versus not. As always, the key is that if you are going to use it on content you would otherwise need to monitor fully, hitting false negatives is very bad. But if you could aim it at sources you would otherwise be okay missing, then you can take a hits-based approach.

Llama 3.2 ‘not available for download’ in the EU, unclear exactly which regulatory concern or necessary approval is the bottleneck. This could be an issue for law-abiding corporations looking to use Llama 3.2. But of course, if an individual wants to download and use it in the EU, and is competent enough that this is a good idea, I am confident they can figure out how to do that.

The current status of AI agents:

Buck Shlegeris: I asked my LLM agent (a wrapper around Claude that lets it run bash commands and see their outputs):

>can you ssh with the username buck to the computer on my network that is open to SSH

because I didn’t know the local IP of my desktop. I walked away and promptly forgot I’d spun up the agent. I came back to my laptop ten minutes later, to see that the agent had found the box, ssh’d in, then decided to continue: it looked around at the system info, decided to upgrade a bunch of stuff including the linux kernel, got impatient with apt and so investigated why it was taking so long, then eventually the update succeeded but the machine doesn’t have the new kernel so edited my grub config. At this point I was amused enough to just let it continue. Unfortunately, the computer no longer boots.

This is probably the most annoying thing that’s happened to me as a result of being wildly reckless with LLM agent. [logs here]

Buck was, of course, very much asking for it, and essentially chose to let this happen. One should still note that this type of proactive messing with things in order to get to the goal is the default behavior of such agents. Currently they’re quite bad at it.

As usual, you can’t get the utility from the AI unless you are willing to listen to it, but also we need to be careful how we score.

When measuring something called ‘diagnostic reasoning’ when given cases to diagnose GPT-4 alone (92%) did much better than doctors (73%) and also did much better than doctors plus GPT-4 (77%). So by that measure, the doctors would be much better fully out of the loop and delegating the task to GPT-4.

Ultimately, though, diagnosis is not a logic test, or a ‘match the logic we think you should use’ test. What we mostly care about is accuracy. GPT-4 had the correct diagnosis in 66% of cases, versus 62% for doctors.

My strong guess is that doctors learn various techniques that are ‘theoretically unsound’ in terms of their logic, or that take into account things that are ‘not supposed to matter’ but that do correlate with the right answer. And they’ve learned what approaches and diagnoses lead to good outcomes, rather than aiming for pure accuracy, because this is part of a greater system. That all mostly works in practice, while they get penalized heavily for it on ‘reasoning’ tests.

Indeed, this suggests that one future weakness of AIs will be if we succeed in restricting what things they can consider, actually enforcing a wide array of ‘you are not allowed to consider factor X’ rules that humans routinely pay lip service to and then ignore.

Ethan Mollick: Ok. Deleting this and reposting given the Community Note (you can see the original and Note below). The main point doesn’t change in any way, but I want to make sure I am clear in this post that the measurement was diagnostic reasoning & not final diagnoses.

A preview of the coming problem of working with AI when it starts to match or exceed human capability: Doctors were given cases to diagnose, with half getting GPT-4 access to help. The control group got 73% score in diagnostic accuracy (a measure of diagnostic reasoning) & the GPT-4 group 77%. No big difference.

But GPT-4 alone got 88%. The doctors didn’t change their opinions when working with AI.

To be clear, this doesn’t say AI will always beat doctors – this is a narrow test. It is much more about what this means for the future. As AI models get better, and match or exceed human level performance, what happens? This is one example where it is happening, and we see the issues emerging.

Jonathan Chen (study author): Provocative result we did NOT expect. We fully expected the Doctor + GPT4 arm to do better than Doctor + “conventional” Internet resources. Flies in the face of the Fundamental Theorem of Informatics (Human + Computer is Better than Either Alone).

It is already well known that if the AI is good enough, the humans will in many settings mess up and violate the Fundamental Theorem of Informatics. It’s happened before. At some point, even when you think you know better, you’re on average wrong, and doctors are not about to fully trust an AI on diagnosis until you prove to them they should (and often not even then, but they should indeed demand that much).

Mark Zuckerberg was asked to clarify his position around content creators whose work is used to create and train commercial products, in case his prior work had made his feelings insufficiently clear.

He was happy to oblige, and wants to be clear that his message is: Fuck you.

The Verge: Meta CEO Mark Zuckerberg says there are complex copyright questions around scraping data to train Al models, but he suggests the individual work of most creators isn’t valuable enough for it to matter.

“I think individual creators or publishers tend to overestimate the value of their specific content in the grand scheme of this,” Zuckerberg said in the interview, which coincides with Meta’s annual Connect event. “My guess is that there are going to be certain partnerships that get made when content is really important and valuable.” But if creators are concerned or object, “when push comes to shove, if they demanded that we don’t use their content, then we just wouldn’t use their content. It’s not like that’s going to change the outcome of this stuff that much.”

So you’re going to give them a practical way to exercise that option, and if they say no and you don’t want to bother paying them or they ask for too much money then you won’t use their content?

Somehow I doubt that is his intention.

Levelsio predicts the social media platform endgame of bifurcation. You have free places where AIs are ubiquitous, and you have paid platforms with only humans.

Andrej Karpathy: Is it a function of whether you pay or not? We pay here and still there is a lot of bot radiation.

I’d look to improve things on OS level with a liveness certification. There were a number of comments along the lines of oh it’s too difficult and I basically disagree. A phone has a lot of sensing, history and local compute to calculate a score for “this device is used in a statistically regular way”.

Keen: seems the easiest / most reliable thing to converge to is some irl proof of personhood like worldcoin. no one likes the idea of iris scans but the fundamental idea seems correct.

I agree that some form of gatekeeping seems inevitable. We have several reasonable choices.

The most obvious is indeed payment. If you charge even a small amount, such as $10/month or perhaps far less, then one already ‘cannot simply’ deploy armies of AI slop. The tax is unfortunate, but highly affordable.

Various forms of proof of identity also work. You don’t need Worldcoin. Anything that is backed by a payment of money or a scarce identity will be fine. For example, if you require a working phone number and subscription with a major phone carrier, that seems like it would work, since faking that costs money? There are several other good alternatives.

Indeed, the only core concept is ‘you post a bond of some kind so if you misbehave there is a price.’ Any payment, either money or use of a scarce resource, will do.

I can also think of other solutions, involving using AI and other algorithms, that should reliably solve the issues involved. This all seems highly survivable, once we bring ourselves to care sufficiently. Right now, the problem isn’t so bad, but also we don’t care so much.

What about scam calls?

Qualy: my actual take about AI scam calls is that it probably won’t be a big issue, because:

– we have this prob with text already, the solution is using authenticated channels

– this authentication doesn’t have to be clever, someone giving you their phone number in person once is fine

There is also the factor of using only-known-to-you information, which in practice calls to your bank or whatever require already (not because they couldn’t in principle recognise your voice, just bc they don’t care to).

I also think this essentially applies to worries about deepfakes too, although I would have to think abt it more. In most cases someone saying “that’s not me” from their official twitter account seems good enough.

Danielle Fong: pretty sure this *isgoing to be a problem, and i wonder why these sort of instrumental evils are downplayed relative to the hypothetical existential threat post by Superintelligence. Intelligence is composed of many different components and tactic . I brought up these problems years ago, and I’m not sure I spoke to the correct office. It was basically impossible to get law enforcement to care.

I for one downplay them because if the problem does get bad we can ‘fix them in post.’ We can wait for there to be a some scam calls, then adjust to mitigate the damage, then adjust again and so on. Homeostasis should set in, the right number of scam calls is not zero.

There are obvious solutions to scam calls and deepfakes. We mostly don’t use them now because they are annoying and time consuming relative to not doing them, so they’re not worth using yet except in certain high-value situations. In those situations, we do use (often improvised and lousy) versions already.

The latest version of a common speculation on the software engineer market, which is super soft right now, taking things up a notch.

alz: reminder, the entry-level tech job market is still totally cooked, like 4.0’s from Berkeley are getting 0 job offers.

Senior PowerPoint Engineer: I say this as someone who was early to being short entry-level tech hires but at some level of quality and salary it had to make sense to hire them, even if for some lousy entry-level consulting role at Accenture. Something weird is going on.

Senior Spreadsheet Engineer: My pet theory rn is the hiring market is insanely adversarial with all the AI-generated slop going around, mostly on the applicant side. HR is just overwhelmed. And that’s on top of the post-covid slowdown and layoffs.

I think a lot of places have simply given up, sifting through the slop and resume mountain is just not worth it for entry level roles.

My other theory right now is the job market will evolve by necessity to hiring managers and HR going out to find candidates proactively. Recruiters might also be helped by/useful for this.

There is clearly an AI-fueled flooding-the-zone and faking-the-interviews application crisis. Giving up on hiring entirely seems like an extreme reaction. You can decline to fill entry-level rolls for a bit but the damage should quickly compound.

The problem should be self-limiting. If the job market gets super soft, that means there will be lots of good real candidates out there. Those candidates, knowing they are good, should be willing to send costly signals. This can mean ‘build cool things,’ it should also mean hard to fake things like the 4.0 GPA, and also being willing to travel in-person for interviews so they can’t cheat on them using AI. Recruiters with a reputation to uphold also seem promising. There are a number of other promising candidate strategies as well.

Tyler Cowen suggests granting tenure on the basis of what you contribute to major AI models. The suggested implementation is somehow even crazier than that sounds, if one were to take it the slightest bit seriously. A fun question is, if this is the right way to grant tenure, what is the tenure for, since clearly we won’t in this scenario need professors that much longer, even if the humans survive and are fine?

How long until we no longer need schools?

There are two clocks ticking here.

  1. As AI improves, the AI tutors get better.

  2. Also, as the child gets older, the relative value of the AI tutor improves.

I think that, today, an average 16 year old would learn better at home with an AI tutor than at a typical school, even if that ‘AI tutor’ was simply access to AIs like Gemini, NotebookLM, Claude and ChatGPT plus an AI coding assistant. Specialization is even better, but not required. You combine the AI with textbooks and other sources, and testing, with ability to reach a teacher or parent in a punch, and you’re good to go.

Of course, the same is true for well-motivated teens without the AI. The school was already only holding them back and now AI supercharges their independent studies.

Six years from now, I don’t see how that is even a question. Kids likely still will go to schools, but it will be a wasteful anachronism, the same way many of our current methods are, as someone once put it, ‘pre-Guttenberg.’ We will justify it with some nonsense, likely about socialization or learning discipline. It will be super dumb.

The question is, will a typical six year old, six years from now, be at a point where they can connect with the AI well enough for that to work? My presumption, given how well voice modes and multimodal with cameras are advancing, is absolutely yes, but there is some chance that kids that young will be better off in some hybrid system for a bit longer. If the kid is 10 at that point? I can’t see how the school makes any sense.

But then, the justifications for our schools have always been rather nonsensical.

A new jailbreak technique is MathPrompt, encoding harmful prompts into mathematical problems. They report a success rate of 73.6% across 13 SotA LLMs.

UK AISI is hiring three societal impacts workstream leads.

  1. Crime and social destabilization lead.

  2. Psychological and social risks lead.

  3. Systemic safety and responsible innovation lead.

AlphaChip, Google DeepMind’s AI for designing better chips with which to build smarter AIs, that they have decided for some bizarre reason should be open sourced. That would not have been my move. File under ‘it’s happening.’

ChatGPT advanced voice mode comes to the UK. The EU is still waiting.

OpenAI used Dev Day to ship new tools for developers. In advance, Altman boasted about some of the progress we’ve made over the years in decreasing order of precision.

Sam Altman: shipping a few new tools for developers today!

from last devday to this one:

*98% decrease in cost per token from GPT-4 to 4o mini

*50x increase in token volume across our systems

*excellent model intelligence progress

*(and a little bit of drama along the way)

What’s a little drama between no longer friends?

All right, let’s get more detailed.

Srinivas Narayanan (OpenAI, VP Engineering): Launched at Dev Day today

— Real time API for low latency speech applications

— Vision fine-tuning

— Model distillation

— Prompt caching

Look forward to seeing what developers build.

Sam Altman: realtime api (speech-to-speech) [here].

vision in the fine-tuning api [here].

prompt caching (50% discounts and faster processing for recently-seen input tokens) [here].

model distillation (!!) [here].

They also doubled the API rate limits on o1 to 10k per minute, matching GPT-4.

Here’s a livestream thread of Lizzie being excited. Here’s Simon Willson’s live blog.

Here’s their general purpose pricing page, as a reference.

Prompt Caching is automatic now for prompts above 1,024 tokens, offering a 50% discount for anything reused. They’re cleared after about 5-10 minutes of inactivity. This contrasts with Claude, where you have to tell it to cache but the discount you get is 90%.

Model Distillation is to help developers use o1-preview and GPT-4o outputs to help fine tune models like GPT-4o mini. It uses stored completions to build data sets, a beta of evals that run continuously while you train, and integration with fine tuning. You give it an evaluation function and a set of stored examples, they handle the rest. After the free samples in October it will cost what fine-tuning already costs. It makes a lot of sense to emphasize this, very good for business.

Vision is now available in the fine-tuning API. They claim as few as 100 images can improve performance on specific tasks, like localized street sign recognition or identifying local UI elements.

What does it mean to have a ‘realtime API’? It means exactly that, you can use an API to sculpt queries by the user while they’re talking in voice mode. The intent is to let you build something like ChatGPT’s Advanced Voice Mode within your own app, and not requiring stringing together different tools for handling inputs and outputs.

They provided a demo of an AI agent making a phone call on your behalf, and in theory (the other end of the call was the person on stage) spending almost $1500 to buy chocolate covered strawberries. This was very much easy mode on every level. We should on many levels be impressed it can do this at all, but we’ve seen enough elsewhere that this much is no surprise. Also note that even in the demo there was an important hitch. The AI was not told how to pay, and jumped to saying it would pay for the full order in cash without confirming that. So there’s definitely some kinks.

The first thing I saw someone else build was called Live Roleplays, an offering from Speak to help with language learning, which OpenAI demoed on stage. This has always been what I’ve seen as the most obvious voice mode use case. There’s a 15 second sample video included at the link and on their blog post.

Andrew Hsu: We’ve been working closely with OpenAI for the past few months to test the new Realtime API. I’m excited to share some thoughts on the best way to productize speech-to-speech for language learning, and announce the first thing we’ve built here, Live Roleplays.

Language learning is the perfect use case for speech-to-speech, as everyone is discovering! We’ve been blown away at how immersive our conversational practice experience now feels. But how does it differ from a general AI assistant like Advanced Voice Mode?

We think it’s more important than ever to create a product experience that’s purpose-built for language learning by combining the best technology, product design, and pedagogy. This is what enables a real path to language fluency beyond the first 15 minutes of interaction.

Here are some key aspects of how we’ve purpose-built this experience to be the most effective:

  1. As a user progresses through the conversation, we use our proficiency graph to ensure the dialogue is at the right level and exposes helpful language they should learn next

  2. We give the user specific objectives to try and complete during the roleplay to drive the conversation forward and get them to use key language items

  3. And when they need extra help, we proactively surface just the right amount of a hint to help them.

And of course all of our lessons are driven by our proprietary learning engine so these Live Roleplays (along with all of our lesson types) happen as part of a wider learning sequence personalized to the learner.

I’m definitely excited for the ‘good version’ of what Speak is building, whether or not Speak is indeed building a good version, or whether OpenAI’s new offerings are a key step towards that good version.

We do need to lower the price a bit, right now this is prohibitive for most uses. But if there’s one thing AI is great at, it’s lowering the price. I have to presume that they’re not going to charge 10x-20x the cost of the text version for that long. Right now GPT-4o-realtime-preview is $5/$20 for a million text tokens, $100/$200 for a million audio tokens.

Timothy Lee: Anyone have a theory for why OpenAI is charging so much more for on a per-token basis? I’d expect audio to be expensive if it was using many tokens per second. But I don’t understand why it would be 10-20x more expensive per token.

If you can take care of that, Sully is excited, as would be many others.

Sully: Whoa okay the realtime api looks kinda insane, low key more exciting than o1

It can use realtime tools + their voice to voice model, which is bonkers.

I genuinely think this:

1) opens up a new wave of never before possible voice startups (sooo much to build)

2) this *mightactually kill existing voice startups because while OAI is not competing with them, the tech moat is basically 0.

i can build a voice ai agent wrapper in < 1 day. cost will go to 0.

I really want to build some real time voice apps now!

Nikshep Saravanan: it’s awesome but expensive, $0.25 per minute of output. You can do 8-10x cheaper and lower latency with @cartesia_ai right now.

Sully: Yeah holy shit I just saw.

Anant: The prohibitive cost is why I’m still considering

@elevenlabsio for my current project. Can’t have an alarm app cost 40$/month 💀

But definitely something to revisit when it’s cheaper OR

if Google finally releases Duplex as an API, and that’s reasonable to implement.

McKay Wrigley is always fun for the ‘this will change everything’ and here you go:

McKay Wrigley: Realtime AI will change everything.

Computers won’t just be tools.

They will be 200 IQ coworkers who will actively help you with any task – and you will have entire teams of them.

OpenAI is building the nervous system for AGI, and it’s available via API.

Take advantage of it.

I know I’m a broken record on this, but once again, who stands to benefit the most from this?

PEOPLE WHO CAN CODE.

OpenAI drops a new API, and boom suddenly anyone who can use Cursor + read docs can build AI assistants with the latest tech.

Even <10 hours gets you far - learn!

The presentation also spent a bunch of time emphasizing progress on structured outputs and explaining how to use them properly, so you get useful JSONs.

These quotes are from the chat at the end between Altman and Kevin Weil. via Simon Willison’s live blog, we also have notes from Greg Kamradt but those don’t always differentiate who said what:

Kevin Weil: An enterprise said they wanted 60 days notice in advance of when you’re going to launch something, “I want that too!”

I don’t think enterprises should be able to get 60 days notice, but it would indeed be nice if OpenAI itself got 60 days notice, for various safety-related reasons?

Sam Altman: “We have an approach of: figure out where the capabilities are going, then work to make that system safe. o1 is our most capable model ever but it’s also our most aligned model ever.”

Is that what the Preparedness Framework says to do? This makes the dangerous assumption that you can establish the capabilities, and then fix the safety issues later in post.

  1. If you’re not considering safety from the beginning, you could paint yourself into a corner, and have to backtrack.

  2. If you’re building up the capabilities without the safety, assuming you can fix safety later, then that’s going to create every incentive to rush the safety efforts, and push to release even if they’re not ready or incomplete.

  3. We already have examples of OpenAI rushing its safety work and testing.

  4. If you build up sufficient capabilities, then the model being trained at all, or evaluated internally, could itself become unsafe. Or someone might steal the model in this intermediate unsafe state.

So, gulp?

Sam: “I think worrying about the sci-fi ways that this all goes wrong is also very important. We have people thinking about that.”

We’ve gone so far backward that Sam Altman needs to reassure us that they at least have some people ‘thinking about’ the ways this all goes wrong, while calling them ‘sci-fi ways’ in order to delegitimize them. Remember when this was 20% of overall compute? Now it’s ‘we have people thinking about that.’

Also this:

“Iterative deployment is our best safety system we have.”

Well, yes, I suppose it is, given that we don’t have anything else and OpenAI has no intention of trying hard to build anything else. So, iterative deployment, then, and the hope that when things go wrong we are always still in charge and around sufficiently to fix it for next time.

What are they going to do with the AGIs?

Mission: Build safe AGI. If the answer is a rack of GPUs, they’ll do that. If the answer is research, they’ll do that.

This must be some use of the word ‘safe’ that I wasn’t previously aware of? Or it’s expressing a hope of some kind, perhaps?

Kevin: “I think 2025 is really going to be the year that [agents] goes big”.

Sam: “I think people will ask an agent to do something for them that would have taken them a month, and it takes an hour – and then they’ll have ten of those at a time, and then a thousand at a time – and we’ll look back and say that this is what a human is meant to be capable of.”

I really, really do not think they have thought through the implications properly, here.

When is o1 going to support function calls? Kevin: “Before the end of the year.” (Applause). o1 is going to get system prompts, structured outputs and function calling by the end of the year.

Sam: “The model (o1) is going to get so much better so fast […] Maybe this is the GPT-2 moment, we know how to get it to GPT-4”. So plan for the model to get rapidly smarter.

I notice I am skeptical, because of how I think about the term ‘smarter.’ I think we can make it, maybe the word is ‘cleverer’? Have it use its smarts better. But already the key limitation is that it is not actually smarter, in my way of thinking, than GPT-4, instead it’s finding ways to maximize the use of what smarts it does have.

Sam asks if people who spend time with o1 feel like they’re “definitively smarter” than that thing, and if they expect to feel that way about o2.

Yes, after using it for a bit I will say I am ‘definitively smarter’ than o1. Perhaps I am prompting it badly but I have overall been disappointed in o1.

Sam: We’ve been tempted to produce a really good on-device model but that segment is actually pretty well served now.

Is this a way of saying they don’t know how to do better than Gemini there?

Singing is disabled for now, it seems, due to copyright issues.

Dirk Kingma, part of the original founding team of OpenAI and more recently of Google DeepMind, joins Anthropic.

Dirk Kingma: Personal news: I’m joining @AnthropicAI! 😄 Anthropic’s approach to AI development resonates significantly with my own beliefs; looking forward to contributing to Anthropic’s mission of developing powerful AI systems responsibly. Can’t wait to work with their talented team, including a number of great ex-colleagues from OpenAI and Google, and tackle the challenges ahead!

OpenAI is moving forward to raise $6.6 billion at a $157 billion valuation. That seems like a strangely small amount of money to be raising, both given their needs and given that valuation. Soon they will need far more.

OpenAI has asked investors to avoid backing rival start-ups such as Anthropic and xAI.

Elon Musk: OpenAI is evil.

Quoted largely because I’m sad Musk couldn’t find ‘OpenlyEvilAI.’ This is all standard business practice. OpenAI has stopped pretending it is above all that.

A list of the top 20 companies by ‘generative AI patents’ in 2023 shows exactly why this is not a meaningful metric.

OpenAI and Anthropic revenue breakdown. Huge if true, they think Anthropic is highly competitive on the API side, but alas no one uses Claude.

They have OpenAI growing 285% year over year for subscriptions, 200% for API, whereas Anthropic is catching up with a 900% increase since last year. Whether that is sustained for the API will presumably depend on who has the better products going forward. For the consumer product, Claude is failing to break through to visibility, and it seems unrelated to product quality.

The best part of chatbot subscriptions is the profit margins are nuts. Most people, myself included, are paying miles more per token for subscriptions than we would pay for the API.

OpenAI got there early to capture the public imagination, and they’ve invested in voice mode and in responses people like and done a good job of that, and gotten good press for it all, such that ChatGPT is halfway to being the ‘Google,’ ‘xerox’ or ‘Kleenex’ of generative AI. I wonder how much of that is a lasting moat, versus being a choice of focus.

Long term, I’d think this is bullish for Anthropic. That’s huge year over year growth, and they’re fully competitive on the API, despite being supposedly valued at only something like 20% of OpenAI even taking into account all of OpenAI’s shall we say ‘issues.’ That seems too low.

BioNTech and Google DeepMind build biological research assistant AIs, primarily focused on predicting experimental outcomes, presumably to choose the right experiments. For now that’s obviously great, the risk concerns are obvious too.

Matthew Yglesias: OpenAI’s creators hired Sam Altman, an extremely intelligent autonomous agent, to execute their vision of x-risk conscious AGI development for the benefit of all humanity but it turned out to be impossible to control him or ensure he’d stay durably aligned to those goals. 🤔

Sigal Samuel writes at Vox that ‘OpenAI as we knew it is dead’ pointing out that this consolidation of absolute power in Altman’s hands and abandonment of the non-profit mission involves stealing billions in value from a 501c(3) and handing it to investors and especially to Microsoft.

OpenAI planning to convert to a for-profit B-corporation is a transparent betrayal of the mission of the OpenAI non-profit. It is a clear theft of resources, a clear break of the fiduciary duties of the new OpenAI board.

If we lived in a nation of laws and contracts, we would do something about this. Alas, we mostly don’t live in such a world, and every expectation is that OpenAI will ‘get away with it.’

So this is presumably correct legal realist take on OpenAI becoming a B-corporation:

John Arnold: I’m sure OpenAI has a bunch of lawyers signing off on this and all but starting a nonprofit that metamorphoses into $10 bil seems, um, interesting.

There are rules. Rules that apply to ‘the little people.’

Greg Colbourn: I run a charity. Pretty sure that I can’t just turn it into a business and have the board give me 7% of the assets. That’s flat out illegal here in the UK (and I’m pretty sure it is in the US too) – basically it’s just steeling charitable assets.

If the charity were to be disbanded, the assets still have to be used for charitable purposes (e.g. donated to another charity), they can’t just be taken into private hands. If this was legal, why wouldn’t everyone use this scam to start a business as a charity first to nefariously extract funds from honest charity givers?

Apple has withdrawn from the new funding round, and WSJ dropped this tidbit:

Tom Dotan and Berber Jin (WSJ): OpenAI is also in the process of overhauling its corporate structure from a nonprofit into a for-profit company. That change, which was encouraged by many of the investors in the round, will be a complicated process for the startup. If it doesn’t complete the change within two years, investors in the current round will have the right to request their money back.

This means OpenAI is potentially in very deep trouble if they don’t execute the switch to a B-corporation. They’re throwing their cap over the wall. If they fail, venture investment becomes venture debt with a conversion option. If the companies request their money back, which conditional on this failure to secure the right to OpenAI’s profits seems not so unlikely, that could then be the end.

So can they pull off the heist, or will they get a Not So Fast?

To those asking why everyone doesn’t go down this path, the path isn’t easy.

Theo Francis, Berber Jin and Tom Dotan (WSJ): To get there, it will have to deal with regulatory requirements in at least two states, determine how to award equity in the for-profit company, and split assets with the nonprofit entity, which will continue to exist.

“This kind of transaction is incredibly complex and would involve a large number of legal and regulatory hurdles that would need to be navigated,” said Karen Blackistone, general counsel at the investment firm Hangar Management and an attorney specializing in technology and tax-exempt organizations.

One problem will be antitrust attention, since Microsoft had been relying on OpenAI’s unique structure to fend off such complaints.

Regulators have already scrutinized Microsoft’s relationship with OpenAI and whether it effectively controls the startup. The tech giant has argued that its investment only entitles it to a share of potential profits, but a new structure under which Microsoft has an equity stake in OpenAI could invite further antitrust attention.

I think the antitrust concerns are bogus and stupid, but many people seem to care.

The bigger question is, what happens to OpenAI’s assets?

The more complicated part is what would happen to OpenAI’s assets. When such a conversion takes place, it can’t simply shift assets from a nonprofit to a for-profit. The nonprofit is legally required to end up with assets, including any cash and securities, at least as valuable as those it turns over to the for-profit. In effect, OpenAI’s operations would likely be sold to the for-profit company or its investors, with the charity retaining the proceeds.

That makes sense and matches my understanding. You can take things away from the 501c3 world, but you have to pay fair market price for them. In this circumstance, the fair value of what is being taken away seems like quite a lot?

Gretchen Krueger, current AI policy researcher who was formerly at OpenAI, notes that her decision to join was partly due to the non-profit governance and profit cap, whereas if they’re removed now it is at least as bad as never having had them.

Carrol Wainwright, formerly of OpenAI, points out that Altman has proven himself a danger to OpenAI’s nonprofit mission that has now been entirely abandoned, that you cannot trust him or OpenAI, and that the actions of the past year were collectively a successful coup by Altman against those in his way, rather than the other way around.

Greg Brockman offers his appreciation to the departing Barret, Bob and Mira.

Wojciech Zaremba, OpenAI cofounder still standing, offers his appreciation, and his sadness about the departures.

Everyone likes money. I like money. But does Sam Altman like money, on a different level I like money?

Joe Rogan argues that yes. The guy very much likes money.

Joe Rogan: He’s always kind of said, ‘I’m not doing this for money; I don’t make any money.’

They just busted him in a $4 million Koenigsegg.

See if you can find that car.

‘Oh, I don’t need money, me money?

I’m not even interested in money.’

He’s driving around in a $4 million Koenigsegg.

Hi busted! I think you like money.

This is certainly a fun argument. Is it a valid one? Or does it only say that he (1) already has a lot of money and (2) likes nice things like a $4 million car?

I think it’s Bayesian evidence that the person likes money, but not the kind of super strong evidence Joe Rogan thinks this is. If you have a thousand times as much money as I do, and this brings you joy, why wouldn’t you go for it? He can certainly afford it. And I do want someone like Altman appreciating nice things, and not to feel guilty about buying those things and enjoying himself.

It is however a different cultural attitude than the one I’d prefer to be in charge of a company like OpenAI. I notice I would never want such a car.

When I asked Claude what it indicates about someone driving such a model around town (without saying who the person in question was), it included that this was evidence of (among other things) status consciousness, attention-seeking and high risk tolerance, which all seems right and concerning. It also speaks to the image he chooses to project on these questions. Intentionally projecting that image is not easily compatible with having the attitude Altman will need in his position leading OpenAI.

Gwern, who predicted Mira’s departure, offered further thoughts a few months ago on the proposition that OpenAI has been a dead or rotting organization walking for a while now, and is rapidly losing its lead. One has to take into account the new o1 model in such assessments, but the part of this that resonates most is that the situation seems likely to be binary. Either OpenAI is ‘still OpenAI’ and can use its superior position to maintain its lead and continue to attract talent and everything else it takes. Or, if OpenAI is no longer so special in the positive ways, it gets weighed down by all of its unique problems, and continues to bleed its talent.

Deepa Seetharaman writes at the WSJ that Turning OpenAI Into a Real Business Is Tearing It Apart.

Deepa Seetharaman (WSJ): Some tensions are related to conflicts between OpenAI’s original mission to develop AI for the public good and new initiatives to deploy moneymaking products. Others relate to chaos and infighting among executives worthy of a soap opera.

Current and former employees say OpenAI has rushed product announcements and safety testing, and lost its lead over rival AI developers. They say Altman has been largely detached from the day-to-day—a characterization the company disputes—as he has flown around the globe promoting AI and his plans to raise huge sums of money to build chips and data centers for AI to work.

OpenAI has also been evolving into a more normal business, as Altman has described it, since his return. The company, which has grown to 1,700 employees from 770 last November, this year appointed its first chief financial officer and chief product officer.

The majority of OpenAI employees have been hired since the Battle of the Board, and that would be true even if no one had left. That’s an extreme level of growth. It is very difficult to retain a good culture doing that. One likely shift is from a research culture to a product-first culture.

Tim Shi (Former OpenAI): It’s hard to do both at the same timeproduct-first culture is very different from research culture. You have to attract different kinds of talent. And maybe you’re building a different kind of company.

Noam Brown disagrees, and promises us that OpenAI still prioritizes research, in the wake of losing several senior researchers. I am sure there has been a substantial shift towards product focus, of course that does not preclude an increase in resources being poured into capabilities research. We do however know that OpenAI has starved their safety research efforts of resources and other support.

So far nothing that new, but I don’t think we’ve heard about this before:

Deepa Seetharaman: Murati and President Greg Brockman told Sutskever that the company was in disarray and might collapse without him. They visited his home, bringing him cards and letters from other employees urging him to return.

Altman visited him as well and expressed regret that others at OpenAI hadn’t found a solution.

Sutskever indicated to his former OpenAI colleagues that he was seriously considering coming back. But soon after, Brockman called and said OpenAI was rescinding the offer for him to return.

This report does not mean that Murati and Brockman actually worried that the company would collapse, there are multiple ways to interpret this, but it does provide valuable color in multiple ways, including that OpenAI made and then rescinded an offer for Sutskever to return.

It’s also hard not to be concerned about the concrete details of the safety protocols around the release of GPT-4o:

Executives wanted to debut 4o ahead of Google’s annual developer conference and take attention from their bigger rival.

The safety staffers worked 20 hour days, and didn’t have time to double check their work. The initial results, based on incomplete data, indicated GPT-4o was safe enough to deploy. 

But after the model launched, people familiar with the project said a subsequent analysis found the model exceeded OpenAI’s internal standards for persuasion—defined as the ability to create content that can persuade people to change their beliefs and engage in potentially dangerous or illegal behavior. 

I believe that releasing GPT-4o does not, in practice and on reflection, exceed what my threshold would be for persuasion, or otherwise have any capabilities that would cause me not to release it. And I do think it was highly reasonable to not give 4o the ‘full frontier model’ treatment given it wasn’t pushing the frontier much.

It still is rather damning that, in order to score a marketing win, it was rushed out the door, after 20 hour days from the safety team, without giving the safety team the time they needed to follow their own protocols. Or that the model turned out to violate their own protocols.

I’ve seen at least one mocking response of ‘so where’s all the massive harm from 4o then?’ and that is not the point. The point is that the safety process failed and was overridden by management under the flimsy pressure of ‘we want to announce something before Google announces something.’ Why should we think that process will be followed later, when it matters?

Nor was this a one time incident. It was a pattern.

The rush to deploy GPT-4o was part of a pattern that affected technical leaders like Murati.

The CTO repeatedly delayed the planned launches of products including search and voice interaction because she thought they weren’t ready. 

Other senior staffers also were growing unhappy.

John Schulman, another co-founder and top scientist, told colleagues he was frustrated over OpenAI’s internal conflicts, disappointed in the failure to woo back Sutskever and concerned about the diminishing importance of its original mission.

In August, he left for Anthropic.

The article also heavily suggests that Brockman’s leave of absence, rather than being motivated by family concerns, was because his management style was pissing off too many employees.

The New York Times covers Altman’s grand compute expansion dreams and his attempts to make them reality. His ambitions have ‘scaled down’ to the hundreds of billions of dollars.

Cate Metz and Tripp Mickle (NYT): TSMC’s executives found the idea so absurd that they took to calling Mr. Altman a “podcasting bro,” one of these people said. Adding just a few more chip-making plants, much less 36, was incredibly risky because of the money involved.

The article is full of people laughing at the sheer audacity and scale of Altman’s asks. But if no one is laughing at your requests, in an enterprise like this, you aren’t asking for enough. What is clear is that Altman does not seem to care which companies and nations he partners with, or what the safety or security implications would be. All he wants is to get the job done.

In Nate Silver’s book, there is a footnote that Altman told Silver that self-improving AI is ‘really scary’ and that OpenAI isn’t pursuing it. This is a highly bizarre way to make a statement that contradicts OpenAI’s clearly stated policies, which include using o1 (aka Strawberry) to do AI research, and the direct pursuit of AGI, and the entire goal of the former superalignment team (RIP) being an AI alignment researcher. So this quote shows how much Altman is willing to mislead.

The new head of alignment at OpenAI is Joshua Achiam. He’s said some useful and interesting things on Twitter at times, but also some deeply troubling things, such as:

Joshua Achiam (November 11, 2022) [giving advice to EAs]: Try to get better calibrated about what tail risks are real and what tail risks aren’t!

It seems like a huge miss to me that an enormous amount of ink gets spilled over speculative AGI tail risks predicted by people who are clearly converting anxiety disorders into made-up numbers about the likelihood of everyone dying within 10 years…

P(Misaligned AGI doom by 2032): <1e-6%

P(Large scale catastrophic accident risk from general-purpose AI by 2032, resulting for example in substantial cyberattack or e.g. empowering bad human actors by helping them accomplish a significant bioterrorist operation, with scale of impact smaller than Covid19): maybe ~3%?

In other words, the new head of AI alignment at OpenAI is on record lecturing EAs that misalignment risk from AGI is not real.

I do get the overall sense that Joshua is attempting to be helpful, but if the head of AI alignment at OpenAI does not believe in existential risk from AI misalignment, at all? If he thinks that all of our efforts should be in fighting human misuse?

Then that is perhaps the worst possible sign. Effectively, OpenAI would be saying that they have no superalignment team, that they are not making any attempt to avoid AI killing everyone, and they intend to proceed without it.

The question then becomes: What do we intend to do about this?

Perplexity for shopping?

TestingCatalogNews: BREAKING 🚨: Perplexity to offer “One-click purchase and free shipping on infinite products” to Pro subscribers in the future.

Will AI-assisted shopping kill the traditional purchasing experience? 🤔

Previously discovered “Buy with PPLX” turned out to be a part of a planned Pro offering to help users make purchases while searching.

It will come along with a separate Purchase tab to let users list and track their purchases as well 👀

Gallabytes: if all perplexity does is build a better search layer & a less broken website/mobile app I will gladly give them 5% of each purchase even if it still uses amazon on the backend. AI curation and prompt interface = perfect antidote to the horrid ui of amazon.com & app.

I still often use both Google and Wikipedia, and was never using Bing in the first place, so let’s not get ahead of ourselves.

In some form or another, yes, of course the future of shopping looks like some version of ‘tell the AI what you want and it locates the item or candidate items for you, and checks for the lowest available price and whether the deal is reasonable, and then you can one-click to purchase it without having to deal with the particular website.’

The question is, how good does this have to be before it is good enough to use? Before it is good enough to use as a default? Use without sanity checking, even for substantial purchases? When will it get that reliable and good? When that happens, who will be providing it to us?

Dreaming Tulpa reports they’ve created smart glasses that automatically snap photos of people you see, identifies them, searches online and tells you tons of stuff about them, like phone number and home address, via streaming the camera video to Instagram.

So on the one hand all of this is incredibly useful, especially if it caches everything for future reference. I hate having to try and remember people’s names and faces, and having to be sure to exchange contact info and ask for basic information. Imagine if you didn’t have to worry about that, and your glasses could tell you ‘oh, right, it’s that guy, with the face’ and even give you key info about them. Parties would be so much more fun, you’d know what to talk to people about, you could stop missing connections, and so on. Love it.

Alas, there are then the privacy concerns. If you make all of this too smooth and too easy, it opens up some malicious and anti-social use cases as well. And those are exactly the types of cases that get the authorities involved to tell you no, despite all of this technically being public information. Most of all it wigs people out.

The good news, I think, is that there is not that much overlap in the Venn diagram between ‘things you would want to know about people’ and ‘things you would want to ensure other people do not know.’ It seems highly practical to design a product that is a win-win, that runs checks and doesn’t share certain specific things like your exact address or your social security number?

Mostly, though, the problem here is not even AI. The problem is that people are leaving their personal info exposed on the web. All the glasses are doing is removing the ‘extra steps.’

Now that SB 1047 has been vetoed, but Newsom has said he wants us to try again with something ‘more comprehensive,’ what should it be? As I explained on Tuesday (recommended if you haven’t read it already), Newsom’s suggested approach of use-based regulation is a recipe for industry strangulation without helping with risks that matter, an EU-style disaster. But now is the time to blue sky, and think big, in case we can come up with something better, especially something that might answer Newsom’s objections while also, ya know, possibly working and not wrecking things.

Garrison Lovely makes it into the New York Times to discuss scandals at OpenAI and argue this supports the need for enhanced whistleblower protections. The case for such protections seems overwhelming to me, even if you don’t believe in existential risk mitigation at all.

Jack Clark offers the rogue state theory of AIs.

From August: Peter Thiel talked to Joe Rogan about a wide variety of things, and I had the chance to listen to a lot more of it.

His central early AI take here is bizarre. He thinks passing the Turing Test is big, with his justification largely being due to how important we previously thought it was, which seems neither here nor there. We agree that current Turing-level AIs are roughly ‘internet big’ (~8.0 on the Technological Richter Scale) in impact if things don’t advance from here, over the course of several decades. The weird part is where he then makes this more important than superintelligence, or thinks this proves superintelligence was an incorrect hypothesis.

I don’t understand the logic. Yes, the path to getting there is not what we expected, and it is possible things stop soon, but the progress so far doesn’t make superintelligence either less likely to happen. And if superintelligence does happen, it will undoubtedly be the new and probably last ‘most important event in the history of history,’ no matter whether that event proves good or bad for humans or our values, and regardless of how important AI had already been.

Peter then takes us on a wild ride through many other topics and unique opinions. He’s always fun and interesting to listen to, even (and perhaps especially) the parts where he seems utterly wrong. You’ve got everything from how and why they built the Pyramids to chimp political dynamics to his suspicions about climate science to extended takes on Jeffrey Epstein. It’s refreshing to hear fresh and unique wrong takes, as opposed to standard dumb wrong takes and especially Not Even Wrong takes.

That’s all in addition to the section on racing with China on AI, which I covered earlier.

Democratic control is nice but have you experienced not dying?

Or: If it turns out Petrov defied ‘the will of the Soviet people’? I’m cool with that.

Sigal Samuel: OpenAI is building tech that aims to totally change the world without asking if we consent. It’s undemocratic. And Sam Altman just proved that bespoke corporate structures & voluntary commitments won’t cut it — we need LAWS that give independent oversight

Robin Hanson: ALL innovation changes the world without democratic consent.

Eliezer Yudkowsky: My problem with ASI is not that it will undemocratically kill everyone, but that it will kill everyone. Call me a wild-eyed libertarian, but I would consider that event to be almost exactly as bad if it happened as the result of a 51% vote of Earth’s population.

Connor Leahy: I disagree.

If we somehow fairly and verifiably gained informed voter consent and like 75% of people were like “fuck it, let it rip”, I think this scenario would be vastly more ethical and dignified than the scenario we are in.

Davidad: The word “informed” is pulling a lot of weight here, and seems contradicted by the phrasing “fuck it, let it rip.” Surely it should sound more like “actually our safeguards seem pretty decent and it’s not worth postponing all the medical/climate/empowerment miracles any longer.”

Robin Hanson’s point is valid and one can go far further than that, in the sense that we all ‘change the world’ every time we do anything or fail to do something, there will often be losers from our decisions, and obviously we should still be free to do most things without permission from another. Innovation is not special.

One must however be careful not to prove too much. Innovation does not mean you automatically need permission. It also does not mean you have or should have a free pass to change the world however you like. Robin and I would both draw the line to give permission to more things than America’s status quo, and indeed I expect to oppose many AI regulations upon mundane AI, starting with much of the EU AI Act (sorry I haven’t finished my summary there, the reason it’s not done is it hurts my eyes to look and I keep not forcing myself to finish it). I would still make an exception for things that would plausibly kill everyone, or otherwise plausibly have massive net negative externalities.

I’m also strongly with Yudkowsky here, not Leahy. My problem with everyone dying undemocratically is mostly the dying part, not the undemocratic one. I’d feel better I suppose in some karmic justice sense that the people ‘deserved it’ if they offered actually informed consent, but none of us would end up less dead.

Indeed, our founders knew this principle well. The Constitution is in large part designed to protect us from the majority doing various highly dumb things.

Your periodic reminder: Better start believing in science fiction stories, dear reader, you’re in one – regardless of how much additional AI progress we see.

David Manheim: “AI takeover and existential risks sound like science fiction, not real problems.”

Science fiction like AI that writes poetry and solves mathematical problems better than a significant majority of humans?

Sorry, reality exists independent of the genre you think it belongs to.

Never mind o1. All the fictional characters in most of the science fiction I’ve read or seen over the years would be blown away by at least one of GPT-4 or what you can do with a smartphone without AI, often by both. You have all the world’s knowledge at your fingertips right now. I could go on. In general, anyone who calls something ‘science fiction’ should be considered to have invoked a variation of Godwin’s Law.

Here’s some fire from Robert Miles, via AI Notkilleveryoneism Memes. Video is at the link, and yes this whole thing is unbelievably exhausting.

It absolutely boggles my mind, every time, no matter how many times I hear it. People really will say, with a straight face that building AIs smarter and more capable than us is a default-safe activity, and letting everyone use them for whatever they want will go fine and turn out well for the humans unless I can show exactly how that goes wrong.

And each time, it’s like, seriously WTF everyone, sure I have a thousand detailed arguments for things likely to go wrong but why do I need to even bring them up?

Robert Miles: People are starting from a prior in which ‘[AIs] are safe until you give me an airtight case for why they’re dangerous.’

This framing is exhausting. You explain one of the 10,000 ways that AIs could be dangerous, then they explain why they don’t think that specific thing would happen. Then you have to change tack, and then they say, ‘your story keeps changing’…

“If you’re building an AGI, it’s like building a Saturn V rocket [but with every human on it]. It’s a complex, difficult engineering task, and you’re going to try and make it aligned, which means it’s going to deliver people to the moon and home again.

People ask “why assume they won’t just land on the Moon and return home safely?”

And I’m like, because you don’t know what you’re doing!

If you try to send people to the moon and you don’t know what you’re doing, your astronauts will die.

[Unlike the telephone, or electricity, where you can assume it’s probably going to work out okay] I contend that ASI is more like the moon rocket.

“The moon is small compared with the rest of the sky, so you don’t get to the moon by default – you hit some part of the sky that isn’t the moon. So, show me the plan by which you predict to specifically hit the moon.”

And then people say, “how do you predict that [AIs] will want bad things?”

There’s more bad things than good things! It’s not actually a complicated argument…

I’m not going to predict specifically where it off into random space your astronauts are going, but you’re not going to hit the moon unless you have a really good, technically clear plan for how you do it. And if you ask these people for their plan, they don’t have one. What’s Yann Lecun’s plan?”

“I think that if you’re building an enormously powerful technology and you have a lot of uncertainty about what’s going to happen, this is bad. Like, this is default unsafe.

If you’ve got something that’s going to do enormously influential things in the world, and you don’t know what enormously influential things it’s going to do, this thing is unsafe until you can convince me that it’s safe.”

HOST: “That’s a good way of thinking about it – with some technologies you can assume that the default will be good or at least neutral, or that the capacity of a person to use this in a very bad way is bounded somehow. There’s just only so many people you could electrocute one by one.

In related metaphor news, here is a thread by Eliezer Yudkowsky, use your judgment on whether you need to read it, the rest of you can skip after the opening.

Eliezer Yudkowsky: The big issue in aligning superintelligence is that, if you screw up enough, you cannot repair your mistake. The ASI will not let you repair it.

I tried calling this the “oneshot” aspect of the problem. This word, “oneshot”, proved vulnerable to misrepresentation… …by motivated misunderstanders (or maybe grinning liars) who said: “But ASI is not one-shot; we can do all sorts of experiments to make sure we understand! Oh, these poor fools who don’t understand empiricism; who think they can analyze a superintelligence by pure theory!”

To design a space probe that will actually land on Mars, without dying to some weird malfunction, is a Huge Difficult Murphy-Cursed Problem. Why?

Because you can’t fix the probe after it launches.

OK, this is the point where a lot of you can skip ahead, but I’ll copy the rest anyway to preserve it for easy reference and copying, cause it’s good.

You can run all kinds of ground experiments. They do! It’s still cursed.

Why is launching a space-probe still Murphy-cursed, with actual weird failures that destroy $327M efforts? Despite all the empirical!!! advance testing done on the ground?

Because the ground experiments can’t exactly reproduce the real outer-space environment. Something changes.

And then, once conditions have changed a little, something goes wrong.

And once a probe is high up in space, far far above you, it is too late for regrets, too late for tears; your one $327M project on which you staked your whole scientific career is dead; you cannot repair it.

We could call it, maybe, Murphy’s Curse of Unretrievability, if we were cataloguing the conditions that make an engineering project be Cursed of Murphy:

If you can’t repair a thing past a certain time, this alone will make an easy engineering project into a Very Hard Problem.

(The phrase “one-shotness” ought to have been shorter, and covered it. But if you’ve got people running around (deliberately or motivatedly-but-deniably-unconsciously) misrepresenting everything, they can do the real damage of making short sensible phrases unusable.)

Murphy’s Curse of Unretrievability is not defeated by doing earlier experiments that are not exactly like the critical context where things need to work.

This is clearly true in practice. Ask GPT-4o about the costliest failed space probes.

Still, let’s talk about the theory.

Let’s say you have no understanding of physics, and are trying to build a bridge.

Can testing, without theory, save you?

And one answer is: Possibly, if you can apply a genuinely matched testing regime that is more severe than the actual intended use.

Eg, say you have no idea of the theory of bridge-building.

So you build a bridge; and then you order a cart pulled across that bridge with a rope.

The cart is loaded down with rocks that weigh 10 times the most weight you expect the bridge to ever bear.

Will this save you?

(Figuring out how this bridge-testing method would need to work, might be for some readers a helpful exercise in alignment mindset. So I invite readers to pause, and consider this question before continuing: Absent theory, how must you verify a bridge by mere testing?)

I answer: Loading a cart down with rocks, and running it over the bridge once before use, might still lead the bridge to fail and fall later.

Maybe the mortar that you used, degrades; or wooden beams rot, or the earth below settles.

So every year again you need to use ropes to pull across that heavier cart, loaded with 10 times as much weight as the bridge ought to hold.

Conditions change over time. When you first pulled the cart across, you were not testing exactly the conditions of use 10 years later.

And how sure is your assumption about the weight the bridge will later bear?

If you wanted to really be sure, you’d have to install a weighing station, where heavy-looking carts are weighed against rocks that are one-tenth of the weight used to test the bridge…

You would write down all these assumptions.

There would be a sheet of parchment on which was written: “I believe as a loadbearing assumption: If I post up a sign saying ‘no carts over 1 ton lest the bridge fall’, nobody will drive a cart more than 2 tons over the bridge”.

The parchment should also say: “I believe as a loadbearing assumption: Though conditions change over time, if I once per year test the bridge’s ability to bear 20 tons of weight, the bridge will not, in just the next year, degrade past being able to hold 2 tons of weight.”

And once you write down an assumption like that, and stop and reflect on it, you realize that maybe the bridge’s operating instructions need to say: “Retest the bridge after a major flood or unprecedented set of storms; don’t just pick up and start using it again.”

None of this, of course, is going to save you if somebody marches troops across the bridge, and the rhythm of their marching feet is close to a resonant frequency of the bridge.

(Eg: Angers Bridge, 1850, 226 deaths.)

It takes theory to know to test something like that.

To be clear: humanity has always picked itself up and trudged on, after bridge collapses that kill 226 people, a little wiser and having a better idea of what to test next time. A bridge collapsing doesn’t wipe out the whole of humanity. So it’s not worth that much worry.

Also TBC: this notion of “Write down all your assumptions on parchment so you can reflect on what might violate them” is anachronistic for an era that knows no bridge-physics. Even the Romans had a little physics of loads, but not parchment lists of load-bearing assumptions.

Likewise today: If we look at the people building artificial superintelligence, they have no physics to tell them if a bridge will stay up.

And they have no written lists of load-bearing assumptions.

And their grand testing plan is to run smaller carts across the bridge first.

To summarize:

– “Unretrievability”, more fully “Murphy’s Curse of Unretrievability”, is what makes it Very Hard to build space probes.

— I tried calling this property “oneshotness”, but people motivatedly misinterpreted it to say, “It’s not ‘one-shot’, we can run tests first!”

– The reason why running ground tests doesn’t make space probes not be Murphy-Cursed (i.e. building space probes is still a Big Huge Deal, and even then often fails in practice) is that conditions on the ground experiments are not exactly like conditions in space.

— We can understand a bit about how hard it is to exactly match conditions, by looking at the example of what it would take to make a bridge stay up by pure testing, absent any theory. — This would also go better with “written lists of load-bearing assumptions”.

—- But even written lists, still won’t save you from learning a hard lesson about resonance, the first time that soldiers march across a bridge. —- The would-be builders of gods, don’t even have written lists of load-bearing assumptions and what might violate them.

John Pressman: I’m no longer allowed to signal my epistemic fairness with public likes so I would like to inform you this is a good thread.

Gallabytes: yeah it’s an interesting fact about the world that neural net training almost has a “reverse-murphy” curse. it’s definitely not *anti*fragile but it’s quite robust. in fact this is a defining characteristic of good neural net architecture choices.

John Pressman: It is, but I think the Murphy Curse he’s worried about here is more like the 2nd order effects of the continuous learning dynamics than the neural net training itself. There’s a lot of opportunity for things to go wrong once the model is in a feedback loop with its training set.

My understanding of capabilities training is that there are a lot of knobs and fiddly bits and characteristics of your data and if you screw them up then the thing doesn’t work right, but you can tinker with them until you get them right and fix the issues, and if you have the experience and intuition you can do a huge ‘YOLO run’ where you guess at all of them and have a decent chance of that part working out.

The contrast is with the alignment part, with regard to the level you need for things smarter or more capable than people (exact thresholds unclear, hard to predict and debatable) which I believe is most definitely cursed, and where one must hit a narrow target. For mundane (or ‘prosaic’) alignment, the kludges we use now are mostly fine, but if you tried to ‘fly to the moon’ with them you’re very out of your test distribution, you were only kind of approximating even within the test, and I can assure you that you are not landing on that moon.

Roon offers wise words (in an unrelated thread), I fully endorse:

Roon: A true accelerationist feels their heart beat faster when they stare into the fog of war. The stomach lurch from the vertigo of science fiction. The courage of someone changing their whole life knowing it could go sideways. Anyone else is a larping idiot.

People who are wearing blindfolds accelerating into walls dissociated from the real world in an amphetamine haze with nothing precious to gain or lose are shuffling zombies that have given up their soul to the great replicator.

There is no guarantee of victory. no hands left unbloodied. only the lightcone of possibilities.

An example of what this failure mode looks like, in a response to Roon:

Anton: “Faster, Faster, until the thrill of speed overcomes the fear of death.”

There is a school of thought that anything opposed to them is 1984-level totalitarian.

Marc Andreessen, and to a lesser extent Paul Graham, provide us with a fully clean examples this week of how his rhetorical world works and what it means when they say words. So I wanted to note them for future reference, so I don’t have to keep doing it over and over again going forward, at least with Marc in particular.

Paul Graham: Degrowthers should lead by example. Don’t tell us how you think we should live. Live that way yourselves, and show us how much better it is.

Eliezer Yudkowsky: Degrowth sucks, but ‘Unilaterally stop engaging in this widespread act that you say has selfish benefits but larger negative externalities” is not a valid gotcha. “There should be a law to make us all cooperate in the prisoners dilemma, but I won’t while you won’t” is valid.

Marc Andreessen: The totalitarian mindset. No personal choice, just top down control, of everything, forever.

Did someone point out that ‘you first’ is not a valid argument or gotcha against requiring a personal sacrifice that some claim would do good? While also opposing (the obligatory and also deeply true ‘degrowth sucks’) the sacrifice by pointing out it is stupid and would not, in fact, do good?

Well, that must mean the people pointing that out are totalitarians, favoring no personal choice, just top down control, of everything, forever.

So the next time people of that ilk call someone a totalitarian, or say that someone opposes personal choice, or otherwise haul out their slogans, remember what they mean when they say this.

The charitable interpretation is that they define ‘totalitarian’ as one who does not, in principle and in every case, oppose the idea of requiring people to do things not in that person’s self-interest.

Here is another example from this week of the same Enemies List attitude, and attributing to them absurdist things that have nothing to do with their actual arguments or positions, except to lash out at anyone with a different position or who cares about the quality of arguments:

Marc Andreessen: There will be no curiosity, no enjoyment. But always there will be the intoxication of power. Always there will be the sensation of trampling on an enemy who is helpless. If you want a picture of the future, imagine Helen Toner’s boot stamping on a human face— forever.

I don’t think he actually believes the things he is saying. I don’t know if that’s worse.

As a bonus, here’s who Martin Casado is and what he cares about.

Martin Casado: Had to mass unmute/unblock a bunch of EA folks just to enjoy their lamentations for a day.

Roon: Seems in really poor taste.

Martin Casado: You’re a bigger person than I am Roon. After months of largely baseless vilification and hit pieces I’m enjoying the moment. Don’t you worry, I’ll block them all again in a day or so.

A Narrow Path is a newly written plan for allowing humanity to survive the path to superintelligence. Like the plan or hate the plan, this at least is indeed a plan, that tries to lay out a path that might work.

The core thesis is, if we build superintelligence before we are ready then we die. So make sure no one builds it until then.

I do agree that this much is clear: Until such time as we figure out how to handle superintelligence in multiple senses, building superintelligence would probably be collective suicide. We are a very long way from figuring out how to handle it.

Andrea Miotti: We chart a path of three Phases:

0. Safety: Build up our defenses to restrict the development of superintelligent AI.

1. Stability: Build a stable international AI governance framework.

2. Flourishing: With a stable system and humanity secure, build transformative AI technology.

We propose a normative guiding principle:

No superintelligence.

Most AI is a beneficial tool for human growth. Superintelligence is a successor species. We should understand the latter as a hyper-capable adversary to contain, and build our defenses against.

If we build things that are plausibly AGIs, that directly creates a lot of mundane issues we can deal with and not-so-mundane intermediate issues that would be difficult to deal with. If that’s all it did, which is how many think about it, then you gotta do it.

The problem is: What we definitely cannot deal with is that once we build AGI, the world would rapidly build ASI, one way or another.

That’s what they realize we need to avoid doing for a while. You backchain from there.

Here is their plan. At whatever level of detail you prefer to focus on: Do you think it is sufficient? Do you think it is necessary? Can you think of a superior alternative that would do the job?

To achieve safety, we identify certain conditions to be met:

1. No AIs improving AIs

2. No AIs capable of breaking out of their environment

3. No unbounded AIs

4. Limit the general intelligence of AI systems so that they cannot reach superhuman level at general tasks

1. No AIs improving AIs

Any restriction on the general intelligence of AIs will be broken if machines can improve themselves or other machines at machine speed. We draw a principled line to focus only on dangerous illegible AIs, while leaving untouched human written software.

This is a tough ask even if you want it. It’s a gray area – are Claude and o1 AIs that can improve AIs? Not in the automated super scary way, but a software engineer with current AI is a lot more productive than one without it. What do you do when humans use AI code assistant tools? When they copy-paste AI code outputs? At what point does more of that change to something different? Can you actually stop it? How?

Similarly, they say ‘no AIs capable of breaking out of their environment’ but for a sufficiently unprotected environment, current AIs already are on the verge of being able to do this. And many will ‘set them free’ on purpose anyway.

Similarly, when interacting with the world and being given tools, what AI can we be confident will stay ‘bounded’? They suggest this can happen with safety justifications. It’s going to be tough.

Finally there is a limit to the ‘general intelligence’ of systems, which again you would need to somehow define, measure and enforce.

This is a long dense document (~80 pages). Even if we did get society and government to buy-in, there are tons of practical obstacles ahead on many levels. We’re talking about some very difficult to pin down, define or enforce provisions. This is very far from a model bill. Everything here would need a lot of iteration and vetting and debate, and there are various details I suspect are laid out poorly. And then you’d need to deal with the game theory and international aspects of the issue.

But it is a great exercise – instead of asking ‘what can we in practice hope to get done right now?’ they instead ask a different question ‘where do we need to go’ and then ‘what would it take to do something that would actually get there?’

You can of course disagree with how they answer those questions. But they are the right question to ask. Then, if the answer comes back ‘anything that might work to get us a place we can afford to go is going to be highly not fun,’ as it well might, how highly not fun? Do you care more about it not working or things being not fun? Is there an alternative path or destination to consider?

No doubt many who read (realistically: glance at or lightly skim or feed into an LLM) this proposal will respond along the lines of ‘look at these horrible people who want to restrict X or require a license for Y’ or ‘they want a global government’ or ‘war on math’ or what not. And then treat that as that.

It would be good to resist that response.

Instead, treat this not as a call to implement anything. Rather treat this as a claim that if we did XYZ, then that is plausibly sufficient, and that no one has a less onerous plan that is plausibly sufficient. And until we can find a better approach, we should ask what might cut us off from being able to implement that plan, versus what would enable us to choose if necessary to walk that path, if events show it is needed.

One should respond on that level. Debate the logic of the path.

Either argue it is insufficient, or it is unnecessary, or that it flat out won’t work or is not well defined, or suggest improvements, point out differing assumptions and cruxes, including doing that conditional on various possible world features.

This can include ‘we won’t get ASI anyway’ or ‘here are less painful measures that are plausibly sufficient’ or ‘here’s why there was never a problem in the first place, creating things smarter than ourselves is going to go great by default.’ And they include many good objections about detail choices, implementations, definitions, and so on, which I haven’t dug into in depth. There are a lot of assumptions and choices here that one can and should question, and requirements that can be emphasized.

Ultimately, if your actual point of view is something like ‘I believe that building ASI would almost certainly go fine [because of reasons]’ then you can say that and stop there. Or you can say ‘I believe building ASI now is fine, but let’s presume that we’ve decided for whatever reason that this is wrong’ and then argue about what alternative paths might prevent ASI from being built soon.

The key is you must pick one of these:

  1. Building a superintelligence under current conditions will turn out fine.

  2. No one will build a superintelligence under anything like current conditions.

  3. We must prevent at almost all costs anyone building superintelligence soon.

Thus, be clear which of these you are endorsing. If it’s #1, fine. If it’s #2, fine.

If you think it’s too early to know if #1 or #2 is true, then you want to keep your options open.

If you know you won’t be able to bite either of those first two bullets? Then it’s time to figure out the path to victory, and talk methods and price. And we should do what is necessary now to gather more information, and ensure we have the option to walk down such paths.

That is very different from saying ‘we should write this agenda into law right now.’ Everyone involved understands that this would be overdeterminedly premature.

A fun feature of interpretability is that the model needs to be smart enough that you can understand it, but not so smart that you stop understanding it again.

Or, similarly, that the model needs to be smart enough to be able to get a useful answer out of a human-style Chain of Thought, without being smart enough to no longer get a useful answer out of a human-style Chain of Thought. And definitely without it being smart enough that it’s better off figuring out the answer and then backfilling in a Chain of Thought to satisfy the humans giving the feedback, a classic alignment failure mode.

Wojciech Zaremba: o1 paradigm of solving problems with a chain of thought offers new avenues to safety/alignment research. It’s easier to ensure such AI behaves as expected because we can see its thoughts. I am feeling pumped.

Davidad: Remember folks, the more capable the base model (beyond about 13B-34B), the less the “reasoning trace” serves as an effective interpretability tool for the true causes of the final answer. UNLESS the final answer is produced only via running formal methods on the reasoning…

Roon: you cannot conclude that native reasoning tokens exhibit similar behaviors to prompted CoTs.

Davidad: yeah you’re right. Qis even worse it doesn’t just hallucinate a plausible answer and then back-rationalize it, it specifically hallucinates a *successfulanswer and then back-rationalizes it.

Roon: you’re changing the subject. your cot faithfulness scaling law may not hold.

Davidad: Look, here’s an a priori argument. If an architecture has enough model capacity to distill the CoT search process into its layers, it will achieve higher reward for its final answers by separating its reasoning process from the noise introduced by sampling during the CoT rollout.

There may be ways to avoid this, by doing *not justprocess supervision but also some kind of per-step entropic regularization.

But if you just say “oh, it’s native reasoning, it’s different,” I think you just mean process supervision, and I think you haven’t understood the problem, and I think your model may still do its true reasoning purely in latent space—in parallel with CoT steps to satisfy the PRM.

If the agent is interacting with a process supervisor that is truly impossible to fool, like a proof assistant, this may be okay, because there’s no corner of strategy space which gets highly rewarded for confident wrong answers.

But even then, you still shouldn’t expect the formal proof to mirror the actual underlying reasoning. Mathematicians often reason almost entirely through geometric intuition while writing down an entirely algebraic proof, even if their motivation is to write down correct algebra.

I think Davidad is correct here.

Always remember to reverse any advice you hear, including the advice to reverse any advice you hear:

Sam Altman: You should have a very high bar for doing anything but thinking about what to work on… Of course that doesn’t work all of the time. At some point, you actually have to go execute. But I often see people who I think are really talented, work super hard, and are super product, not spend any time thinking about what they’re going to work on.

If I have a very hard problem or I’m a little confused about something, I still have not found anything better to do than sit down and make myself write it out… It is a super powerful thinking tool.

My experience is that almost no one gets this correct. People usually do one of:

  1. Do something reasonably random without thinking about what to do.

  2. Do a ton of thinking about what to do, stay paralyzed and do nothing.

  3. Do a ton of thinking about what to do, realize they are paralyzed or out of time and money, and end up doing something reasonably random instead.

  4. Do a ton of thinking, buy some abstract argument, and do something unwise.

  5. Overthink it, and do something unwise.

The good news is, even your system-1 instinctive guess on whether you are doing too little or too much thinking versus doing is almost certainly correct.

And yes, I do hope everyone sees the irony of Altman telling everyone to take more time to think harder about whether they’re working on the right project.

The perfect subtweet doesn’t exist.

Sasha de Marigny (Head of Anthropic Comms): Happy to report that Anthropic’s co-founders all still merrily work at the company. None have been lost to Middle Age plagues or jacuzzis.

(See the section The Mask Comes Off and this in particular if you don’t have the proper context.)

AI #84: Better Than a Podcast Read More »

“gone-phishing”—every-cyberattacker’s-favorite-phrase

“Gone Phishing”—Every Cyberattacker’s Favorite Phrase

Phishing—how old hat is that as a topic? Isn’t it solved for most of us by now? Can’t we speak about AI instead? That may be your response when you hear a security analyst talk about phishing and phishing prevention, but those assumptions couldn’t be further from the truth. Phishing continues to be one of the primary threat vectors any organization needs to protect itself from.

How Phishing Has Evolved

Phishing, sadly, remains a persistent threat, continually evolving and attacking more users across a broader array of channels. It is no longer relegated to email messages with suspect spelling and grammar. Instead, phishing will target anywhere a user communicates: email, collaboration platforms, messaging apps, code repositories, and mobile devices. It is also increasingly accurate, making malicious communication more difficult than ever to identify. Its more sophisticated messaging is not always focused on stealing credentials or deploying malicious software and instead seeks to encourage users to carry out malicious activity unknowingly.

This is where AI plays its part. AI is at the forefront of modern attacks, having increased the efficacy of phishing campaigns by enabling criminals to study a target’s online habits and craft more convincing phishing attempts. Modern attacks can recognize the usual communication patterns of organizations and users, and the language used in those communications, and are using this ability to great effect across new channels such as messaging apps, SMS messages, and even audio and video.

Packing the Defense

Many organizations have, of course, invested in anti-phishing tools and have done so for a prolonged period. However, with an attack methodology that evolves so quickly, organizations must continue to evaluate their defenses. This does not mean they must rip out what they currently have, but it certainly means they should evaluate existing tools to ensure they remain effective and look at how to address gaps if discovered.

What should you consider when evaluating your current approaches?

  • Understand the attack surface: If your phishing protection is only focused on email, how are you protecting your users from other threats? Can you protect users from phishing attempts in Teams or Slack? When they access third-party sites and SaaS apps? When they are accessing code in code repositories? When they scan a QR code on their mobile? All of these are potential attack vectors. Are you covered?
  • AI defense: AI is rapidly accelerating the efficacy of phishing-based attacks. Its ability to build effective and hard-to-identify phishing attacks at scale presents a serious threat to traditional methods of spotting attacks. The most effective tool to reduce this threat is defensive AI. Understand how your tools are currently protecting your business from AI-based attacks and decide if the methods are effective.
  • Multilayered protection: Phishing attacks are broad, so defenses must be equally broad and layered. Modern tools should be able to stop basic attacks in a way that reduces the impact of false positives, which impact workflows and user efficiency. Solutions must ensure that phishing detection is accurate, but should also properly evaluate threats they don’t know using tools like link protection and sandboxing.
  • User education in phishing prevention: User education is a key component of phishing prevention. Organizations must determine the type of education that best serves their needs, whether it’s formal awareness training, phishing education exercises, or subtle “nudge” training to improve usage habits. Are your current tools as effective as you need them to be?
  • Catch you later: Increasingly, phishing threats are retrospectively activated. They are not triggered or malicious on delivery but are weaponized later in attempts to evade security tools. Ensure your solutions are capable of addressing this and can remove threats from communications channels when they become weaponized after delivery.

Don’t Let Them Phish in Your Lake

Phishing remains the most likely attack vector for cybercriminals. The impact of a successful phishing attempt can be significant, causing loss of business, reputation, financial impact and potential legal action.

Phishing is not a static threat; it continues to evolve rapidly. Organizations must continue to evaluate their phishing protection stance to ensure they remain effective against new and evolving threats.

Fortunately, cybersecurity vendors continue to evolve too. So, ensure you continue to monitor your defenses and don’t let a cyberattacker catch you hook, line, and sinker.

Next Steps

To learn more, take a look at GigaOm’s anti-phishing Key Criteria and Radar reports. These reports provide a comprehensive overview of the market, outline the criteria you’ll want to consider in a purchase decision, and evaluate how a number of vendors perform against those decision criteria.

If you’re not yet a GigaOm subscriber, sign up here.

“Gone Phishing”—Every Cyberattacker’s Favorite Phrase Read More »