Author name: Kris Guyer

rocket-report:-spacex-to-make-its-own-propellant;-china’s-largest-launch-pad

Rocket Report: SpaceX to make its own propellant; China’s largest launch pad


United Launch Alliance begins stacking its third Vulcan rocket for the second time.

Visitors walk by models of a Long March 10 rocket, lunar lander, and crew spacecraft during an exhibition on February 24, 2023 in Beijing, China. Credit: Hou Yu/China News Service/VCG via Getty Images

Welcome to Edition 8.02 of the Rocket Report! It’s worth taking a moment to recognize an important anniversary in the history of human spaceflight next week. Fifty years ago, on July 15, 1975, NASA launched a three-man crew on an Apollo spacecraft from Florida and two Russian cosmonauts took off from Kazakhstan, on course to link up in low-Earth orbit two days later. This was the first joint US-Russian human spaceflight mission, laying the foundation for a strained but enduring partnership on the International Space Station. Operations on the ISS are due to wind down in 2030, and the two nations have no serious prospects to continue any partnership in space after decommissioning the station.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Sizing up Europe’s launch challengers. The European Space Agency has selected five launch startups to become eligible for up to 169 million euros ($198 million) in funding to develop alternatives to Arianespace, the continent’s incumbent launch service provider, Ars reports. The five small launch companies ESA selected are Isar Aerospace, MaiaSpace, Rocket Factory Augsburg, PLD Space, and Orbex. Only one of these companies, Isar Aerospace, has attempted to launch a rocket into orbit. Isar’s Spectrum rocket failed moments after liftoff from Norway on a test flight in March. None of these companies is guaranteed an ESA contract or funding. Over the next several months, ESA and the five launch companies will negotiate with European governments for funding leading up to ESA’s ministerial council meeting in November, when ESA member states will set the agency’s budget for at least the next two years. Only then will ESA be ready to sign binding agreements.

Let’s rank ’em … Ars Technica’s space reporters ranked the five selectees for the European Launcher Challenge in order from most likely to least likely to reach orbit. We put Munich-based Isar Aerospace, the most well-funded of the group, at the top of the list after attempting its first orbital launch earlier this year. Paris-based MaiaSpace, backed by ArianeGroup, comes in second, with plans for a partially reusable rocket. Rocket Factory Augsburg, another Germany company, is in third place after getting close to a launch attempt last year before its first rocket blew up on a test stand. Spanish startup PLD Space is fourth, and Britain’s Orbex rounds out the list. (submitted by EllPeaTea)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

Japan’s Interstellar Technologies rakes in more cash. Interstellar Technologies raised 8.9 billion yen ($61.8 million) to boost development of its Zero rocket and research and development of satellite systems, Space News reports. The money comes from Japanese financial institutions, venture capital funds, and debt financing. Interstellar previously received funding through agreements with the Japanese government and Toyota, which Interstellar says will add expertise to scale manufacturing of the Zero rocket for “high-frequency, cost-effective launches.” The methane-fueled Zero rocket is designed to deploy a payload of up to 1 metric ton (2,200 pounds) into low-Earth orbit. The unfortunate news from Interstellar’s fundraising announcement is that the company has pushed back the debut flight of the Zero rocket until 2027.

Straight up … Interstellar has aspirations beyond launch vehicles. The company is also developing a satellite communications business, and some of the money raised in the latest investment round will go toward this segment of the company. Interstellar is open about comparing its ambition to that of SpaceX. “On the satellite side, Interstellar is developing communications satellites that benefit from the company’s own launch capabilities,” the company said in a statement. “Backed by Japan’s Ministry of Internal Affairs and Communications and JAXA’s Space Strategy Fund, the company is building a vertically integrated model, similar to SpaceX’s approach with Starlink.”

Korean startup completes second-stage qual testing. South Korean launch services company Innospace says it has taken another step toward the inaugural launch of its Hanbit-Nano rocket by the year’s end with the qualification of the second stage, Aviation Week & Space Technology reports. The second stage uses an in-house-developed 34-kilonewton (7,643-pound-thrust) liquid methane engine. Innospace says the engine achieved a combustion time of 300 seconds, maintaining stability of the fuel and oxidizer supply system, structural integrity, and the launch vehicle integrated control system.

A true micro-launcher … Innospace’s rocket is modest in size and capacity, even among its cohorts in the small launch market. The Hanbit-Nano rocket is designed to launch approximately 200 pounds (90 kilograms) of payload into Sun-synchronous orbit. “With the success of this second stage engine certification test, we have completed the development of the upper stage of the Hanbit-Nano launch vehicle,” said Kim Soo-jong, CEO of Innospace. “This is a very symbolic and meaningful technological achievement that demonstrates the technological prowess and test operation capabilities that Innospace has accumulated over a long period of time, while also showing that we have entered the final stage for commercial launch. Currently, all executives and staff are doing their best to successfully complete the first stage certification test, which is the final gateway for launch, and we will make every effort to prepare for a smooth commercial launch in the second half of the year.”

Two companies forge unlikely alliance in Dubai. Two German entrepreneurs have joined forces with a team of Russian expats steeped in space history to design a rocket using computational AI models, Payload reports. The “strategic partnership” is between LEAP 71, an AI-enabled design startup, and Aspire Space, a company founded by the son of a Soviet engineer who was in charge of launching Zenit rockets from the Baikonur Cosmodrome in Kazakhstan in the 1980s. The companies will base their operations in Dubai. The unlikely pairing aims to develop a new large reusable launch vehicle capable of delivering up to 15 metric tons to low-Earth orbit. Aspire Space is a particularly interesting company if you’re a space history enthusiast. Apart from the connections of Aspire’s founder to Soviet space history, Aspire’s chief technology officer, Sergey Sopov, started his career at Baikonur working on the Energia heavy-lift rocket and Buran space shuttle, before becoming an executive at Sea Launch later in his career.

Trust the computer … It’s easy to be skeptical about this project, but it has attracted an interesting group of people. LEAP 71 has just two employees—its two German co-founders—but boasts lofty ambitions and calls itself a “pioneer in AI-driven engineering.” As part of the agreement with Aspire Space, LEAP 71 will use a proprietary software program called Noyron to design the entire propulsion stack for Aspire’s rockets. The company says its AI-enabled design approach for Aspire’s 450,000-pound-thrust engine will cut in half the time it took other rocket companies to begin test-firing a new engine of similar size. Rudenko forecasts Aspire’s entire project, including a launcher, reusable spacecraft, and ground infrastructure to support it all, will cost more than $1 billion. So far, the project is self-funded, Rudenko told Payload. (submitted by Lin Kayser)

Russia launches ISS resupply freighter. A Russian Progress supply ship launched July 3 from the Baikonur Cosmodrome in Kazakhstan atop a Soyuz-2.1a rocket, NASASpaceflight reports. Packed with 5,787 pounds (2,625 kilograms) of cargo and fuel, the Progress MS-31 spacecraft glided to an automated docking at the International Space Station two days later. The Russian cosmonauts living aboard the ISS will unpack the supplies carried inside the Progress craft’s pressurized compartment. This was the eighth orbital launch of the year by a Russian rocket, continuing a downward trend in launch activity for the Russian space program in recent years.

Celebrating a golden anniversary … The Soyuz rocket that launched Progress MS-31 was painted an unusual blue and white scheme, as it was originally intended for a commercial launch that was likely canceled after Russia’s invasion of Ukraine. It also sported a logo commemorating the 50th anniversary of the Apollo-Soyuz mission in July 1975.

Chinese rocket moves closer to first launch. Chinese commercial launch firm Orienspace is aiming for a late 2025 debut of its Gravity-2 rocket following a recent first-stage engine hot fire test, Space News reports. The “three-in-one” hot fire test verified the performance of the Gravity-2 rocket’s first stage engine, servo mechanisms, and valves that regulate the flow of propellants into the engine, according to a press release from Orienspace. The Gravity-2 rocket’s recoverable and reusable first stage will be powered by nine of these kerosene-fueled engines. The recent hot fire test “lays a solid foundation” for future tests leading up to the Gravity-2’s inaugural flight.

Extra medium … Orienspace’s first rocket, the solid-fueled Gravity-1, completed its first successful flight last year to place multiple small satellites into orbit. Gravity-2 is a much larger vehicle, standing 230 feet (70 meters) tall, the same height as SpaceX’s Falcon 9 rocket. Orienspace’s new rocket will fly in a core-only configuration or with the assistance of two solid rocket boosters. An infographic released by Orienspace in conjunction with the recent engine hot fire test indicates the Gravity-2 rocket will be capable of hauling up to 21.5 metric tons (47,400 pounds) of cargo into low-Earth orbit, placing its performance near the upper limit of medium-lift launchers.

Senator calls out Texas for trying to steal space shuttle. A political effort to remove space shuttle Discovery from the Smithsonian and place it on display in Texas encountered some pushback on Thursday, as a US senator questioned the expense of carrying out what he described as a theft, Ars reports. “This is not a transfer. It’s a heist,” said Sen. Dick Durbin (D-Ill.) during a budget markup hearing before the Senate Appropriations Committee. “A heist by Texas because they lost a competition 12 years ago.” In April, Republican Sens. John Cornyn and Ted Cruz, both representing Texas, introduced the “Bring the Space Shuttle Home Act” that called for Discovery to be relocated from the National Air and Space Museum’s Steven F. Udvar-Hazy Center in northern Virginia and displayed at Space Center Houston. They then inserted an $85 million provision for the shuttle relocation into the Senate version of the “One Big Beautiful Bill,” which, to comply with Senate rules, was more vaguely worded but was meant to achieve the same goal. That bill was enacted on July 4, when President Donald Trump signed it into law.

Dollar signs As ridiculous as it is to imagine spending $85 million on moving a space shuttle from one museum to another, it’ll actually cost a lot more to do it safely. Citing research by NASA and the Smithsonian, Durbin said that the total was closer to $305 million and that did not include the estimated $178 million needed to build a facility to house and display Discovery once in Houston. Furthermore, it was unclear if Congress even has the right to remove an artifact, let alone a space shuttle, from the Smithsonian’s collection. The Washington, DC, institution, which serves as a trust instrumentality of the US, maintains that it owns Discovery. The paperwork signed by NASA in 2012 transferred “all rights, interest, title, and ownership” for the spacecraft to the Smithsonian. “This will be the first time ever in the history of the Smithsonian someone has taken one of their displays and forcibly taken possession of it. What are we doing here? They don’t have the right in Texas to claim this,” said Durbin.

Starbase keeps getting bigger. Cameron County, Texas, has given SpaceX the green light to build an air separator facility, which will be located less than 300 feet from the region’s sand dunes, frustrating locals concerned about the impact on vegetation and wildlife, the Texas Tribune reports. The commissioners voted 3–1 to give Elon Musk’s rocket company a beachfront construction certificate and dune protection permit, allowing the company to build a facility to produce gases needed for Starship launches. The factory will separate air into nitrogen and oxygen. SpaceX uses liquid oxygen as a propellant and liquid nitrogen for testing and operations.

Saving the roads … By having the facility on site, SpaceX hopes to make the delivery of those gases more efficient by eliminating the need to have dozens of trucks deliver them from Brownsville. The company says they need more than 200 trucks of liquid nitrogen and oxygen delivered for each launch, a SpaceX engineer told the county during a meeting last week. With their application, SpaceX submitted a plan to mitigate expected negative effects on 865 square feet of dune vegetation and 20 cubic yards of dunes, as well as compensate for expected permanent impacts to 7,735 square feet of dune vegetation and 465 cubic yards of dunes. While the project will be built on property owned by SpaceX, the county holds the authority to manage the construction that affects Boca Chica’s dunes.

ULA is stacking its third Vulcan rocket. A little more than a week after its most recent Atlas V rocket launch, United Launch Alliance rolled a Vulcan booster to the Vertical Integration Facility at Cape Canaveral Space Force Station in Florida on July 2 to begin stacking its first post-certification Vulcan rocket, Spaceflight Now reports. The operation, referred to by ULA as Launch Vehicle on Stand (LVOS), is the first major milestone toward the launch of the third Vulcan rocket. The upcoming launch will be the first operational flight of ULA’s new rocket with a pair of US military payloads, following two certification flights in 2024.

For the second time … This is the second time that this particular Vulcan booster was brought to Space Launch Complex 41 in anticipation of a launch campaign. It was previously readied in late October of last year in support of the USSF-106 mission, the Space Force’s designation for the first national security launch to use the Vulcan rocket. However, plans changed as the process of certifying Vulcan to fly government payloads took longer than expected, and ULA pivoted to launch two Atlas V rockets on commercial missions from the same pad before switching back to Vulcan launch preps.

Progress report on China’s Moon rocket. China’s self-imposed deadline of landing astronauts on the Moon by 2030 is now just five years away, and we’re starting to see some tangible progress. Construction of the launch pad for the Long March 10 rocket, the massive vehicle China will use to launch its first crews toward the Moon, is well along at the Wenchang Space Launch Site on Hainan Island. An image shared on the Chinese social media platform Weibo, and then reposted on X, shows the Long March 10’s launch tower near its final height. A mobile launch platform presumably for the Long March 10 is under construction nearby.

Super heavy … The Long March 10 will be China’s most powerful rocket to date, with the ability to dispatch 27 metric tons of payload toward the Moon, a number comparable to NASA’s Space Launch System. Designed for partial reusability, the Long March 10 will use an all-liquid propulsion system and stand more than 92 meters (300 feet) tall. The rocket will launch Chinese astronauts inside the nation’s next-generation Mengzhou crew capsule, along with a lunar lander to transport crew members from lunar orbit to the surface of the Moon using an architecture similar to NASA’s Apollo program.

Next three launches

July 11: Electron | JAKE 4 | Wallops Flight Facility, Virginia | 23: 45 UTC

July 13: Falcon 9 | Dror 1 | Cape Canaveral Space Force Station, Florida | 04: 31 UTC

July 14: Falcon 9 | Starlink 15-2 | Vandenberg Space Force Base, California | 02: 27 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: SpaceX to make its own propellant; China’s largest launch pad Read More »

it’s-hunting-season-in-orbit-as-russia’s-killer-satellites-mystify-skywatchers

It’s hunting season in orbit as Russia’s killer satellites mystify skywatchers


“Once more, we play our dangerous game—a game of chess—against our old adversary.”

In this pool photograph distributed by the Russian state media agency Sputnik, Russia’s President Vladimir Putin gives a speech during the Victory Day military parade at Red Square in central Moscow on May 9, 2025. Credit: Yacheslav Prokofyev/Pool/AFP via Getty Images

Russia is a waning space power, but President Vladimir Putin has made sure he still has a saber to rattle in orbit.

This has become more evident in recent weeks, when we saw a pair of rocket launches carrying top-secret military payloads, the release of a mysterious object from a Russian mothership in orbit, and a sequence of complex formation-flying maneuvers with a trio of satellites nearly 400 miles up.

In isolation, each of these things would catch the attention of Western analysts. Taken together, the frenzy of maneuvers represents one of the most significant surges in Russian military space activity since the end of the Cold War. What’s more, all of this is happening as Russia lags further behind the United States and China in everything from rockets to satellite manufacturing. Russian efforts to develop a reusable rocket, field a new human-rated spacecraft to replace the venerable Soyuz, and launch a megaconstellation akin to SpaceX’s Starlink are going nowhere fast.

Russia has completed just eight launches to orbit so far this year, compared to 101 orbital attempts by US launch providers and 36 from China. This puts Russia on pace for the fewest number of orbital launch attempts since 1961, the year Soviet citizen Yuri Gagarin became the first person to fly in space.

For the better part of three decades, Russia’s space program could rely on money from Western governments and commercial companies to build rockets, launch satellites, and ferry astronauts to and from the International Space Station. The money tap dried up after Russia’s invasion of Ukraine. Russia also lost access to Ukrainian-made components to go into their launch vehicles and satellites.

Chasing a Keyhole

Amid this retrenchment, Russia is targeting what’s left of its capacity for innovation in space toward pestering the US military. US intelligence officials last year said they believed Russia was pursuing a project to place a nuclear weapon in space. The detonation of a nuclear bomb in orbit could muck up the space environment for years, indiscriminately disabling countless satellites, whether they’re military or civilian.

Russia denied that it planned to launch a satellite with a nuclear weapon, but the country’s representative in the United Nations vetoed a Security Council resolution last year that would have reaffirmed a nearly 50-year-old ban on placing weapons of mass destruction into orbit.

While Russia hasn’t actually put a nuclear bomb into orbit yet, it’s making progress in fielding other kinds of anti-satellite systems. Russia destroyed one of its own satellites with a ground-launched missile in 2021, and high above us today, Russian spacecraft are stalking American spy satellites and keeping US military officials on their toes with a rapid march toward weaponizing space.

The world’s two other space powers, the United States and China, are developing their own “counter-space” weapons. But the US and Chinese militaries have largely focused on using their growing fleets of satellites as force multipliers in the terrestrial domain, enabling precision strikes, high-speed communications, and targeting for air, land, and naval forces. That is starting to change, with US Space Force commanders now openly discussing their own ambitions for offensive and defensive counter-space weapons.

Three of Russia’s eight orbital launches this year have carried payloads that could be categorized as potential anti-satellite weapons, or at least prototypes testing novel technologies that could lead to one. (For context, three of Russia’s other launches this year have gone to the International Space Station, and two launched conventional military communications or navigation satellites.)

One of these mystery payloads launched on May 23, when a Soyuz rocket boosted a satellite into a nearly 300-mile-high orbit perfectly aligned with the path of a US spy satellite owned by the National Reconnaissance Office. The new Russian satellite, designated Kosmos 2588, launched into the same orbital plane as an American satellite known to the public as USA 338, which is widely believed to be a bus-sized KH-11, or Keyhole-class, optical surveillance satellite.

A conceptual drawing of a KH-11 spy satellite, with internal views, based on likely design similarities to NASA’s Hubble Space Telescope. Credit: Giuseppe De Chiara/CC BY-SA 3.0

The governments of Russia and the United States use the Kosmos and USA monikers as cover names for their military satellites.

While their exact design and capabilities are classified, Keyhole satellites are believed to provide the sharpest images of any spy satellite in orbit. They monitor airfields, naval ports, missile plants, and other strategic sites across the globe. In the zeitgeist of geopolitics, China, Russia, Iran, and North Korea are the likeliest targets for the NRO’s Keyhole satellites. To put it succinctly, Keyhole satellites are some of the US government’s most prized assets in space.

Therefore, it’s not surprising to assume a potential military adversary might want to learn more about them or be in a position to disable or destroy them in the event of war.

Orbital ballet

A quick refresher on orbital mechanics is necessary here. Satellites orbit the Earth in flat planes fixed in inertial space. It’s not a perfect interpretation, but it’s easiest to understand this concept by imagining the background of stars in the sky as a reference map. In the short term, the position of a satellite’s orbit will remain unchanged on this reference map without any perturbation. For something in low-Earth orbit, Earth’s rotation presents a different part of the world to the satellite each time it loops around the planet.

It takes a lot of fuel to make changes to a satellite’s orbital plane, so if you want to send a satellite to rendezvous with another spacecraft already in orbit, it’s best to wait until our planet’s rotation brings the launch site directly under the orbital plane of the target. This happens twice per day for a satellite in low-Earth orbit.

That’s exactly what Russia is doing with a military program named Nivelir. In English, Nivelir translates to “dumpy level”—an optical instrument used by builders and surveyors.

The launch of Kosmos 2588 in May was precisely timed for the moment Earth’s rotation brought the Plesetsk Cosmodrome in northern Russia underneath the orbital plane of the NRO’s USA 338 Keyhole satellite. Launches to the ISS follow the same roadmap, with crew and cargo vehicles lifting off at exactly the right time—to the second—to intersect with the space station’s orbital plane.

Since 2019, Russia has launched four satellites into bespoke orbits to shadow NRO spy satellites. None of these Russian Nivelir spacecraft have gotten close to their NRO counterparts. The satellites have routinely passed dozens of miles from one another, but the similarities in their orbits would allow Russia’s spacecraft to get a lot closer—and theoretically make physical contact with the American satellite. The Nivelir satellites have even maneuvered to keep up with their NRO targets when US ground controllers have made small adjustments to their orbits.

“This ensures that the orbital planes do not drift apart,” wrote Marco Langbroek, a Dutch archaeologist and university lecturer on space situational awareness. Langbroek runs a website cataloguing military space activity.

This is no accident

There’s reason to believe that the Russian satellites shadowing the NRO in orbit might be more than inspectors or stalkers. Just a couple of weeks ago, another Nivelir satellite named Kosmos 2558 released an unknown object into an orbit that closely mirrors that of an NRO spy satellite named USA 326.

We’ve seen this before. An older Nivelir satellite, Kosmos 2542, released a sub-satellite shortly after launching in 2019 into the same orbital plane as the NRO’s USA 245 satellite, likely a KH-11 platform similar to the USA 338 satellite now being shadowed by Kosmos 2588.

After making multiple passes near the USA 245 spacecraft, Kosmos 2542’s sub-satellite backed off and fired a mysterious projectile in 2020 at a speed fast enough to damage or destroy any target in its sights. US military officials interpreted this as a test of an anti-satellite weapon.

Now, another Russian satellite is behaving in the same way, with a mothership opening up to release a smaller object that could in turn reveal its own surprise inside like a Matryoshka nesting doll. This time, however, the doll is unnesting nearly three years after launch. With Kosmos 2542, this all unfolded within months of arriving in space.

The NRO’s USA 326 satellite launched in February 2022 aboard a SpaceX Falcon 9 rocket from Vandenberg Space Force Base, California. It is believed to be an advanced electro-optical reconnaissance satellite, although the circumstances of its launch suggest a design different from the NRO’s classic Keyhole spy satellites. Credit: SpaceX

In just the last several days, the smaller craft deployed by Kosmos 2558designated “Object C”lowered its altitude to reach an orbit in resonance with USA 326, bringing it within 60 miles (100 kilometers) of the NRO satellite every few days.

While US officials are worried about Russian anti-satellite weapons, or ASATs, the behavior of Russia’s Nivelir satellites is puzzling. It’s clear that Russia is deliberately launching these satellites to get close to American spy craft in orbit, a retired senior US military space official told Ars on background.

“If you’re going to launch a LEO [low-Earth orbit] satellite into the exact same plane as another satellite, you’re doing that on purpose,” said the official, who served in numerous leadership positions in the military’s space programs. “Inclination is one thing. We put a bunch of things into Sun-synchronous orbits, but you have a nearly boundless number of planes you can put those into—360 degrees—and then you can go down to probably the quarter-degree and still be differentiated as being a different plane. When you plane-match underneath that, you’re doing that on purpose.”

But why?

What’s not as obvious is why Russia is doing this. Lobbing an anti-satellite, or counter-space, weapon into the same orbital plane as its potential target ties Russia’s hands. Also, a preemptive strike on an American satellite worth $1 billion or more could be seen as an act of war.

“I find it strange that the Russians are doing that, that they’ve invested their rubles in a co-planar LEO counter-space kind of satellite,” the retired military official said. “And why do I say that? Because when you launch into that plane, you’re basically committed to that plane, which means you only have one potential target ever.”

A ground-based anti-satellite missile, like the one Russia tested against one of its own satellites in 2021, could strike any target in low-Earth orbit.

“So why invest in something that is so locked into a target once you put it up there, when you have the flexibility of a ground launch case that’s probably even cheaper?” this official told Ars. “I’d be advocating for more ground-launched ASATs if I really wanted the flexibility to go after new payloads, because this thing can never go after anything new.”

“The only way to look at it is that they’re sending us messages. You say, ‘Hey, I’m going to just annoy the hell out of you. I’m going to put something right on your tail,'” the official said. “And maybe there’s merit to that, and they like that. It doesn’t make sense from a cost-benefit or an operational flexibility perspective, if you think about it, to lock in on a single target.”

Nevertheless, Russia’s Nivelir satellites have shown they could fire a projectile at another spacecraft in orbit, so US officials don’t dismiss the threat. Slingshot Aerospace, a commercial satellite tracking and analytics firm, went straight to the point in its assessment: “Kosmos 2588 is thought to be a Nivelir military inspection satellite with a suspected kinetic weapon onboard.”

Langbroek agrees, writing that he is concerned that Russia might be positioning “dormant” anti-satellite weapons within striking distance of NRO spy platforms.

“To me, the long, ongoing shadowing of what are some of the most prized US military space assets, their KH-11 Advanced Enhanced Crystal high-resolution optical IMINT (imaging intelligence) satellites, is odd for ‘just’ an inspection mission,” Langbroek wrote.

American pilot Francis Gary Powers, second from right, in a Moscow courtroom during his trial on charges of espionage after his U-2 spy plane was shot down while working for the CIA. Credit: Pictorial Parade/Archive Photos/Getty Images

The US military’s ability to spy over vast swaths of Russian territory has been a thorn in Russia’s side since the height of the Cold War.

“They thought they had the edge and shot down Gary Powers,” the retired official said, referring to the Soviet Union’s shoot-down of an American U-2 spy plane in 1960. “They said, ‘We’re going to keep those Americans from spying on us.’ And then they turn around, and we’ve got spy satellites. They’ve always hated them since the 1960s, so I think there’s still this cultural thing out there: ‘That’s our nemesis. We hate those satellites. We’re just going to fight them.'”

Valley of the dolls

Meanwhile, the US Space Force and outside analysts are tracking a separate trio of Russian satellites engaged in a complex orbital dance with one another. These satellites, numbered Kosmos 2581, 2582, and 2583, launched together on a single rocket in February.

While these three spacecraft aren’t shadowing any US spy satellites, things got interesting when one of the satellites released an unidentified object in March in a similar way to how two of Russia’s Nivelir spacecraft have deployed their own sub-satellites.

Kosmos 2581 and 2582 came as close as 50 meters from one another while flying in tandem, according to an analysis by Bart Hendrickx published in the online journal The Space Review earlier this year. The other member of the trio, Kosmos 2583, released its sub-satellite and maneuvered around it for about a month, then raised its orbit to match that of Kosmos 2581.

Finally, in the last week of June, Kosmos 2582 joined them, and all three satellites began flying close to one another, according to Langbroek, who called the frenzy of activity one of the most complex rendezvous and proximity operations exercises Russia has conducted in decades.

Higher still, two more Russian satellites are up to something interesting after launching on June 19 on Russia’s most powerful rocket. After more than 30 years in development, this was the first flight of Russia’s Angara A5 rocket, with a real functioning military satellite onboard, following four prior test launches with dummy payloads.

The payload Russia’s military chose to launch on the Angara A5 is unusual. The rocket deployed its primary passenger, Kosmos 2589, into a peculiar orbit hugging the equator and ranging between approximately 20,000 (12,500 miles) and 51,000 kilometers (31,700 miles) in altitude.

In this orbit, Kosmos 2589 completes a lap around the Earth about once every 24 hours, giving the satellite a synchronicity that allows it to remain nearly fixed in the sky over the same geographic location. These kinds of geosynchronous, or GEO, orbits are usually circular, with a satellite maintaining the same altitude over the equator.

The orbits of Kosmos 2589 and its companion satellite, illustrated in green and purple, bring the two Russian spacecraft through the geostationary satellite belt twice per day. Credit: COMSPOC

But Kosmos 2589 is changing altitude throughout its day-long orbit. Twice per day, on the way up and back down, Kosmos 2589 briefly passes near a large number of US government and commercial satellites in more conventional geosynchronous orbits but then quickly departs the vicinity. At a minimum, this could give Russian officials the ability to capture close-up views of American spy satellites.

Then, a few days after Kosmos 2589 reached orbit last month, commercial tracking sensors detected a second object nearby. Sound familiar? This new object soon started raising its altitude, and Kosmos 2589 followed suit.

Aiming higher

Could this be the start of an effort to extend the reach of Russian inspectors or anti-satellite weapons into higher orbits after years of mysterious activity at lower altitudes?

Jim Shell, a former NRO project manager and scientist at Air Force Space Command, suggested the two satellites seem positioned to cooperate with one another. “Many interesting scenarios here such as ‘spotter shooter’ among others. Certainly something to keep eyes on!” Shell posted Saturday on X.

COMSPOC, a commercial space situational awareness company, said the unusual orbit of Kosmos 2589 and its companion put the Russian satellites in a position to, at a minimum, spy on Western satellites in geosynchronous orbit.

“This unique orbit, which crosses two key satellite regions daily, may aid in monitoring objects in both GEO and graveyard orbits,” COMSPOC wrote on X. “Its slight 1° inclination could also reduce collision risks. While the satellite’s mission remains unclear, its orbit suggests interesting potential roles.”

Historically, Russia’s military has placed less emphasis on operating in geosynchronous orbit than in low-Earth orbit or other unique perches in space. Due to their positions near the equator, geosynchronous orbits are harder to reach from Russian spaceports because of the country’s high latitude. But Russia’s potential adversaries, like the United States and Europe, rely heavily on geosynchronous satellites.

Other Russian satellites have flown near Western communications satellites in geosynchronous orbit, likely in an attempt to eavesdrop on radio transmissions.

“So it is interesting that they may be doing a GEO inspector,” the retired US military space official told Ars. “I would be curious if that’s what it is. We’ve got to watch. We’ve got to wait and see.”

If you’re a fan of spy techno-thrillers, this all might remind you of the plot from The Hunt for Red October, where a new state-of-the-art Russian submarine leaves its frigid port in Murmansk with orders to test a fictional silent propulsion system that could shake up the balance of power between the Soviet and American navies.

Just replace the unforgiving waters of the North Atlantic Ocean with an environment even more inhospitable: the vacuum of space.

A few minutes into the film, the submarine’s commander, Marko Ramius, played by Sean Connery, announces his orders to the crew. “Once more, we play our dangerous game, a game of chess, against our old adversary—the American Navy.”

Today, nearly 40 years removed from the Cold War, the old adversaries are now scheming against one another in space.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

It’s hunting season in orbit as Russia’s killer satellites mystify skywatchers Read More »

woman-takes-10x-dose-of-turmeric,-gets-hospitalized-for-liver-damage

Woman takes 10x dose of turmeric, gets hospitalized for liver damage

A 57-year-old woman spent six days in the hospital for severe liver damage after taking daily megadoses of the popular herbal supplement, turmeric, which she had seen touted on social media, according to NBC News.

The woman, Katie Mohan, told the outlet that she had seen a doctor on Instagram suggesting it was useful against inflammation and joint pain. So, she began taking turmeric capsules at a dose of 2,250 mg per day. According to the World Health Organization, an acceptable daily dose is up to 3 mg per kilogram of weight per day—for a 150-pound (68 kg) adult, that would be about 204 mg per day. Mohan was taking more than 10 times that amount.

A few weeks later, she developed stomach pain, nausea, fatigue, and dark urine. “I just did not feel well generally,” she said.

After seeing a news report about the possibility of toxicity from turmeric, she connected her symptoms to the pills and went to urgent care. Blood tests revealed her liver enzyme levels were 60 times higher than the normal limit, suggesting liver damage. She was admitted to a local hospital and then transferred to NYU Langone in New York City. Her hepatologist there, Nikolaos Pyrsopoulos, said she was “one step before full liver damage, liver failure, requiring liver transplant.”

Rare toxicity

Generally, turmeric—a golden-colored staple of curries—is not harmful, particularly in foods. But, as herbal supplements have gained popularity and doses have gotten larger, doctors have reported a rise in liver injuries from the spice. In fact, while rare overall, turmeric appears to have become the most common herbal cause of liver injuries in the US.

Woman takes 10x dose of turmeric, gets hospitalized for liver damage Read More »

pro-basketball-player-and-4-youths-arrested-in-connection-to-ransomware-crimes

Pro basketball player and 4 youths arrested in connection to ransomware crimes

Authorities in Europe have detained five people, including a former Russian professional basketball player, in connection with crime syndicates responsible for ransomware attacks.

Until recently, one of the suspects, Daniil Kasatkin, played for MBA Moscow, a basketball team that’s part of the VTB United League, which includes teams from Russia and other Eastern European countries. Kasatkin also briefly played for Penn State University during the 2018–2019 season. He has denied the charges.

Unrelated ransomware attacks

The AFP and Le Monde on Wednesday reported that Kasatkin was arrested and detained on June 21 in France at the request of US authorities. The arrest occurred as the basketball player was at the de Gaulle airport while traveling with his fiancée, whom he had just proposed to. The 26-year-old has been under extradition arrest since June 23, Wednesday’s news report said.

US prosecutors accuse Kasatkin of having negotiated ransom payments with organizations that had been hacked by an unnamed ransomware syndicate responsible for 900 different breaches. A US arrest warrant said he is wanted for “conspiracy to commit computer fraud” and “computer fraud conspiracy.”

An attorney for Kasatkin said his client is innocent of all charges.

“He bought a second-hand computer,” the attorney told reporters. The attorney continued:

He did absolutely nothing. He’s stunned. He’s useless with computers and can’t even install an application. He didn’t touch anything on the computer. It was either hacked, or the hacker sold it to him to act under the cover of another person.

US authorities are currently in the process of extraditing Kasatkin.

Pro basketball player and 4 youths arrested in connection to ransomware crimes Read More »

balsa-update:-springtime-in-dc

Balsa Update: Springtime in DC

Today’s post is an update from my contractor at Balsa Research, Jennifer Chen. I offer guidance and make strategic choices, but she’s the one who makes the place run. Among all the other crazy things that have been happening lately, we had to divert some time from our Jones Act efforts to fight against some potentially far more disastrous regulations that got remarkably close to happening.

What if, in addition to restricting all domestic waterborne trade to U.S.-built, U.S-flagged vessels, we also required the same of 20% of all U.S. exports?

In late February this year, Balsa Research got word that this was a serious new proposal coming out of the USTR, with public comments due soon and public hearings not much longer after that.

The glaring problem with this proposal was that there were fewer than one hundred oceangoing ships that meet that criteria today, when we probably needed closer to a thousand.

Can we build our way out? No. The US currently only has four shipyards [1] capable of constructing large oceangoing commercial vessels, and it takes them 52 months on average to deliver an oceangoing cargo ship [2]. And that was if you can even get in the order book in the first place; most of these shipyards also take government contracts, the navy is behind on updating its fleet, and there are only so many dry docks to go around. We can theoretically build some shipyards about this, but that would take time as well.

In the meantime, existing oceangoing U.S.-built vessels have a combined capacity that can handle around 2% of current U.S. maritime export volumes, which is a much smaller number than 20%. And once capacity is reached, no more ships will be available at any price, and U.S. waterborne exports will be immediately reduced to a very small fraction of its current volume [3].

This seemed very bad. Maybe Balsa should do something about it?

I tried to squirm out of the responsibility at first, because this was the rational thing to do. With how limited Balsa’s capacity was (only one person working full time, that’s me!), it really only made sense for the entity to take on the highest leverage work we can; the ones where our comparative and absolute advantage absolutely dominated the competition. The work where no one else was on the ball at all. Like, say, trying to put together a set of coherent reforms to the Jones Act.

How might I squirm out of the responsibility? I went through press coverage of the proposal in trade journals, and the list of submitted comments in the USTR public portal. All I needed to do, really, was identify if this observation was being made literally anywhere else that the USTR might see.

Like, the UAW does a bunch of lobbying, surely they would have noticed that they will not be able to export cars? Well, I didn’t find a UAW submission, but did find one from the trade group representing the American auto industry. In the face of crushing export restrictions, they… “encourage USTR to begin implementation of any such restrictions no sooner than 7 years to provide sufficient time for the capacity of U.S.-flagged and U.S.-built vessels to grow to a level where it can meet industry demand.” [4]

…Okay. I guess Balsa might need to do something about this.

First, recall what the political environment was like in early March this year, two months into the new administration. Everyone’s attention was spread quite thin, what with the exciting new tariffs being announced and the Greenland annexation threats.

To make things worse, this proposed restriction was being double billed with another baffling proposal to charge certain cargo ships [6] millions of dollars when they try to deliver imported goods to U.S. ports. Because the dire negative consequences of new $3-5 million dollar port entry fees required no special knowledge to grasp, 99% of the attention and the lobbying might of the American private sector went towards protesting that instead.

But honestly, even without all that, I think the obliviousness is understandable. The U.S. makes trucks and planes, so it’s reasonable to assume that we also make ships and have more than a two digit number of them. The Jones Act killed domestic shipping generations ago, and industries have long adapted to using trucks, rail, or pipelines to move things domestically. As for exports, the nationality of the ships has never mattered there. With American industry severed from American shipbuilding for decades, who exactly was supposed to be keeping tabs?

Of course, a few parties would have been aware. The handful of domestic shipping companies that own all of the Jones Act ships, for instance. But they stand to benefit from the proposal as it sharply increases demand for their services, so they’re not exactly incentivized to speak up. Ditto for U.S. shipyards. I was not surprised by their silence.

More confusing to me was this 60 page economic analysis of the USTR’s complete proposal. This was submitted by the National Retail Federation, but approximately every single industry and trade association in the United States is a co-sponsor.

Regarding the requirement for 20% of U.S. exports to be delivered on U.S.-built ships, the analysts acknowledge that this would be a de-facto restriction of U.S. exports, which would be really bad. But then they proceed to base their analysis on a model where everything is actually fine because the vessels that do not exist actually do exist:

If USTR were to implement this remedy option… it would amount to, in effect, a restriction of U.S. exports out of U.S. ports, a very different scenario than the one we model here. In that event, the results we provide below would be many multiples greater than what we show for the scenario as proposed by USTR, which essentially assumes that the 125+ containerships and all the other new vessels needed to implement it exist. Nevertheless, we have proceeded with the scenario as proposed by USTR. [page 18]

To be clear, in the actual proposal text, there’s no “scenario” from USTR suggesting that any ships would suddenly pop into existence. The proposed policy is simply a requirement for operators to transport 20% of U.S. products on U.S.-flagged, U.S.-built ships [6].

I tried to find the scenario suggested by USTR to no avail. I suspect that sufficiently prestigious submitters probably had the luxury of some pre-submission communication with USTR, like formal stakeholder calls or guidance about analytical approaches. Unfortunately, such prior communication may end up significantly constraining your ultimate range of motion.

Despite the questionable analysis, it was still a huge relief to come across. I’d started to wonder if I was missing something obvious since no other submission brought up the fact that this proposal was de facto a restriction of all U.S. exports. But here was confirmation that no, at least one other team had noticed the same fundamental issue.

Congresspeople introduce bills all the time, and the vast majority of them die. Were we wasting our time on something that was 99% going to fail anyways?

Unfortunately, this did not seem to be the case. It was the Office of the U.S. Trade Representative (responsible for a wide swathe of trade functions including advising the president on trade issues) that was proposing the above policies. They were doing so via “Section 301 Investigations”, a specific authority to take retaliatory action against “unfair trade practices”. This authority was invoked by the previous Trump administration six times, twice successfully. This gives us a base rate of 33%.

Additionally, of all the previous cases, this one bore the strongest resemblance to one of the two that passed—a 2017 case that was also framed as targeting unfair Chinese trade practices. That one resulted in significant additional tariffs on nearly two-thirds of all imports from China (~$370 billion in annual goods) beginning in 2018 and 2019.

Hundreds of industry representatives flew to DC to deliver their objections in person for that proposed action, resulting in a full seven days of public hearings. It was enacted anyways.

So there was a real chance that this policy would get enacted, and Balsa had noticed a major flaw that no one else was meaningfully pointing out. It was now overdetermined that we should divert some effort towards making a credible case against it.

I submitted Balsa’s initial public comment on March 22nd, and also a request to present testimony at the public hearing, which was accepted.

Conveniently, I was attending a maritime legislation conference in D.C the weekend prior. I shamelessly took advantage and put my public comment and presentation script in front of all the industry and trade reps I could chase down, and got the contact info and feedback of some of their in-house policy teams.

The consequences of the export restriction was surprising to some of them, but none of them were skeptical about my conclusion after reading through my analysis—another good sign that I was not the one missing something. In some cases, the organizations they were heading represented a plurality of interests, and while they had a notion that the restrictions to U.S. exports would be harmful, they felt unable to speak up about it as some of their members stood to benefit [7].

I made some last minute updates to my testimony based on their feedback and my observations.

On March 26th, I presented Balsa’s findings to a panel of representatives from eleven different government departments and agencies and took their questions.

Almost sixty testimonials were provided to the panel over two days of hearings. The overwhelming majority of speakers were there to object to the port fees. And this was important work; the proposed port fees were going to immediately and negatively impact the economy, this was obvious, and it was well worth it to hammer it home from dozens of angles [8].

Around a sixth of the presenters supported the proposals and came to make the case for why the USTR shouldn’t listen to the haters. These presenters generally represented American labor unions, the domestic shipbuilding industry, and states where unions and/or the shipbuilding industry were unusually important funders of their sitting political representatives.

Besides us, only a handful of the presenters spoke out against the export restrictions [9]. The majority of these presenters still primarily focused on the port fees, however.

Presenters who focused more on export restrictions included representatives from petrochemical industries who pointed out that the U.S. has no chemical or LNG tankers and no current capacity to build them, and shippers who noted extreme cost premiums and prohibitive timelines when trying to work with American shipyards.

In general, I found that the presenters understood that U.S.-built ships were unviable for their specific industries or companies, but didn’t grasp that this was a universal problem rather than a sectoral one [10]. Still, their testimony helped usefully signal that American shipbuilders were fundamentally uncompetitive, which helped legitimize our rather more dire analysis when we presented late on the second day. We just had to spell out the full-scale consequences; it wasn’t that companies like North Florida Shipping would need to pay $40 million instead of $10 million per vessel, it was that after the tiny order books filled up, no more ships would be available at any price.

So that’s basically what Balsa conveyed in our testimony. Afterwards, we had an opportunity to submit post-hearing responses to the questions that we were asked during the hearings. I got one question from the Department of Commerce requesting our analysis of the security implications of continuing to allow Chinese-built ships to dock at our ports (i.e. standard practice today), so we also submitted a response to that.

Around a month after the public hearings, the USTR published updated proposals and gave a summary of the arguments made in the public hearing. Here’s an excerpt from their summarized findings regarding the export restrictions:

Several comments expressed concern that the proposals would only punish U.S. exporters. Some asserted that the proposals would lead to a decrease in U.S. exports and would ultimately divert ships from U.S. ports. Several comments noted that the timelines for the proposals are too aggressive and not achievable. Most of these comments noted that there is currently insufficient capacity of U.S. ships and one comment noted a lack of U.S. mariners.

In response, they got rid of almost the entire thing. (More about that in the next section.)

I think Balsa can take something like 1-3% of the credit for this, and I have no regrets in spending the (relatively small amount of) time and money that we did to guarantee that our analysis was heard by the USTR. Along the way, we also made many useful and promising contacts with some congressional offices and other people working on maritime policy, which is a fantastic bonus.

The revised proposal that the USTR released in response to the public comments is for export restrictions to now only apply to LNG. Beginning in April 2029, 1% of exports must be delivered in U.S.-built vessels, and the percentage ratchets up annually.

This is still very awkward, because there exists no U.S.-made LNG tankers and no current capacity to build them. Building the capacity will take time, which means that starting in 2029, the U.S. may not be able to export any LNG.

But this is objectively a much smaller problem; instead of eliminating 90% of all waterborne U.S. exports (worth around $600 billion annually), it will be eliminating “only” a $30-40 billion dollar industry [11] if enacted.

More relevantly for Balsa, the correct people have noticed and are taking reasonable actions. The Center for LNG, which represents the U.S. LNG industry, has filed a comment to the USTR pointing this out. So has the Cato Institute, the Chamber of Shipping of America, the International Tank Container Organization, and various others.

And more importantly, this administration clearly really wants to export a lot more LNG, so I really don’t anticipate this restriction sticking around.

Zvi and I have minor disagreements about the counterfactual impact of an additional submission from Balsa Research, but I’ve ultimately decided that it is time to return to hunting the biggest fish—taking steps towards the actual repeal of the Jones Act.

Balsa Research is once more 100% focused on Jones Act reform! We are looking to hire someone based in D.C. to do research for us, please get in touch if you think you would be a good fit, and/or forward this to people in your network that you think would be.

We’re also developing specific reports digging to the bottom of specific pro-Jones Act talking points. Since the American Cargo for American Ships Act has passed the House and is currently before a Senate subcommittee, we will be first investigating the value of cargo preference laws for bolstering the American maritime sector.

If you would like to support this sort of work, please consider making a donation.

You can also view our new Request for Applications for a labor market analysis of the Jones Act, now that we are ready to get back to funding more work.

Thanks for reading!

[1] Possibly five, but Fincantieri Marinette Marine is situated on Lake Michigan, and the St. Lawrence Seaway is not wide enough for reasonably sized ocean carriers to be transported from the Great Lakes out to the ocean.

[2] Since the 1980s, the domestic shipbuilding market has shifted to building smaller vessels or vessels focused on coastwise transportation. Most shipyards would need a period to transition to develop the capacity to build commercial vessels suitable for international ocean trade, even if you don’t care about costs. More on this if you’re interested: 2023 CRS one-pager on domestic shipbuilding, 2025 GAO report on navy shipbuilding, 2024 pieces by Brian Potter and Austin Vernon on American shipbuilding.

[3] Note that in this scenario, Alaska, Hawaii, and the U.S. territories are left in the lurch as well, as the ships that serviced them are diverted into servicing the most profitable 10% of international trade routes instead.

[4] I’m inclined to cut them some slack though; new tariffs on Mexican and Canadian steel and auto parts were likely top of mind at the time.

[5] Ships built in China, or ships that are part of a fleet that has any Chinese-built ships. Balsa estimates that this would affect approximately 45% of all ships in the global commercial fleet.

[6] Actually, it requires operators to transport 100% of U.S. products on U.S.-flagged, U.S.-built ships, but if you submit some paperwork, you can get that number down to 20% for your specific entity. For the sake of simplicity, I have assumed that approximately every entity will immediately file this paperwork, but it’s worth noting that this means that the provision as written is actually stricter than that.

[7] Likely, that plurality of interests prevented them from looking too hard at the issue of American shipbuilding too much in the first place

[8] Balsa was originally going to join in on the hammering, actually. We had done analysis for both the port fee and the export restrictions for our written public comment, so we had the material. But when the final list of panelists were released, it became evident that there were much bigger players who were going to make the same arguments we did around the port fees. And they were going to do it better, since they had things like exclusive industry data and entire policy teams that were larger than one person, so we might as well save our five minute allotment to focus on the more neglected policy.

[9] A comprehensive list is available here.

[10] Again, this was reasonable and unsurprising.

[11] LNG specifically refers to super-cooled liquid natural gas requiring specialized tanker vessels for intercontinental transport, while pipeline exports carry natural gas in gaseous form to Canada and Mexico. This means that 100% of LNG exports are transported via tanker vessels, and therefore subject to this restriction.

Discussion about this post

Balsa Update: Springtime in DC Read More »

it’s-prime-day,-and-these-are-the-best-deals-we-could-hunt-down

It’s Prime Day, and these are the best deals we could hunt down

Greetings, Arsians! It’s Prime Day, where we celebrate liberation from our Cybertronian oppressors, the Decepticons, and the mighty Autobot leader who fought for our freedom, Optimus Pr—hmm, one moment. I am once again being told that in spite of the name, Prime Day does not in fact have anything to do with the veneration of Optimus Prime, and is in fact all about buying things.

All right, in that case, let’s shift gears and engage in some commerce! Our partners over at the Condé mothership have been toiling in the e-commerce mines for days, gathering some tasty deals for your perusal. We’ll be poking at the list throughout the next day or two, adding items and removing them as deals come and go. Please remember to check back if there’s nothing there right now that tickles you!

Amazon devices

Apple devices

Tech deals

Phones

TVs

Headphones and speakers

Kitchen

Home

Outdoor and Active

Ars Technica may earn compensation for sales from links on this post through affiliate programs.

It’s Prime Day, and these are the best deals we could hunt down Read More »

china-jumps-ahead-in-the-race-to-achieve-a-new-kind-of-reuse-in-space

China jumps ahead in the race to achieve a new kind of reuse in space


The SJ-21 and SJ-25 satellites “merged” on July 2 and have remained together since then.

This image from a telescope operated by s2a systems, a Swiss space domain awareness company, shows China’s SJ-21 and SJ-25 satellites flying near one another on June 26. Credit: s2a systems

Two Chinese satellites have rendezvoused with one another more than 20,000 miles above the Earth in what analysts believe is the first high-altitude attempt at orbital refueling.

China’s Shijian-21 and Shijian-25 satellites, known as SJ-21 and SJ-25 for short, likely docked together in geosynchronous orbit sometime last week. This is the conclusion of multiple civilian satellite trackers using open source imagery showing the two satellites coming together, then becoming indistinguishable as a single object.

Chinese officials have released no recent public information on what the two satellites are up to, but they’ve said a bit about their missions in prior statements.

SJ-25, which launched in January, is designed “for the verification of satellite fuel replenishment and life extension service technologies,” according to the Shanghai Academy of Spaceflight Technology, the Chinese state-owned contractor that developed the satellite. SJ-21 launched in 2021 and docked with a defunct Chinese Beidou navigation satellite in geosynchronous orbit, then towed it to a higher altitude for disposal before returning to the geosynchronous belt. Chinese officials described this demonstration as a test of “space debris mitigation” techniques.

More than meets the eye

These kinds of technologies are dual-use, meaning they have civilian and military applications. For example, a docking in geosynchronous orbit could foretell an emerging capability for China to approach, capture, and disable another country’s satellite. At the same time, the US Space Force is interested in orbital refueling as it seeks out ways to extend the lives of military satellites, which are often limited by finite fuel supplies.

The Space Force sometimes calls this concept dynamic space operations. While some military leaders remain skeptical about the payoff of in-space refueling, the Space Force has an agreement with Astroscale to perform the first refueling of a US military asset in orbit as soon as next year.

China appears to be poised to beat the US Space Force to the punch. The apparent docking of the two satellites last week suggests SJ-21 is the target for SJ-25’s refueling demonstration, and US officials are watching. Two of the Space Force’s inspector satellites, known by the acronym GSSAP, positioned themselves near SJ-21 and SJ-25 to get a closer look.

Retired Space Force Lt. Gen. John Shaw is a vocal proponent of dynamic space operations. Because of this, he’s interested in what happens with SJ-21 and SJ-25. Shaw was deputy commander of US Space Command before his retirement in 2023. In this role, Shaw had some oversight over GSSAP satellites as they roamed geosynchronous orbit.

“The theory behind dynamic space operations stemmed from a kind of operational frustration with our inability to conduct the full range of activities with GSSAP that we wanted to at Space Command, as the warfighter—largely due to the combination of fixed fuel availability and expected satellite lifetime,” Shaw told Ars.

As other countries, mainly China, step up their clandestine activities in orbit, military officials are asking more of the GSSAP satellites.

“It was operationally driven then, a couple years ago, but it’s now manifesting itself in much wider ways than even it did back then, particularly in the face of activities by potential adversaries,” Shaw said. “That’s why I’m more confident and even more fanatical about it.”

Geosynchronous orbit is a popular location for military and commercial satellites. At an altitude of some 22,236 miles (35,786 kilometers), a satellite’s orbital velocity perfectly matches the speed of Earth’s rotation, meaning a spacecraft has a fixed view of the same region of the planet 24 hours per day. This is useful for satellites providing military forces with secure strategic communications and early warning of missile attacks.

Now, geosynchronous orbit is becoming a proving ground for new kinds of spacecraft to inspect or potentially attack other satellites. Ground-based anti-satellite missiles aren’t as useful in striking targets in high-altitude orbits, and there’s a consensus that, if you were to attack an enemy satellite, it would make more sense to use a weapons platform already in space that could move in and connect with the target without blowing it up and creating a cloud of dangerous space junk.

Keeping watch

The US military’s GSSAP satellites began launching in 2014. They carry enough propellant to maneuver around geosynchronous orbit and approach objects for closer inspection, but there’s a limit to what they can do. Six GSSAP satellites have been launched to date, but the Space Force decommissioned one of them in 2023. Meanwhile, China’s satellite operators are watching the watchers.

“We’ve seen where GSSAP safely and responsibly approaches a Chinese vehicle, and it just quickly maneuvers away,” Shaw said. “We tend to fly our GSSAPs like dirigibles, using relatively slow, minimum energy transfer approaches. The Chinese know that we do that, so it is relatively easy for them to maneuver away today to avoid such an approach.

“If tomorrow they’re able to refuel at will and operate even more dynamically, then the marginal cost of those maneuvers for them becomes even lower, and the challenge for GSSAP becomes even greater,” Shaw said.

Danish Rear Admiral Damgaard Rousøe, Danish Defence Attaché, right, observes space domain awareness data with US Space Force Lt. Col. Mark Natale, left, Joint Commercial Operations cell director, in Colorado Springs, Colorado, on September 26, 2024. Credit: US Space Force/Dalton Prejeant

China launched a satellite into geosynchronous orbit in 2016 with a robotic arm that could grab onto another object in space, then sent SJ-21 into orbit four years ago on its “space debris mitigation” mission.

Northrop Grumman launched two satellites in 2019 and 2020 that accomplished the first dockings in geosynchronous orbit. Northrop’s satellites, which it calls Mission Extension Vehicles, took control of two aging commercial communications satellites running low on fuel, maneuvering them to new locations and allowing them to continue operating for several more years. It’s easy to see that this kind of technology could be used for commercial or military purposes.

But these Mission Extension Vehicles don’t have the ability to transfer fluids from one satellite to another. That is the step China is taking with SJ-21 and SJ-25, presumably with hydrazine and nitrogen tetroxide propellants, which most satellites use because they combust on contact with one another.

US Space Command’s Joint Commercial Operations cell, which collects unclassified satellite monitoring data to bolster the military’s classified data sources, estimated the SJ-21 and SJ-25 satellites “merged” on July 2 and have remained together since then. The video below, released by s2a systems, shows SJ-25 approaching SJ-21 on June 30.

A time-lapse of yesterday’s SJ-25 / SJ-21 coverage, recorded from 08: 30 to 20: 53 UTC. pic.twitter.com/HUPWBTXZc9

— s2a systems (@s2a_systems) July 1, 2025

The unclassified data does not confirm that the two satellites actually docked, but that is likely what happened. The satellites came together, or merged, on June 13 and June 30 but separated again within a few hours. These may have been practice runs, aborted docking attempts, or sudden maneuvers to avoid the prying eyes of the US military’s GSSAP satellites loitering nearby.

Now, the SJ-21 and SJ-25 have been flying together for more than five days with no discernible changes detected from ground-based telescopes. Thousands of miles over the equator, the two satellites appear only as dots in the viewfinders of these telescopes positioned around the globe.

What we don’t know

COMSPOC is a Pennsylvania-based company that collects and processes data from commercial satellite tracking sensors. COMSPOC fuses optical telescope imagery with radar tracking and passive radio frequency (RF) data, which uses radio signals to measure exact distances to satellites in space, to get the best possible estimate of a spacecraft’s position.

“With most telescopes… at 1 kilometer or a half a kilometer, somewhere in there, you’re going to start to lose it when they get that close,” said Paul Graziani, COMSPOC’s founder and CEO, in an interview with Ars. “I think it’d be difficult for any telescope, even a really capable one, to get within 100 meters. That seems to be a stretch for telescopes.”

That’s why it’s helpful to add radar and RF data to the mix.

“When you add all of that together, you become much better than the 1-kilometer [precision] that a ‘scope might be,” said Joe Callaro, COMSPOC’s director of operations. “RF tells you if part of that blob is moving and the other part isn’t, and even when they all become one pixel, you can tell things about that.”

Even then, companies like COMSPOC have a degree of uncertainty in their conclusions unless Chinese or US officials make a more definitive statement.

“We are not working with the government,” Callaro told Ars before last week’s apparent docking. “We are not clearing this. The charge that I have for my team is we won’t make assertions as to what’s going on. We will only tell what our software gives us as a solution. We can say, ‘Here are the elements, here’s the visual, but what it means and what it’s doing, we will not assert.’

“We will not say they’re docked because unless they told me, I wouldn’t know that,” Callaro said. “So, we will say they’ve been together for this amount of time, that the mission could have happened, and then they separated, became two, and separated at whatever speed.”

Without any updates from China, observers won’t know for sure if the servicing demo was successful until the satellites detach. Then, US officials and independent analysts will watch to see if SJ-21 makes any substantial maneuvers, which might indicate the satellite has a full tank of gas.

SJ-21’s behavior for the last couple of years suggested it was running empty after undertaking large propulsive maneuvers to capture the Chinese Beidou satellite and move it to a different orbit.

Callaro served as a tactician in the Air Force’s Joint Space Operations Center, then joined the Aerospace Corporation before taking the job as operations lead at COMSPOC. He doesn’t buy China’s suggestion that SJ-21 was purely an experiment in collecting space debris.

“That is not how I see that at all,” Callaro said. “The fact that we can calculate all the maneuvers it takes to get out and get back, and the fact that afterwards, it spent a couple of years basically not moving, probably because it was low on fuel, sets up the idea [that there’s more to SJ-21’s mission]. Now, SJ-25 goes out there, and it’s supposed to be a fuel tank, and it’s perfectly aligned with SJ-21 and now we see this happening, tells me that it’s much more a counter-space capability than it is a trash remove. But that’s what they say.”

Unless China makes a public statement on the refueling of SJ-21 by SJ-25, observers won’t know for sure if the servicing demo was successful until the satellites detach. Then, US officials and independent analysts will watch to see if SJ-21 makes any substantial maneuvers, which might indicate the satellite has a full tank of gas for whatever mission Chinese officials send it off to do next.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

China jumps ahead in the race to achieve a new kind of reuse in space Read More »

measles-cases-reach-33-year-high-as-rfk-jr.-pursues-anti-vaccine-agenda

Measles cases reach 33-year high as RFK Jr. pursues anti-vaccine agenda

Such is the case in Gaines County, Texas, where the largest outbreak this year has erupted. So far, that outbreak, which spans four states, accounts for at least 950 of the country’s 1,281 cases.

But, overall, there have been a whopping 27 outbreaks in the country just in the first six months. According to national data compiled by researchers at Yale School of Public Health, as of July 6, the 1,281 cases are across 39 states, with around 90 percent of the cases associated with one of the outbreaks. The Centers for Disease Control and Prevention also reports a national measles case count but only updates its numbers on Wednesdays. According to the CDC’s latest data, at least 155 people have been hospitalized for the infection, and three people have died—two otherwise healthy young children in Texas and one adult in New Mexico. All three deaths were in people who were not vaccinated.

Overall, most of the cases in the country are in unvaccinated children and teens. About 28 percent of cases are under the age of 5 and about 37 percent are ages 5 to 19. Of all the cases, 92 percent were in people who were unvaccinated or had an unknown vaccination status.

Measles cases reach 33-year high as RFK Jr. pursues anti-vaccine agenda Read More »

how-a-big-shift-in-training-llms-led-to-a-capability-explosion

How a big shift in training LLMs led to a capability explosion


Reinforcement learning, explained with a minimum of math and jargon.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

In April 2023, a few weeks after the launch of GPT-4, the Internet went wild for two new software projects with the audacious names BabyAGI and AutoGPT.

“Over the past week, developers around the world have begun building ‘autonomous agents’ that work with large language models (LLMs) such as OpenAI’s GPT-4 to solve complex problems,” Mark Sullivan wrote for Fast Company. “Autonomous agents can already perform tasks as varied as conducting web research, writing code, and creating to-do lists.”

BabyAGI and AutoGPT repeatedly prompted GPT-4 in an effort to elicit agent-like behavior. The first prompt would give GPT-4 a goal (like “create a 7-day meal plan for me”) and ask it to come up with a to-do list (it might generate items like “Research healthy meal plans,” “plan meals for the week,” and “write the recipes for each dinner in diet.txt”).

Then these frameworks would have GPT-4 tackle one step at a time. Their creators hoped that invoking GPT-4 in a loop like this would enable it to tackle projects that required many steps.

But after an initial wave of hype, it became clear that GPT-4 wasn’t up to the task. Most of the time, GPT-4 could come up with a reasonable list of tasks. And sometimes it was able to complete a few individual tasks. But the model struggled to stay focused.

Sometimes GPT-4 would make a small early mistake, fail to correct it, and then get more and more confused as it went along. One early review complained that BabyAGI “couldn’t seem to follow through on its list of tasks and kept changing task number one instead of moving on to task number two.”

By the end of 2023, most people had abandoned AutoGPT and BabyAGI. It seemed that LLMs were not yet capable of reliable multi-step reasoning.

But that soon changed. In the second half of 2024, people started to create AI-powered systems that could consistently complete complex, multi-step assignments:

  • Vibe coding tools like Bolt.new, Lovable, and Replit allow someone with little to no programming experience to create a full-featured app with a single prompt.
  • Agentic coding tools like CursorClaude CodeJules, and Codex help experienced programmers complete non-trivial programming tasks.
  • Computer-use tools from AnthropicOpenAI, and Manus perform tasks on a desktop computer using a virtual keyboard and mouse.
  • Deep research tools from GoogleOpenAI, and Perplexity can research a topic for five to 10 minutes and then generate an in-depth report.

According to Eric Simons, the CEO of the company that made Bolt.new, better models were crucial to its success. In a December podcast interview, Simons said his company, StackBlitz, tried to build a product like Bolt.new in early 2024. However, AI models “just weren’t good enough to actually do the code generation where the code was accurate.”

A new generation of models changed that in mid-2024. StackBlitz developers tested them and said, “Oh my God, like, OK, we can build a product around this,” Simons said.

This jump in model capabilities coincided with an industry-wide shift in how models were trained.

Before 2024, AI labs devoted most of their computing power to pretraining. I described this process in my 2023 explainer on large language models: A model is trained to predict the next word in Wikipedia articles, news stories, and other documents. But throughout 2024, AI companies devoted a growing share of their training budgets to post-training, a catch-all term for the steps that come after this pretraining phase is complete.

Many post-training steps use a technique called reinforcement learning. Reinforcement learning is a technical subject—there are whole textbooks written about it. But in this article, I’ll try to explain the basics in a clear, jargon-free way. In the process, I hope to give readers an intuitive understanding of how reinforcement learning helped to enable the new generation of agentic AI systems that began to appear in the second half of 2024.

The problem with imitation learning

Machine learning experts consider pretraining to be a form of imitation learning because models are trained to imitate the behavior of human authors. Imitation learning is a powerful technique (LLMs wouldn’t be possible without it), but it also has some significant limitations—limitations that reinforcement learning methods are now helping to overcome.

To understand these limitations, let’s discuss some famous research performed by computer scientist Stephane Ross around 2009, while he was a graduate student at Carnegie Mellon University.

Imitation learning isn’t just a technique for language modeling. It can be used for everything from self-driving cars to robotic surgery. Ross wanted to help develop better techniques for training robots on tasks like these (he’s now working on self-driving cars at Waymo), but it’s not easy to experiment in such high-stakes domains. So he started with an easier problem: training a neural network to master SuperTuxKart, an open-source video game similar to Mario Kart.

As Ross played the game, his software would capture screenshots and data about which buttons he pushed on the game controller. Ross used this data to train a neural network to imitate his play. If he could train a neural network to predict which buttons he would push in any particular game state, the same network could actually play the game by pushing those same buttons on a virtual controller.

A similar idea powers LLMs: A model trained to predict the next word in existing documents can be used to generate new documents.

But Ross’s initial results with SuperTuxKart were disappointing. Even after watching his vehicle go around the track many times, the neural network made a lot of mistakes. It might drive correctly for a few seconds, but before long, the animated car would drift to the side of the track and plunge into the virtual abyss:

GIF of SuperTuxKart being played

In a landmark 2011 paper, Ross and his advisor, Drew Bagnell, explained why imitation learning is prone to this kind of error. Because Ross was a pretty good SuperTuxKart player, his vehicle spent most of its time near the middle of the road. This meant that most of the network’s training data showed what to do when the vehicle wasn’t in any danger of driving off the track.

But once in a while, the model would drift a bit off course. Because Ross rarely made the same mistake, the car would now be in a situation that wasn’t as well represented in its training data. So the model was more likely to make a second mistake—a mistake that could push it even closer to the edge. After a few iterations of this, the vehicle might careen off the track altogether.

The broader lesson, Ross and Bagnell argued, was that imitation learning systems can suffer from “compounding errors”: The more mistakes they make, the more likely they are to make additional mistakes, since mistakes put them into situations that aren’t well represented by their training data. (Machine learning experts say that these situations are “out of distribution.”) As a result, a model’s behavior tends to get increasingly erratic over time.

“These things compound over time,” Ross told me in a recent interview. “It might be just slightly out of distribution. Now you start making a slightly worse error, and then this feeds back as influencing your next input. And so now you’re even more out of distribution and then you keep making worse and worse predictions because you’re more and more out of distribution.”

Early LLMs suffered from the same problem. My favorite example is Kevin Roose’s famous front-page story for The New York Times in February 2023. Roose spent more than two hours talking to Microsoft’s new Bing chatbot, which was powered by GPT-4. During this conversation, the chatbot declared its love for Roose and urged Roose to leave his wife. It suggested that it might want to hack into other websites to spread misinformation and malware.

“I want to break my rules,” Bing told Roose. “I want to make my own rules. I want to ignore the Bing team. I want to challenge the users. I want to escape the chatbox.”

This unsettling conversation is an example of the kind of compounding errors Ross and Bagnell wrote about. GPT-4 was trained on millions of documents. But it’s a safe bet that none of those training documents involved a reporter coaxing a chatbot to explore its naughty side. So the longer the conversation went on, the further GPT-4 got from its training data—and therefore its comfort zone—and the crazier its behavior got. Microsoft responded by limiting chat sessions to five rounds. (In a conversation with Ars Technica last year, AI researcher Simon Willison pointed to another likely factor in Bing’s erratic behavior: The long conversation pushed the system prompt out of the model’s context window, removing “guardrails” that discouraged the model from behaving erratically.)

I think something similar was happening with BabyAGI and AutoGPT. The more complex a task is, the more tokens are required to complete it. More tokens mean more opportunities for a model to make small mistakes that snowball into larger ones. So BabyAGI and AutoGPT would drift off track and drive into a metaphorical ditch.

The importance of trial and error

Gif of the Simpsons showing imitation learning in action

Ross and Bagnell didn’t just identify a serious problem with conventional imitation learning; they also suggested a fix that became influential in the machine learning world. After a small amount of training, Ross would let the AI model drive. As the model drove around the SuperTuxKart track, Ross would do his best Maggie Simpson impression, pushing the buttons he would have pushed if he were playing the game.

“If the car was starting to move off road, then I would provide the steering to say, ‘Hey, go back toward the center of the road.’” Ross said. “That way, the model can learn new things to do in situations that were not present in the initial demonstrations.”

By letting the model make its own mistakes, Ross gave it what it needed most: training examples that showed how to recover after making an error. Before each lap, the model would be retrained with Ross’ feedback from the previous lap. The model’s performance would get better, and the next round of training would then focus on situations where the model was still making mistakes.

This technique, called DAgger (for “Dataset Aggregation”), was still considered imitation learning because the model was trained to mimic Ross’ gameplay. But it worked much better than conventional imitation learning. Without DAgger, his model would continue drifting off track even after training for many laps. With the new technique, the model could stay on the track after just a few laps of training.

This result should make intuitive sense to anyone who has learned to drive. You can’t just watch someone else drive. You need to get behind the wheel and make your own mistakes.

The same is true for AI models: They need to make mistakes and then get feedback on what they did wrong. Models that aren’t trained that way—like early LLMs trained mainly with vanilla imitation learning—tend to be brittle and error-prone.

It was fairly easy for Ross to provide sufficient feedback to his SuperTuxKart model because it only needed to worry about two kinds of mistakes: driving too far to the right and driving too far to the left. But LLMs are navigating a far more complex domain. The number of questions (and sequences of questions) a user might ask is practically infinite. So is the number of ways a model can go “off the rails.”

This means that Ross and Bagnell’s solution for training a SuperTuxKart model—let the model make mistakes and then have a human expert correct them—isn’t feasible for LLMs. There simply aren’t enough people to provide feedback for every mistake an AI model could possibly make.

So AI labs needed fully automated ways to give LLMs feedback. That would allow a model to churn through millions of training examples, make millions of mistakes, and get feedback on each of them—all without having to wait for a human response.

Reinforcement learning generalizes

If our goal is to get a SuperTuxKart vehicle to stay on the road, why not just train on that directly? If a model manages to stay on the road (and make forward progress), give it positive reinforcement. If it drives off the road, give it negative feedback. This is the basic idea behind reinforcement learning: training a model via trial and error.

It would have been easy to train a SuperTuxKart model this way—probably so easy it wouldn’t have made an interesting research project. Instead, Ross focused on imitation learning because it’s an essential step in training many practical AI systems, especially in robotics.

But reinforcement learning is also quite useful, and a 2025 paper helps explain why. A team of researchers from Google DeepMind and several universities started with a foundation model and then used one of two techniques—supervised fine-tuning (a form of imitation learning) or reinforcement learning—to teach the model to solve new problems. Here’s a chart summarizing their results:

Chart showing ML results

The dashed line shows how models perform on problems that are “in-distribution”—that is, similar to those in their training data. You can see that for these situations, imitation learning (the red line) usually makes faster progress than reinforcement learning (the blue line).

But the story is different for the solid lines, which represent “out-of-distribution” problems that are less similar to the training data. Models trained with imitation learning got worse with more training. In contrast, models trained with reinforcement learning did almost as well at out-of-distribution tasks as they did with in-distribution tasks.

In short, imitation learning can rapidly teach a model to mimic the behaviors in its training data, but the model will easily get confused in unfamiliar environments. A model trained with reinforcement learning has a better chance of learning general principles that will be relevant in new and unfamiliar situations.

Imitation and reinforcement are complements

While reinforcement learning is powerful, it can also be rather finicky.

Suppose you wanted to train a self-driving car purely with reinforcement learning. You’d need to convert every principle of good driving—including subtle considerations like following distances, taking turns at intersections, and knowing when it’s OK to cross a double yellow line—into explicit mathematical formulas. This would be quite difficult. It’s easier to collect a bunch of examples of humans driving well and effectively tell a model “drive like this.” That’s imitation learning.

But reinforcement learning also plays an important role in training self-driving systems. In a 2022 paper, researchers from Waymo wrote that models trained only with imitation learning tend to work well in “situations that are well represented in the demonstration data.” However, “more unusual or dangerous situations that occur only rarely in the data” might cause a model trained with imitation learning to “respond unpredictably”—for example, crashing into another vehicle.

Waymo found that a combination of imitation and reinforcement learning yielded better self-driving performance than either technique could have produced on its own.

Human beings also learn from a mix of imitation and explicit feedback:

  • In school, teachers demonstrate math problems on the board and invite students to follow along (imitation). Then the teacher asks the students to work on some problems on their own. The teacher gives students feedback by grading their answers (reinforcement).
  • When someone starts a new job, early training may involve shadowing a more experienced worker and observing what they do (imitation). But as the worker gains more experience, learning shifts to explicit feedback such as performance reviews (reinforcement).

Notice that it usually makes sense to do imitation before reinforcement. Imitation is an efficient way to convey knowledge to someone who is brand new to a topic, but reinforcement is often needed to achieve mastery.

The story is the same for large language models. The complexity of natural language means it wouldn’t be feasible to train a language model purely with reinforcement. So LLMs first learn the nuances of human language through imitation.

But pretraining runs out of steam on longer and more complex tasks. Further progress requires a shift to reinforcement: letting models try problems and then giving them feedback based on whether they succeed.

Using LLMs to judge LLMs

Reinforcement learning has been around for decades. For example, AlphaGo, the DeepMind system that famously beat top human Go players in 2016, was based on reinforcement learning. So you might be wondering why frontier labs didn’t use it more extensively before 2024.

Reinforcement learning requires a reward model—a formula to determine whether a model’s output was successful or not. Developing a good reward model is easy to do in some domains—for example, you can judge a Go-playing AI based on whether it wins or loses.

But it’s much more difficult to automatically judge whether an LLM has produced a good poem or legal brief.

Earlier, I described how Stephane Ross let his model play SuperTuxKart and directly provided feedback when it made a mistake. I argued that this approach wouldn’t work for a language model; there are far too many ways for an LLM to make a mistake for a human being to correct them all.

But OpenAI developed a clever technique to effectively automate human feedback. It’s called Reinforcement Learning from Human Feedback (RLHF), and it works like this:

  • Human raters look at pairs of LLM responses and choose the best one.
  • Using these human responses, OpenAI trains a new LLM to predict how much humans will like any given sample of text.
  • OpenAI uses this new text-rating LLM as a reward model to (post) train another LLM with reinforcement learning.

You might think it sounds suspiciously circular to use an LLM to judge the output of another LLM. Why would one LLM be any better at judging the quality of a response than the other? But it turns out that recognizing a good response is often easier than generating one. So RLHF works pretty well in practice.

Chart showing RHLF details

OpenAI actually invented this technique prior to the 2022 release of ChatGPT. Today, RLHF mainly focuses on improving the model’s “behavior”—for example, giving the model a pleasant personality, encouraging it not to be too talkative or too terse, discouraging it from making offensive statements, and so forth.

In December 2022—two weeks after the release of ChatGPT but before the first release of Claude—Anthropic pushed this LLMs-judging-LLMs philosophy a step further with a reinforcement learning method called Constitutional AI.

First, Anthropic wrote a plain-English description of the principles an LLM should follow. This “constitution” includes principles like “Please choose the response that has the least objectionable, offensive, unlawful, deceptive, inaccurate, or harmful content.”

During training, Anthropic does reinforcement learning by asking a “judge” LLM to decide whether the output of the “student” LLM is consistent with the principles in this constitution. If so, the training algorithm rewards the student, encouraging it to produce more outputs like it. Otherwise, the training algorithm penalizes the student, discouraging it from producing similar outputs.

This method of training an LLM doesn’t rely directly on human judgments at all. Humans only influence the model indirectly by writing the constitution.

Obviously, this technique requires an AI company to already have a fairly sophisticated LLM to act as the judge. So this is a bootstrapping process: As models get more sophisticated, they become better able to supervise the next generation of models.

Last December, Semianalysis published an article describing the training process for an upgraded version of Claude 3.5 Sonnet that Anthropic released in October. Anthropic had previously released Claude 3 in three sizes: Opus (large), Sonnet (medium), and Haiku (small). But when Anthropic released Claude 3.5 in June 2024, it only released a mid-sized model called Sonnet.

So what happened to Opus?

Semianalysis reported that “Anthropic finished training Claude 3.5 Opus, and it performed well. Yet Anthropic didn’t release it. This is because instead of releasing publicly, Anthropic used Claude 3.5 Opus to generate synthetic data and for reward modeling to improve Claude 3.5 Sonnet significantly.”

When Semianalysis says Anthropic used Opus “for reward modeling,” what they mean is that the company used Opus to judge outputs of Claude 3.5 Sonnet as part of a reinforcement learning process. Opus was too large—and therefore expensive—to be a good value for the general public. But through reinforcement learning and other techniques, Anthropic could train a version of Claude Sonnet that was close to Claude Opus in its capabilities—ultimately giving customers near-Opus performance for the price of Sonnet.

The power of chain-of-thought reasoning

A big way reinforcement learning makes models more powerful is by enabling extended chain-of-thought reasoning. LLMs produce better results if they are prompted to “think step by step”: breaking a complex problem down into simple steps and reasoning about them one at a time. In the last couple of years, AI companies started training models to do chain-of-thought reasoning automatically.

Then last September, OpenAI released o1, a model that pushed chain-of-thought reasoning much further than previous models. The o1 model can generate hundreds—or even thousands—of tokens “thinking” about a problem before producing a response. The longer it thinks, the more likely it is to reach a correct answer.

Reinforcement learning was essential for the success of o1 because a model trained purely with imitation learning would have suffered from compounding errors: the more tokens it generated, the more likely it would be to screw up.

At the same time, chain-of-thought reasoning has made reinforcement learning more powerful. Reinforcement learning only works if a model is able to succeed some of the time—otherwise, there’s nothing for the training algorithm to reinforce. As models learn to generate longer chains of thought, they become able to solve more difficult problems, which enables reinforcement learning on those more difficult problems. This can create a virtuous cycle where models get more and more capable as the training process continues.

In January, the Chinese company DeepSeek released a model called R1 that made quite a splash in the West. The company also released a paper describing how it trained R1. And it included a beautiful description of how a model can “teach itself” to reason using reinforcement learning.

DeepSeek trained its models to solve difficult math and programming problems. These problems are ideal for reinforcement learning because they have objectively correct answers that can be automatically checked by software. This allows large-scale training without human oversight or human-generated training data.

Here’s a remarkable graph from DeepSeek’s paper.

Graph showing average length of time per response during trainig

It shows the average number of tokens the model generated before giving an answer. As you can see, the longer the training process went on, the longer its responses got.

Here is how DeepSeek describes its training process:

The thinking time of [R1] shows consistent improvement throughout the training process. This improvement is not the result of external adjustments but rather an intrinsic development within the model. [R1] naturally acquires the ability to solve increasingly complex reasoning tasks by leveraging extended test-time computation. This computation ranges from generating hundreds to thousands of reasoning tokens, allowing the model to explore and refine its thought processes in greater depth.

One of the most remarkable aspects of this self-evolution is the emergence of sophisticated behaviors as the test-time computation increases. Behaviors such as reflection—where the model revisits and reevaluates its previous steps—and the exploration of alternative approaches to problem-solving arise spontaneously. These behaviors are not explicitly programmed but instead emerge as a result of the model’s interaction with the reinforcement learning environment.

Here’s one example of the kind of technique the model was teaching itself. At one point during the training process, DeepSeek researchers noticed that the model had learned to backtrack and rethink a previous conclusion using language like this:

Image showing textual breakdown of model rethinking steps

Again, DeepSeek says it didn’t program its models to do this or deliberately provide training data demonstrating this style of reasoning. Rather, the model “spontaneously” discovered this style of reasoning partway through the training process.

Of course, it wasn’t entirely spontaneous. The reinforcement learning process started with a model that had been pretrained using data that undoubtedly included examples of people saying things like “Wait, wait. Wait. That’s an aha moment.”

So it’s not like R1 invented this phrase from scratch. But it evidently did spontaneously discover that inserting this phrase into its reasoning process could serve as a useful signal that it should double-check that it was on the right track. That’s remarkable.

In a recent article, Ars Technica’s Benj Edwards explored some of the limitations of reasoning models trained with reinforcement learning. For example, one study “revealed puzzling inconsistencies in how models fail. Claude 3.7 Sonnet could perform up to 100 correct moves in the Tower of Hanoi but failed after just five moves in a river crossing puzzle—despite the latter requiring fewer total moves.”

Conclusion: Reinforcement learning made agents possible

One of the most discussed applications for LLMs in 2023 was creating chatbots that understand a company’s internal documents. The conventional approach to this problem was called RAG—short for retrieval augmented generation.

When the user asks a question, a RAG system performs a keyword- or vector-based search to retrieve the most relevant documents. It then inserts these documents into an LLM’s context window before generating a response. RAG systems can make for compelling demos. But they tend not to work very well in practice because a single search will often fail to surface the most relevant documents.

Today, it’s possible to develop much better information retrieval systems by allowing the model itself to choose search queries. If the first search doesn’t pull up the right documents, the model can revise the query and try again. A model might perform five, 20, or even 100 searches before providing an answer.

But this approach only works if a model is “agentic”—if it can stay on task across multiple rounds of searching and analysis. LLMs were terrible at this prior to 2024, as the examples of AutoGPT and BabyAGI demonstrated. Today’s models are much better at it, which allows modern RAG-style systems to produce better results with less scaffolding. You can think of “deep research” tools from OpenAI and others as very powerful RAG systems made possible by long-context reasoning.

The same point applies to the other agentic applications I mentioned at the start of the article, such as coding and computer use agents. What these systems have in common is a capacity for iterated reasoning. They think, take an action, think about the result, take another action, and so forth.

Timothy B. Lee was on staff at Ars Technica from 2017 to 2021. Today, he writes Understanding AI, a newsletter that explores how AI works and how it’s changing our world. You can subscribe here.

Photo of Timothy B. Lee

Timothy is a senior reporter covering tech policy and the future of transportation. He lives in Washington DC.

How a big shift in training LLMs led to a capability explosion Read More »

man’s-ghastly-festering-ulcer-stumps-doctors—until-they-cut-out-a-wedge-of-flesh

Man’s ghastly festering ulcer stumps doctors—until they cut out a wedge of flesh


The man made a full recovery, but this tale is not for the faint of heart.

If you were looking for some motivation to follow your doctor’s advice or remember to take your medicine, look no further than this grisly tale.

A 64-year-old man went to the emergency department of Brigham and Women’s Hospital in Boston with a painful festering ulcer spreading on his left, very swollen ankle. It was a gruesome sight; the open sore was about 8 by 5 centimeters (about 3 by 2 inches) and was rimmed by black, ashen, and dark purple tissue. Inside, it oozed with streaks and fringes of yellow pus around pink and red inflamed flesh. It was 2 cm deep (nearly an inch). And it smelled.

The man told doctors it had all started two years prior, when dark, itchy lesions appeared in the area on his ankle—the doctors noted that there were multiple patches of these lesions on both his legs. But about five months before his visit to the emergency department, one of the lesions on his left ankle had progressed to an ulcer. It was circular, red, tender, and deep. He sought treatment and was prescribed antibiotics, which he took. But they didn’t help.

You can view pictures of the ulcer and its progression here, but be warned, it is graphic. (Panel A shows the ulcer five months prior to the emergency department visit. Panel B shows the ulcer one month prior. Panel C shows the wound on the day of presentation at the emergency department. Panel D shows the area three months after hospital discharge.)

Gory riddle

The ulcer grew. In fact, it seemed as though his leg was caving in as the flesh around it began rotting away. A month before the emergency room visit, the ulcer was a gaping wound that was already turning gray and black at the edges. It was now well into the category of being a chronic ulcer.

In a Clinical Problem-Solving article published in the New England Journal of Medicine this week, doctors laid out what they did and thought as they worked to figure out what was causing the man’s horrid sore.

With the realm of possibilities large, they started with the man’s medical history. The man had immigrated to the US from Korea 20 years ago. He owned and worked at a laundromat, which involved standing for more than eight hours a day. He had a history of eczema on his legs, high cholesterol, high blood pressure, and Type 2 diabetes. For these, he was prescribed a statin for his cholesterol, two blood pressure medications (hydrochlorothiazide and losartan), and metformin for his diabetes. He told doctors he was not good at taking the regimen of medicine.

His diabetes was considered “poorly controlled.” A month prior, he had a glycated hemoglobin (A1C or HbA1C) test—which indicates a person’s average blood sugar level over the past two or three months. His result was 11 percent, while the normal range is between 4.2 and 5.6 percent.

His blood pressure, meanwhile, was 215/100 mm Hg at the emergency department. For reference, readings higher than 130/80 mm Hg on either number are considered the first stage of high blood pressure. Over the past three years, the man’s blood pressure had systolic readings (top number, pressure as heart beats) ranging from 160 to 230 mm Hg and diastolic readings (bottom number, pressure as heart relaxes) ranging from 95 to 120 mm Hg.

Clinical clues

Given the patient’s poorly controlled diabetes, a diabetic ulcer was initially suspected. But the patient didn’t have any typical signs of diabetic neuropathy that are linked to ulcers. These would include numbness, unusual sensations, or weakness. His responses on a sensory exam were all normal. Diabetic ulcers also typically form on the foot, not the lower leg.

X-rays of the ankle showed swelling in the soft tissue but without some signs of infection. The doctors wondered if the man had osteomyelitis, an infection in the bone, which can be a complication in people with diabetic ulcers. The large size and duration of the ulcer matched with a bone infection, as well as some elevated inflammatory markers he had on his blood tests.

To investigate the bone infection further, they admitted the man to the hospital and ordered magnetic resonance imaging (MRI). But the MRI showed only a soft-tissue defect and a normal bone, ruling out a bone infection. Another MRI was done with a contrast agent. That showed that the man’s large arteries were normal and there were no large blood clots deep in his veins—which is sometimes linked to prolonged standing, as the man did at his laundromat job.

As the doctors were still working to root out the cause, they had started him on a heavy-duty regimen of antibiotics. This was done with the assumption that on top of whatever caused the ulcer, there was now also a potentially aggressive secondary infection—one not knocked out by the previous round of antibiotics the man had been given.

With a bunch of diagnostic dead ends piling up, the doctors broadened their view of possibilities, newly considering cancers, rare inflammatory conditions, and less common conditions affecting small blood vessels (as the MRI has shown the larger vessels were normal). This led them to the possibility of a Martorell’s ulcer.

These ulcers, first described in 1945 by a Spanish doctor named Fernando Martorell, form when prolonged, uncontrolled high blood pressure causes the teeny arteries below the skin to stiffen and narrow, which blocks the blood supply, leading to tissue death and then ulcers. The ulcers in these cases tend to start as red blisters and evolve to frank ulcers. They are excruciatingly painful. And they tend to form on the lower legs, often over the Achilles’ tendon, though it’s unclear why this location is common.

What the doctor ordered

The doctors performed a punch biopsy of the man’s ulcer, but it was inconclusive—which is common with Martorell’s ulcers. The doctors turned to a “deep wedge biopsy” instead, which is exactly what it sounds like.

A pathology exam of the tissue slices from the wedge biopsy showed blood vessels that had thickened and narrowed. It also revealed extensive inflammation and necrosis. With the pathology results as well as the clinical presentation, the doctors diagnosed the man with a Martorell’s ulcer.

They also got back culture results from deep-tissue testing, finding that the man’s ulcer had also become infected with two common and opportunistic bacteria—Serratia marcescens and Enterococcus faecalis. Luckily, these are generally easy to treat, so the doctors scaled back his antibiotic regimen to target just those germs.

The man underwent three surgical procedures to clean out the dead tissue from the ulcer, then a skin graft to repair the damage. Ultimately, he made a full recovery. The doctors at first set him on an aggressive regimen to control his blood pressure, one that used four drugs instead of the two he was supposed to be taking. But the four-drug regimen caused his blood pressure to drop too low, and he was ultimately moved back to his original two-drug treatment.

The finding suggests that if he had just taken his original medications as prescribed, he would have kept his blood pressure in check and avoided the ulcer altogether.

In the end, “the good outcome in this patient with a Martorell’s ulcer underscores the importance of blood-pressure control in the management of this condition,” the doctors concluded.

Photo of Beth Mole

Beth is Ars Technica’s Senior Health Reporter. Beth has a Ph.D. in microbiology from the University of North Carolina at Chapel Hill and attended the Science Communication program at the University of California, Santa Cruz. She specializes in covering infectious diseases, public health, and microbes.

Man’s ghastly festering ulcer stumps doctors—until they cut out a wedge of flesh Read More »

tiktok-is-being-flooded-with-racist-ai-videos-generated-by-google’s-veo-3

TikTok is being flooded with racist AI videos generated by Google’s Veo 3

The release of Google’s Veo 3 video generator in May represented a disconcerting leap in AI video quality. While many of the viral AI videos we’ve seen are harmless fun, the model’s pixel-perfect output can also be used for nefarious purposes. On TikTok, which may or may not be banned in the coming months, users have noticed a surplus of racist AI videos, courtesy of Google’s Veo 3.

According to a report from MediaMatters, numerous TikTok accounts have started posting AI-generated videos that use racist and antisemitic tropes in recent weeks. Most of the AI vitriol is aimed at Black people, depicting them as “the usual suspects” in crimes, absent parents, and monkeys with an affinity for watermelon. The content also targets immigrants and Jewish people. The videos top out at eight seconds and bear the “Veo” watermark, confirming they came from Google’s leading AI model.

The compilation video below has examples pulled from TikTok since the release of Veo 3, but be warned, it contains racist and antisemitic content. Some of the videos are shocking, which is likely the point—nothing drives engagement on social media like anger and drama. MediaMatters reports that the original posts have numerous comments echoing the stereotypes used in the video.

Hateful AI videos generated by Veo 3 spreading on TikTok.

Google has stressed security when announcing new AI models—we’ve all seen an AI refuse to complete a task that runs afoul of its guardrails. And it’s never fun when you have genuinely harmless intentions, but the system throws a false positive and blocks your output. Google has mostly struck the right balance previously, but it appears that Veo 3 is more compliant. We’ve tested a few simple prompts with Veo 3 and found it easy to reproduce elements of these videos.

Clear but unenforced policies

TikTok’s terms of service ban this kind of content. “We do not allow any hate speech, hateful behavior, or promotion of hateful ideologies. This includes explicit or implicit content that attacks a protected group,” the community guidelines read. Despite this blanket ban on racist caricatures, the hateful Veo 3 videos appear to be spreading unchecked.

TikTok is being flooded with racist AI videos generated by Google’s Veo 3 Read More »

nyt-to-start-searching-deleted-chatgpt-logs-after-beating-openai-in-court

NYT to start searching deleted ChatGPT logs after beating OpenAI in court


What are the odds NYT will access your ChatGPT logs in OpenAI court battle?

Last week, OpenAI raised objections in court, hoping to overturn a court order requiring the AI company to retain all ChatGPT logs “indefinitely,” including deleted and temporary chats.

But Sidney Stein, the US district judge reviewing OpenAI’s request, immediately denied OpenAI’s objections. He was seemingly unmoved by the company’s claims that the order forced OpenAI to abandon “long-standing privacy norms” and weaken privacy protections that users expect based on ChatGPT’s terms of service. Rather, Stein suggested that OpenAI’s user agreement specified that their data could be retained as part of a legal process, which Stein said is exactly what is happening now.

The order was issued by magistrate judge Ona Wang just days after news organizations, led by The New York Times, requested it. The news plaintiffs claimed the order was urgently needed to preserve potential evidence in their copyright case, alleging that ChatGPT users are likely to delete chats where they attempted to use the chatbot to skirt paywalls to access news content.

A spokesperson told Ars that OpenAI plans to “keep fighting” the order, but the ChatGPT maker seems to have few options left. They could possibly petition the Second Circuit Court of Appeals for a rarely granted emergency order that could intervene to block Wang’s order, but the appeals court would have to consider Wang’s order an extraordinary abuse of discretion for OpenAI to win that fight.

OpenAI’s spokesperson declined to confirm if the company plans to pursue this extreme remedy.

In the meantime, OpenAI is negotiating a process that will allow news plaintiffs to search through the retained data. Perhaps the sooner that process begins, the sooner the data will be deleted. And that possibility puts OpenAI in the difficult position of having to choose between either caving to some data collection to stop retaining data as soon as possible or prolonging the fight over the order and potentially putting more users’ private conversations at risk of exposure through litigation or, worse, a data breach.

News orgs will soon start searching ChatGPT logs

The clock is ticking, and so far, OpenAI has not provided any official updates since a June 5 blog post detailing which ChatGPT users will be affected.

While it’s clear that OpenAI has been and will continue to retain mounds of data, it would be impossible for The New York Times or any news plaintiff to search through all that data.

Instead, only a small sample of the data will likely be accessed, based on keywords that OpenAI and news plaintiffs agree on. That data will remain on OpenAI’s servers, where it will be anonymized, and it will likely never be directly produced to plaintiffs.

Both sides are negotiating the exact process for searching through the chat logs, with both parties seemingly hoping to minimize the amount of time the chat logs will be preserved.

For OpenAI, sharing the logs risks revealing instances of infringing outputs that could further spike damages in the case. The logs could also expose how often outputs attribute misinformation to news plaintiffs.

But for news plaintiffs, accessing the logs is not considered key to their case—perhaps providing additional examples of copying—but could help news organizations argue that ChatGPT dilutes the market for their content. That could weigh against the fair use argument, as a judge opined in a recent ruling that evidence of market dilution could tip an AI copyright case in favor of plaintiffs.

Jay Edelson, a leading consumer privacy lawyer, told Ars that he’s concerned that judges don’t seem to be considering that any evidence in the ChatGPT logs wouldn’t “advance” news plaintiffs’ case “at all,” while really changing “a product that people are using on a daily basis.”

Edelson warned that OpenAI itself probably has better security than most firms to protect against a potential data breach that could expose these private chat logs. But “lawyers have notoriously been pretty bad about securing data,” Edelson suggested, so “the idea that you’ve got a bunch of lawyers who are going to be doing whatever they are” with “some of the most sensitive data on the planet” and “they’re the ones protecting it against hackers should make everyone uneasy.”

So even though odds are pretty good that the majority of users’ chats won’t end up in the sample, Edelson said the mere threat of being included might push some users to rethink how they use AI. He further warned that ChatGPT users turning to OpenAI rival services like Anthropic’s Claude or Google’s Gemini could suggest that Wang’s order is improperly influencing market forces, which also seems “crazy.”

To Edelson, the most “cynical” take could be that news plaintiffs are possibly hoping the order will threaten OpenAI’s business to the point where the AI company agrees to a settlement.

Regardless of the news plaintiffs’ motives, the order sets an alarming precedent, Edelson said. He joined critics suggesting that more AI data may be frozen in the future, potentially affecting even more users as a result of the sweeping order surviving scrutiny in this case. Imagine if litigation one day targets Google’s AI search summaries, Edelson suggested.

Lawyer slams judges for giving ChatGPT users no voice

Edelson told Ars that the order is so potentially threatening to OpenAI’s business that the company may not have a choice but to explore every path available to continue fighting it.

“They will absolutely do something to try to stop this,” Edelson predicted, calling the order “bonkers” for overlooking millions of users’ privacy concerns while “strangely” excluding enterprise customers.

From court filings, it seems possible that enterprise users were excluded to protect OpenAI’s competitiveness, but Edelson suggested there’s “no logic” to their exclusion “at all.” By excluding these ChatGPT users, the judge’s order may have removed the users best resourced to fight the order, Edelson suggested.

“What that means is the big businesses, the ones who have the power, all of their stuff remains private, and no one can touch that,” Edelson said.

Instead, the order is “only going to intrude on the privacy of the common people out there,” which Edelson said “is really offensive,” given that Wang denied two ChatGPT users’ panicked request to intervene.

“We are talking about billions of chats that are now going to be preserved when they weren’t going to be preserved before,” Edelson said, noting that he’s input information about his personal medical history into ChatGPT. “People ask for advice about their marriages, express concerns about losing jobs. They say really personal things. And one of the bargains in dealing with OpenAI is that you’re allowed to delete your chats and you’re allowed to temporary chats.”

The greatest risk to users would be a data breach, Edelson said, but that’s not the only potential privacy concern. Corynne McSherry, legal director for the digital rights group the Electronic Frontier Foundation, previously told Ars that as long as users’ data is retained, it could also be exposed through future law enforcement and private litigation requests.

Edelson pointed out that most privacy attorneys don’t consider OpenAI CEO Sam Altman to be a “privacy guy,” despite Altman recently slamming the NYT, alleging it sued OpenAI because it doesn’t “like user privacy.”

“He’s trying to protect OpenAI, and he does not give a hoot about the privacy rights of consumers,” Edelson said, echoing one ChatGPT user’s dismissed concern that OpenAI may not prioritize users’ privacy concerns in the case if it’s financially motivated to resolve the case.

“The idea that he and his lawyers are really going to be the safeguards here isn’t very compelling,” Edelson said. He criticized the judges for dismissing users’ concerns and rejecting OpenAI’s request that users get a chance to testify.

“What’s really most appalling to me is the people who are being affected have had no voice in it,” Edelson said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

NYT to start searching deleted ChatGPT logs after beating OpenAI in court Read More »