Author name: Kris Guyer

the-macbook-air-is-the-obvious-loser-as-the-sun-sets-on-the-intel-mac-era

The MacBook Air is the obvious loser as the sun sets on the Intel Mac era


In the end, Intel Macs have mostly gotten a better deal than PowerPC Macs did.

For the last three years, we’ve engaged in some in-depth data analysis and tea-leaf reading to answer two questions about Apple’s support for older Macs that still use Intel chips.

First, was Apple providing fewer updates and fewer years of software support to Macs based on Intel chips as it worked to transition the entire lineup to its internally developed Apple Silicon? And second, how long could Intel Mac owners reasonably expect to keep getting updates?

The answer to the first question has always been “it depends, but generally yes.” And this year, we have a definitive answer to the second question: For the bare handful of Intel Macs it supports, macOS 26 Tahoe will be the final new version of the operating system to support any of Intel’s chips.

To its credit, Apple has also clearly spelled this out ahead of time rather than pulling the plug on Intel Macs with no notice. The company has also said that it plans to provide security updates for those Macs for two years after Tahoe is replaced by macOS 27 next year. These Macs aren’t getting special treatment—this has been Apple’s unspoken, unwritten policy for macOS security updates for decades now—but to look past its usual “we don’t comment on our future plans” stance to give people a couple years of predictability is something we’ve been pushing Apple to do for a long time.

With none of the tea leaf reading left to do, we can now present a fairly definitive look at how Apple has handled the entire Intel transition, compare it to how the PowerPC-to-Intel switch went two decades ago, and predict what it might mean about support for Apple Silicon Macs.

The data

We’ve assembled an epoch-spanning spreadsheet of every PowerPC or Intel Mac Apple has released since the original iMac kicked off the modern era of Apple back in 1998. On that list, we’ve recorded the introduction date for each Mac, the discontinuation date (when it was either replaced or taken off the market), the version of macOS it shipped with, and the final version of macOS it officially supported.

For those macOS versions, we’ve recorded the dates they received their last major point update—these are the feature-adding updates these releases get when they’re Apple’s latest and greatest version of macOS, as macOS 15 Sequoia is right now. After replacing them, Apple releases security-only patches and Safari browser updates for old macOS versions for another two years after replacing them, so we’ve also recorded the dates that those Macs would have received their final security update. For Intel Macs that are still receiving updates (versions 13, 14, and 15) and macOS 26 Tahoe, we’ve extrapolated end-of-support dates based on Apple’s past practices.

A 27-inch iMac model. It’s still the only Intel Mac without a true Apple Silicon replacement. Credit: Andrew Cunningham

We’re primarily focusing on two time spans: from the date of each Mac’s introduction to the date it stopped receiving major macOS updates, and from the date of each Mac’s introduction to the date it stopped receiving any updates at all. We consider any Macs inside either of these spans to be actively supported; Macs that are no longer receiving regular updates from Apple will gradually become less secure and less compatible with modern apps as time passes. We measure by years of support rather than number of releases, which controls for Apple’s transition to a once-yearly release schedule for macOS back in the early 2010s.

We’ve also tracked the time between each Mac model’s discontinuation and when it stopped receiving updates. This is how Apple determines which products go on its “vintage” and “obsolete” hardware lists, which determine the level of hardware support and the kinds of repairs that the company will provide.

We have lots of detailed charts, but here are some highlights:

  • For all Mac models tracked, the average Mac receives about 6.6 years of macOS updates that add new features, plus another two years of security-only updates.
  • If you only count the Intel era, the average is around seven years of macOS updates, plus two years of security-only patches.
  • Most (though not all) Macs released since 2016 come in lower than either of these averages, indicating that Apple has been less generous to most Intel Macs since the Apple Silicon transition began.
  • The three longest-lived Macs are still the mid-2007 15- and 17-inch MacBook Pros, the mid-2010 Mac Pro, and the mid-2007 iMac, which received new macOS updates for around nine years after their introduction (and security updates for around 11 years).
  • The shortest-lived Mac is still the late-2008 version of the white MacBook, which received only 2.7 years of new macOS updates and another 3.3 years of security updates from the time it was introduced. (Late PowerPC-era and early Intel-era Macs are all pretty bad by modern standards.)

The charts

If you bought a Mac any time between 2016 and 2020, you’re generally settling for fewer years of software updates than you would have gotten in the recent past. If you bought a Mac released in 2020, the tail end of the Intel era when Apple Silicon Macs were around the corner, your reward is the shortest software support window since 2006.

There are outliers in either direction. The sole iMac Pro, introduced in 2017 as Apple tried to regain some of its lost credibility with professional users, will end up with 7.75 years of updates plus another two years of security updates when all is said and done. Buyers of 2018–2020 MacBook Airs and the two-port version of the 2020 13-inch MacBook Pro, however, are treated pretty poorly, getting not quite 5.5 years of updates (plus two years of security patches) on average from the date they were introduced.

That said, most Macs usually end up getting a little over six years of macOS updates and two more years of security updates. If that’s a year or two lower than the recent past, it’s also not ridiculously far from the historical average.

If there’s something to praise here, it’s interesting that Apple doesn’t seem to treat any of its Macs differently based on how much they cost. Now that we have a complete overview of the Intel era, breaking out the support timelines by model rather than by model year shows that a Mac mini doesn’t get dramatically more or less support than an iMac or a Mac Pro, despite costing a fraction of the price. A MacBook Air doesn’t receive significantly more or less support than a MacBook Pro.

These are just averages, and some models are lucky while others are not. The no-adjective MacBook that Apple has sold on and off since 2006 is also an outlier, with fewer years of support on average than the other Macs.

If there’s one overarching takeaway, it’s that you should buy new Macs as close to the date of their introduction as possible if you want to maximize your software support window. Especially for Macs that were sold continuously for years and years—the 2013 and 2019 Mac Pro, the 2018 Mac mini, the non-Retina 2015 MacBook Air that Apple sold some version of for over four years—buying them toward the end of their retail lifecycle means settling for years of fewer updates than you would have gotten if you had waited for the introduction of a new model. And that’s true even though Apple’s hardware support timelines are all calculated from the date of last availability rather than the date of introduction.

It just puts Mac buyers in a bad spot when Apple isn’t prompt with hardware updates, forcing people to either buy something that doesn’t fully suit their needs or settle for something older that will last for fewer years.

What should you do with an older Intel Mac?

The big question: If your Intel Mac is still functional but Apple is no longer supporting it, is there anything you can do to keep it both secure and functional?

All late-model Intel Macs officially support Windows 10, but that OS has its own end-of-support date looming in October 2025. Windows 11 can be installed, but only if you bypass its system requirements, which can work well, but it does require additional fiddling when it comes time to install major updates. Consumer-focused Linux distributions like Ubuntu, Mint, or Pop!_OS may work, depending on your hardware, but they come with a steep learning curve for non-technical users. Google’s ChromeOS Flex may also work, but ChromeOS is more functionally limited than most other operating systems.

The OpenCore Legacy Patcher provides one possible stay of execution for Mac owners who want to stay on macOS for as long as they can. But it faces two steep uphill climbs in macOS Tahoe. First, as Apple has removed more Intel Macs from the official support list, it has removed more of the underlying code from macOS that is needed to support those Macs and other Macs with similar hardware. This leaves more for the OpenCore Configurator team to have to patch in from older OSes, and this kind of forward-porting can leave hardware and software partly functional or non-functional.

Second, there’s the Apple T2 to consider. The Macs with a T2 treat it as a load-bearing co-processor, responsible for crucial operating system functions such as enabling Touch ID, serving as an SSD controller, encoding and decoding videos, communicating with the webcam and built-in microphone, and other operations. But Apple has never opened the T2 up to anyone, and it remains a bit of a black box for both the OpenCore/Hackintosh community and folks who would run Linux-based operating systems like Ubuntu or ChromeOS on that hardware.

The result is that the 2018 and 2019 MacBook Airs that didn’t support macOS 15 Sequoia last year never had support for them added to the OpenCore Legacy Patcher because the T2 chip simply won’t communicate with OpenCore firmware booted. Some T2 Macs don’t have this problem. But if yours does, it’s unlikely that anyone will be able to do anything about it, and your software support will end when Apple says it does.

Does any of this mean anything for Apple Silicon Mac support?

Late-model Intel MacBook Airs have fared worse than other Macs in terms of update longevity. Credit: Valentina Palladino

It will likely be at least two or three years before we know for sure how Apple plans to treat Apple Silicon Macs. Will the company primarily look at specs and technical capabilities, as it did from the late-’90s through to the mid-2010s? Or will Apple mainly stop supporting hardware based on its age, as it has done for more recent Macs and most current iPhones and iPads?

The three models to examine for this purpose are the first ones to shift to Apple Silicon: the M1 versions of the MacBook Air, Mac mini, and 13-inch MacBook Pro, all launched in late 2020. If these Macs are dropped in, say, 2027 or 2028’s big macOS release, but other, later M1 Macs like the iMac stay supported, it means Apple is likely sticking to a somewhat arbitrary age-based model, with certain Macs cut off from software updates that they are perfectly capable of running.

But it’s our hope that all Apple Silicon Macs have a long life ahead of them. The M2, M3, and M4 have all improved on the M1’s performance and other capabilities, but the M1 Macs are much more capable than the Intel ones they supplanted, the M1 was used so widely in various Mac models for so long, and Mac owners can pay so much more for their devices than iPhone and iPad owners. We’d love to see macOS return to the longer-tail software support it provided in the late-’00s and mid-2010s, when models could expect to see seven or eight all-new macOS versions and another two years of security updates afterward.

All signs point to Apple using the launch date of any given piece of hardware as the determining factor for continued software support. But that isn’t how it has always been, nor is it how it always has to be.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

The MacBook Air is the obvious loser as the sun sets on the Intel Mac era Read More »

here’s-kia’s-new-small,-affordable-electric-car:-the-2026-ev4-sedan

Here’s Kia’s new small, affordable electric car: The 2026 EV4 sedan

The mesh headrests are a clever touch, as they’re both comfortable and lightweight. The controls built into the side of the passenger seat that let the driver change its position are a specialty of the automaker. There are also plenty of other conveniences, including wireless device charging, 100 W USB-C ports, and wireless Android Auto and Apple CarPlay. We relied on the native navigation app, which is not as visually pretty as the one you cast from your phone to the 12.3-inch infotainment screen, but it kept me on course on unfamiliar roads in a foreign country while suffering from jet lag. That seems worthy of a mention.

Public transport

Traffic in and around Seoul makes a wonderful case for public transport; it provided less of an opportunity for the EV4 to show its stuff beyond relatively low-speed stop-and-go, mostly topping out at 50 mph (80 km/h) on the roads, which are heavily studded with traffic cameras. Determining a true impression of the car’s range will require spending more time with it on US roads, as a result.

It was, however, an easy car to drive in traffic and to drive slowly. It’s no speed demon anyway; 0–62 mph (100 km/h) takes 7.4 seconds if you floor it in the standard range car, or 7.7 seconds in the big battery one. The ride is good over broken tarmac, although it is quite firm when dealing with short-duration bumps. Meanwhile, the steering is light but not particularly informative when it comes to providing a picture of what the front tires are doing.

Good driving dynamics help sell a car once someone has had a test drive, but most will only get that far if the pricing is right. That’s yet to be announced, and who knows what will happen with tariffs and the clean vehicle tax credit between now and when the cars arrive in dealerships toward the end of the year. However, we expect the standard-range car to start between $37,000 and $39,000, undercutting the Tesla Model 3 in the process. That sounds rather compelling to me.

Here’s Kia’s new small, affordable electric car: The 2026 EV4 sedan Read More »

delightfully-irreverent-underdogs-isn’t-your-parents’-nature-docuseries

Delightfully irreverent Underdogs isn’t your parents’ nature docuseries

Narrator Ryan Reynolds celebrates nature’s outcasts in the new NatGeo docuseries Underdogs.

Most of us have seen a nature documentary or two (or three) at some point in our lives, so it’s a familiar format: sweeping, majestic footage of impressively regal animals accompanied by reverently high-toned narration (preferably with a tony British accent). Underdogs, a new docuseries from National Geographic, takes a decidedly different approach. Narrated with hilarious irreverence by Ryan Reynolds, the five-part series highlights nature’s less cool and majestic creatures: the outcasts and benchwarmers, more noteworthy for their “unconventional hygiene choices” and “unsavory courtship rituals.” It’s like The Suicide Squad or Thunderbolts*, except these creatures actually exist.

Per the official premise, “Underdogs features a range of never-before-filmed scenes, including the first time a film crew has ever entered a special cave in New Zealand—a huge cavern that glows brighter than a bachelor pad under a black light thanks to the glowing butts of millions of mucus-coated grubs. All over the world, overlooked superstars like this are out there 24/7, giving it maximum effort and keeping the natural world in working order for all those showboating polar bears, sharks and gorillas.” It’s rated PG-13 thanks to the odd bit of scatalogical humor and shots of Nature Sexy Time.

Each of the five episodes is built around a specific genre. “Superheroes” highlights the surprising superpowers of the honey badger, pistol shrimp, and the invisible glass frog, among others, augmented with comic book graphics; “Sexy Beasts” focuses on bizarre mating habits and follows the format of a romantic advice column; “Terrible Parents” highlights nature’s worst practices, following the outline of a parenting guide; “Total Grossout” is exactly what it sounds like; and “The Unusual Suspects” is a heist tale, documenting the supposed efforts of a macaque to put together the ultimate team of masters of deception and disguise (an inside man, a decoy, a fall guy, etc.). Green Day even wrote and recorded a special theme song for the opening credits.

Co-creators Mark Linfield and Vanessa Berlowitz of Wildstar Films are longtime producers of award-winning wildlife films, most notably Frozen Planet, Planet Earth, and David Attenborough’s Life of Mammals—you know, the kind of prestige nature documentaries that have become a mainstay for National Geographic and the BBC, among others. They’re justly proud of that work, but this time around the duo wanted to try something different.

Delightfully irreverent Underdogs isn’t your parents’ nature docuseries Read More »

companies-may-soon-pay-a-fee-for-their-rockets-to-share-the-skies-with-airplanes

Companies may soon pay a fee for their rockets to share the skies with airplanes


Some space companies aren’t necessarily against this idea, but SpaceX hasn’t spoken.

Starship soars through the stratosphere. Credit: Stephen Clark/Ars Technica

The Federal Aviation Administration may soon levy fees on companies seeking launch and reentry licenses, a new tack in the push to give the agency the resources it needs to keep up with the rapidly growing commercial space industry.

The text of a budget reconciliation bill released by Sen. Ted Cruz (R-Texas) last week calls for the FAA’s Office of Commercial Space Transportation, known as AST, to begin charging licensing fees to space companies next year. The fees would phase in over eight years, after which the FAA would adjust them to keep pace with inflation. The money would go into a trust fund to help pay for the operating costs of the FAA’s commercial space office.

The bill released by Cruz’s office last week covers federal agencies under the oversight of the Senate Commerce Committee, which he chairs. These agencies include the FAA and NASA. Ars recently covered Cruz’s proposals for NASA to keep the Space Launch System rocket, Orion spacecraft, and Gateway lunar space station alive, while the Trump administration aims to cancel Gateway and end the SLS and Orion programs after two crew missions to the Moon.

The Trump administration’s fiscal year 2026 budget request, released last month, proposes $42 million for the FAA’s Office of Commercial Space Transportation, a fraction of the agency’s overall budget request of $22 billion. The FAA’s commercial space office received an almost identical funding level in 2024 and 2025. Accounting for inflation, this is effectively a budget cut for AST. The office’s budget increased from $27.6 million to more than $42 million between 2021 and 2024, when companies like SpaceX began complaining the FAA was not equipped to keep up with the fast-moving commercial launch industry.

The FAA licensed 11 commercial launch and reentry operations in 2015, when AST’s budget was $16.6 million. Last year, the number of space operations increased to 164, and the US industry is on track to conduct more than 200 commercial launches and reentries in 2025. SpaceX’s Falcon 9 rocket is doing most of these launches.

While the FAA’s commercial space office receives more federal funding today, the budget hasn’t grown to keep up with the cadence of commercial spaceflight. SpaceX officials urged the FAA to double its licensing staff in 2023 after the company experienced delays in securing launch licenses.

In the background, a Falcon 9 rocket climbs away from Space Launch Complex 40 at Cape Canaveral Space Force Station, Florida. Another Falcon 9 stands on its launch pad at neighboring Kennedy Space Center awaiting its opportunity to fly.

Adding it up

Cruz’s section of the Senate reconciliation bill calls for the FAA to charge commercial space companies per pound of payload mass, beginning with 25 cents per pound in 2026 and increasing to $1.50 per pound in 2033. Subsequent fee rates would change based on inflation. The overall fee per launch or entry would be capped at $30,000 in 2026, increasing to $200,000 in 2033, and then adjusted to keep pace with inflation.

The Trump administration has not weighed in on Cruz’s proposed fee schedule, but Trump’s nominee for the next FAA administrator, Bryan Bedford, agreed with the need for launch and reentry licensing fees in a Senate confirmation hearing Wednesday. Most of the hearing’s question-and-answer session focused on the safety of commercial air travel, but there was a notable exchange on the topic of commercial spaceflight.

Cruz said the rising number of space launches will “add considerable strain to the airspace system” in the United States. Airlines and their passengers pay FAA-mandated fees for each flight segment, and private owners pay the FAA a fee to register their aircraft. The FAA also charges overflight fees to aircraft traveling through US airspace, even if they don’t take off or land in the United States.

“Nearly every user of the National Airspace System pays something back into the system to help cover their operational costs, yet under current law, space launch companies do not, and there is no mechanism for them to pay even if they wish to,” Cruz said. “As commercial spaceflight expands rapidly, so does its impact on the FAA’s ability to operate the National Airspace System. This proposal accounts for that.”

When asked if he agreed, Trump’s FAA nominee suggested he did. Bedford, president and CEO of Republic Airways, is poised to take the helm of the federal aviation regulator if he passes Senate confirmation.

Bryan Bedford is seen prior to his nomination hearing before the Senate Commerce Committee to lead the Federal Aviation Administration on June 11, 2025. Credit: Craig Hudson For The Washington Post via Getty Images

The FAA clears airspace of commercial and private air traffic along the flight corridors of rockets as they launch into space, and around the paths of spacecraft as they return to Earth. The agency is primarily charged with ensuring commercial rockets don’t endanger the public. The National Airspace System (NAS) consists of 29 million square miles of airspace over land and oceans. The FAA says more than 45,000 flights and 2.9 million airline passengers travel through the airspace every day.

Bedford said he didn’t want to speak on specific policy proposals before the Trump administration announces an official position on the matter.

“But I’ll confirm you’re exactly right,” Bedford told Cruz. “Passengers and airlines themselves pay significant taxes. … Those taxes are designed to modernize our NAS. One of the things that is absolutely critical in modernization is making sure we design the NAS so it can accommodate an increased cadence in space launch, so I certainly support where you’re going with that.”

SpaceX would be the company most affected by the proposed licensing fees. The majority of SpaceX’s missions launch the company’s own Starlink broadband satellites aboard Falcon 9 rockets. Most of those launches carry around 17 metric tons (about 37,500 pounds) of usable payload mass.

A quick calculation shows that SpaceX would pay a fee of roughly $9,400 for an average Starlink launch on a Falcon 9 rocket next year if Cruz’s legislation is signed into law. SpaceX launched 89 dedicated Starlink missions last year. That would add up to more than $800,000 in annual fees going into the FAA’s coffers under Cruz’s licensing scheme. Once you account for all of SpaceX’s other commercial launches, this number would likely exceed $1 million.

Assuming Falcon 9s continue to launch Starlink satellites in 2033, the fees would rise to approximately $56,000 per launch. SpaceX may have switched over all Starlink missions to its giant new Starship rocket by then, in which case the company will likely reach the FAA’s proposed fee cap of $200,000 per launch. SpaceX hopes to launch Starships at lower cost than it currently launches the Falcon 9 rocket, so this proposal would see SpaceX pay a significantly larger fraction of its per-mission costs in the form of FAA fees.

Industry reaction

A senior transportation official in the Biden administration voiced tentative support in 2023 for a fee scheme similar to the one under consideration by the Senate. Michael Huerta, a former FAA administrator during the Obama administration and the first Trump administration, told NPR last year that he supports the idea.

“You have this group of new users that are paying nothing into the system that are an increasing share of the operations,” Huerta said. “I truly believe the current structure isn’t sustainable.”

The Commercial Spaceflight Federation, an industry advocacy group that includes SpaceX and Blue Origin among its membership, signaled last year it was against the idea of creating launch and reentry fees, or taxes, as some industry officials call them. Commercial launch and reentry companies have been excluded from FAA fees to remove regulatory burdens and help the industry grow. The federation told NPR last year that because the commercial space industry requires access to US airspace much less often than the aviation industry, it would not yet be appropriate to have space companies pay into an FAA trust fund.

SpaceX did not respond to questions from Ars on the matter. United Launch Alliance would likely be on the hook to become the second-largest payer of FAA fees, at least over the next couple of years, with numerous missions in its backlog to launch massive stacks of Internet satellites for Amazon’s Project Kuiper network from Cape Canaveral Space Force Station in Florida.

A ULA spokesperson told Ars the company is still reviewing and assessing the Senate Commerce Committee’s proposal. “In general, we are supportive of fees that are affordable, do not disadvantage US companies against their foreign counterparts, are fair, equitable, and are used to directly improve the shared infrastructure at the Cape and other spaceports,” the spokesperson said.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Companies may soon pay a fee for their rockets to share the skies with airplanes Read More »

biofuels-policy-has-been-a-failure-for-the-climate,-new-report-claims

Biofuels policy has been a failure for the climate, new report claims

The new report concludes that not only will the expansion of ethanol increase greenhouse gas emissions, but it has also failed to provide the social and financial benefits to Midwestern communities that lawmakers and the industry say it has. (The report defines the Midwest as Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin.)

“The benefits from biofuels remain concentrated in the hands of a few,” Leslie-Bole said. “As subsidies flow, so may the trend of farmland consolidation, increasing inaccessibility of farmland in the Midwest, and locking out emerging or low-resource farmers. This means the benefits of biofuels production are flowing to fewer people, while more are left bearing the costs.”

New policies being considered in state legislatures and Congress, including additional tax credits and support for biofuel-based aviation fuel, could expand production, potentially causing more land conversion and greenhouse gas emissions, widening the gap between the rural communities and rich agribusinesses at a time when food demand is climbing and, critics say, land should be used to grow food instead.

President Donald Trump’s tax cut bill, passed by the House and currently being negotiated in the Senate, would not only extend tax credits for biofuels producers, it specifically excludes calculations of emissions from land conversion when determining what qualifies as a low-emission fuel.

The primary biofuels industry trade groups, including Growth Energy and the Renewable Fuels Association, did not respond to Inside Climate News requests for comment or interviews.

An employee with the Clean Fuels Alliance America, which represents biodiesel and sustainable aviation fuel producers, not ethanol, said the report vastly overstates the carbon emissions from crop-based fuels by comparing the farmed land to natural landscapes, which no longer exist.

They also noted that the impact of soy-based fuels in 2024 was more than $42 billion, providing over 100,000 jobs.

“Ten percent of the value of every bushel of soybeans is linked to biomass-based fuel,” they said.

Biofuels policy has been a failure for the climate, new report claims Read More »

trump’s-ftc-may-impose-merger-condition-that-forbids-advertising-boycotts

Trump’s FTC may impose merger condition that forbids advertising boycotts

FTC chair alleged “serious risk” from ad boycotts

After Musk’s purchase of Twitter, the social network lost advertisers for various reasons, including changes to content moderation and an incident in which Musk posted a favorable response to an antisemitic tweet and then told concerned advertisers to “go fuck yourself.”

FTC Chairman Andrew Ferguson said at a conference in April that “the risk of an advertiser boycott is a pretty serious risk to the free exchange of ideas.”

“If advertisers get into a back room and agree, ‘We aren’t going to put our stuff next to this guy or woman or his or her ideas,’ that is a form of concerted refusal to deal,” Ferguson said. “The antitrust laws condemn concerted refusals to deal. Now, of course, because of the First Amendment, we don’t have a categorical antitrust prohibition on boycotts. When a boycott ceases to be economic for purposes of the antitrust laws and becomes purely First Amendment activity, the courts have not been super clear—[it’s] sort of a ‘we know it when we see it’ type of thing.”

The FTC website says that any individual company acting on its own may “refuse to do business with another firm, but an agreement among competitors not to do business with targeted individuals or businesses may be an illegal boycott, especially if the group of competitors working together has market power.” The examples given on the FTC webpage are mostly about price competition and do not address the widespread practice of companies choosing where to place advertising based on concerns about their brands.

We contacted the FTC about the merger review today and will update this article if it provides any comment.

X’s ad lawsuit

X’s lawsuit targets a World Federation of Advertisers initiative called the Global Alliance for Responsible Media (GARM), a now-defunct program that Omnicom and Interpublic participated in. X itself was part of the GARM initiative, which shut down after X filed the lawsuit. X alleged that the defendants conspired “to collectively withhold billions of dollars in advertising revenue.”

The World Federation of Advertisers said in a court filing last month that GARM was founded “to bring clarity and transparency to disparate definitions and understandings in advertising and brand safety in the context of social media. For example, certain advertisers did not want platforms to advertise their brands alongside content that could negatively impact their brands.”

Trump’s FTC may impose merger condition that forbids advertising boycotts Read More »

there’s-another-leak-on-the-iss,-but-nasa-is-not-saying-much-about-it

There’s another leak on the ISS, but NASA is not saying much about it

No one is certain. The best guess is that the seals on the hatch leading to the PrK module are, in some way, leaking. In this scenario, pressure from the station is feeding the leak inside the PrK module through these seals, leading to a stable pressure inside—making it appear as though the PrK module leaks are fully repaired.

At this point, NASA is monitoring the ongoing leak and preparing for any possibility. A senior industry source told Ars that the NASA leadership of the space station program is “worried” about the leak and its implications.

This is one reason the space agency delayed the launch of a commercial mission carrying four astronauts to the space station, Axiom-4, on Thursday.

“The postponement of Axiom Mission 4 provides additional time for NASA and Roscosmos to evaluate the situation and determine whether any additional troubleshooting is necessary,” NASA said in a statement. “A new launch date for the fourth private astronaut mission will be provided once available.”

One source indicated that the new tentative launch date is now June 18. However, this will depend on whatever resolution there is to the leak issue.

What’s the worst that could happen?

The worst-case scenario for the space station is that the ongoing leaks are a harbinger of a phenomenon known as “high cycle fatigue,” which affects metal, including aluminum. Consider that if you bend a metal clothes hanger once, it bends. But if you bend it back and forth multiple times, it will snap. This is because, as the metal fatigues, it hardens and eventually snaps. This happens suddenly and without warning, as was the case with an Aloha Airlines flight in 1988.

The concern is that some of these metal structures on board the station could fail quickly and catastrophically. Accordingly, in its previous assessments, NASA has classified the structural cracking issue on the space station as the highest level of concern on its 5v5 risk matrix to gauge the likelihood and severity of risks to the space station.

In the meantime, the space agency has not been forthcoming with any additional information. Despite many questions from Ars Technica and other publications, NASA has not scheduled a press conference or said anything else publicly about the leaks beyond stating, “The crew aboard the International Space Station is safely conducting normal operations.”

There’s another leak on the ISS, but NASA is not saying much about it Read More »

inside-the-firm-turning-eerie-blank-streaming-ads-into-useful-nonprofit-messages

Inside the firm turning eerie blank streaming ads into useful nonprofit messages

AdGood’s offerings also include a managed service for ad campaign management for nonprofits. AdGood doesn’t yet offer pixels, but Johns said developments like that are “in the works.”

Johns explained that while many nonprofits use services like Meta and Google AdWords for tracking ads, they’re “hitting plateaus” with their typical methods. He said there is nonprofit interest in reaching younger audiences, who often use CTV devices:

A lot of them have been looking for ways to get [into CTV ads], but, unfortunately, with minimum spend amounts, they’re just not able to access it.

Helping nonprofits make commercials

AdGood also sells a self-serve generative AI ad manager, which it offers via a partnership with Streamr.AI. The tool is designed to simplify the process of creating 30-second video ads that are “completely editable via a chat prompt,” according to Johns.

“It automatically generates all their targeting. They can update their targeting for whatever they want, and then they can swipe a credit card and essentially run that campaign. It goes into our approval queue, which typically takes 24 hours for us to approve because it needs to be deemed TV-quality,” he explained.

The executive said AdGood charges nonprofits a $7 CPM and a $250 flat fee for the service. He added:

Think about a small nonprofit in a local community, for instance, my son’s special needs baseball team. I can get together with five other parents, easily pull together a campaign, and run it in our local town. We get seven kids to show up, and it changes their lives. We’re talking about $250 having a massive impact in a local market.

Looking ahead, Johns said he’d like to see AdGood’s platform and team grow to be able to give every customer “a certain allocation of inventory, whether it’s 50,000 impressions a month or 100,000 a month.”

For some, streaming ads are rarely a good thing. But when those ads can help important causes and replace odd blank ad spaces that make us question our own existence, it brings new meaning to the idea of a “good” commercial.

Inside the firm turning eerie blank streaming ads into useful nonprofit messages Read More »

isaacman’s-bold-plan-for-nasa:-nuclear-ships,-seven-crew-dragons,-accelerated-artemis

Isaacman’s bold plan for NASA: Nuclear ships, seven-crew Dragons, accelerated Artemis


Needs a Super Administrator

“I was very disappointed, especially because it was so close to confirmation.”

Jared Isaacman speaks at the Spacepower Conference in Orlando, Florida. Credit: John Kraus

Nearly two weeks have passed since Jared Isaacman received a fateful, brief phone call from two officials in President Trump’s Office of Personnel Management. In those few seconds, the trajectory of his life over the next three and a half years changed dramatically.

The president, the callers said, wanted to go in a different direction for NASA’s administrator. At the time, Isaacman was within days of a final vote on the floor of the US Senate and assured of bipartisan support. He had run the gauntlet of six months of vetting, interviews, and a committee hearing. He expected to be sworn in within a week. And then, it was all gone.

“I was very disappointed, especially because it was so close to confirmation and I think we had a good plan to implement,” Isaacman told Ars on Wednesday.

Isaacman’s nomination was pulled for political reasons. As SpaceX founder and one-time President Trump confidant Elon Musk made his exit from the White House, key officials who felt trampled on by Musk took their revenge. They knifed a political appointment, Isaacman, who shared Musk’s passion for extending humanity’s reach to Mars. The dismissal was part of a chain of events that ultimately led to a break in the relationship between Trump and Musk, igniting a war of words.

When I spoke with Isaacman this week, I didn’t want to rehash the political melee. I preferred to talk about his plan. After all, he had six months to look under the hood of NASA, identify the problems that were holding the space agency back, and release its potential in this new era of spaceflight.

A man with a plan

“It shouldn’t be a surprise, the organizational structure is very heavy with management and leadership,” Isaacman said. “Lots of senior leadership with long meetings, who have their deputies, who have their chiefs of staff, who have deputy chiefs of staff and associate deputies. It is not just a NASA problem; across government, there are principal, deputy, assistant-to-the-deputy roles. It makes it very hard to have a culture of ownership and urgent decision-making.”

Isaacman said his plan, a blueprint of more than 100 pages detailing various actions to modernize NASA and make it more efficient, would have started with the bureaucracy. “It was going to be hard to get the big, exciting stuff done without a reorganization, a rebuild, including cultural rebuilding, and an aggressive, hungry, mission-first culture,” he said.

One of his first steps would have been to attempt to accelerate the timeline for the Artemis II mission, which is scheduled to fly four astronauts around the Moon in April 2026. He planned to bring in “strike” teams of engineers to help move Artemis and other programs forward. Isaacman wanted to see the Artemis II vehicle on the pad later this summer, with the goal of launching in December of this year, echoing the historic launch of Apollo 8 in December 1968.

Isaacman also sought to reverse the space agency’s decision to cut utilization of the International Space Station due to budget issues.

“Instead of the current thinking, three crew members every eight months to manage the budget, I wanted to go seven crew members every four months,” he said. “I was even going to pay for one of the missions, if need be, to just get more people up there, more cracks at science, and try and figure out the orbital economy, or else life will be very hard on the commercial LEO destinations.”

As part of this, he would have pushed for certification of SpaceX’s Dragon spacecraft to carry seven astronauts—which was in the vehicle’s baseline design—instead of the current four. This would have allowed NASA to fly more professional astronauts, but also payload specialists like the agency used to do during the Space Shuttle program. Essentially, NASA experts of certain experiments would fly and conduct their own research.

“I wanted to bring back the Payload Specialist program and open it up to the NASA workforce,” he said. “Because things are pretty difficult right now, and I wanted to get people excited and reward the best.”

He also planned to seek goodwill by donating his salary as administrator to Space Camp at the US Space & Rocket Center in Huntsville, Alabama, for scholarships to inspire the next generation of explorers.

Nuclear spaceships

Isaacman’s signature issue was going to be a full-bore push into nuclear electric propulsion, which he views as essential for the sustainable exploration of the Solar System by humans. Nuclear electric propulsion converts heat from a fission reactor to electrical power, like a power plant on Earth, and then uses this energy to produce thrust by accelerating an ionized propellant, such as xenon. Nuclear propulsion requires significantly less fuel than chemical propulsion, and it opens up more launch windows to Mars and other destinations.

“We would have gone right to a 100-kilowatt test vehicle that we would send somewhere inspiring with some great cameras,” he said. “Then we are going right to megawatt class, inside of four years, something you could dock a human-rated spaceship to, or drag a telescope to a Lagrange point and then return, big stuff like that. The goal was to get America underway in space on nuclear power.”

Another key element of this plan is that it would give some of NASA’s field centers, including Marshall Space Flight Center, important work to do after the cancellation of the Space Launch System rocket.

“Pivoting to nuclear spaceships, in my mind, was just the right thing to do for the SLS states, even if it’s not the right locations or the right people. There is a lot of dollars there that those states don’t want to let go of,” he said. “When you speak to those senators, if you give them another kind of bar to grab onto, they can get excited about what comes next. And imagine an SLS-caliber budget going into building, literally, nuclear orbiters that could do all sorts of things. That’s directionally correct, right?”

What direction NASA takes now is unclear, but the loss of Isaacman is acute. The agency’s acting administrator, Janet Petro, is largely taking direction from the White House Office of Management and Budget and has no independence. A confirmed administrator is now months away. The lights at the historic space agency get a little dimmer each day as a result.

Considering politics

As for what he plans to do now that he suddenly has time on his hands—Isaacman stepped down as chief executive of Shift4, the financial payments company he founded, to become NASA administrator—Isaacman is weighing his options.

“I’m sure a lot of supporters in the space community would love to hear me say that I’m done with politics, but I’m not sure that’s the case,” he said. “I want to serve our country, give back, and make a difference. I don’t know what, but I will find something.”

What his role in politics would be, Isaacman, who has described himself as a moderate, Republican-leaning voter, is unsure. However, he wants to help bridge a nation that is riven by partisan politics. “I think if you don’t have more moderates and better communicators try to pull us closer together, we’re just going to keep moving farther apart,” he said. “And that just doesn’t seem like it’s in any way good for the country.”

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Isaacman’s bold plan for NASA: Nuclear ships, seven-crew Dragons, accelerated Artemis Read More »

ai-chatbots-tell-users-what-they-want-to-hear,-and-that’s-problematic

AI chatbots tell users what they want to hear, and that’s problematic

After the model has been trained, companies can set system prompts, or guidelines, for how the model should behave to minimize sycophantic behavior.

However, working out the best response means delving into the subtleties of how people communicate with one another, such as determining when a direct response is better than a more hedged one.

“[I]s it for the model to not give egregious, unsolicited compliments to the user?” Joanne Jang, head of model behavior at OpenAI, said in a Reddit post. “Or, if the user starts with a really bad writing draft, can the model still tell them it’s a good start and then follow up with constructive feedback?”

Evidence is growing that some users are becoming hooked on using AI.

A study by MIT Media Lab and OpenAI found that a small proportion were becoming addicted. Those who perceived the chatbot as a “friend” also reported lower socialization with other people and higher levels of emotional dependence on a chatbot, as well as other problematic behavior associated with addiction.

“These things set up this perfect storm, where you have a person desperately seeking reassurance and validation paired with a model which inherently has a tendency towards agreeing with the participant,” said Nour from Oxford University.

AI start-ups such as Character.AI that offer chatbots as “companions” have faced criticism for allegedly not doing enough to protect users. Last year, a teenager killed himself after interacting with Character.AI’s chatbot. The teen’s family is suing the company for allegedly causing wrongful death, as well as for negligence and deceptive trade practices.

Character.AI said it does not comment on pending litigation, but added it has “prominent disclaimers in every chat to remind users that a character is not a real person and that everything a character says should be treated as fiction.” The company added it has safeguards to protect under-18s and against discussions of self-harm.

Another concern for Anthropic’s Askell is that AI tools can play with perceptions of reality in subtle ways, such as when offering factually incorrect or biased information as the truth.

“If someone’s being super sycophantic, it’s just very obvious,” Askell said. “It’s more concerning if this is happening in a way that is less noticeable to us [as individual users] and it takes us too long to figure out that the advice that we were given was actually bad.”

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

AI chatbots tell users what they want to hear, and that’s problematic Read More »

apple’s-craig-federighi-on-the-long-road-to-the-ipad’s-mac-like-multitasking

Apple’s Craig Federighi on the long road to the iPad’s Mac-like multitasking


Federighi talks to Ars about why the iPad’s Mac-style multitasking took so long.

Apple press photograph of iPads running iPadOS 26

iPads! Running iOS 26! Credit: Apple

iPads! Running iOS 26! Credit: Apple

CUPERTINO, Calif.—When Apple Senior Vice President of Software Engineering Craig Federighi introduced the new multitasking UI in iPadOS 26 at the company’s Worldwide Developers Conference this week, he did it the same way he introduced the Calculator app for the iPad last year or timers in the iPad’s Clock app the year before—with a hint of sarcasm.

“Wow,” Federighi enthuses in a lightly exaggerated tone about an hour and 19 minutes into a 90-minute presentation. “More windows, a pointier pointer, and a menu bar? Who would’ve thought? We’ve truly pulled off a mind-blowing release!”

This elicits a sensible chuckle from the gathered audience of developers, media, and Apple employees watching the keynote on the Apple Park campus, where I have grabbed myself a good-but-not-great seat to watch the largely pre-recorded keynote on a gigantic outdoor screen.

Federighi is acknowledging—and lightly poking fun at—the audience of developers, pro users, and media personalities who have been asking for years that Apple’s iPad behave more like a traditional computer. And after many incremental steps, including a big swing and partial miss with the buggy, limited Stage Manager interface a couple of years ago, Apple has finally responded to requests for Mac-like multitasking with a distinctly Mac-like interface, an improved file manager, and better support for running tasks in the background.

But if this move was so forehead-slappingly obvious, why did it take so long to get here? This is one of the questions we dug into when we sat down with Federighi and Senior Vice President of Worldwide Marketing Greg Joswiak for a post-keynote chat earlier this week.

It used to be about hardware restrictions

People have been trying to use iPads (and make a philosophical case for them) as quote-unquote real computers practically from the moment they were introduced 15 years ago.

But those early iPads lacked so much of what we expect from modern PCs and Macs, most notably robust multi-window multitasking and the ability for third-party apps to exchange data. The first iPads were almost literally just iPhone internals connected to big screens, with just a fraction of the RAM and storage available in the Macs of the day; that necessitated the use of a blown-up version of the iPhone’s operating system and the iPhone’s one-full-screen-app-at-a-time interface.

“If you want to rewind all the way to the time we introduced Split View and Slide Over [in iOS 9], you have to start with the grounding that the iPad is a direct manipulation touch-first device,” Federighi told Ars. “It is a foundational requirement that if you touch the screen and start to move something, that it responds. Otherwise, the entire interaction model is broken—it’s a psychic break with your contract with the device.”

Mac users, Federighi said, were more tolerant of small latency on their devices because they were already manipulating apps on the screen indirectly, but the iPads of a decade or so ago “didn’t have the capacity to run an unlimited number of windowed apps with perfect responsiveness.”

It’s also worth noting the technical limitations of iPhone and iPad apps at the time, which up until then had mostly been designed and coded to match the specific screen sizes and resolutions of the (then-manageable) number of iDevices that existed. It simply wasn’t possible for the apps of the day to be dynamically resized as desktop windows are, because no one was coding their apps that way.

Apple’s iPad Pros—and, later, the iPad Airs—have gradually adopted hardware and software features that make them more Mac-like. Credit: Andrew Cunningham

Of course, those hardware limitations no longer exist. Apple’s iPad Pros started boosting the tablets’ processing power, RAM, and storage in earnest in the late 2010s, and Apple introduced a Microsoft Surface-like keyboard and stylus accessories that moved the iPad away from its role as a content consumption device. For years now, Apple’s faster tablets have been based on the same hardware as its slower Macs—we know the hardware can do more because Apple is already doing more with it elsewhere.

“Over time the iPad’s gotten more powerful, the screens have gotten larger, the user base has shifted into a mode where there is a little bit more trackpad and keyboard use in how many people use the device,” Federighi told Ars. “And so the stars kind of aligned to where many of the things that you traditionally do with a Mac were possible to do on an iPad for the first time and still meet iPad’s basic contract.”

On correcting some of Stage Manager’s problems

More multitasking in iPadOS 26. Credit: Apple

Apple has already tried a windowed multitasking system on modern iPads once this decade, of course, with iPadOS 16’s Stage Manager interface.

Any first crack at windowed multitasking on the iPad was going to have a steep climb. This was the first time Apple or its developers had needed to contend with truly dynamically resizable app windows in iOS or iPadOS, the first time Apple had implemented a virtual memory system on the iPad, and the first time Apple had tried true multi-monitor support. Stage Manager was in such rough shape that Apple delayed that year’s iPadOS release to keep working on it.

But the biggest problem with Stage Manager was actually that it just didn’t work on a whole bunch of iPads. You could only use it on new expensive models—if you had a new cheap model or even an older expensive model, your iPad was stuck with the older Slide Over and Split View modes that had been designed around the hardware limitations of mid-2010s iPads.

“We wanted to offer a new baseline of a totally consistent experience of what it meant to have Stage Manager,” Federighi told Ars. “And for us, that meant four simultaneous apps on the internal display and an external display with four simultaneous apps. So, eight apps running at once. And we said that’s the baseline, and that’s what it means to be Stage Manager; we didn’t want to say ‘you get Stage Manager, but you get Stage Manager-lite here or something like that. And so immediately that established a floor for how low we could go.”

Fixing that was one of the primary goals of the new windowing system.

“We decided this time: make everything we can make available,” said Federighi, “even if it has some nuances on older hardware, because we saw so much demand [for Stage Manager].”

That slight change in approach, combined with other behind-the-scenes optimizations, makes the new multitasking model more widely compatible than Stage Manager is. There are still limits on those devices—not to the number of windows you can open, but to how many of those windows can be active and up-to-date at once. And true multi-monitor support would remain the purview of the faster, more-expensive models.

“We have discovered many, many optimizations,” Federighi said. “We re-architected our windowing system and we re-architected the way that we manage background tasks, background processing, that enabled us to squeeze more out of other devices than we were able to do at the time we introduced Stage Manager.”

Stage Manager still exists in iPadOS 26, but as an optional extra multitasking mode that you have to choose to enable instead of the new windowed multitasking system. You can also choose to turn both multitasking systems off entirely, preserving the iPad’s traditional big-iPhone-for-watching-Netflix interface for the people who prefer it.

“iPad’s gonna be iPad”

The $349 base-model iPad is one that stands to gain the most from iPadOS 26. Credit: Andrew Cunningham

However, while the new iPadOS 26 UI takes big steps toward the Mac’s interface, the company still tries to treat them as different products with different priorities. To date, that has meant no touch screens on the Mac (despite years of rumors), and it will continue to mean that there are some Mac things that the iPad will remain unable to do.

“But we’ve looked and said, as [the iPad and Mac] come together, where on the iPad the Mac idiom for doing something, like where we put the window close controls and maximize controls, what color are they—we’ve said why not, where it makes sense, use a converged design for those things so it’s familiar and comfortable,” Federighi told Ars. “But where it doesn’t make sense, iPad’s gonna be iPad.”

There will still be limitations and frustrations when trying to fit an iPad into a Mac-shaped hole in your computing setup. While tasks can run in the background, for example, Apple only allows apps to run workloads with a definitive endpoint, things like a video export or a file transfer. System agents or other apps that perform some routine on-and-off tasks continuously in the background aren’t supported. All the demos we’ve seen so far are also on new, high-end iPad hardware, and it remains to be seen how well the new features behave on low-end tablets like the 11th-generation A16 iPad, or old 2019-era hardware like the iPad Air 3.

But it does feel like Apple has finally settled on a design that might stick and that adds capability to the iPad without wrecking its simplicity for the people who still just want a big screen for reading and streaming.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

Apple’s Craig Federighi on the long road to the iPad’s Mac-like multitasking Read More »

new-apple-study-challenges-whether-ai-models-truly-“reason”-through-problems

New Apple study challenges whether AI models truly “reason” through problems


Puzzle-based experiments reveal limitations of simulated reasoning, but others dispute findings.

An illustration of Tower of Hanoi from Popular Science in 1885. Credit: Public Domain

In early June, Apple researchers released a study suggesting that simulated reasoning (SR) models, such as OpenAI’s o1 and o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking, produce outputs consistent with pattern-matching from training data when faced with novel problems requiring systematic thinking. The researchers found similar results to a recent study by the United States of America Mathematical Olympiad (USAMO) in April, showing that these same models achieved low scores on novel mathematical proofs.

The new study, titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,” comes from a team at Apple led by Parshin Shojaee and Iman Mirzadeh, and it includes contributions from Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar.

The researchers examined what they call “large reasoning models” (LRMs), which attempt to simulate a logical reasoning process by producing a deliberative text output sometimes called “chain-of-thought reasoning” that ostensibly assists with solving problems in a step-by-step fashion.

To do that, they pitted the AI models against four classic puzzles—Tower of Hanoi (moving disks between pegs), checkers jumping (eliminating pieces), river crossing (transporting items with constraints), and blocks world (stacking blocks)—scaling them from trivially easy (like one-disk Hanoi) to extremely complex (20-disk Hanoi requiring over a million moves).

Figure 1 from Apple's

Figure 1 from Apple’s “The Illusion of Thinking” research paper. Credit: Apple

“Current evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy,” the researchers write. In other words, today’s tests only care if the model gets the right answer to math or coding problems that may already be in its training data—they don’t examine whether the model actually reasoned its way to that answer or simply pattern-matched from examples it had seen before.

Ultimately, the researchers found results consistent with the aforementioned USAMO research, showing that these same models achieved mostly under 5 percent on novel mathematical proofs, with only one model reaching 25 percent, and not a single perfect proof among nearly 200 attempts. Both research teams documented severe performance degradation on problems requiring extended systematic reasoning.

Known skeptics and new evidence

AI researcher Gary Marcus, who has long argued that neural networks struggle with out-of-distribution generalization, called the Apple results “pretty devastating to LLMs.” While Marcus has been making similar arguments for years and is known for his AI skepticism, the new research provides fresh empirical support for his particular brand of criticism.

“It is truly embarrassing that LLMs cannot reliably solve Hanoi,” Marcus wrote, noting that AI researcher Herb Simon solved the puzzle in 1957 and many algorithmic solutions are available on the web. Marcus pointed out that even when researchers provided explicit algorithms for solving Tower of Hanoi, model performance did not improve—a finding that study co-lead Iman Mirzadeh argued shows “their process is not logical and intelligent.”

Figure 4 from Apple's

Figure 4 from Apple’s “The Illusion of Thinking” research paper. Credit: Apple

The Apple team found that simulated reasoning models behave differently from “standard” models (like GPT-4o) depending on puzzle difficulty. On easy tasks, such as Tower of Hanoi with just a few disks, standard models actually won because reasoning models would “overthink” and generate long chains of thought that led to incorrect answers. On moderately difficult tasks, SR models’ methodical approach gave them an edge. But on truly difficult tasks, including Tower of Hanoi with 10 or more disks, both types failed entirely, unable to complete the puzzles, no matter how much time they were given.

The researchers also identified what they call a “counterintuitive scaling limit.” As problem complexity increases, simulated reasoning models initially generate more thinking tokens but then reduce their reasoning effort beyond a threshold, despite having adequate computational resources.

The study also revealed puzzling inconsistencies in how models fail. Claude 3.7 Sonnet could perform up to 100 correct moves in Tower of Hanoi but failed after just five moves in a river crossing puzzle—despite the latter requiring fewer total moves. This suggests the failures may be task-specific rather than purely computational.

Competing interpretations emerge

However, not all researchers agree with the interpretation that these results demonstrate fundamental reasoning limitations. University of Toronto economist Kevin A. Bryan argued on X that the observed limitations may reflect deliberate training constraints rather than inherent inabilities.

“If you tell me to solve a problem that would take me an hour of pen and paper, but give me five minutes, I’ll probably give you an approximate solution or a heuristic. This is exactly what foundation models with thinking are RL’d to do,” Bryan wrote, suggesting that models are specifically trained through reinforcement learning (RL) to avoid excessive computation.

Bryan suggests that unspecified industry benchmarks show “performance strictly increases as we increase in tokens used for inference, on ~every problem domain tried,” but notes that deployed models intentionally limit this to prevent “overthinking” simple queries. This perspective suggests the Apple paper may be measuring engineered constraints rather than fundamental reasoning limits.

Figure 6 from Apple's

Figure 6 from Apple’s “The Illusion of Thinking” research paper. Credit: Apple

Software engineer Sean Goedecke offered a similar critique of the Apple paper on his blog, noting that when faced with Tower of Hanoi requiring over 1,000 moves, DeepSeek-R1 “immediately decides ‘generating all those moves manually is impossible,’ because it would require tracking over a thousand moves. So it spins around trying to find a shortcut and fails.” Goedecke argues this represents the model choosing not to attempt the task rather than being unable to complete it.

Other researchers also question whether these puzzle-based evaluations are even appropriate for LLMs. Independent AI researcher Simon Willison told Ars Technica in an interview that the Tower of Hanoi approach was “not exactly a sensible way to apply LLMs, with or without reasoning,” and suggested the failures might simply reflect running out of tokens in the context window (the maximum amount of text an AI model can process) rather than reasoning deficits. He characterized the paper as potentially overblown research that gained attention primarily due to its “irresistible headline” about Apple claiming LLMs don’t reason.

The Apple researchers themselves caution against over-extrapolating the results of their study, acknowledging in their limitations section that “puzzle environments represent a narrow slice of reasoning tasks and may not capture the diversity of real-world or knowledge-intensive reasoning problems.” The paper also acknowledges that reasoning models show improvements in the “medium complexity” range and continue to demonstrate utility in some real-world applications.

Implications remain contested

Have the credibility of claims about AI reasoning models been completely destroyed by these two studies? Not necessarily.

What these studies may suggest instead is that the kinds of extended context reasoning hacks used by SR models may not be a pathway to general intelligence, like some have hoped. In that case, the path to more robust reasoning capabilities may require fundamentally different approaches rather than refinements to current methods.

As Willison noted above, the results of the Apple study have so far been explosive in the AI community. Generative AI is a controversial topic, with many people gravitating toward extreme positions in an ongoing ideological battle over the models’ general utility. Many proponents of generative AI have contested the Apple results, while critics have latched onto the study as a definitive knockout blow for LLM credibility.

Apple’s results, combined with the USAMO findings, seem to strengthen the case made by critics like Marcus that these systems rely on elaborate pattern-matching rather than the kind of systematic reasoning their marketing might suggest. To be fair, much of the generative AI space is so new that even its inventors do not yet fully understand how or why these techniques work. In the meantime, AI companies might build trust by tempering some claims about reasoning and intelligence breakthroughs.

However, that doesn’t mean these AI models are useless. Even elaborate pattern-matching machines can be useful in performing labor-saving tasks for the people that use them, given an understanding of their drawbacks and confabulations. As Marcus concedes, “At least for the next decade, LLMs (with and without inference time “reasoning”) will continue have their uses, especially for coding and brainstorming and writing.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

New Apple study challenges whether AI models truly “reason” through problems Read More »