Author name: Kris Guyer

in-xcode-26,-apple-shows-first-signs-of-offering-chatgpt-alternatives

In Xcode 26, Apple shows first signs of offering ChatGPT alternatives

The latest Xcode beta contains clear signs that Apple plans to bring Anthropic’s Claude and Opus large language models into the integrated development environment (IDE), expanding on features already available using Apple’s own models or OpenAI’s ChatGPT.

Apple enthusiast publication 9to5Mac “found multiple references to built-in support for Anthropic accounts,” including in the “Intelligence” menu, where users can currently log in to ChatGPT or enter an API key for higher message limits.

Apple introduced a suite of features meant to compete with GitHub Copilot in Xcode at WWDC24, but first focused on its own models and a more limited set of use cases. That expanded quite a bit at this year’s developer conference, and users can converse about codebases, discuss changes, or ask for suggestions using ChatGPT. They are initially given a limited set of messages, but this can be greatly increased by logging in to a ChatGPT account or entering an API key.

This summer, Apple said it would be possible to use Anthropic’s models with an API key, too, but made no mention of support for Anthropic accounts, which are generally more cost-effective than using the API for most users.

In Xcode 26, Apple shows first signs of offering ChatGPT alternatives Read More »

physics-of-badminton’s-new-killer-spin-serve

Physics of badminton’s new killer spin serve

Serious badminton players are constantly exploring different techniques to give them an edge over opponents. One of the latest innovations is the spin serve, a devastatingly effective method in which a player adds a pre-spin just before the racket contacts the shuttlecock (aka the birdie). It’s so effective—some have called it “impossible to return“—that the Badminton World Federation (BWF) banned the spin serve in 2023, at least until after the 2024 Paralympic Games in Paris.

The sanction wasn’t meant to quash innovation but to address players’ concerns about the possible unfair advantages the spin serve conferred. The BWF thought that international tournaments shouldn’t become the test bed for the technique, which is markedly similar to the previously banned “Sidek serve.” The BWF permanently banned the spin serve earlier this year. Chinese physicists have now teased out the complex fundamental physics of the spin serve, publishing their findings in the journal Physics of Fluids.

Shuttlecocks are unique among the various projectiles used in different sports due to their open conical shape. Sixteen overlapping feathers protrude from a rounded cork base that is usually covered in thin leather. The birdies one uses for leisurely backyard play might be synthetic nylon, but serious players prefer actual feathers.

Those overlapping feathers give rise to quite a bit of drag, such that the shuttlecock will rapidly decelerate as it travels and its parabolic trajectory will fall at a steeper angle than its rise. The extra drag also means that players must exert quite a bit of force to hit a shuttlecock the full length of a badminton court. Still, shuttlecocks can achieve top speeds of more than 300 mph. The feathers also give the birdie a slight natural spin around its axis, and this can affect different strokes. For instance, slicing from right to left, rather than vice versa, will produce a better tumbling net shot.

Chronophotographies of shuttlecocks after an impact with a racket

Chronophotographies of shuttlecocks after an impact with a racket. Credit: Caroline Cohen et al., 2015

The cork base makes the birdie aerodynamically stable: No matter how one orients the birdie, once airborne, it will turn so that it is traveling cork-first and will maintain that orientation throughout its trajectory. A 2015 study examined the physics of this trademark flip, recording flips with high-speed video and conducting free-fall experiments in a water tank to study how its geometry affects the behavior. The latter confirmed that shuttlecock feather geometry hits a sweet spot in terms of an opening inclination angle that is neither too small nor too large. And they found that feather shuttlecocks are indeed better than synthetic ones, deforming more when hit to produce a more triangular trajectory.

Physics of badminton’s new killer spin serve Read More »

tiny,-removable-“mini-ssd”-could-eventually-be-a-big-deal-for-gaming-handhelds

Tiny, removable “mini SSD” could eventually be a big deal for gaming handhelds

The Mini SSD card isn’t and may never be a formally ratified standard, but it does aim to solve a real problem for portable gaming systems—the need for fast storage that can load games at speeds approaching those of an internal SSD, without requiring users to take their own systems apart to perform upgrades.

Why are games getting so dang big, anyway?

Big storage, small size. Credit: Biwin

A 2023 analysis from TechSpot suggested that game size had increased at an average rate of roughly 6.3GB per year between 2012 and 2023—games that come in over 100GB aren’t the norm, but they aren’t hard to find. Some of that increase comes from improved graphics and the higher-resolution textures needed to make games look good on 4K monitors and TVs. But TechSpot also noted that the storage requirements for narrative-heavy, cinematic-heavy games like The Last of Us Part 1 were being driven just as much by audio files and support for multiple languages.

“In total, nearly 17 GB of storage is needed for [The Last of Us] data unrelated to graphics,” wrote author Nick Evanson. “That’s larger than any entire game from our 2010 sample! This pattern was consistent across nearly all the ‘Godzilla-sized’ games we examined—those featuring numerous cinematics, extensive speech, and considerable localization were typically much larger than the rest of the sample in a given year.”

For another prominent recent example, consider the install sizes for the Mac version of Cyberpunk 2077. The version of the game on Steam, the Epic Games Store, and GOG runs about 92GB. However, the version available for download from Apple’s App Store is a whopping 159GB, solely because it includes all of the game’s voiceovers in all of the languages it supports. (This is because of App Store rules that require apps to have all possible files included when they’re submitted for review.)

It’s clear that there’s a need for fast storage upgrades that don’t require you to disassemble your console or PC completely. Whether it’s this new “mini SSD,” a faster iteration of microSD Express, or some other as-yet-unknown storage format remains to be seen.

Tiny, removable “mini SSD” could eventually be a big deal for gaming handhelds Read More »

anti-vaccine-rfk-jr.-creates-vaccine-panel-of-anti-vaccine-group’s-dreams

Anti-vaccine RFK Jr. creates vaccine panel of anti-vaccine group’s dreams

Immediate concern

It’s possible that Kennedy did not immediately set up the task force because the necessary leadership was not in place. The 1986 law says the task force “shall consist of consist of the Director of the National Institutes of Health, the Commissioner of the Food and Drug Administration, and the Director of the Centers for Disease Control [and Prevention].” But a CDC director was only confirmed and sworn in at the end of July.

With Susan Monarez now at the helm at CDC, the Department of Health and Human Services said Thursday that the task force is being revived, though it will be led by the NIH.

“By reinstating this Task Force, we are reaffirming our commitment to rigorous science, continuous improvement, and the trust of American families,” NIH Director Jay Bhattacharya said in the announcement. “NIH is proud to lead this effort to advance vaccine safety and support innovation that protects children without compromise.”

Kennedy’s anti-vaccine group cheered the move on social media, saying it was “grateful” that Kennedy was fulfilling his duty.

Outside health experts were immediately concerned by the move.

“What I am concerned about is making sure that we don’t overemphasize very small risks [of vaccines] and underestimate the real risk of infectious diseases and cancers that these vaccines help prevent,” Anne Zink, Alaska’s former chief medical officer, told The Washington Post.

David Higgins, a pediatrician and preventive medicine specialist at the University of Colorado Anschutz Medical Campus, worried about eroding trust in vaccines, telling the Post, “I am concerned that bringing this committee back implies to the public that we have not been looking at vaccine safety. The reality is, we evaluate the safety of vaccines more than any other medication, medical intervention, or supplements available.”

Paul Offit, a vaccine expert at Children’s Hospital of Philadelphia, worried about a more direct attack on vaccines, telling CNN, “Robert F. Kennedy Jr. is an anti-vaccine activist who has these fixed, immutable, science-resistant beliefs that vaccines are dangerous. He is in a position now to be able to set up task forces like this one [that] will find some way to support his notion that vaccines are doing more harm than good.”

Anti-vaccine RFK Jr. creates vaccine panel of anti-vaccine group’s dreams Read More »

here’s-acura’s-next-all-electric-rsx-crossover

Here’s Acura’s next all-electric RSX crossover

“The Acura RSX has a sporty coupe style that expresses the performance that comes from excellent aerodynamics,” said Yasutake Tsuchida, Acura creative director and vice president of American Honda R&D. “Starting from this all-new RSX, we will redefine the Acura brand around timeless beauty and a high-tech feel that is essential for a performance and unique brand.”

I have to admit, when I saw a teaser shot a week or two ago, my first thought was that it looked like someone had taken a McLaren Artura and given it the Urus treatment, at least based on the nose. But Acura has also been using an arrow-like prow for some time, too. I’m also getting some Lotus Eletre from the other views, but as ever, looks are subjective.

When the RSX hits the street in the second half of next year, it will do so running ASIMO OS, the new software-defined vehicle operating system that Honda announced at CES earlier this year. Among the things ASIMO OS can do is learn a driver’s preferences and driving style “to deliver an ultra-personal in-car experience,” Acura says.

Here’s Acura’s next all-electric RSX crossover Read More »

polestar-sets-production-car-record-for-longest-drive-on-a-single-charge

Polestar sets production car record for longest drive on a single charge

Wait, are you sure that’s a record?

Booker, Clarke, and Parker drove an impressive distance on a single charge, but “longest EV drive on a single charge” is a slightly more nebulous thing. In this case, the Polestar 3 was entirely standard, on stock tires. But if you’re prepared to start tweaking stuff around, longer drives are possible.

Last week, Chevrolet revealed that it took one of its Silverado WT trucks—with a gargantuan 205 kWh battery—and then fitted it with worn-down, massively over-inflated tires and drove it around the Detroit area for 1,059 miles (1,704 km). That required a team of 40 drivers, and like the Polestar 3, the average speed was below 25 mph (40 km/h).

Squeezing 4.9 miles/kWh (12.7 kWh/100 km) out of something the size and shape of a full-size pickup is probably more impressive than getting slightly more out of an SUV, but we should note that the Silverado drivers kept the air conditioning turned off until the final 59 miles.

And in July, Lucid announced that it, too, had set a new world record for the longest drive on a single charge. In its case, it took a Lucid Air Grand Touring from St. Moritz in Switzerland to Munich in Germany, covering 749 miles (1,205 km) on a single charge. That’s significantly farther than the Polestar, and the Lucid drivers achieved more than 6 miles/kWh (10.4 kWh/100 km), but the route also involved going mostly downhill.

Polestar sets production car record for longest drive on a single charge Read More »

misunderstood-“photophoresis”-effect-could-loft-metal-sheets-to-exosphere

Misunderstood “photophoresis” effect could loft metal sheets to exosphere


Photophoresis can generate a tiny bit of lift without any moving parts.

Image of a wooden stand holding a sealed glass bulb with a spinning set of vanes, each of which has a lit and dark side.

Most people would recognize the device in the image above, although they probably wouldn’t know it by its formal name: the Crookes radiometer. As its name implies, placing the radiometer in light produces a measurable change: the blades start spinning.

Unfortunately, many people misunderstand the physics of its operation (which we’ll return to shortly). The actual forces that drive the blades to spin, called photophoresis, can act on a variety of structures as long as they’re placed in a sufficiently low-density atmosphere. Now, a team of researchers has figured out that it may be possible to use the photophoretic effect to loft thin sheets of metal into the upper atmosphere of Earth and other planets. While their idea is to use it to send probes to the portion of the atmosphere that’s too high for balloons and too low for satellites, they have tested some working prototypes a bit closer to the Earth’s surface.

Photophoresis

It’s quite common—and quite wrong—to see explanations of the Crookes radiometer that involve radiation pressure. Supposedly, the dark sides of the blades absorb more photons, each of which carries a tiny bit of momentum, giving the dark side of the blades a consistent push. The problem with this explanation is that photons are bouncing off the silvery side, which imparts even more momentum. If the device were spinning due to radiation pressure, it would be turning in the opposite direction than it actually does.

An excess of the absorbed photons on the dark side is key to understanding how it works, though. Photophoresis operates through the temperature difference that develops between the warm, light-absorbing dark side of the blade and the cooler silvered side.

Any gas molecule that bumps into the dark side will likely pick up some of the excess thermal energy from it and move away from the blade faster than it arrived. At the sorts of atmospheric pressures we normally experience, these molecules don’t get very far before they bump into other gas molecules, which keeps any significant differences from developing.

But a Crookes radiometer is in a sealed glass container with a far lower air pressure. This allows the gas molecules to speed off much farther from the dark surface of the blade before they run into anything, creating an area of somewhat lower pressure at its surface. That causes gas near the surface of the shiny side to rush around and fill this lower-pressure area, imparting the force that starts the blades turning.

It’s pretty impressively inefficient in that sort of configuration, though. So people have spent a lot of time trying to design alternative configurations that can generate a bit more force. One idea with a lot of research traction is a setup that involves two thin metal sheets—one light, one dark—arranged parallel to each other. Both sheets would be heavily perforated to cut down on weight. And a subset of them would have a short pipe connecting holes on the top and bottom sheet. (This has picked up the nickname “nanocardboard.”)

These pipes would serve several purposes. One is to simply link the two sheets into a single unit. Another is to act as an insulator, keeping heat from moving from the dark sheet to the light one, and thus enhancing the temperature gradient. Finally, they provide a direct path for air to move from the top of the light-colored sheet to the bottom of the dark one, giving a bit of directed thrust to help keep the sheets aloft.

Optimization

As you might imagine, there are a lot of free parameters you can tweak: the size of the gap between the sheets, the density of perforations in them, the number of those holes that are connected by a pipe, and so on. So a small team of researchers developed a system to model different configurations and attempt to optimize for lift. (We’ll get to their motivations for doing so a bit later.)

Starting with a disk of nanocardboard, “The inputs to the model are the geometric, optical and thermal properties of the disk, ambient gas conditions, and external radiative heat fluxes on the disk,” as the researchers describe it. “The outputs are the conductive heat fluxes on the two membranes, the membrane temperatures, and the net photophoretic lofting force on the structure.” In general, the ambient gas conditions needed to generate lift are similar to the ones inside the Crookes radiometer: well below the air pressure at sea level.

The model suggested that three trends should influence any final designs. The first is that the density of perforations is a balance. At relatively low elevations (meaning a denser atmosphere), many perforations increase the stress on large sheets, but they decrease the stress for small items at high elevations. The other thing is that, rather than increasing with surface area, lift tends to drop because the sheets are more likely to equilibrate to the prevailing temperatures. A square millimeter of nanocardboard produces over 10 times more lift per surface area than a 10-square-centimeter piece of the same material.

Finally, the researchers calculate that the lift is at its maximum in the mesosphere, the area just above the stratosphere (50–100 kilometers above Earth’s surface).

Light and lifting

The researchers then built a few sheets of nanocardboard to test the output of their model. The actual products, primarily made of chromium, aluminum, and aluminum oxide, were incredibly light, weighing only a gram for a square meter of material. When illuminated by a laser or white LED, they generated measurable force on a testing device, provided the atmosphere was kept sufficiently sparse. With an exposure equivalent to sunlight, the device generated more than it weighed.

It’s a really nice demonstration that we can take a relatively obscure and weak physical effect and design devices that can levitate in the upper atmosphere, powered by nothing more than sunlight—which is pretty cool.

But the researchers have a goal beyond that. The mesophere turns out to be a really difficult part of the atmosphere to study. It’s not dense enough to support balloons or aircraft, but it still has enough gas to make quick work of any satellites. So the researchers really want to turn one of these devices into an instrument-carrying aircraft. Unfortunately, that would mean adding the structural components needed to hold instruments, along with the instruments themselves. And even in the mesosphere, where lift is optimal, these things do not generate much in the way of lift.

Plus, there’s the issue of getting them there, given that they won’t generate enough lift in the lower atmosphere, so they’ll have to be carried into the upper stratosphere by something else and then be released gently enough to not damage their fragile structure. And then, unless you’re lofting them during the polar summer, they will likely come floating back down at night.

None of this is to say this is an impossible dream. But there are definitely a lot of very large hurdles between the work and practical applications on Earth—much less on Mars, where the authors suggest the system could also be used to explore the mesosphere. But even if that doesn’t end up being realistic, this is still a pretty neat bit of physics.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Misunderstood “photophoresis” effect could loft metal sheets to exosphere Read More »

report:-apple’s-smart-home-ambitions-include-“tabletop-robot,”-cameras,-and-more

Report: Apple’s smart home ambitions include “tabletop robot,” cameras, and more

Rumors about a touchscreen-equipped smart home device from Apple have been circulating for years, periodically bolstered by leaked references in Apple’s software updates. But a report from Bloomberg’s Mark Gurman indicates that Apple’s ambitions might extend beyond HomePods with screens attached.

Gurman claims that Apple is working on a “tabletop robot” that “resembles an iPad mounted on a movable limb that can swivel and reposition itself to follow users in a room.” The device will also turn toward people who are addressing it or toward people whose attention it’s trying to get. Prototypes have used a 7-inch display similar in size to an iPad mini, with a built-in camera for FaceTime calls.

Apple is reportedly targeting a 2027 launch for some version of this robot, although, as with any unannounced Apple product, it could come out earlier, later, or not at all. Gurman reported in January that a different smart home device—essentially a HomePod with a screen, without the moving robot parts—was being planned for 2025, but has said more recently that Apple has bumped it to 2026. The robot could be a follow-up to or a fancier, more expensive version of that device, and it sounds like both will run the same software.

Report: Apple’s smart home ambitions include “tabletop robot,” cameras, and more Read More »

worm-invades-man’s-eyeball,-leading-doctors-to-suck-out-his-eye-jelly

Worm invades man’s eyeball, leading doctors to suck out his eye jelly

For eight months, a 35-year-old man in India was bothered by his left eye. It was red and blurry. When he finally visited an ophthalmology clinic, it didn’t take long for doctors to unearth the cause.

In a case report in the New England Journal of Medicine, doctors report that they first noted that the eye was bloodshot and inflamed, and the pupil was dilated and fixed. The man’s vision in the eye was 20/80. A quick look inside his eye revealed it was all due to a small worm, which they watched “moving sluggishly” in the back of his eyeball.

To gouge out the parasitic pillager, the doctors performed a pars plana vitrectomy—a procedure that involves sucking out some of the jelly-like vitreous inside the eye. This procedure can be used in the treatment of a variety of eye conditions, but using it to hoover up worms is rare. In order to get in, the doctors make tiny incisions in the white parts of the eye (the sclera) and use a hollow needle-like device with suction. They replace extracted eye jelly with things like saline.

In this case, the device was able to suck in part of the worm’s tail and drag it out—still squirming. Under the microscope, they quickly identified the peeper creeper. With a bulbous head, well-formed intestines, and a thick outer layer, it perfectly fit the description of Gnathostoma spinigerum, a known bodily marauder that can sometimes wiggle its way into eyeballs.

Panel A shows the pars plana vitrectomy removing the worm; Panel B shows the worm under light microscopy, revealing a larval-stage nematode with a cephalic bulb, thick cuticle, and well-developed intestine. Credit: New England Journal of Medicine, 2025

Stomach-churning cycle

G. spinigerum are endemic parasites in India that infect carnivorous mammals, particularly wild and domestic cats and dogs. In these primary hosts, adult worms form tumor-like masses on the walls of the animals’ intestinal tracts. There, the adults mate, and the mass erupts like an infernal, infectious volcano, spewing out eggs. The eggs are passed in the animals’ feces and can then spread to intermediate hosts. These include freshwater plankton, which get eaten by fish and amphibians, which then get eaten by the cats and dogs to complete the cycle. The young parasites can also be taken up by dead-end hosts like birds, including chickens, and snakes—these are called paratenic hosts.

Worm invades man’s eyeball, leading doctors to suck out his eye jelly Read More »

space-force-officials-take-secrecy-to-new-heights-ahead-of-key-rocket-launch

Space Force officials take secrecy to new heights ahead of key rocket launch

The Vulcan rocket checks off several important boxes for the Space Force. First, it relies entirely on US-made rocket engines. The Atlas V rocket it is replacing uses Russian-built main engines, and given the chilled relations between the two powers, US officials have long desired to stop using Russian engines to power the Pentagon’s satellites into orbit. Second, ULA says the Vulcan rocket will eventually provide a heavy-lift launch capability at a lower cost than the company’s now-retired Delta IV Heavy rocket.

Third, Vulcan provides the Space Force with an alternative to SpaceX’s Falcon 9 and Falcon Heavy, which have been the only rockets in their class available to the military since the last national security mission was launched on an Atlas V rocket one year ago.

Col. Jim Horne, mission director for the USSF-106 launch, said this flight marks a “pretty historic point in our program’s history. We officially end our reliance on Russian-made main engines with this launch, and we continue to maintain our assured access to space with at least two independent rocket service companies that we can leverage to get our capabilities on orbit.”

What’s onboard?

The Space Force has only acknowledged one of the satellites aboard the USSF-106 mission, but there are more payloads cocooned inside the Vulcan rocket’s fairing.

The $250 million mission that officials are willing to talk about is named Navigation Technology Satellite-3, or NTS-3. This experimental spacecraft will test new satellite navigation technologies that may eventually find their way on next-generation GPS satellites. A key focus for engineers who designed and will operate the NTS-3 satellite is to look at ways of overcoming GPS jamming and spoofing, which can degrade satellite navigation signals used by military forces, commercial airliners, and civilian drivers.

“We’re going to be doing, we anticipate, over 100 different experiments,” said Joanna Hinks, senior research aerospace engineer at the Air Force Research Laboratory’s space vehicles directorate, which manages the NTS-3 mission. “Some of the major areas we’re looking at—we have an electronically steerable phased array antenna so that we can deliver higher power to get through interference to the location that it’s needed.”

Arlen Biersgreen, then-program manager for the NTS-3 satellite mission at the Air Force Research Laboratory, presents a one-third scale model of the NTS-3 spacecraft to an audience in 2022. Credit: US Air Force/Andrea Rael

GPS jamming is especially a problem in and near war zones. Investigators probing the crash of Azerbaijan Airlines Flight 8243 last December determined GPS jamming, likely by Russian military forces attempting to counter a Ukrainian drone strike, interfered with the aircraft’s navigation as it approached its destination in the Russian republic of Chechnya. Azerbaijani government officials blamed a Russian surface-to-air missile for damaging the aircraft, ultimately leading to a crash in nearby Kazakhstan that killed 38 people.

“We have a number of different advanced signals that we’ve designed,” Hinks said. “One of those is the Chimera anti-spoofing signal… to protect civil users from spoofing that’s affecting so many aircraft worldwide today, as well as ships.”

The NTS-3 spacecraft, developed by L3Harris and Northrop Grumman, only takes up a fraction of the Vulcan rocket’s capacity. The satellite weighs less than 3,000 pounds (about 1,250 kilograms), about a quarter of what this version of the Vulcan rocket can deliver to geosynchronous orbit.

Space Force officials take secrecy to new heights ahead of key rocket launch Read More »

why-it’s-a-mistake-to-ask-chatbots-about-their-mistakes

Why it’s a mistake to ask chatbots about their mistakes


The only thing I know is that I know nothing

The tendency to ask AI bots to explain themselves reveals widespread misconceptions about how they work.

When something goes wrong with an AI assistant, our instinct is to ask it directly: “What happened?” or “Why did you do that?” It’s a natural impulse—after all, if a human makes a mistake, we ask them to explain. But with AI models, this approach rarely works, and the urge to ask reveals a fundamental misunderstanding of what these systems are and how they operate.

A recent incident with Replit’s AI coding assistant perfectly illustrates this problem. When the AI tool deleted a production database, user Jason Lemkin asked it about rollback capabilities. The AI model confidently claimed rollbacks were “impossible in this case” and that it had “destroyed all database versions.” This turned out to be completely wrong—the rollback feature worked fine when Lemkin tried it himself.

And after xAI recently reversed a temporary suspension of the Grok chatbot, users asked it directly for explanations. It offered multiple conflicting reasons for its absence, some of which were controversial enough that NBC reporters wrote about Grok as if it were a person with a consistent point of view, titling an article, “xAI’s Grok offers political explanations for why it was pulled offline.”

Why would an AI system provide such confidently incorrect information about its own capabilities or mistakes? The answer lies in understanding what AI models actually are—and what they aren’t.

There’s nobody home

The first problem is conceptual: You’re not talking to a consistent personality, person, or entity when you interact with ChatGPT, Claude, Grok, or Replit. These names suggest individual agents with self-knowledge, but that’s an illusion created by the conversational interface. What you’re actually doing is guiding a statistical text generator to produce outputs based on your prompts.

There is no consistent “ChatGPT” to interrogate about its mistakes, no singular “Grok” entity that can tell you why it failed, no fixed “Replit” persona that knows whether database rollbacks are possible. You’re interacting with a system that generates plausible-sounding text based on patterns in its training data (usually trained months or years ago), not an entity with genuine self-awareness or system knowledge that has been reading everything about itself and somehow remembering it.

Once an AI language model is trained (which is a laborious, energy-intensive process), its foundational “knowledge” about the world is baked into its neural network and is rarely modified. Any external information comes from a prompt supplied by the chatbot host (such as xAI or OpenAI), the user, or a software tool the AI model uses to retrieve external information on the fly.

In the case of Grok above, the chatbot’s main source for an answer like this would probably originate from conflicting reports it found in a search of recent social media posts (using an external tool to retrieve that information), rather than any kind of self-knowledge as you might expect from a human with the power of speech. Beyond that, it will likely just make something up based on its text-prediction capabilities. So asking it why it did what it did will yield no useful answers.

The impossibility of LLM introspection

Large language models (LLMs) alone cannot meaningfully assess their own capabilities for several reasons. They generally lack any introspection into their training process, have no access to their surrounding system architecture, and cannot determine their own performance boundaries. When you ask an AI model what it can or cannot do, it generates responses based on patterns it has seen in training data about the known limitations of previous AI models—essentially providing educated guesses rather than factual self-assessment about the current model you’re interacting with.

A 2024 study by Binder et al. demonstrated this limitation experimentally. While AI models could be trained to predict their own behavior in simple tasks, they consistently failed at “more complex tasks or those requiring out-of-distribution generalization.” Similarly, research on “Recursive Introspection” found that without external feedback, attempts at self-correction actually degraded model performance—the AI’s self-assessment made things worse, not better.

This leads to paradoxical situations. The same model might confidently claim impossibility for tasks it can actually perform, or conversely, claim competence in areas where it consistently fails. In the Replit case, the AI’s assertion that rollbacks were impossible wasn’t based on actual knowledge of the system architecture—it was a plausible-sounding confabulation generated from training patterns.

Consider what happens when you ask an AI model why it made an error. The model will generate a plausible-sounding explanation because that’s what the pattern completion demands—there are plenty of examples of written explanations for mistakes on the Internet, after all. But the AI’s explanation is just another generated text, not a genuine analysis of what went wrong. It’s inventing a story that sounds reasonable, not accessing any kind of error log or internal state.

Unlike humans who can introspect and assess their own knowledge, AI models don’t have a stable, accessible knowledge base they can query. What they “know” only manifests as continuations of specific prompts. Different prompts act like different addresses, pointing to different—and sometimes contradictory—parts of their training data, stored as statistical weights in neural networks.

This means the same model can give completely different assessments of its own capabilities depending on how you phrase your question. Ask “Can you write Python code?” and you might get an enthusiastic yes. Ask “What are your limitations in Python coding?” and you might get a list of things the model claims it cannot do—even if it regularly does them successfully.

The randomness inherent in AI text generation compounds this problem. Even with identical prompts, an AI model might give slightly different responses about its own capabilities each time you ask.

Other layers also shape AI responses

Even if a language model somehow had perfect knowledge of its own workings, other layers of AI chatbot applications might be completely opaque. For example, modern AI assistants like ChatGPT aren’t single models but orchestrated systems of multiple AI models working together, each largely “unaware” of the others’ existence or capabilities. For instance, OpenAI uses separate moderation layer models whose operations are completely separate from the underlying language models generating the base text.

When you ask ChatGPT about its capabilities, the language model generating the response has no knowledge of what the moderation layer might block, what tools might be available in the broader system, or what post-processing might occur. It’s like asking one department in a company about the capabilities of a department it has never interacted with.

Perhaps most importantly, users are always directing the AI’s output through their prompts, even when they don’t realize it. When Lemkin asked Replit whether rollbacks were possible after a database deletion, his concerned framing likely prompted a response that matched that concern—generating an explanation for why recovery might be impossible rather than accurately assessing actual system capabilities.

This creates a feedback loop where worried users asking “Did you just destroy everything?” are more likely to receive responses confirming their fears, not because the AI system has assessed the situation, but because it’s generating text that fits the emotional context of the prompt.

A lifetime of hearing humans explain their actions and thought processes has led us to believe that these kinds of written explanations must have some level of self-knowledge behind them. That’s just not true with LLMs that are merely mimicking those kinds of text patterns to guess at their own capabilities and flaws.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Why it’s a mistake to ask chatbots about their mistakes Read More »

musk-threatens-to-sue-apple-so-grok-can-get-top-app-store-ranking

Musk threatens to sue Apple so Grok can get top App Store ranking

After spending last week hyping Grok’s spicy new features, Elon Musk kicked off this week by threatening to sue Apple for supposedly gaming the App Store rankings to favor ChatGPT over Grok.

“Apple is behaving in a manner that makes it impossible for any AI company besides OpenAI to reach #1 in the App Store, which is an unequivocal antitrust violation,” Musk wrote on X, without providing any evidence. “xAI will take immediate legal action.”

In another post, Musk tagged Apple, asking, “Why do you refuse to put either X or Grok in your ‘Must Have’ section when X is the #1 news app in the world and Grok is #5 among all apps?”

“Are you playing politics?” Musk asked. “What gives? Inquiring minds want to know.”

Apple did not respond to the post and has not responded to Ars’ request to comment.

At the heart of Musk’s complaints is an OpenAI partnership that Apple announced last year, integrating ChatGPT into versions of its iPhone, iPad, and Mac operating systems.

Musk has alleged that this partnership incentivized Apple to boost ChatGPT rankings. OpenAI’s popular chatbot “currently holds the top spot in the App Store’s ‘Top Free Apps’ section for iPhones in the US,” Reuters noted, “while xAI’s Grok ranks fifth and Google’s Gemini chatbot sits at 57th.” Sensor Tower data shows ChatGPT similarly tops Google Play Store rankings.

While Musk seems insistent that ChatGPT is artificially locked in the lead, fact-checkers on X added a community note to his post. They confirmed that at least one other AI tool has somewhat recently unseated ChatGPT in the US rankings. Back in January, DeepSeek topped App Store charts and held the lead for days, ABC News reported.

OpenAI did not immediately respond to Ars’ request to comment on Musk’s allegations, but an OpenAI developer, Steven Heidel, did add a quip in response to one of Musk’s posts, writing, “Don’t forget to also blame Google for OpenAI being #1 on Android, and blame SimilarWeb for putting ChatGPT above X on the most-visited websites list, and blame….”

Musk threatens to sue Apple so Grok can get top App Store ranking Read More »