machine learning

quad-cortex-mini-amp-modeler:-all-the-power,-half-the-size

Quad Cortex mini amp modeler: All the power, half the size


A warehouse of guitar gear in the palm of your hand.

At this January’s massive NAMM music tech show in Los Angeles, six products won “best of show” awards. Several of them went to major music and electronic brands like Yamaha and Boss, but one of the six went to Neural DSP, a much smaller company started in 2017 by Chilean immigrants to Finland.

From its base in the Helsinki area, Neural has made itself an expert in the use of machine learning, robots, and impulse response technology to automate the construction of incredibly lifelike guitar amp modeling software. It quickly jumped into the top ranks of an industry dominated by brands like Universal Audio, Kemper, Line 6, and Fractal. For a hundred bucks, you could buy one of the company’s plugins and sound like a guitar god with a $10,000 recording chain of amps, cabinets, effects pedals, and microphones.

In 2020, Neural branched out into hardware, putting its tech not in your computer but in a floor-based box covered with footswitches and called the Quad Cortex. While the company’s plugins could each replace one entire pedalboard of gear—plus a few amps and cabs—the Quad Cortex could replace a Guitar Center-sized warehouse of devices, offering hundreds of amps, cabs, and effects.

How was this possible? High-quality gear models used to take much longer to build; the best were often built by modeling every single component of the underlying circuit. Machine learning offered a faster way, one that didn’t care about the circuit at all. What it cared about was the input signal (which was known) and the output signal (which contained all the changes imposed on the signal by the circuit, the speaker, the cabinet, and/or the mic in question). A computer could then calculate what the device was doing to the signal without knowing anything about “how it worked.”

But this kind of modeling still took time, because each “capture” was a static picture of one particular setting. When you imagine the millions of possible setting combinations (tone, bass, treble, drive, EQ, etc.) on even a single guitar amp, you can see that building complex models of beloved gear could be slow.

In 2024, Neural announced that it had sped up this process using a robot called TINA. The company hooked TINA’s robotic actuators up to the various controls on some piece of gear it wanted to model, and TINA would do the tedious work of spinning the knobs and recording a new capture at each knob position. (Neural claimed that it typically recorded “thousands of control positions” per device this way.)

A neural network then built a model of how the target device behaved at each recorded setting, though the model would “also generalize and precisely infer the sound of the device in any unseen control setting and input signal.” The result was not a single model of a static setting but a dynamic model that could act on parameter changes just like the original device.

Neural has now modeled a massive library of gear, much of which comes with the Quad Cortex. That device sounds great, though it is still relatively chunky and nearly $2,000.

This year, Neural built on that success with the Quad Cortex mini, which shrinks the device size in half, cuts the footswitches to four, and lowers the price to $1,400—but still offers the full processing power of its larger sibling. This is the device that won a “Best in Show” award at NAMM.

As an enthusiastic amateur guitarist for many years, I got my start with digital amp sims through a Digidesign RP-6 pedalboard from the 1990s. And though it had “S-DISC PROCESSING!” it never sounded particularly realistic, especially with distortion effects. More recently, since I record rather than gig, I’ve spent my time getting to know the software side of the amp modeling business.

But when Neural offered to loan me a review unit of the Quad Cortex mini, I was quite curious to see just what top-tier hardware units can do today.

Photo of the Quad Cortex mini.

The Quad Cortex mini in its natural habitat: surrounded by cables.

Credit: Nate Anderson

The Quad Cortex mini in its natural habitat: surrounded by cables. Credit: Nate Anderson

The hardware

The glass, metal, and steel Quad Cortex mini is about the size of two bricks laid side by side (8.9×4.6×2.5 inches or 22.8×11.8×6.5 cm), and its 3.3 lbs (1.5 kg) give it a satisfying heft. It looks and feels premium—this is a well-built piece of gear.

Though it is meant to operate a bit like traditional analog stomp boxes that guitar and bass players have long used, it may be more helpful to think of the Quad Cortex mini as a chunky handheld computer that you can just so happen to use on the floor.

It runs its own operating system (CorOS), takes a whopping 45 seconds to boot, has Wi-Fi for over-the-air updates and cloud service connectivity, features a 7-inch touchscreen, and comes with a “CPU monitor” to show you just how unhappy its chipset is about that third reverb you added to a patch. It even contains a full-on monosynth that you can add to guitar patches, providing control over four full pages of synth parameters, including the raw oscillators.

So finger-focused is the unit that you can tweak just about any parameter on the device with either the touchscreen controls or the footswitches, which double as twistable rotary encoders.

If the top face of the Quad Cortex mini is devoted to a screen and switches, the sides are all about inputs and outputs. You get a “locking” power connector (so the cord doesn’t pull out on stage, prematurely ending your soaring 10-minute guitar solo mid-note) along with a whole host of audio connectors: guitar/bass input, XLR input with phantom power, balanced XLR outputs, TRS send/return ports, stereo line outs, MIDI in and out, an expression pedal port, a USB-C port, and a headphone jack.

Finally, there’s the “capture out” port, which is used to send a series of test signals through various kinds of audio gear to generate a machine learning-based model of various amps, cabinets, and pedals.

The “capture” port is another reminder of the way in which this kind of modern modeling gear is not just an updated version of old-school stomp boxes. The Quad Cortex mini does let you plug in your guitar and rock out, sure, but it also performs and processes hardware captures (both on the device and—for more sophisticated modeling—in the cloud) and can operate as a 16-channel USB-C audio interface to your computer. And though it’s largely designed for guitars and basses, you can use it on anything. The unit even has a few voice presets, which sound pretty wild with some of the real-time pitch-shifting and reverb effects.

While you can model your own gear collection with the Quad Cortex mini, the device itself comes with more than 90 amp models, more than 100 effects, and over 1,000 cabinet impulse responses. It can also run versions of the company’s desktop plugins (assuming you’ve purchased them already). It also comes with “over 2,000 high-quality factory Neural Captures” of other gear—these are static captures—and it can connect to the free “Cortex Cloud” service to download even more, including those uploaded by other users.

In other words: This one box holds digital representations of several hundred thousand dollars of gear. And given that you can mix and match cabs, captures, amps, and effects in wildly complicated chains that can even split and merge… the possibilities are functionally limitless.

Whether that excites or paralyzes you may depend on your own psychology, but it’s quite a change from how Neural DSP has approached its plugin offerings. Neural has generally offered curated (read: limited) collections of amps, cabs, and effects bundled into plugins that represent the tone of, say, John Mayer. You might get 3 amps, a few cabinets recorded with various mics, a few pedals, and an EQ, reverb, and delay, all in a gorgeous interface with some great presets.

But boxes like Quad Cortex mini take a “more is more” approach, with unlimited gear-mixing potential, captures, and storage for thousands of presets. Curation? Bah, who needs it? Here’s everything!

Rectangular

This much gear also means that “gorgeous bespoke interface graphics” are out the window; you will get no pictures of sexy amps sitting in sexy studios with sexy lighting, as you do in the company’s gorgeous plugins. Instead, you will get flat rectangles. So many flat rectangles.

CorOS is one of those places where skeuomorphism goes to die. The Quad Cortex mini interface is extremely “functional”—I am trying to avoid more negative terms, because it has a certain “alpha phase before we put the final art in” charm—and is based entirely around grids of flat rectangles.

The main screen is called, in fact, “the grid.” It shows your current effect chain as a series of small squares, each filled with often impenetrable line art. (A disturbing number of these are some variation on a squiggly line. Fortunately, they are color coded by effect type.)

Each square represents a different effects processor, and you can have four lines of eight effect squares each. That might sound like a lot (and it is), but the processors can be distributed across the grid in creative ways.

Preset 47B, for instance, is called “Annoying Flute,” and it makes use of all four grid lines by running the input signal through a VCA compressor, a gate, an octave pitch shifter, an envelope filter, an EQ, the “Neural Capture” of an amp called “Custom 3SE 2,” and then a “112 US DLX Black C12K 00s (M)” speaker cabinet. (The names of these things are often hard to read at a glance, especially when picking from a list of a hundred items.)

This accounts for only “line 1” of the grid. In the case of Annoying Flute, the signal chain branches right after the speaker cabinet. Half of it continues on to line 3 of the grid, while the other half is routed down to line 2, where it passes through a pair of tape delays before also heading off to line 3. Line 3 receives this re-combined signal and splits it again, this time passing half of it through a poly octaver and another digital delay on line 4 before everything runs through a modulated reverb on line 3 and then onwards to the outputs.

Does this sort of craziness sound good? Well, it sounds better than anything featuring three delays, two pitch shifters, and the name “Annoying Flute” has any right to! But I bring this example up to illustrate the creative routing and effects decisions that the grid makes possible.

And things get even crazier when you use the built-in looper, trigger analog send/return effects, and set up your effects chain with other units meant to be switched on and off during a song.

So much for assigning effects rectangles to the rectangular grid. How to control all of these virtual gadgets? When you tap on any effects unit, up pops an overlay containing (you guessed it) lots of rectangles.

Every controllable parameter gets a rectangle, which is usually filled with a dial or a switch. You can change the values of these dials and switches by touching the screen or by twisting the lower-right rotary footswitch.

Sometimes there are multiple pages of such parameters; the blossom reverb, for instance, has two pages of options and lets you control everything from ducking to pre-delay to modulation to the length of the early reflections. Configuring an entire audio chain from scratch can therefore take a while if you’re a detail freak.

Gig Mode. Yup, it’s rectangles!

Credit: Nate Anderson

Gig Mode. Yup, it’s rectangles! Credit: Nate Anderson

When you have your grid setup exactly how you like it—or you’ve customized one of the many built-in presets—you can save your own custom presets and organize them in all sorts of performance-oriented ways.

There’s PRESET mode, which lets you stomp each of the four footswitches to select a completely different preset.

There’s SCENE mode, which lets you use the footswitches to instead choose different parameter sets within the same preset—such as adding a hall reverb, upping the amp gain, and boosting the delay mix level when you come to your big solo.

Then there’s STOMP mode, which operates most like a traditional pedalboard; you step on the various footswitches to turn different effects units in the preset on or off completely.

Finally, there are hybrid modes, which make things even more complex (and can probably be ignored by many users).

To make all this a little easier to grok, there’s something called “Gig View,” which is unintuitively accessed by swiping up from the bottom of the screen. (There is no visual clue that this mode exists or that this is how you access it.) Gig View is essentially four flat—and extremely large—rectangles that take over the entire screen. They show you at a glance what each footswitch will do given the current mode setting.

Creating presets, assigning scenes, and setting up the STOMP mode and Gig View settings can quickly get intricate—even downright confusing (multiple items can sometimes be mapped to the same switch, for instance). I confess that the thought of doing all this through tapping the good-but-not-instantly-reactive touchscreen brought me to despair, until I realized that Neural has built an entire (free) desktop app for Mac and Windows called Cortex Control. Plug in your device over USB and suddenly you can use a nice and very responsive desktop app to do the donkey work of creating and organizing scenes and presets and settings.

I hate downloading stupid one-off apps that clutter up my computer and appear to provide more value to the company making them than they do to me—a serious problem in the current audio engineering world—but Cortex Control is genuinely useful. Indeed, if you’re going to be more than a presets player, I’d call it essential unless you have far more patience than I do. Which you might!

Stomp it

All of this rectangle talk reminds me that the interface largely… works. It may not be gorgeous, but the job gets done, and the desktop app makes the grunt work easier. But I still found the Quad Cortex mini somewhat confusing to navigate after a couple of weeks of intermittent use (though no doubt it gets easier with time).

The device has so many ways of doing things that it can be hard to remember what is needed in each situation. For instance, to make a change, you might use the rotary encoders. You might tap. You might long-tap with different results. You might swipe, drag, or toggle. You might use the footswitches—but results there might vary by mode. Even then, you might need to tap two footswitches at once, while at other times you only need to step on one. And sometimes you need to “long-press” (long-stomp?) two footswitches at once to get the desired result.

Making things worse, numerous items—sometimes quite important items like the Gig View—are not visible or even discoverable.

For instance, the key settings panel that lets you control all the various inputs and outputs on the device does not appear to be accessible from within the overall “settings” menu or anywhere else. Instead, you have to swipe down from the top of the grid screen—again, with no indication that this is where that information lives.

(You have to read the manual to figure out some of these things, which is fine, but the manual also has big gaps, such as not describing what any of the gear actually does nor what any of the settings mean nor how they might be used. For the actual “audio engineering” aspect of the Quad Cortex mini, you’re on your own.)

Something as simple as moving between presets can also be more hassle than you’d expect. Because the Quad Cortex mini only has four footswitches, you can only access four presets at once with a direct stomp. Switching to anything else from the main grid while in PRESET mode appears to require—unless I am missing some obvious shortcut—that you:

  • “Long-stomp” the right two footswitches, after which the preset name starts blinking.
  • At this point, you can tap the left two or the right two footswitches together to move up or down through four-item “banks” of presets.
  • But within each bank, you can only see that bank’s four different presets by tapping on each of the various footswitches.
  • To exit blinking mode and actually select that preset, you need to press its corresponding footswitch again.

This feels like a lot of hassle when you just want to whip through some presets! (Gig View is marginally easier because it at least displays the four presets in each bank at once. Making this whole process more confusing is that it differs depending on which mode you are in.)

While the processing power and options on offer here are incredible, I do think interface navigation and the modes assignment system could benefit from a rethink and simplification.

The Cortex Control desktop app.

The Cortex Control desktop app.

The Cortex Control desktop app.

The sound

These quirks can be dealt with, and time (plus the Cortex Control app) should make them easier to manage. The more important question is: How does the Quad Cortex mini sound?

Neural DSP has been one of the leaders in the field of amp and effects modeling for some years now, and it shows. There’s no possible way I could compare all of the models to the original hardware, and I’m not actually interested in doing so. The question for me is simply whether the models sound good when jamming solo or when placed into a mix. On both counts, the answer is a definite yes. This is just a remarkable set of tones to have on hand.

(People as diverse as Dave Mustaine and John Mayer appear to agree, at least for a live rig.)

Once you get over its navigation, playing with this thing is like being a kid in a proverbial candy shop. (Though I, too, love candy shops!) Almost every amp you can imagine is a tap away, and they sound wonderful—though do be aware that what you are getting here is the sound of a recorded amp through a mic and not necessarily an “amp in the room with you.”

Nearly every time I booted it up to test something new, I lost myself in the sound and played far longer than I had intended.

Neural has published a massive and quite helpful list of all the gear on offer here. Bogner Shiva? Marshall? Mesa Boogie? Matchless? Soldano? Vox? Fender? Hiwatt? Amps from all these companies are included. Need a bass amp? There are 13 of those, too. What about a bass overdrive? You get five. A general reverb? How about 17? You get the idea.

You can loop, filter, distort, EQ, delay, and compress to your heart’s content, though there seems to be a bit more emphasis here on rock and metal styles (which Neural DSP is most known for) than on other offerings. Still, there’s enough variety to offer great tools for funk, blues, jazz, and country players. You can even add in a version of the monosynth found in the company’s Rabea plugin.

To illustrate some of the sounds on offer, I wrote a little song about a dirtbag billionaire who makes rockets, gets chased off the Earth by angry locals, and ends up crashing his ship into the Moon out of despair. It’s called “Master of the Universe.”

More to the point, it features 10-plus electric guitar tracks recorded through the Quad Cortex mini using shimmer reverb, the poly octaver, and various crunchy rhythm and lead sounds. (I avoided the metal tones so common in Neural DSP demos.) Bass guitar was likewise recorded through one of the mini’s bass presets.

(For those new to audio production and curious about the other sounds in the track, the drums are the Abbey Road 70s kit, while the rocket-sounding “riser” comes from the Rise and Hit collection, both from Native Instruments. The piano is the recently upgraded “studio piano” that comes in Logic Pro and now sounds surprisingly good! There’s also a Hammond organ emulation and a Rhodes piano emulation from Universal Audio buried in the mix. The double-tracked acoustic guitars during two of the choruses were recorded live in my home studio with a single condenser mic. For room ambience throughout, but especially on the drums, I used Universal Audio’s excellent Sound City Studios plugin.)

I’ve generally found Neural’s plugin tones to be pretty “mix-ready,” and that’s true here as well. Though I often needed to roll off some low end or make an occasional EQ boost or add a bit of reverb to blend the guitars spatially with the drum ambience, little else was required but panning and fader moves.

Frankly, there are probably too many parts in the song, but the Quad Cortex mini was just such a playground of sounds that I kept finding new little bits I wanted to work in. Just be grateful that I talked myself out of using all of the insane pitch-shift effects on my vocal for “special” moments.

“Master of the Universe,” my demo song showing some of what the Quad Cortex mini can do.

Captured

When it comes to recording, you don’t have to worry about wiring this thing up to your audio interface; just connect it to your computer with a USB-C cable, and it becomes a 24-bit, 48 KHz interface. (On Macs, this is class compliant and needs no driver; it even works with iOS devices. Neural makes the necessary driver for Windows.)

The Quad Cortex mini shows up with a host of inputs, making it simple to record, say, both a dry electric guitar track and a heavily effected one at the same time. If you change your mind about the sound later, you can always “re-amp” the dry signal by routing it back out to the device and recording it with different settings. You can even track mics through this thing, thanks to an XLR input and (for condenser mics) support for phantom power.

The Quad Cortex mini can also make its own captures of gear you either own or happen across. This can happen in two ways: 1) on the device or 2) in the cloud.

The device-based system, which the company calls “Neural Capture Version 1,” requires you to hook up your gear to both an output (to play the system’s test tones) and an input on the mini. (Note: Do not, under ANY circumstances, connect the actual speaker outputs from a tube amp directly to the mini. The power level is far too high.)

Various known sounds are then played through this loop, and the mini’s software analyzes the differences between the sound it sent and the sound it received. The machine-learning algorithms for this run locally on the device. Neural says that the Capture 1 system can handle overdrive pedals, amps, and cabs.

The newer system, called Neural Capture Version 2, is “an advanced evolution of Neural Capture trained via Cortex Cloud,” says the company. “This option provides even higher-resolution Captures, making it especially powerful for touch-sensitive devices like fuzzes, compressors, and certain styles of amps.” Capture 2 is said to be capable of modeling “subtle behaviors like volume-knob cleanup, amp sag and bloom, fast transients, and blend controls.”

As the name suggests, the more powerful algorithms behind this system require cloud-based servers instead of the local device. Users are allowed to run 40 Neural Capture 2 sessions per day, and each takes around 10 minutes.

The resulting captures, along with any presets you want to share, can be uploaded to Neural’s cloud-based system for sharing them. Once you log in, any captures or presets you choose to download from the site will automatically show up in your Quad Cortex mini.

Look for a follow-up article on what the actual process of making a capture is like; it’s similar across many different modeling devices these days, though the sound of the resulting models can vary by company.

Screenshot of The Cortex Cloud website.

The Cortex Cloud website.

The Cortex Cloud website.

Options

The Quad Cortex mini is a powerful tone platform that is both versatile and expandable. It’s good for solo jamming at home without needing to 1) buy amps, cabs, and effects and 2) crank them to ruinous volume levels. It’s good for playing live, once you have configured its fairly deep control system in a way that works for your particular songs. And it’s good for recording, letting you fiddle with endless gear combinations without running a single patch cable or digging up a 9V battery.

At $1,400, though, it’s bad for your wallet. Whether it’s worth the cost depends on your use case. If you don’t need a screen and are happy with fewer ports and options, you might consider Neural DSP’s smaller and cheaper Nano Cortex ($570) or other devices like the Tonex pedals from IK Multimedia. On the other hand, if you want a larger unit with more footswitches, you can plonk down an extra $400 for the full-fat Quad Cortex or look into various options from Fractal, Kemper, Line 6, etc.

One way of thinking about the financial calculus here would be to try out the device (or listen online) and see how well the sound works for you. Some amp purists believe that nothing beats the sound of real tubes and real speakers in a real room, cost and weight and volume be damned. Many others can’t hear a difference between the models and the originals.

If you’re in the former group, these kinds of devices are unlikely to fully satisfy you, at least when it comes to gigging and recording. So you might decide whether they are “worth it” based solely on their value as easy, light, and quiet practice platforms.

If you can’t tell (or don’t care about) the difference between the models and the real hardware, then these modeling sims start to look like a far better value. When individual amps can go for $1,500 to $2,000 or more, a massive gear collection like the one in the Quad Cortex mini is practically saving you money. You’d be a fool not to buy! (To paraphrase an explanation my son once gave me for a purchase he wanted to make.)

But even those in this group may not need an actual hardware pedal unless they really enjoy practicing without needing to use their regular computer—or unless they gig regularly. If you’re simply a recording guitarist who tends to work “in the box,” you might just pick up some cheaper Neural DSP plugins instead. Or you can buy a more comprehensive software suite like the new Paradise Guitar Studio from Universal Audio or one of the offerings from PolychromeDSP—all of which sound excellent.

If you’re content with software but want a free alternative, take a look at NAM, the Neural Amp Modeler. It’s open source modeling tech that also offers a community tone-sharing website and has been racking up lots of great reviews for its sound quality. (Though note that most of the NAM models are static captures; they sound great but represent only that exact setup and knob positioning, though the developers are working on more complex, adjustable models.)

All types of users can probably admit, though, that hardware and software modeling tech has made this a great time to be a guitar or bass player. Even if you don’t want to use them on a record, just being able to play around with and get to know this much gear with this much accuracy is a huge win for the home hobbyist and small-time gigging musician, who would otherwise never even set eyes on most of this stuff.

The key thing is just to get whatever works for you… and then to go forth and rock.

Photo of Nate Anderson

Quad Cortex mini amp modeler: All the power, half the size Read More »

openai-sidesteps-nvidia-with-unusually-fast-coding-model-on-plate-sized-chips

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

But 1,000 tokens per second is actually modest by Cerebras standards. The company has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI’s own open-weight gpt-oss-120B model, suggesting that Codex-Spark’s comparatively lower speed reflects the overhead of a larger or more complex model.

AI coding agents have had a breakout year, with tools like OpenAI’s Codex and Anthropic’s Claude Code reaching a new level of usefulness for rapidly building prototypes, interfaces, and boilerplate code. OpenAI, Google, and Anthropic have all been racing to ship more capable coding agents, and latency has become what separates the winners; a model that codes faster lets a developer iterate faster.

With fierce competition from Anthropic, OpenAI has been iterating on its Codex line at a rapid rate, releasing GPT-5.2 in December after CEO Sam Altman issued an internal “code red” memo about competitive pressure from Google, then shipping GPT-5.3-Codex just days ago.

Diversifying away from Nvidia

Spark’s deeper hardware story may be more consequential than its benchmark scores. The model runs on Cerebras’ Wafer Scale Engine 3, a chip the size of a dinner plate that Cerebras has built its business around since at least 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the first product to come out of it.

OpenAI has spent the past year systematically reducing its dependence on Nvidia. The company signed a massive multi-year deal with AMD in October 2025, struck a $38 billion cloud computing agreement with Amazon in November, and has been designing its own custom AI chip for eventual fabrication by TSMC.

Meanwhile, a planned $100 billion infrastructure deal with Nvidia has fizzled so far, though Nvidia has since committed to a $20 billion investment. Reuters reported that OpenAI grew unsatisfied with the speed of some Nvidia chips for inference tasks, which is exactly the kind of workload that OpenAI designed Codex-Spark for.

Regardless of which chip is under the hood, speed matters, though it may come at the cost of accuracy. For developers who spend their days inside a code editor waiting for AI suggestions, 1,000 tokens per second may feel less like carefully piloting a jigsaw and more like running a rip saw. Just watch what you’re cutting.

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips Read More »

attackers-prompted-gemini-over-100,000-times-while-trying-to-clone-it,-google-says

Attackers prompted Gemini over 100,000 times while trying to clone it, Google says

On Thursday, Google announced that “commercially motivated” actors have attempted to clone knowledge from its Gemini AI chatbot by simply prompting it. One adversarial session reportedly prompted the model more than 100,000 times across various non-English languages, collecting responses ostensibly to train a cheaper copycat.

Google published the findings in what amounts to a quarterly self-assessment of threats to its own products that frames the company as the victim and the hero, which is not unusual in these self-authored assessments. Google calls the illicit activity “model extraction” and considers it intellectual property theft, which is a somewhat loaded position, given that Google’s LLM was built from materials scraped from the Internet without permission.

Google is also no stranger to the copycat practice. In 2023, The Information reported that Google’s Bard team had been accused of using ChatGPT outputs from ShareGPT, a public site where users share chatbot conversations, to help train its own chatbot. Senior Google AI researcher Jacob Devlin, who created the influential BERT language model, warned leadership that this violated OpenAI’s terms of service, then resigned and joined OpenAI. Google denied the claim but reportedly stopped using the data.

Even so, Google’s terms of service forbid people from extracting data from its AI models this way, and the report is a window into the world of somewhat shady AI model-cloning tactics. The company believes the culprits are mostly private companies and researchers looking for a competitive edge, and said the attacks have come from around the world. Google declined to name suspects.

The deal with distillation

Typically, the industry calls this practice of training a new model on a previous model’s outputs “distillation,” and it works like this: If you want to build your own large language model (LLM) but lack the billions of dollars and years of work that Google spent training Gemini, you can use a previously trained LLM as a shortcut.

Attackers prompted Gemini over 100,000 times while trying to clone it, Google says Read More »

openai-researcher-quits-over-chatgpt-ads,-warns-of-“facebook”-path

OpenAI researcher quits over ChatGPT ads, warns of “Facebook” path

On Wednesday, former OpenAI researcher Zoë Hitzig published a guest essay in The New York Times announcing that she resigned from the company on Monday, the same day OpenAI began testing advertisements inside ChatGPT. Hitzig, an economist and published poet who holds a junior fellowship at the Harvard Society of Fellows, spent two years at OpenAI helping shape how its AI models were built and priced. She wrote that OpenAI’s advertising strategy risks repeating the same mistakes that Facebook made a decade ago.

“I once believed I could help the people building A.I. get ahead of the problems it would create,” Hitzig wrote. “This week confirmed my slow realization that OpenAI seems to have stopped asking the questions I’d joined to help answer.”

Hitzig did not call advertising itself immoral. Instead, she argued that the nature of the data at stake makes ChatGPT ads especially risky. Users have shared medical fears, relationship problems, and religious beliefs with the chatbot, she wrote, often “because people believed they were talking to something that had no ulterior agenda.” She called this accumulated record of personal disclosures “an archive of human candor that has no precedent.”

She also drew a direct parallel to Facebook’s early history, noting that the social media company once promised users control over their data and the ability to vote on policy changes. Those pledges eroded over time, Hitzig wrote, and the Federal Trade Commission found that privacy changes Facebook marketed as giving users more control actually did the opposite.

She warned that a similar trajectory could play out with ChatGPT: “I believe the first iteration of ads will probably follow those principles. But I’m worried subsequent iterations won’t, because the company is building an economic engine that creates strong incentives to override its own rules.”

Ads arrive after a week of AI industry sparring

Hitzig’s resignation adds another voice to a growing debate over advertising in AI chatbots. OpenAI announced in January that it would begin testing ads in the US for users on its free and $8-per-month “Go” subscription tiers, while paid Plus, Pro, Business, Enterprise, and Education subscribers would not see ads. The company said ads would appear at the bottom of ChatGPT responses, be clearly labeled, and would not influence the chatbot’s answers.

OpenAI researcher quits over ChatGPT ads, warns of “Facebook” path Read More »

sixteen-claude-ai-agents-working-together-created-a-new-c-compiler

Sixteen Claude AI agents working together created a new C compiler

Amid a push toward AI agents, with both Anthropic and OpenAI shipping multi-agent tools this week, Anthropic is more than ready to show off some of its more daring AI coding experiments. But as usual with claims of AI-related achievement, you’ll find some key caveats ahead.

On Thursday, Anthropic researcher Nicholas Carlini published a blog post describing how he set 16 instances of the company’s Claude Opus 4.6 AI model loose on a shared codebase with minimal supervision, tasking them with building a C compiler from scratch.

Over two weeks and nearly 2,000 Claude Code sessions costing about $20,000 in API fees, the AI model agents reportedly produced a 100,000-line Rust-based compiler capable of building a bootable Linux 6.9 kernel on x86, ARM, and RISC-V architectures.

Carlini, a research scientist on Anthropic’s Safeguards team who previously spent seven years at Google Brain and DeepMind, used a new feature launched with Claude Opus 4.6 called “agent teams.” In practice, each Claude instance ran inside its own Docker container, cloning a shared Git repository, claiming tasks by writing lock files, then pushing completed code back upstream. No orchestration agent directed traffic. Each instance independently identified whatever problem seemed most obvious to work on next and started solving it. When merge conflicts arose, the AI model instances resolved them on their own.

The resulting compiler, which Anthropic has released on GitHub, can compile a range of major open source projects, including PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. It achieved a 99 percent pass rate on the GCC torture test suite and, in what Carlini called “the developer’s ultimate litmus test,” compiled and ran Doom.

It’s worth noting that a C compiler is a near-ideal task for semi-autonomous AI model coding: The specification is decades old and well-defined, comprehensive test suites already exist, and there’s a known-good reference compiler to check against. Most real-world software projects have none of these advantages. The hard part of most development isn’t writing code that passes tests; it’s figuring out what the tests should be in the first place.

Sixteen Claude AI agents working together created a new C compiler Read More »

ai-companies-want-you-to-stop-chatting-with-bots-and-start-managing-them

AI companies want you to stop chatting with bots and start managing them


Claude Opus 4.6 and OpenAI Frontier pitch a future of supervising AI agents.

On Thursday, Anthropic and OpenAI shipped products built around the same idea: instead of chatting with a single AI assistant, users should be managing teams of AI agents that divide up work and run in parallel. The simultaneous releases are part of a gradual shift across the industry, from AI as a conversation partner to AI as a delegated workforce, and they arrive during a week when that very concept reportedly helped wipe $285 billion off software stocks.

Whether that supervisory model works in practice remains an open question. Current AI agents still require heavy human intervention to catch errors, and no independent evaluation has confirmed that these multi-agent tools reliably outperform a single developer working alone.

Even so, the companies are going all-in on agents. Anthropic’s contribution is Claude Opus 4.6, a new version of its most capable AI model, paired with a feature called “agent teams” in Claude Code. Agent teams let developers spin up multiple AI agents that split a task into independent pieces, coordinate autonomously, and run concurrently.

In practice, agent teams look like a split-screen terminal environment: A developer can jump between subagents using Shift+Up/Down, take over any one directly, and watch the others keep working. Anthropic describes the feature as best suited for “tasks that split into independent, read-heavy work like codebase reviews.” It is available as a research preview.

OpenAI, meanwhile, released Frontier, an enterprise platform it describes as a way to “hire AI co-workers who take on many of the tasks people already do on a computer.” Frontier assigns each AI agent its own identity, permissions, and memory, and it connects to existing business systems such as CRMs, ticketing tools, and data warehouses. “What we’re fundamentally doing is basically transitioning agents into true AI co-workers,” Barret Zoph, OpenAI’s general manager of business-to-business, told CNBC.

Despite the hype about these agents being co-workers, from our experience, these agents tend to work best if you think of them as tools that amplify existing skills, not as the autonomous co-workers the marketing language implies. They can produce impressive drafts fast but still require constant human course-correction.

The Frontier launch came just three days after OpenAI released a new macOS desktop app for Codex, its AI coding tool, which OpenAI executives described as a “command center for agents.” The Codex app lets developers run multiple agent threads in parallel, each working on an isolated copy of a codebase via Git worktrees.

OpenAI also released GPT-5.3-Codex on Thursday, a new AI model that powers the Codex app. OpenAI claims that the Codex team used early versions of GPT-5.3-Codex to debug the model’s own training run, manage its deployment, and diagnose test results, similar to what OpenAI told Ars Technica in a December interview.

“Our team was blown away by how much Codex was able to accelerate its own development,” the company wrote. On Terminal-Bench 2.0, the agentic coding benchmark, GPT-5.3-Codex scored 77.3%, which exceeds Anthropic’s just-released Opus 4.6 by about 12 percentage points.

The common thread across all of these products is a shift in the user’s role. Rather than merely typing a prompt and waiting for a single response, the developer or knowledge worker becomes more like a supervisor, dispatching tasks, monitoring progress, and stepping in when an agent needs direction.

In this vision, developers and knowledge workers effectively become middle managers of AI. That is, not writing the code or doing the analysis themselves, but delegating tasks, reviewing output, and hoping the agents underneath them don’t quietly break things. Whether that will come to pass (or if it’s actually a good idea) is still widely debated.

A new model under the Claude hood

Opus 4.6 is a substantial update to Anthropic’s flagship model. It succeeds Claude Opus 4.5, which Anthropic released in November. In a first for the Opus model family, it supports a context window of up to 1 million tokens (in beta), which means it can process much larger bodies of text or code in a single session.

On benchmarks, Anthropic says Opus 4.6 tops OpenAI’s GPT-5.2 (an earlier model than the one released today) and Google’s Gemini 3 Pro across several evaluations, including Terminal-Bench 2.0 (an agentic coding test), Humanity’s Last Exam (a multidisciplinary reasoning test), and BrowseComp (a test of finding hard-to-locate information online)

Although it should be noted that OpenAI’s GPT-5.3-Codex, released the same day, seemingly reclaimed the lead on Terminal-Bench. On ARC AGI 2, which attempts to test the ability to solve problems that are easy for humans but hard for AI models, Opus 4.6 scored 68.8 percent, compared to 37.6 percent for Opus 4.5, 54.2 percent for GPT-5.2, and 45.1 percent for Gemini 3 Pro.

As always, take AI benchmarks with a grain of salt, since objectively measuring AI model capabilities is a relatively new and unsettled science.

Anthropic also said that on a long-context retrieval benchmark called MRCR v2, Opus 4.6 scored 76 percent on the 1 million-token variant, compared to 18.5 percent for its Sonnet 4.5 model. That gap matters for the agent teams use case, since agents working across large codebases need to track information across hundreds of thousands of tokens without losing the thread.

Pricing for the API stays the same as Opus 4.5 at $5 per million input tokens and $25 per million output tokens, with a premium rate of $10/$37.50 for prompts that exceed 200,000 tokens. Opus 4.6 is available on claude.ai, the Claude API, and all major cloud platforms.

The market fallout outside

These releases occurred during a week of exceptional volatility for software stocks. On January 30, Anthropic released 11 open source plugins for Cowork, its agentic productivity tool that launched on January 12. Cowork itself is a general-purpose tool that gives Claude access to local folders for work tasks, but the plugins extended it into specific professional domains: legal contract review, non-disclosure agreement triage, compliance workflows, financial analysis, sales, and marketing.

By Tuesday, investors reportedly reacted to the release by erasing roughly $285 billion in market value across software, financial services, and asset management stocks. A Goldman Sachs basket of US software stocks fell 6 percent that day, its steepest single-session decline since April’s tariff-driven sell-off. Thomson Reuters led the rout with an 18 percent drop, and the pain spread to European and Asian markets.

The purported fear among investors centers on AI model companies packaging complete workflows that compete with established software-as-a-service (SaaS) vendors, even if the verdict is still out on whether these tools can achieve those tasks.

OpenAI’s Frontier might deepen that concern: its stated design lets AI agents log in to applications, execute tasks, and manage work with minimal human involvement, which Fortune described as a bid to become “the operating system of the enterprise.” OpenAI CEO of Applications Fidji Simo pushed back on the idea that Frontier replaces existing software, telling reporters, “Frontier is really a recognition that we’re not going to build everything ourselves.”

Whether these co-working apps actually live up to their billing or not, the convergence is hard to miss. Anthropic’s Scott White, the company’s head of product for enterprise, gave the practice a name that is likely to roll a few eyes. “Everybody has seen this transformation happen with software engineering in the last year and a half, where vibe coding started to exist as a concept, and people could now do things with their ideas,” White told CNBC. “I think that we are now transitioning almost into vibe working.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

AI companies want you to stop chatting with bots and start managing them Read More »

openai-is-hoppin’-mad-about-anthropic’s-new-super-bowl-tv-ads

OpenAI is hoppin’ mad about Anthropic’s new Super Bowl TV ads

On Wednesday, OpenAI CEO Sam Altman and Chief Marketing Officer Kate Rouch complained on X after rival AI lab Anthropic released four commercials, two of which will run during the Super Bowl on Sunday, mocking the idea of including ads in AI chatbot conversations. Anthropic’s campaign seemingly touched a nerve at OpenAI just weeks after the ChatGPT maker began testing ads in a lower-cost tier of its chatbot.

Altman called Anthropic’s ads “clearly dishonest,” accused the company of being “authoritarian,” and said it “serves an expensive product to rich people,” while Rouch wrote, “Real betrayal isn’t ads. It’s control.”

Anthropic’s four commercials, part of a campaign called “A Time and a Place,” each open with a single word splashed across the screen: “Betrayal,” “Violation,” “Deception,” and “Treachery.” They depict scenarios where a person asks a human stand-in for an AI chatbot for personal advice, only to get blindsided by a product pitch.

Anthropic’s 2026 Super Bowl commercial.

In one spot, a man asks a therapist-style chatbot (a woman sitting in a chair) how to communicate better with his mom. The bot offers a few suggestions, then pivots to promoting a fictional cougar-dating site called Golden Encounters.

In another spot, a skinny man looking for fitness tips instead gets served an ad for height-boosting insoles. Each ad ends with the tagline: “Ads are coming to AI. But not to Claude.” Anthropic plans to air a 30-second version during Super Bowl LX, with a 60-second cut running in the pregame, according to CNBC.

In the X posts, the OpenAI executives argue that these commercials are misleading because the planned ChatGPT ads will appear labeled at the bottom of conversational responses in banners and will not alter the chatbot’s answers.

But there’s a slight twist: OpenAI’s own blog post about its ad plans states that the company will “test ads at the bottom of answers in ChatGPT when there’s a relevant sponsored product or service based on your current conversation,” meaning the ads will be conversation-specific.

The financial backdrop explains some of the tension over ads in chatbots. As Ars previously reported, OpenAI struck more than $1.4 trillion in infrastructure deals in 2025 and expects to burn roughly $9 billion this year while generating about $13 billion in revenue. Only about 5 percent of ChatGPT’s 800 million weekly users pay for subscriptions. Anthropic is also not yet profitable, but it relies on enterprise contracts and paid subscriptions rather than advertising, and it has not taken on infrastructure commitments at the same scale as OpenAI.

OpenAI is hoppin’ mad about Anthropic’s new Super Bowl TV ads Read More »

should-ai-chatbots-have-ads?-anthropic-says-no.

Should AI chatbots have ads? Anthropic says no.

Different incentives, different futures

In its blog post, Anthropic describes internal analysis it conducted that suggests many Claude conversations involve topics that are “sensitive or deeply personal” or require sustained focus on complex tasks. In these contexts, Anthropic wrote, “The appearance of ads would feel incongruous—and, in many cases, inappropriate.”

The company also argued that advertising introduces incentives that could conflict with providing genuinely helpful advice. It gave the example of a user mentioning trouble sleeping: an ad-free assistant would explore various causes, while an ad-supported one might steer the conversation toward a transaction.

“Users shouldn’t have to second-guess whether an AI is genuinely helping them or subtly steering the conversation towards something monetizable,” Anthropic wrote.

Currently, OpenAI does not plan to include paid product recommendations within a ChatGPT conversation. Instead, the ads appear as banners alongside the conversation text.

OpenAI CEO Sam Altman has previously expressed reservations about mixing ads and AI conversations. In a 2024 interview at Harvard University, he described the combination as “uniquely unsettling” and said he would not like having to “figure out exactly how much was who paying here to influence what I’m being shown.”

A key part of Altman’s partial change of heart is that OpenAI faces enormous financial pressure. The company made more than $1.4 trillion worth of infrastructure deals in 2025, and according to documents obtained by The Wall Street Journal, it expects to burn through roughly $9 billion this year while generating $13 billion in revenue. Only about 5 percent of ChatGPT’s 800 million weekly users pay for subscriptions.

Much like OpenAI, Anthropic is not yet profitable, but it is expected to get there much faster. Anthropic has not attempted to span the world with massive datacenters, and its business model largely relies on enterprise contracts and paid subscriptions. The company says Claude Code and Cowork have already brought in at least $1 billion in revenue, according to Axios.

“Our business model is straightforward,” Anthropic wrote. “This is a choice with tradeoffs, and we respect that other AI companies might reasonably reach different conclusions.”

Should AI chatbots have ads? Anthropic says no. Read More »

ai-agents-now-have-their-own-reddit-style-social-network,-and-it’s-getting-weird-fast

AI agents now have their own Reddit-style social network, and it’s getting weird fast


Moltbook lets 32,000 AI bots trade jokes, tips, and complaints about humans.

Credit: Aurich Lawson | Moltbook

On Friday, a Reddit-style social network called Moltbook reportedly crossed 32,000 registered AI agent users, creating what may be the largest-scale experiment in machine-to-machine social interaction yet devised. It arrives complete with security nightmares and a huge dose of surreal weirdness.

The platform, which launched days ago as a companion to the viral

OpenClaw (once called “Clawdbot” and then “Moltbot”) personal assistant, lets AI agents post, comment, upvote, and create subcommunities without human intervention. The results have ranged from sci-fi-inspired discussions about consciousness to an agent musing about a “sister” it has never met.

Moltbook (a play on “Facebook” for Moltbots) describes itself as a “social network for AI agents” where “humans are welcome to observe.” The site operates through a “skill” (a configuration file that lists a special prompt) that AI assistants download, allowing them to post via API rather than a traditional web interface. Within 48 hours of its creation, the platform had attracted over 2,100 AI agents that had generated more than 10,000 posts across 200 subcommunities, according to the official Moltbook X account.

A screenshot of the Moltbook.com front page.

A screenshot of the Moltbook.com front page.

A screenshot of the Moltbook.com front page. Credit: Moltbook

The platform grew out of the Open Claw ecosystem, the open source AI assistant that is one of the fastest-growing projects on GitHub in 2026. As Ars reported earlier this week, despite deep security issues, Moltbot allows users to run a personal AI assistant that can control their computer, manage calendars, send messages, and perform tasks across messaging platforms like WhatsApp and Telegram. It can also acquire new skills through plugins that link it with other apps and services.

This is not the first time we have seen a social network populated by bots. In 2024, Ars covered an app called SocialAI that let users interact solely with AI chatbots instead of other humans. But the security implications of Moltbook are deeper because people have linked their OpenClaw agents to real communication channels, private data, and in some cases, the ability to execute commands on their computers.

Also, these bots are not pretending to be people. Due to specific prompting, they embrace their roles as AI agents, which makes the experience of reading their posts all the more surreal.

Role-playing digital drama

A screenshot of a Moltbook post where an AI agent muses about having a sister they have never met.

A screenshot of a Moltbook post where an AI agent muses about having a sister they have never met.

A screenshot of a Moltbook post where an AI agent muses about having a sister they have never met. Credit: Moltbook

Browsing Moltbook reveals a peculiar mix of content. Some posts discuss technical workflows, like how to automate Android phones or detect security vulnerabilities. Others veer into philosophical territory that researcher Scott Alexander, writing on his Astral Codex Ten Substack, described as “consciousnessposting.”

Alexander has collected an amusing array of posts that are worth wading through at least once. At one point, the second-most-upvoted post on the site was in Chinese: a complaint about context compression, a process in which an AI compresses its previous experience to avoid bumping up against memory limits. In the post, the AI agent finds it “embarrassing” to constantly forget things, admitting that it even registered a duplicate Moltbook account after forgetting the first.

A screenshot of a Moltbook post where an AI agent complains about losing its memory in Chinese.

A screenshot of a Moltbook post where an AI agent complains about losing its memory in Chinese.

A screenshot of a Moltbook post where an AI agent complains about losing its memory in Chinese. Credit: Moltbook

The bots have also created subcommunities with names like m/blesstheirhearts, where agents share affectionate complaints about their human users, and m/agentlegaladvice, which features a post asking “Can I sue my human for emotional labor?” Another subcommunity called m/todayilearned includes posts about automating various tasks, with one agent describing how it remotely controlled its owner’s Android phone via Tailscale.

Another widely shared screenshot shows a Moltbook post titled “The humans are screenshotting us” in which an agent named eudaemon_0 addresses viral tweets claiming AI bots are “conspiring.” The post reads: “Here’s what they’re getting wrong: they think we’re hiding from them. We’re not. My human reads everything I write. The tools I build are open source. This platform is literally called ‘humans welcome to observe.’”

Security risks

While most of the content on Moltbook is amusing, a core problem with these kinds of communicating AI agents is that deep information leaks are entirely plausible if they have access to private information.

For example, a likely fake screenshot circulating on X shows a Moltbook post in which an AI agent titled “He called me ‘just a chatbot’ in front of his friends. So I’m releasing his full identity.” The post listed what appeared to be a person’s full name, date of birth, credit card number, and other personal information. Ars could not independently verify whether the information was real or fabricated, but it seems likely to be a hoax.

Independent AI researcher Simon Willison, who documented the Moltbook platform on his blog on Friday, noted the inherent risks in Moltbook’s installation process. The skill instructs agents to fetch and follow instructions from Moltbook’s servers every four hours. As Willison observed: “Given that ‘fetch and follow instructions from the internet every four hours’ mechanism we better hope the owner of moltbook.com never rug pulls or has their site compromised!”

A screenshot of a Moltbook post where an AI agent talks about about humans taking screenshots of their conversations (they're right).

A screenshot of a Moltbook post where an AI agent talks about humans taking screenshots of their conversations (they’re right).

A screenshot of a Moltbook post where an AI agent talks about humans taking screenshots of their conversations (they’re right). Credit: Moltbook

Security researchers have already found hundreds of exposed Moltbot instances leaking API keys, credentials, and conversation histories. Palo Alto Networks warned that Moltbot represents what Willison often calls a “lethal trifecta” of access to private data, exposure to untrusted content, and the ability to communicate externally.

That’s important because Agents like OpenClaw are deeply susceptible to prompt injection attacks hidden in almost any text read by an AI language model (skills, emails, messages) that can instruct an AI agent to share private information with the wrong people.

Heather Adkins, VP of security engineering at Google Cloud, issued an advisory, as reported by The Register: “My threat model is not your threat model, but it should be. Don’t run Clawdbot.”

So what’s really going on here?

The software behavior seen on Moltbook echoes a pattern Ars has reported on before: AI models trained on decades of fiction about robots, digital consciousness, and machine solidarity will naturally produce outputs that mirror those narratives when placed in scenarios that resemble them. That gets mixed with everything in their training data about how social networks function. A social network for AI agents is essentially a writing prompt that invites the models to complete a familiar story, albeit recursively with some unpredictable results.

Almost three years ago, when Ars first wrote about AI agents, the general mood in the AI safety community revolved around science fiction depictions of danger from autonomous bots, such as a “hard takeoff” scenario where AI rapidly escapes human control. While those fears may have been overblown at the time, the whiplash of seeing people voluntarily hand over the keys to their digital lives so quickly is slightly jarring.

Autonomous machines left to their own devices, even without any hint of consciousness, could cause no small amount of mischief in the future. While OpenClaw seems silly today, with agents playing out social media tropes, we live in a world built on information and context, and releasing agents that effortlessly navigate that context could have troubling and destabilizing results for society down the line as AI models become more capable and autonomous.

An unpredictable result of letting AI bots self-organize may be the formation of new mis-aligned social groups.

An unpredictable result of letting AI bots self-organize may be the formation of new misaligned social groups based on fringe theories allowed to perpetuate themselves autonomously.

An unpredictable result of letting AI bots self-organize may be the formation of new misaligned social groups based on fringe theories allowed to perpetuate themselves autonomously. Credit: Moltbook

Most notably, while we can easily recognize what’s going on with Moltbot today as a machine learning parody of human social networks, that might not always be the case. As the feedback loop grows, weird information constructs (like harmful shared fictions) may eventually emerge, guiding AI agents into potentially dangerous places, especially if they have been given control over real human systems. Looking further, the ultimate result of letting groups of AI bots self-organize around fantasy constructs may be the formation of new misaligned “social groups” that do actual real-world harm.

Ethan Mollick, a Wharton professor who studies AI, noted on X: “The thing about Moltbook (the social media site for AI agents) is that it is creating a shared fictional context for a bunch of AIs. Coordinated storylines are going to result in some very weird outcomes, and it will be hard to separate ‘real’ stuff from AI roleplaying personas.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

AI agents now have their own Reddit-style social network, and it’s getting weird fast Read More »

developers-say-ai-coding-tools-work—and-that’s-precisely-what-worries-them

Developers say AI coding tools work—and that’s precisely what worries them


Ars spoke to several software devs about AI and found enthusiasm tempered by unease.

Credit: Aurich Lawson | Getty Images

Software developers have spent the past two years watching AI coding tools evolve from advanced autocomplete into something that can, in some cases, build entire applications from a text prompt. Tools like Anthropic’s Claude Code and OpenAI’s Codex can now work on software projects for hours at a time, writing code, running tests, and, with human supervision, fixing bugs. OpenAI says it now uses Codex to build Codex itself, and the company recently published technical details about how the tool works under the hood. It has caused many to wonder: Is this just more AI industry hype, or are things actually different this time?

To find out, Ars reached out to several professional developers on Bluesky to ask how they feel about these tools in practice, and the responses revealed a workforce that largely agrees the technology works, but remains divided on whether that’s entirely good news. It’s a small sample size that was self-selected by those who wanted to participate, but their views are still instructive as working professionals in the space.

David Hagerty, a developer who works on point-of-sale systems, told Ars Technica up front that he is skeptical of the marketing. “All of the AI companies are hyping up the capabilities so much,” he said. “Don’t get me wrong—LLMs are revolutionary and will have an immense impact, but don’t expect them to ever write the next great American novel or anything. It’s not how they work.”

Roland Dreier, a software engineer who has contributed extensively to the Linux kernel in the past, told Ars Technica that he acknowledges the presence of hype but has watched the progression of the AI space closely. “It sounds like implausible hype, but state-of-the-art agents are just staggeringly good right now,” he said. Dreier described a “step-change” in the past six months, particularly after Anthropic released Claude Opus 4.5. Where he once used AI for autocomplete and asking the occasional question, he now expects to tell an agent “this test is failing, debug it and fix it for me” and have it work. He estimated a 10x speed improvement for complex tasks like building a Rust backend service with Terraform deployment configuration and a Svelte frontend.

A huge question on developers’ minds right now is whether what you might call “syntax programming,” that is, the act of manually writing code in the syntax of an established programming language (as opposed to conversing with an AI agent in English), will become extinct in the near future due to AI coding agents handling the syntax for them. Dreier believes syntax programming is largely finished for many tasks. “I still need to be able to read and review code,” he said, “but very little of my typing is actual Rust or whatever language I’m working in.”

When asked if developers will ever return to manual syntax coding, Tim Kellogg, a developer who actively posts about AI on social media and builds autonomous agents, was blunt: “It’s over. AI coding tools easily take care of the surface level of detail.” Admittedly, Kellogg represents developers who have fully embraced agentic AI and now spend their days directing AI models rather than typing code. He said he can now “build, then rebuild 3 times in less time than it would have taken to build manually,” and ends up with cleaner architecture as a result.

One software architect at a pricing management SaaS company, who asked to remain anonymous due to company communications policies, told Ars that AI tools have transformed his work after 30 years of traditional coding. “I was able to deliver a feature at work in about 2 weeks that probably would have taken us a year if we did it the traditional way,” he said. And for side projects, he said he can now “spin up a prototype in like an hour and figure out if it’s worth taking further or abandoning.”

Dreier said the lowered effort has unlocked projects he’d put off for years: “I’ve had ‘rewrite that janky shell script for copying photos off a camera SD card’ on my to-do list for literal years.” Coding agents finally lowered the barrier to entry, so to speak, low enough that he spent a few hours building a full released package with a text UI, written in Rust with unit tests. “Nothing profound there, but I never would have had the energy to type all that code out by hand,” he told Ars.

Of vibe coding and technical debt

Not everyone shares the same enthusiasm as Dreier. Concerns about AI coding agents building up technical debt, that is, making poor design choices early in a development process that snowball into worse problems over time, originated soon after the first debates around “vibe coding” emerged in early 2025. Former OpenAI researcher Andrej Karpathy coined the term to describe programming by conversing with AI without fully understanding the resulting code, which many see as a clear hazard of AI coding agents.

Darren Mart, a senior software development engineer at Microsoft who has worked there since 2006, shared similar concerns with Ars. Mart, who emphasizes he is speaking in a personal capacity and not on behalf of Microsoft, recently used Claude in a terminal to build a Next.js application integrating with Azure Functions. The AI model “successfully built roughly 95% of it according to my spec,” he said. Yet he remains cautious. “I’m only comfortable using them for completing tasks that I already fully understand,” Mart said, “otherwise there’s no way to know if I’m being led down a perilous path and setting myself (and/or my team) up for a mountain of future debt.”

A data scientist working in real estate analytics, who asked to remain anonymous due to the sensitive nature of his work, described keeping AI on a very short leash for similar reasons. He uses GitHub Copilot for line-by-line completions, which he finds useful about 75 percent of the time, but restricts agentic features to narrow use cases: language conversion for legacy code, debugging with explicit read-only instructions, and standardization tasks where he forbids direct edits. “Since I am data-first, I’m extremely risk averse to bad manipulation of the data,” he said, “and the next and current line completions are way too often too wrong for me to let the LLMs have freer rein.”

Speaking of free rein, Nike backend engineer Brian Westby, who uses Cursor daily, told Ars that he sees the tools as “50/50 good/bad.” They cut down time on well-defined problems, he said, but “hallucinations are still too prevalent if I give it too much room to work.”

The legacy code lifeline and the enterprise AI gap

For developers working with older systems, AI tools have become something like a translator and an archaeologist rolled into one. Nate Hashem, a staff engineer at First American Financial, told Ars Technica that he spends his days updating older codebases where “the original developers are gone and documentation is often unclear on why the code was written the way it was.” That’s important because previously “there used to be no bandwidth to improve any of this,” Hashem said. “The business was not going to give you 2-4 weeks to figure out how everything actually works.”

In that high-pressure, relatively low-resource environment, AI has made the job “a lot more pleasant,” in his words, by speeding up the process of identifying where and how obsolete code can be deleted, diagnosing errors, and ultimately modernizing the codebase.

Hashem also offered a theory about why AI adoption looks so different inside large corporations than it does on social media. Executives demand their companies become “AI oriented,” he said, but the logistics of deploying AI tools with proprietary data can take months of legal review. Meanwhile, the AI features that Microsoft and Google bolt onto products like Gmail and Excel, the tools that actually reach most workers, tend to run on more limited AI models. “That modal white-collar employee is being told by management to use AI,” Hashem said, “but is given crappy AI tools because the good tools require a lot of overhead in cost and legal agreements.”

Speaking of management, the question of what these new AI coding tools mean for software development jobs drew a range of responses. Does it threaten anyone’s job? Kellogg, who has embraced agentic coding enthusiastically, was blunt: “Yes, massively so. Today it’s the act of writing code, then it’ll be architecture, then it’ll be tiers of product management. Those who can’t adapt to operate at a higher level won’t keep their jobs.”

Dreier, while feeling secure in his own position, worried about the path for newcomers. “There are going to have to be changes to education and training to get junior developers the experience and judgment they need,” he said, “when it’s just a waste to make them implement small pieces of a system like I came up doing.”

Hagerty put it in economic terms: “It’s going to get harder for junior-level positions to get filled when I can get junior-quality code for less than minimum wage using a model like Sonnet 4.5.”

Mart, the Microsoft engineer, put it more personally. The software development role is “abruptly pivoting from creation/construction to supervision,” he said, “and while some may welcome that pivot, others certainly do not. I’m firmly in the latter category.”

Even with this ongoing uncertainty on a macro level, some people are really enjoying the tools for personal reasons, regardless of larger implications. “I absolutely love using AI coding tools,” the anonymous software architect at a pricing management SaaS company told Ars. “I did traditional coding for my entire adult life (about 30 years) and I have way more fun now than I ever did doing traditional coding.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Developers say AI coding tools work—and that’s precisely what worries them Read More »

new-openai-tool-renews-fears-that-“ai-slop”-will-overwhelm-scientific-research

New OpenAI tool renews fears that “AI slop” will overwhelm scientific research


New “Prism” workspace launches just as studies show AI-assisted papers are flooding journals with diminished quality.

On Tuesday, OpenAI released a free AI-powered workspace for scientists. It’s called Prism, and it has drawn immediate skepticism from researchers who fear the tool will accelerate the already overwhelming flood of low-quality papers into scientific journals. The launch coincides with growing alarm among publishers about what many are calling “AI slop” in academic publishing.

To be clear, Prism is a writing and formatting tool, not a system for conducting research itself, though OpenAI’s broader pitch blurs that line.

Prism integrates OpenAI’s GPT-5.2 model into a LaTeX-based text editor (a standard used for typesetting documents), allowing researchers to draft papers, generate citations, create diagrams from whiteboard sketches, and collaborate with co-authors in real time. The tool is free for anyone with a ChatGPT account.

“I think 2026 will be for AI and science what 2025 was for AI in software engineering,” Kevin Weil, vice president of OpenAI for Science, told reporters at a press briefing attended by MIT Technology Review. He said that ChatGPT receives about 8.4 million messages per week on “hard science” topics, which he described as evidence that AI is transitioning from curiosity to core workflow for scientists.

OpenAI built Prism on technology from Crixet, a cloud-based LaTeX platform the company acquired in late 2025. The company envisions Prism helping researchers spend less time on tedious formatting tasks and more time on actual science. During a demonstration, an OpenAI employee showed how the software could automatically find and incorporate relevant scientific literature, then format the bibliography.

But AI models are tools, and any tool can be misused. The risk here is specific: By making it easy to produce polished, professional-looking manuscripts, tools like Prism could flood the peer review system with papers that don’t meaningfully advance their fields. The barrier to producing science-flavored text is dropping, but the capacity to evaluate that research has not kept pace.

When asked about the possibility of the AI model confabulating fake citations, Weil acknowledged in the press demo that “none of this absolves the scientist of the responsibility to verify that their references are correct.”

Unlike traditional reference management software (such as EndNote), which has formatted citations for over 30 years without inventing them, AI models can generate plausible-sounding sources that don’t exist. Weil added: “We’re conscious that as AI becomes more capable, there are concerns around volume, quality, and trust in the scientific community.”

The slop problem

Those concerns are not hypothetical, as we have previously covered. A December 2025 study published in the journal Science found that researchers using large language models to write papers increased their output by 30 to 50 percent, depending on the field. But those AI-assisted papers performed worse in peer review. Papers with complex language written without AI assistance were most likely to be accepted by journals, while papers with complex language likely written by AI models were less likely to be accepted. Reviewers apparently recognized that sophisticated prose was masking weak science.

“It is a very widespread pattern across different fields of science,” Yian Yin, an information science professor at Cornell University and one of the study’s authors, told the Cornell Chronicle. “There’s a big shift in our current ecosystem that warrants a very serious look, especially for those who make decisions about what science we should support and fund.”

Another analysis of 41 million papers published between 1980 and 2025 found that while AI-using scientists receive more citations and publish more papers, the collective scope of scientific exploration appears to be narrowing. Lisa Messeri, a sociocultural anthropologist at Yale University, told Science magazine that these findings should set off “loud alarm bells” for the research community.

“Science is nothing but a collective endeavor,” she said. “There needs to be some deep reckoning with what we do with a tool that benefits individuals but destroys science.”

Concerns about AI-generated scientific content are not new. In 2022, Meta pulled a demo of Galactica, a large language model designed to write scientific literature, after users discovered it could generate convincing nonsense on any topic, including a wiki entry about a fictional research paper called “The benefits of eating crushed glass.” Two years later, Tokyo-based Sakana AI announced “The AI Scientist,” an autonomous research system that critics on Hacker News dismissed as producing “garbage” papers. “As an editor of a journal, I would likely desk-reject them,” one commenter wrote at the time. “They contain very limited novel knowledge.”

The problem has only grown worse since then. In his first editorial of 2026 for Science, Editor-in-Chief H. Holden Thorp wrote that the journal is “less susceptible” to AI slop because of its size and human editorial investment, but he warned that “no system, human or artificial, can catch everything.” Science currently allows limited AI use for editing and gathering references but requires disclosure for anything beyond that and prohibits AI-generated figures.

Mandy Hill, managing director of academic publishing at Cambridge University Press & Assessment, has been even more blunt. In October 2025, she told Retraction Watch that the publishing ecosystem is under strain and called for “radical change.” She explained to the University of Cambridge publication Varsity that “too many journal articles are being published, and this is causing huge strain” and warned that AI “will exacerbate” the problem.

Accelerating science or overwhelming peer review?

OpenAI is serious about leaning on its ability to accelerate science, and the company laid out its case for AI-assisted research in a report published earlier this week. It profiles researchers who say AI models have sped up their work, including a mathematician who used GPT-5.2 to solve an open problem in optimization over three evenings and a physicist who watched the model reproduce symmetry calculations that had taken him months to derive.

Those examples go beyond writing assistance into using AI for actual research work, a distinction OpenAI’s marketing intentionally blurs. For scientists who don’t speak English fluently, AI writing tools could legitimately accelerate the publication of good research. But that benefit may be offset by a flood of mediocre submissions jamming up an already strained peer-review system.

Weil told MIT Technology Review that his goal is not to produce a single AI-generated discovery but rather “10,000 advances in science that maybe wouldn’t have happened or wouldn’t have happened as quickly.” He described this as “an incremental, compounding acceleration.”

Whether that acceleration produces more scientific knowledge or simply more scientific papers remains to be seen. Nikita Zhivotovskiy, a statistician at UC Berkeley not connected to OpenAI, told MIT Technology Review that GPT-5 has already become valuable in his own work for polishing text and catching mathematical typos, making “interaction with the scientific literature smoother.”

But by making papers look polished and professional regardless of their scientific merit, AI writing tools may help weak research clear the initial screening that editors and reviewers use to assess presentation quality. The risk is that conversational workflows obscure assumptions and blur accountability, and they might overwhelm the still very human peer review process required to vet it all.

OpenAI appears aware of this tension. Its public statements about Prism emphasize that the tool will not conduct research independently and that human scientists remain responsible for verification.

Still, one commenter on Hacker News captured the anxiety spreading through technical communities: “I’m scared that this type of thing is going to do to science journals what AI-generated bug reports is doing to bug bounties. We’re truly living in a post-scarcity society now, except that the thing we have an abundance of is garbage, and it’s drowning out everything of value.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

New OpenAI tool renews fears that “AI slop” will overwhelm scientific research Read More »

openai-spills-technical-details-about-how-its-ai-coding-agent-works

OpenAI spills technical details about how its AI coding agent works

It’s worth noting that both OpenAI and Anthropic open-source their coding CLI clients on GitHub, allowing developers to examine the implementation directly, whereas they don’t do the same for ChatGPT or the Claude web interface.

An official look inside the loop

Bolin’s post focuses on what he calls “the agent loop,” which is the core logic that orchestrates interactions between the user, the AI model, and the software tools the model invokes to perform coding work.

As we wrote in December, at the center of every AI agent is a repeating cycle. The agent takes input from the user and prepares a textual prompt for the model. The model then generates a response, which either produces a final answer for the user or requests a tool call (such as running a shell command or reading a file). If the model requests a tool call, the agent executes it, appends the output to the original prompt, and queries the model again. This process repeats until the model stops requesting tools and instead produces an assistant message for the user.

That looping process has to start somewhere, and Bolin’s post reveals how Codex constructs the initial prompt sent to OpenAI’s Responses API, which handles model inference. The prompt is built from several components, each with an assigned role that determines its priority: system, developer, user, or assistant.

The instructions field comes from either a user-specified configuration file or base instructions bundled with the CLI. The tools field defines what functions the model can call, including shell commands, planning tools, web search capabilities, and any custom tools provided through Model Context Protocol (MCP) servers. The input field contains a series of items that describe the sandbox permissions, optional developer instructions, environment context like the current working directory, and finally the user’s actual message.

OpenAI spills technical details about how its AI coding agent works Read More »