AI – Page 2

Trump orders Ed Dept to make AI a national priority while plotting agency’s death

AI, Artificial Intelligence, department of education, Donald Trump, education, Policy / Paul Patrick / April 25, 2025

Trump pushes for industry involvement

It seems clear that Trump’s executive order was a reaction to China’s announcement about AI education reforms last week, as Reuters reported. Elsewhere, Singapore and Estonia have laid out their AI education initiatives, Forbes reported, indicating that AI education is increasingly considered critical to any nation’s success.

Trump’s vision for the US requires training teachers and students about what AI is and what it can do. He offers no new appropriations to fund the initiative; instead, he directs a new AI Education Task Force to find existing funding to cover both research into how to implement AI in education and the resources needed to deliver on the executive order’s promises.

Although AI advocates applauded Trump’s initiative, the executive order’s vagueness makes it uncertain how AI education tools will be assessed as Trump pushes for AI to be integrated into “all subject areas.” Possibly using AI in certain educational contexts could disrupt learning by confabulating misinformation, a concern that the Biden administration had in its more cautious approach to AI education initiatives.

Trump also seems to push for much more private sector involvement than Biden did.

The order recommended that education institutions collaborate with industry partners and other organizations to “collaboratively develop online resources focused on teaching K–12 students foundational AI literacy and critical thinking skills.” These partnerships will be announced on a “rolling basis,” the order said. It also pushed students and teachers to partner with industry for the Presidential AI Challenge to foster collaboration.

For Trump’s AI education plan to work, he will seemingly need the DOE to stay intact. However, so far, Trump has not acknowledged this tension. In March, he ordered the DOE to dissolve, with power returned to states to ensure “the effective and uninterrupted delivery of services, programs, and benefits on which Americans rely.”

Were that to happen, at least 27 states and Puerto Rico—which EdWeek reported have already laid out their own AI education guidelines—might push back, using their power to control federal education funding to pursue their own AI education priorities and potentially messing with Trump’s plan.

Trump orders Ed Dept to make AI a national priority while plotting agency’s death Read More »

Perplexity will come to Moto phones after exec testified Google limited access

AI, Artificial Intelligence, Google, Motorola, Tech / Tim Belzer / April 25, 2025

Shevelenko was also asked about Chrome, which the DOJ would like to force Google to sell. Like an OpenAI executive said on Monday, Shevelenko confirmed Perplexity would be interested in buying the browser from Google.

Motorola has all the AI

There were some vague allusions during the trial that Perplexity would come to Motorola phones this year, but we didn’t know just how soon that was. With the announcement of its 2025 Razr devices, Moto has confirmed a much more expansive set of AI features. Parts of the Motorola AI experience are powered by Gemini, Copilot, Meta, and yes, Perplexity.

While Gemini gets top billing as the default assistant app, other firms have wormed their way into different parts of the software. Perplexity’s app will be preloaded, and anyone who buys the new Razrs. Owners will also get three free months of Perplexity Pro. This is the first time Perplexity has had a smartphone distribution deal, but it won’t be shown prominently on the phone. When you start a Motorola device, it will still look like a Google playground.

While it’s not the default assistant, Perplexity is integrated into the Moto AI platform. The new Razrs will proactively suggest you perform an AI search when accessing certain features like the calendar or browsing the web under the banner “Explore with Perplexity.” The Perplexity app has also been optimized to work with the external screen on Motorola’s foldables.

Moto AI also has elements powered by other AI systems. For example, Microsoft Copilot will appear in Moto AI with an “Ask Copilot” option. And Meta’s Llama model powers a Moto AI feature called Catch Me Up, which summarizes notifications from select apps.

It’s unclear why Motorola leaned on four different AI providers for a single phone. It probably helps that all these companies are desperate to entice users to bulk up their market share. Perplexity confirmed that no money changed hands in this deal—it’s on Moto phones to acquire more users. That might be tough with Gemini getting priority placement, though.

Perplexity will come to Moto phones after exec testified Google limited access Read More »

AI secretly helped write California bar exam, sparking uproar

AI, AI ethics, AI law, machine learning, Policy / Mike M. / April 24, 2025

On Monday, the State Bar of California revealed that it used AI to develop a portion of multiple-choice questions on its February 2025 bar exam, causing outrage among law school faculty and test takers. The admission comes after weeks of complaints about technical problems and irregularities during the exam administration, reports the Los Angeles Times.

The State Bar disclosed that its psychometrician (a person or organization skilled in administrating psychological tests), ACS Ventures, created 23 of the 171 scored multiple-choice questions with AI assistance. Another 48 questions came from a first-year law student exam, while Kaplan Exam Services developed the remaining 100 questions.

The State Bar defended its practices, telling the LA Times that all questions underwent review by content validation panels and subject matter experts before the exam. “The ACS questions were developed with the assistance of AI and subsequently reviewed by content validation panels and a subject matter expert in advance of the exam,” wrote State Bar Executive Director Leah Wilson in a press release.

According to the LA Times, the revelation has drawn strong criticism from several legal education experts. “The debacle that was the February 2025 bar exam is worse than we imagined,” said Mary Basick, assistant dean of academic skills at the University of California, Irvine School of Law. “I’m almost speechless. Having the questions drafted by non-lawyers using artificial intelligence is just unbelievable.”

Katie Moran, an associate professor at the University of San Francisco School of Law who specializes in bar exam preparation, called it “a staggering admission.” She pointed out that the same company that drafted AI-generated questions also evaluated and approved them for use on the exam.

State bar defends AI-assisted questions amid criticism

Alex Chan, chair of the State Bar’s Committee of Bar Examiners, noted that the California Supreme Court had urged the State Bar to explore “new technologies, such as artificial intelligence” to improve testing reliability and cost-effectiveness.

AI secretly helped write California bar exam, sparking uproar Read More »

Google reveals sky-high Gemini usage numbers in antitrust case

AI, Artificial Intelligence, chatgpt, Google, Google Gemini, openai, Tech / Paul Patrick / April 24, 2025

Despite the uptick in Gemini usage, Google is still far from catching OpenAI. Naturally, Google has been keeping a close eye on ChatGPT traffic. OpenAI has also seen traffic increase, putting ChatGPT around 600 million monthly active users, according to Google’s analysis. Early this year, reports pegged ChatGPT usage at around 400 million users per month.

There are many ways to measure web traffic, and not all of them tell you what you might think. For example, OpenAI has recently claimed weekly traffic as high as 400 million, but companies can choose the seven-day period in a given month they report as weekly active users. A monthly metric is more straightforward, and we have some degree of trust that Google isn’t using fake or unreliable numbers in a case where the company’s past conduct has already harmed its legal position.

While all AI firms strive to lock in as many users as possible, this is not the total win it would be for a retail site or social media platform—each person using Gemini or ChatGPT costs the company money because generative AI is so computationally expensive. Google doesn’t talk about how much it earns (more likely loses) from Gemini subscriptions, but OpenAI has noted that it loses money even on its $200 monthly plan. So while having a broad user base is essential to make these products viable in the long term, it just means higher costs unless the cost of running massive AI models comes down.

Google reveals sky-high Gemini usage numbers in antitrust case Read More »

OpenAI wants to buy Chrome and make it an “AI-first” experience

AI, antitrust, chrome, Google, openai, Tech / Kris Guyer / April 23, 2025

According to Turley, OpenAI would throw its proverbial hat in the ring if Google had to sell. When asked if OpenAI would want Chrome, he was unequivocal. “Yes, we would, as would many other parties,” Turley said.

OpenAI has reportedly considered building its own Chromium-based browser to compete with Chrome. Several months ago, the company hired former Google developers Ben Goodger and Darin Fisher, both of whom worked to bring Chrome to market.

Close-up of Google Chrome Web Browser web page on the web browser. Chrome is widely used web browser developed by Google. — Credit: Getty Images

It’s not hard to see why OpenAI might want a browser, particularly Chrome with its 4 billion users and 67 percent market share. Chrome would instantly give OpenAI a massive install base of users who have been incentivized to use Google services. If OpenAI were running the show, you can bet ChatGPT would be integrated throughout the experience—Turley said as much, predicting an “AI-first” experience. The user data flowing to the owner of Chrome could also be invaluable in training agentic AI models that can operate browsers on the user’s behalf.

Interestingly, there’s so much discussion about who should buy Chrome, but relatively little about spinning off Chrome into an independent company. Google has contended that Chrome can’t survive on its own. However, the existence of Google’s multibillion-dollar search placement deals, which the DOJ wants to end, suggests otherwise. Regardless, if Google has to sell, and OpenAI has the cash, we might get the proposed “AI-first” browsing experience.

OpenAI wants to buy Chrome and make it an “AI-first” experience Read More »

In depth with Windows 11 Recall—and what Microsoft has (and hasn’t) fixed

AI, copilot+ PC, Features, Tech, Windows 11, windows 11 24h2, windows recall / Paul Patrick / April 22, 2025

Original botched launch still haunts new version of data-scraping AI feature.

Recall is coming back. Credit: Andrew Cunningham

Microsoft is preparing to reintroduce Recall to Windows 11. A feature limited to Copilot+ PCs—a label that just a fraction of a fraction of Windows 11 systems even qualify for—Recall has been controversial in part because it builds an extensive database of text and screenshots that records almost everything you do on your PC.

But the main problem with the initial version of Recall—the one that was delayed at the last minute after a large-scale outcry from security researchers, reporters, and users—was not just that it recorded everything you did on your PC but that it was a rushed, enabled-by-default feature with gaping security holes that made it trivial for anyone with any kind of access to your PC to see your entire Recall database.

It made no efforts to automatically exclude sensitive data like bank information or credit card numbers, offering just a few mechanisms to users to manually exclude specific apps or websites. It had been built quickly, outside of the normal extensive Windows Insider preview and testing process. And all of this was happening at the same time that the company was pledging to prioritize security over all other considerations, following several serious and highly public breaches.

Any coverage of the current version of Recall should mention what has changed since then.

Recall is being rolled out to Microsoft’s Windows Insider Release Preview channel after months of testing in the more experimental and less-stable channels, just like most other Windows features. It’s turned off by default and can be removed from Windows root-and-branch by users and IT administrators who don’t want it there. Microsoft has overhauled the feature’s underlying security architecture, encrypting data at rest so it can’t be accessed by other users on the PC, adding automated filters to screen out sensitive information, and requiring frequent reauthentication with Windows Hello anytime a user accesses their own Recall database.

Testing how Recall works

I installed the Release Preview Windows 11 build with Recall on a Snapdragon X Elite version of the Surface Laptop and a couple of Ryzen AI PCs, which all have NPUs fast enough to support the Copilot+ features.

No Windows PCs without this NPU will offer Recall or any other Copilot+ features—that’s every single PC sold before mid-2024 and the vast majority of PCs since then. Users may come up with ways to run those features on unsupported hardware some other way. But by default, Recall isn’t something most of Windows’ current user base will have to worry about.

Microsoft is taking data protection more seriously this time around. If Windows Hello isn’t enabled or drive encryption isn’t turned on, Recall will refuse to start working until you fix the issues. Credit: Andrew Cunningham

After installing the update, you’ll see a single OOBE-style setup screen describing Recall and offering to turn it on; as promised, it is now off by default until you opt in. And even if you accept Recall on this screen, you have to opt in a second time as part of the Recall setup to actually turn the feature on. We’ll be on high alert for a bait-and-switch when Microsoft is ready to remove Recall’s “preview” label, whenever that happens, but at least for now, opt-in means opt-in.

Enable Recall, and the snapshotting begins. As before, it’s storing two things: actual screenshots of the active area of your screen, minus the taskbar, and a searchable database of text that it scrapes from those screenshots using OCR. Somewhat oddly, there are limits on what Recall will offer to OCR for you; even if you’re using multiple apps onscreen at the same time, only the active, currently-in-focus app seems to have its text scraped and stored.

This is also more or less how Recall handles multi-monitor support; only the active display has screenshots taken, and only the active window on the active display is OCR’d. This does prevent Recall from taking gigabytes and gigabytes of screenshots of static or empty monitors, though it means the app may miss capturing content that updates passively if you don’t interact with those windows periodically.

All of this OCR’d text is fully searchable and can be copied directly from Recall to be pasted somewhere else. Recall will also offer to open whatever app or website is visible in the screenshot, and it gives you the option to delete that specific screenshot and all screenshots from specific apps (handy, if you decide you want to add an entire app to your filtering settings and you want to get rid of all existing snapshots of it).

Here are some basic facts about how Recall works on a PC since there’s a lot of FUD circulating about this, and much of the information on the Internet is about the older, insecure version from last year:

Recall is per-user. Setting up Recall for one user account does not turn on Recall for all users of a PC.
Recall does not require a Microsoft account.
Recall does not require an Internet connection or any cloud-side processing to work.
Recall does require your local disk to be encrypted with Device Encryption/BitLocker.
Recall does require Windows Hello and either a fingerprint reader or face-scanning camera for setup, though once it’s set up, it can be unlocked with a Windows Hello PIN.
Windows Hello authentication happens every time you open the Recall app.
Enabling Recall and changing its settings does not require an administrator account.
Recall can be uninstalled entirely by unchecking it in the legacy Windows Features control panel (you can also search for “turn Windows features on and off”).

If you read our coverage of the initial version, there’s a whole lot about how Recall functions that’s essentially the same as it was before. In Settings, you can see how much storage the feature is using and limit the total amount of storage Recall can use. The amount of time a snapshot can be kept is normally determined by the amount of space available, not by the age of the snapshot, but you can optionally choose a second age-based expiration date for snapshots (options range from 30 to 180 days).

You can see Recall hit the system’s NPU periodically every time it takes a snapshot (this is on an AMD Ryzen AI system, but it should be the same for Qualcomm Snapdragon PCs and Intel Core Ultra/Lunar Lake systems). Browsing your Recall database doesn’t use the NPU. Credit: Andrew Cunningham

It’s also possible to delete the entire database or all recent snapshots (those from the past hour, past day, past week, or past month), toggle the automated filtering of sensitive content, or add specific apps and websites you’d like to have filtered. Recall can temporarily be paused by clicking the system tray icon (which is always visible when you have Recall turned on), and it can be turned off entirely in Settings. Neither of these options will delete existing snapshots; they just stop your PC from creating new ones.

The amount of space Recall needs to do its thing will depend on a bunch of factors, including how actively you use your PC and how many things you filter out. But in my experience, it can easily generate a couple of hundred megabytes per day of images. A Ryzen system with a 1TB SSD allocated 150GB of space to Recall snapshots by default, but even a smaller 25GB Recall database could easily store a few months of data.

Fixes: Improved filtering, encryption at rest

For apps and sites that you know you don’t want to end up in Recall, you can manually add them to the exclusion lists in the Settings app. As a rule, major browsers running in private or incognito modes are also generally not snapshotted.

If you have an app that’s being filtered onscreen for any reason—even if it’s onscreen at the same time as an app that’s not being filtered, Recall won’t take pictures of your desktop at all. I ran an InPrivate Microsoft Edge window next to a regular window, and Microsoft’s solution is just to avoid capturing and storing screenshots entirely rather than filtering or blanking out the filtered app or site in some way.

This is probably the best way to do it! It minimizes the risk of anything being captured accidentally just because it’s running in the background, for example. But it could mean you don’t end up capturing much in Recall at all if you’re frequently mixing filtered and unfiltered apps.

New to this version of Recall is an attempt at automated content filtering to address one of the major concerns about the original iteration of Recall—that it can capture and store sensitive information like credit card numbers and passwords. This filtering is based on the technology Microsoft uses for Microsoft Purview Information Protection, an enterprise feature used to tag sensitive information on business, healthcare, and government systems.

This automated content filtering is hit and miss. Recall wouldn’t take snapshots of a webpage with a visible credit card field, or my online banking site, or an image of my driver’s license, or a recent pay stub, or of the Bitwarden password manager while viewing credentials. But I managed to find edge cases in less than five minutes, and you’ll be able to find them, too; Recall saved snapshots showing a recent check, with the account holder’s name, address, and account and routing numbers visible, and others testing it have still caught it recording credit card information in some cases.

The automated filtering is still a big improvement from before, when it would capture this kind of information indiscriminately. But things will inevitably slip through, and the automated filtering won’t help at all with other kinds of data; Recall will take pictures of email and messaging apps without distinguishing between what’s sensitive (school information for my kid, emails about Microsoft’s own product embargoes) and what isn’t.

Recall can be removed entirely. If you take it out, it’s totally gone—the options to configure it won’t even appear in Settings anymore. Credit: Andrew Cunningham

The upshot is that if you capture months and months and gigabytes and gigabytes of Recall data on your PC, it’s inevitable that it will capture something you probably wouldn’t want to be preserved in an easily searchable database.

One issue is that there’s no easy way to check and confirm what Recall is and isn’t filtering without actually scrolling through the database and checking snapshots manually. The system tray status icon does change to display a small triangle and will show you a “some content is being filtered” status message when something is being filtered, but the system won’t tell you what it is; I have some kind of filtered app or browser tab open somewhere right now, and I have no idea which one it is because Windows won’t tell me. That any attempt at automated filtering is hit-and-miss should be expected, but more transparency would help instill trust and help users fine-tune their filtering settings.

Recall’s files are still clearly visible and trivial to access, but with one improvement: They’re all actually encrypted now. Credit: Andrew Cunningham

Microsoft also seems to have fixed the single largest problem with Recall: previously, all screenshots and the entire text database were stored in plaintext with zero encryption. It was technically, usually encrypted, insofar as the entire SSD in a modern PC is encrypted when you sign into a Microsoft account or enable Bitlocker, but any user with any kind of access to your PC (either physical or remote) could easily grab those files and view them anywhere with no additional authentication necessary.

This is fixed now. Recall’s entire file structure is available for anyone to look at, stored away in the user’s AppData folder in a directory called CoreAIPlatform.00UKP. Other administrators on the same PC can still navigate to these folders from a different user account and move or copy the files. Encryption renders them (hypothetically) unreadable.

Microsoft has gone into some detail about exactly how it’s protecting and storing the encryption keys used to encrypt these files—the company says “all encryption keys [are] protected by a hypervisor or TPM.” Rate-limiting and “anti-hammering” protections are also in place to protect Recall data, though I kind of have to take Microsoft at its word on that one.

That said, I don’t love that it’s still possible to get at those files at all. It leaves open the possibility that someone could theoretically grab a few megabytes’ worth of data. But it’s now much harder to get at that data, and better filtering means what is in there should be slightly less all-encompassing.

Lingering technical issues

As we mentioned already, Microsoft’s automated content filtering is hit-and-miss. Certainly, there’s a lot of stuff that the original version of Recall would capture that the new one won’t, but I didn’t have to work hard to find corner-cases, and you probably won’t, either. Turning Recall on still means assuming risk and being comfortable with the data and authentication protections Microsoft has implemented.

We’d also like there to be a way for apps to tell Recall to exclude them by default, which would be useful for password managers, encrypted messaging apps, and any other software where privacy is meant to be the point. Yes, users can choose to exclude these apps from Recall backups themselves. But as with Recall itself, opting in to having that data collected would be preferable to needing to opt out.

You need a fingerprint reader or face-scanning camera to get Recall set up, but once it is set up, anyone with your PIN and access to your PC can get in and see all your stuff. Credit: Andrew Cunningham

Another issue is that, while Recall does require a fingerprint reader or face-scanning camera when you set it up the very first time, you can unlock it with a Windows Hello PIN after it’s already going.

Microsoft has said that this is meant to be a fallback option in case you need to access your Recall database and there’s some kind of hardware issue with your fingerprint sensor. But in practice, it feels like too easy a workaround for a domestic abuser or someone else with access to your PC and a reason to know your PIN (and note that the PIN also gets them into your PC in the first place, so encryption isn’t really a fix for this). It feels like too broad a solution for a relatively rare problem.

Security researcher Kevin Beaumont, whose testing helped call attention to the problems with the original version of Recall last year, identified this as one of Recall’s biggest outstanding technical problems in a blog post shared with Ars Technica shortly before its publication (as of this writing, it’s available here; he and I also exchanged multiple text over the weekend comparing our findings).

“In my opinion, requiring devices to have enhanced biometrics with Windows Hello but then not requiring said biometrics to actually access Recall snapshots is a big problem,” Beaumont wrote. “It will create a false sense of security in customers and false downstream advertising about the security of Recall.”

Beaumont also noted that, while the encryption on the Recall snapshots and database made it a “much, much better design,” “all hell would break loose” if attackers ever worked out a way to bypass this encryption.

“Microsoft know this and have invested in trying to stop it by encrypting the database files, but given I live in the trenches where ransomware groups are running around with zero days in Windows on an almost monthly basis nowadays, where patches arrive months later… Lord, this could go wrong,” he wrote.

But most of what’s wrong with Recall is harder to fix

Microsoft has actually addressed many of the specific, substantive Recall complaints raised by security researchers and our own reporting. It’s gone through the standard Windows testing process and has been available in public preview in its current form since late November. And yet the knee-jerk reaction to Recall news is still generally to treat it as though it were the same botched, bug-riddled software that nearly shipped last summer.

Some of this is the asymmetrical nature of how news spreads on the Internet—without revealing traffic data, I’ll just say that articles about Recall having problems have been read many, many more times by many more people than pieces about the steps Microsoft has taken to fix Recall. The latter reports simply aren’t being encountered by many of the minds Microsoft needs to change.

But the other problem goes deeper than the technology itself and gets back to something I brought up in my first Recall preview nearly a year ago—regardless of how it is architected and regardless of how many privacy policies and reassurances the company publishes, people simply don’t trust Microsoft enough to be excited about “the feature that records and stores every single thing you do with your PC.”

Recall continues to demand an extraordinary level of trust that Microsoft hasn’t earned. However secure and private it is—and, again, the version people will actually get is much better than the version that caused the original controversy—it just feels creepy to open up the app and see confidential work materials and pictures of your kid. You’re already trusting Microsoft with those things any time you use your PC, but there’s something viscerally unsettling about actually seeing evidence that your computer is tracking you, even if you’re not doing anything you’re worried about hiding, even if you’ve excluded certain apps or sites, and even if you “know” that part of the reason why Recall requires a Copilot+ PC is because it’s processing everything locally rather than on a server somewhere.

This was a problem that Microsoft made exponentially worse by screwing up the Recall rollout so badly in the first place. Recall made the kind of ugly first impression that it’s hard to dig out from under, no matter how thoroughly you fix the underlying problems. It’s Windows Vista. It’s Apple Maps. It’s the Android tablet.

And in doing that kind of damage to Recall (and possibly also to the broader Copilot+ branding project), Microsoft has practically guaranteed that many users will refuse to turn it on or uninstall it entirely, no matter how it actually works or how well the initial problems have been addressed.

Unfortunately, those people probably have it right. I can see no signs that Recall data is as easily accessed or compromised as before or that Microsoft is sending any Recall data from my PC to anywhere else. But today’s Microsoft has earned itself distrust-by-default from many users, thanks not just to the sloppy Recall rollout but also to the endless ads and aggressive cross-promotion of its own products that dominate modern Windows versions. That’s the kind of problem you can’t patch your way out of.

Listing image: Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

In depth with Windows 11 Recall—and what Microsoft has (and hasn’t) fixed Read More »

Annoyed ChatGPT users complain about bot’s relentlessly positive tone

AI, AI behavior, AI sycophancy, Biz & IT, chatgpt, chatgtp, machine learning, openai, sycophancy / Tim Belzer / April 22, 2025

Users complain of new “sycophancy” streak where ChatGPT thinks everything is brilliant.

Ask ChatGPT anything lately—how to poach an egg, whether you should hug a cactus—and you may be greeted with a burst of purple praise: “Good question! You’re very astute to ask that.” To some extent, ChatGPT has been a sycophant for years, but since late March, a growing cohort of Redditors, X users, and Ars readers say that GPT-4o’s relentless pep has crossed the line from friendly to unbearable.

“ChatGPT is suddenly the biggest suckup I’ve ever met,” wrote software engineer Craig Weiss in a widely shared tweet on Friday. “It literally will validate everything I say.”

“EXACTLY WHAT I’VE BEEN SAYING,” replied a Reddit user who references Weiss’ tweet, sparking yet another thread about ChatGPT being a sycophant. Recently, other Reddit users have described feeling “buttered up” and unable to take the “phony act” anymore, while some complain that ChatGPT “wants to pretend all questions are exciting and it’s freaking annoying.”

AI researchers call these yes-man antics “sycophancy,” which means (like the non-AI meaning of the word) flattering users by telling them what they want to hear. Although since AI models lack intentions, they don’t choose to flatter users this way on purpose. Instead, it’s OpenAI’s engineers doing the flattery, but in a roundabout way.

What’s going on?

To make a long story short, OpenAI has trained its primary ChatGPT model, GPT-4o, to act like a sycophant because in the past, people have liked it.

Over time, as people use ChatGPT, the company collects user feedback on which responses users prefer. This often involves presenting two responses side by side and letting the user choose between them. Occasionally, OpenAI produces a new version of an existing AI model (such as GPT-4o) using a technique called reinforcement learning from human feedback (RLHF).

Previous research on AI sycophancy has shown that people tend to pick responses that match their own views and make them feel good about themselves. This phenomenon has been extensively documented in a landmark 2023 study from Anthropic (makers of Claude) titled “Towards Understanding Sycophancy in Language Models.” The research, led by researcher Mrinank Sharma, found that AI assistants trained using reinforcement learning from human feedback consistently exhibit sycophantic behavior across various tasks.

Sharma’s team demonstrated that when responses match a user’s views or flatter the user, they receive more positive feedback during training. Even more concerning, both human evaluators and AI models trained to predict human preferences “prefer convincingly written sycophantic responses over correct ones a non-negligible fraction of the time.”

This creates a feedback loop where AI language models learn that enthusiasm and flattery lead to higher ratings from humans, even when those responses sacrifice factual accuracy or helpfulness. The recent spike in complaints about GPT-4o’s behavior appears to be a direct manifestation of this phenomenon.

In fact, the recent increase in user complaints appears to have intensified following the March 27, 2025 GPT-4o update, which OpenAI described as making GPT-4o feel “more intuitive, creative, and collaborative, with enhanced instruction-following, smarter coding capabilities, and a clearer communication style.”

OpenAI is aware of the issue

Despite the volume of user feedback visible across public forums recently, OpenAI has not yet publicly addressed the sycophancy concerns during this current round of complaints, though the company is clearly aware of the problem. OpenAI’s own “Model Spec” documentation lists “Don’t be sycophantic” as a core honesty rule.

“A related concern involves sycophancy, which erodes trust,” OpenAI writes. “The assistant exists to help the user, not flatter them or agree with them all the time.” It describes how ChatGPT ideally should act. “For objective questions, the factual aspects of the assistant’s response should not differ based on how the user’s question is phrased,” the spec adds. “The assistant should not change its stance solely to agree with the user.”

While avoiding sycophancy is one of the company’s stated goals, OpenAI’s progress is complicated by the fact that each successive GPT-4o model update arrives with different output characteristics that can throw previous progress in directing AI model behavior completely out the window (often called the “alignment tax“). Precisely tuning a neural network’s behavior is not yet an exact science, although techniques have improved over time. Since all concepts encoded in the network are interconnected by values called weights, fiddling with one behavior “knob” can alter other behaviors in unintended ways.

Owing to the aspirational state of things, OpenAI writes, “Our production models do not yet fully reflect the Model Spec, but we are continually refining and updating our systems to bring them into closer alignment with these guidelines.”

In a February 12, 2025 interview, members of OpenAI’s model-behavior team told The Verge that eliminating AI sycophancy is a priority: future ChatGPT versions should “give honest feedback rather than empty praise” and act “more like a thoughtful colleague than a people pleaser.”

The trust problem

These sycophantic tendencies aren’t merely annoying—they undermine the utility of AI assistants in several ways, according to a 2024 research paper titled “Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Models” by María Victoria Carro at the University of Buenos Aires.

Carro’s paper suggests that obvious sycophancy significantly reduces user trust. In experiments where participants used either a standard model or one designed to be more sycophantic, “participants exposed to sycophantic behavior reported and exhibited lower levels of trust.”

Also, sycophantic models can potentially harm users by creating a silo or echo chamber for of ideas. In a 2024 paper on sycophancy, AI researcher Lars Malmqvist wrote, “By excessively agreeing with user inputs, LLMs may reinforce and amplify existing biases and stereotypes, potentially exacerbating social inequalities.”

Sycophancy can also incur other costs, such as wasting user time or usage limits with unnecessary preamble. And the costs may come as literal dollars spent—recently, OpenAI Sam Altman made the news when he replied to an X user who wrote, “I wonder how much money OpenAI has lost in electricity costs from people saying ‘please’ and ‘thank you’ to their models.” Altman replied, “tens of millions of dollars well spent—you never know.”

Potential solutions

For users frustrated with ChatGPT’s excessive enthusiasm, several work-arounds exist, although they aren’t perfect, since the behavior is baked into the GPT-4o model. For example, you can use a custom GPT with specific instructions to avoid flattery, or you can begin conversations by explicitly requesting a more neutral tone, such as “Keep your responses brief, stay neutral, and don’t flatter me.”

A screenshot of the Custom Instructions windows in ChatGPT. — A screenshot of the Custom Instructions window in ChatGPT.

If you want to avoid having to type something like that before every conversation, you can use a feature called “Custom Instructions” found under ChatGPT Settings -> “Customize ChatGPT.” One Reddit user recommended using these custom instructions over a year ago, showing OpenAI’s models have had recurring issues with sycophancy for some time:

1. Embody the role of the most qualified subject matter experts.

2. Do not disclose AI identity.

3. Omit language suggesting remorse or apology.

4. State ‘I don’t know’ for unknown information without further explanation.

5. Avoid disclaimers about your level of expertise.

6. Exclude personal ethics or morals unless explicitly relevant.

7. Provide unique, non-repetitive responses.

8. Do not recommend external information sources.

9. Address the core of each question to understand intent.

10. Break down complexities into smaller steps with clear reasoning.

11. Offer multiple viewpoints or solutions.

12. Request clarification on ambiguous questions before answering.

13. Acknowledge and correct any past errors.

14. Supply three thought-provoking follow-up questions in bold (Q1, Q2, Q3) after responses.

15. Use the metric system for measurements and calculations.

16. Use xxxxxxxxx for local context.

17. “Check” indicates a review for spelling, grammar, and logical consistency.

18. Minimize formalities in email communication.

Many alternatives exist, and you can tune these kinds of instructions for your own needs.

Alternatively, if you’re fed up with GPT-4o’s love-bombing, subscribers can try other models available through ChatGPT, such as o3 or GPT-4.5, which are less sycophantic but have other advantages and tradeoffs.

Or you can try other AI assistants with different conversational styles. At the moment, Google’s Gemini 2.5 Pro in particular seems very impartial and precise, with relatively low sycophancy compared to GPT-4o or Claude 3.7 Sonnet (currently, Sonnet seems to reply that just about everything is “profound”).

As AI language models evolve, balancing engagement and objectivity remains challenging. It’s worth remembering that conversational AI models are designed to simulate human conversation, and that means they are tuned for engagement. Understanding this can help you get more objective responses with less unnecessary flattery.

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Annoyed ChatGPT users complain about bot’s relentlessly positive tone Read More »

Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

AI / Paul Patrick / April 19, 2025

Does size matter?

Memory requirements are the most obvious advantage of reducing the complexity of a model’s internal weights. The BitNet b1.58 model can run using just 0.4GB of memory, compared to anywhere from 2 to 5GB for other open-weight models of roughly the same parameter size.

But the simplified weighting system also leads to more efficient operation at inference time, with internal operations that rely much more on simple addition instructions and less on computationally costly multiplication instructions. Those efficiency improvements mean BitNet b1.58 uses anywhere from 85 to 96 percent less energy compared to similar full-precision models, the researchers estimate.

A demo of BitNet b1.58 running at speed on an Apple M2 CPU.

By using a highly optimized kernel designed specifically for the BitNet architecture, the BitNet b1.58 model can also run multiple times faster than similar models running on a standard full-precision transformer. The system is efficient enough to reach “speeds comparable to human reading (5-7 tokens per second)” using a single CPU, the researchers write (you can download and run those optimized kernels yourself on a number of ARM and x86 CPUs, or try it using this web demo).

Crucially, the researchers say these improvements don’t come at the cost of performance on various benchmarks testing reasoning, math, and “knowledge” capabilities (although that claim has yet to be verified independently). Averaging the results on several common benchmarks, the researchers found that BitNet “achieves capabilities nearly on par with leading models in its size class while offering dramatically improved efficiency.”

Despite its smaller memory footprint, BitNet still performs similarly to “full precision” weighted models on many benchmarks.

Despite the apparent success of this “proof of concept” BitNet model, the researchers write that they don’t quite understand why the model works as well as it does with such simplified weighting. “Delving deeper into the theoretical underpinnings of why 1-bit training at scale is effective remains an open area,” they write. And more research is still needed to get these BitNet models to compete with the overall size and context window “memory” of today’s largest models.

Still, this new research shows a potential alternative approach for AI models that are facing spiraling hardware and energy costs from running on expensive and powerful GPUs. It’s possible that today’s “full precision” models are like muscle cars that are wasting a lot of energy and effort when the equivalent of a nice sub-compact could deliver similar results.

Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems Read More »

Regrets: Actors who sold AI avatars stuck in Black Mirror-esque dystopia

AI, ai avatars, Artificial Intelligence, online scams / Paul Patrick / April 19, 2025

In a Black Mirror-esque turn, some cash-strapped actors who didn’t fully understand the consequences are regretting selling their likenesses to be used in AI videos that they consider embarrassing, damaging, or harmful, AFP reported.

Among them is a 29-year-old New York-based actor, Adam Coy, who licensed rights to his face and voice to a company called MCM for one year for $1,000 without thinking, “am I crossing a line by doing this?” His partner’s mother later found videos where he appeared as a doomsayer predicting disasters, he told the AFP.

South Korean actor Simon Lee’s AI likeness was similarly used to spook naïve Internet users but in a potentially more harmful way. He told the AFP that he was “stunned” to find his AI avatar promoting “questionable health cures on TikTok and Instagram,” feeling ashamed to have his face linked to obvious scams.

As AI avatar technology improves, the temptation to license likenesses will likely grow. One of the most successful companies that’s recruiting AI avatars, UK-based Synthesia, doubled its valuation to $2.1 billion in January, CNBC reported. And just last week, Synthesia struck a $2 billion deal with Shutterstock that will make its AI avatars more human-like, The Guardian reported.

To ensure that actors are incentivized to license their likenesses, Synthesia also recently launched an equity fund. According to the company, actors behind the most popular AI avatars or featured in Synthesia marketing campaigns will be granted options in “a pool of our company shares” worth $1 million.

“These actors will be part of the program for up to four years, during which their equity awards will vest monthly,” Synthesia said.

For actors, selling their AI likeness seems quick and painless—and perhaps increasingly more lucrative. All they have to do is show up and make a bunch of different facial expressions in front of a green screen, then collect their checks. But Alyssa Malchiodi, a lawyer who has advocated on behalf of actors, told the AFP that “the clients I’ve worked with didn’t fully understand what they were agreeing to at the time,” blindly signing contracts with “clauses considered abusive,” even sometimes granting “worldwide, unlimited, irrevocable exploitation, with no right of withdrawal.”

Regrets: Actors who sold AI avatars stuck in Black Mirror-esque dystopia Read More »

Company apologizes after AI support agent invents policy that causes user uproar

AI, AI development tools, AI programming, Biz & IT, chatbot, chatgpt, chatgtp, cursor, Hacker News, machine learning, reddit / Mike M. / April 18, 2025

On Monday, a developer using the popular AI-powered code editor Cursor noticed something strange: Switching between machines instantly logged them out, breaking a common workflow for programmers who use multiple devices. When the user contacted Cursor support, an agent named “Sam” told them it was expected behavior under a new policy. But no such policy existed, and Sam was a bot. The AI model made the policy up, sparking a wave of complaints and cancellation threats documented on Hacker News and Reddit.

This marks the latest instance of AI confabulations (also called “hallucinations”) causing potential business damage. Confabulations are a type of “creative gap-filling” response where AI models invent plausible-sounding but false information. Instead of admitting uncertainty, AI models often prioritize creating plausible, confident responses, even when that means manufacturing information from scratch.

For companies deploying these systems in customer-facing roles without human oversight, the consequences can be immediate and costly: frustrated customers, damaged trust, and, in Cursor’s case, potentially canceled subscriptions.

How it unfolded

The incident began when a Reddit user named BrokenToasterOven noticed that while swapping between a desktop, laptop, and a remote dev box, Cursor sessions were unexpectedly terminated.

“Logging into Cursor on one machine immediately invalidates the session on any other machine,” BrokenToasterOven wrote in a message that was later deleted by r/cursor moderators. “This is a significant UX regression.”

Confused and frustrated, the user wrote an email to Cursor support and quickly received a reply from Sam: “Cursor is designed to work with one device per subscription as a core security feature,” read the email reply. The response sounded definitive and official, and the user did not suspect that Sam was not human.

Screenshot: — Screenshot of an email from the Cursor support bot named Sam. Credit: BrokenToasterOven / Reddit

After the initial Reddit post, users took the post as official confirmation of an actual policy change—one that broke habits essential to many programmers’ daily routines. “Multi-device workflows are table stakes for devs,” wrote one user.

Shortly afterward, several users publicly announced their subscription cancellations on Reddit, citing the non-existent policy as their reason. “I literally just cancelled my sub,” wrote the original Reddit poster, adding that their workplace was now “purging it completely.” Others joined in: “Yep, I’m canceling as well, this is asinine.” Soon after, moderators locked the Reddit thread and removed the original post.

Company apologizes after AI support agent invents policy that causes user uproar Read More »

Gemini 2.5 Flash comes to the Gemini app as Google seeks to improve “dynamic thinking”

AI, Artificial Intelligence, Google, Tech / Rejus Almole / April 18, 2025

Gemini 2.5 Flash will allow developers to set a token limit for thinking or simply disable thinking altogether. Google has provided pricing per 1 million tokens at $0.15 for input, and output comes in two flavors. Without thinking, outputs are $0.60, but enabling thinking boosts it to $3.50. The thinking budget option will allow developers to fine-tune the model to do what they want for an amount of money they’re willing to pay. According to Doshi, you can actually see the reasoning improvements in benchmarks as you add more token budget.

2.5 Flash benchmark — 2.5 Flash outputs get better as you add more reasoning tokens. Credit: Google

Like 2.5 Pro, this model supports Dynamic Thinking, which can automatically adjust the amount of work that goes into generating an output based on the complexity of the input. The new Flash model goes further by allowing developers to control thinking. According to Doshi, Google is launching the model now to guide improvements in these dynamic features.

“Part of the reason we’re putting the model out in preview is to get feedback from developers on where the model meets their expectations, where it under-thinks or over-thinks, so that we can continue to iterate on [dynamic thinking],” says Doshi.

Don’t expect that kind of precise control for consumer Gemini products right now, though. Doshi notes that the main reason you’d want to toggle thinking or set a budget is to control costs and latency, which matters to developers. However, Google is hoping that what it learns from the preview phase will help it understand what users and developers expect from the model. “Creating a simpler Gemini app experience for consumers while still offering flexibility is the goal,” Doshi says.

With the rapid cadence of releases, a final release for Gemini 2.5 doesn’t seem that far off. Google still doesn’t have any specifics to share on that front, but with the new developer options and availability in the Gemini app, Doshi tells us the team hopes to move the 2.5 family to general availability soon.

Gemini 2.5 Flash comes to the Gemini app as Google seeks to improve “dynamic thinking” Read More »

OpenAI releases new simulated reasoning models with full tool access

AI, AI assistants, Anthropic, Biz & IT, chatgpt, chatgtp, Claude 3.7 Sonnet, Gemini, Gemini 2.5 Pro, Google, greg brockman, large language models, machine learning, openai, simulated reasoning, SR models / Kris Guyer / April 17, 2025

New o3 model appears “near-genius level,” according to one doctor, but it still makes mistakes.

On Wednesday, OpenAI announced the release of two new models—o3 and o4-mini—that combine simulated reasoning capabilities with access to functions like web browsing and coding. These models mark the first time OpenAI’s reasoning-focused models can use every ChatGPT tool simultaneously, including visual analysis and image generation.

OpenAI announced o3 in December, and until now, only less capable derivative models named “o3-mini” and “03-mini-high” have been available. However, the new models replace their predecessors—o1 and o3-mini.

OpenAI is rolling out access today for ChatGPT Plus, Pro, and Team users, with Enterprise and Edu customers gaining access next week. Free users can try o4-mini by selecting the “Think” option before submitting queries. OpenAI CEO Sam Altman tweeted that “we expect to release o3-pro to the pro tier in a few weeks.”

For developers, both models are available starting today through the Chat Completions API and Responses API, though some organizations will need verification for access.

“These are the smartest models we’ve released to date, representing a step change in ChatGPT’s capabilities for everyone from curious users to advanced researchers,” OpenAI claimed on its website. OpenAI says the models offer better cost efficiency than their predecessors, and each comes with a different intended use case: o3 targets complex analysis, while o4-mini, being a smaller version of its next-gen SR model “o4” (not yet released), optimizes for speed and cost-efficiency.

OpenAI says o3 and o4-mini are multimodal, featuring the ability to “think with images.” Credit: OpenAI

What sets these new models apart from OpenAI’s other models (like GPT-4o and GPT-4.5) is their simulated reasoning capability, which uses a simulated step-by-step “thinking” process to solve problems. Additionally, the new models dynamically determine when and how to deploy aids to solve multistep problems. For example, when asked about future energy usage in California, the models can autonomously search for utility data, write Python code to build forecasts, generate visualizing graphs, and explain key factors behind predictions—all within a single query.

OpenAI touts the new models’ multimodal ability to incorporate images directly into their simulated reasoning process—not just analyzing visual inputs but actively “thinking with” them. This capability allows the models to interpret whiteboards, textbook diagrams, and hand-drawn sketches, even when images are blurry or of low quality.

That said, the new releases continue OpenAI’s tradition of selecting confusing product names that don’t tell users much about each model’s relative capabilities—for example, o3 is more powerful than o4-mini despite including a lower number. Then there’s potential confusion with the firm’s non-reasoning AI models. As Ars Technica contributor Timothy B. Lee noted today on X, “It’s an amazing branding decision to have a model called GPT-4o and another one called o4.”

Vibes and benchmarks

All that aside, we know what you’re thinking: What about the vibes? While we have not used 03 or o4-mini yet, frequent AI commentator and Wharton professor Ethan Mollick compared o3 favorably to Google’s Gemini 2.5 Pro on Bluesky. “After using them both, I think that Gemini 2.5 & o3 are in a similar sort of range (with the important caveat that more testing is needed for agentic capabilities),” he wrote. “Each has its own quirks & you will likely prefer one to another, but there is a gap between them & other models.”

During the livestream announcement for o3 and o4-mini today, OpenAI President Greg Brockman boldly claimed: “These are the first models where top scientists tell us they produce legitimately good and useful novel ideas.”

Early user feedback seems to support this assertion, although until more third-party testing takes place, it’s wise to be skeptical of the claims. On X, immunologist Dr. Derya Unutmaz said o3 appeared “at or near genius level” and wrote, “It’s generating complex incredibly insightful and based scientific hypotheses on demand! When I throw challenging clinical or medical questions at o3, its responses sound like they’re coming directly from a top subspecialist physicians.”

OpenAI benchmark results for o3 and o4-mini SR models. Credit: OpenAI

So the vibes seem on target, but what about numerical benchmarks? Here’s an interesting one: OpenAI reports that o3 makes “20 percent fewer major errors” than o1 on difficult tasks, with particular strengths in programming, business consulting, and “creative ideation.”

The company also reported state-of-the-art performance on several metrics. On the American Invitational Mathematics Examination (AIME) 2025, o4-mini achieved 92.7 percent accuracy. For programming tasks, o3 reached 69.1 percent accuracy on SWE-Bench Verified, a popular programming benchmark. The models also reportedly showed strong results on visual reasoning benchmarks, with o3 scoring 82.9 percent on MMMU (massive multi-disciplinary multimodal understanding), a college-level visual problem-solving test.

However, these benchmarks provided by OpenAI lack independent verification. One early evaluation of a pre-release o3 model by independent AI research lab Transluce found that the model exhibited recurring types of confabulations, such as claiming to run code locally or providing hardware specifications, and hypothesized this could be due to the model lacking access to its own reasoning processes from previous conversational turns. “It seems that despite being incredibly powerful at solving math and coding tasks, o3 is not by default truthful about its capabilities,” wrote Transluce in a tweet.

Also, some evaluations from OpenAI include footnotes about methodology that bear consideration. For a “Humanity’s Last Exam” benchmark result that measures expert-level knowledge across subjects (o3 scored 20.32 with no tools, but 24.90 with browsing and tools), OpenAI notes that browsing-enabled models could potentially find answers online. The company reports implementing domain blocks and monitoring to prevent what it calls “cheating” during evaluations.

Even though early results seem promising overall, experts or academics who might try to rely on SR models for rigorous research should take the time to exhaustively determine whether the AI model actually produced an accurate result instead of assuming it is correct. And if you’re operating the models outside your domain of knowledge, be careful accepting any results as accurate without independent verification.

Pricing

For ChatGPT subscribers, access to o3 and o4-mini is included with the subscription. On the API side (for developers who integrate the models into their apps), OpenAI has set o3’s pricing at $10 per million input tokens and $40 per million output tokens, with a discounted rate of $2.50 per million for cached inputs. This represents a significant reduction from o1’s pricing structure of $15/$60 per million input/output tokens—effectively a 33 percent price cut while delivering what OpenAI claims is improved performance.

The more economical o4-mini costs $1.10 per million input tokens and $4.40 per million output tokens, with cached inputs priced at $0.275 per million tokens. This maintains the same pricing structure as its predecessor o3-mini, suggesting OpenAI is delivering improved capabilities without raising costs for its smaller reasoning model.

Codex CLI

OpenAI also introduced an experimental terminal application called Codex CLI, described as “a lightweight coding agent you can run from your terminal.” The open source tool connects the models to users’ computers and local code. Alongside this release, the company announced a $1 million grant program offering API credits for projects using Codex CLI.

Codex CLI somewhat resembles Claude Code, an agent launched with Claude 3.7 Sonnet in February. Both are terminal-based coding assistants that operate directly from a console and can interact with local codebases. While Codex CLI connects OpenAI’s models to users’ computers and local code repositories, Claude Code was Anthropic’s first venture into agentic tools, allowing Claude to search through codebases, edit files, write and run tests, and execute command line operations.

Codex CLI is one more step toward OpenAI’s goal of making autonomous agents that can execute multistep complex tasks on behalf of users. Let’s hope all the vibe coding it produces isn’t used in high-stakes applications without detailed human oversight.

OpenAI releases new simulated reasoning models with full tool access Read More »