Author name: Tim Belzer

universities-(finally)-band-together,-fight-“unprecedented-government-overreach”

Universities (finally) band together, fight “unprecedented government overreach”

We speak with one voice against the unprecedented government overreach and political interference now endangering American higher education… We must reject the coercive use of public research funding…

American institutions of higher learning have in common the essential freedom to determine, on academic grounds, whom to admit and what is taught, how, and by whom… In their pursuit of truth, faculty, students, and staff are free to exchange ideas and opinions across a full range of viewpoints without fear of retribution, censorship, or deportation.

This is fine, as far as it goes. But what are all these institutions going to do about the funding cuts, attempts to revoke their nonprofit status, threats not to hire their graduates, and student speech-based deportations? They are going to ask the Trump administration for “constructive engagement that improves our institutions and serves our republic.”

This sounds lovely, if naive, and I hope it works out well for every one of them as they seek good-faith dialogue with a vice president who has called universities the “enemy” and an administration that demanded Harvard submit to the vetting of every department for unspecified “viewpoint diversity.”

As a first step to finding common ground and speaking with a common voice, the statement is a start. But statements, like all words, can be cheap. We’ll see what steps schools actually take—and how much they can speak and act in concert—as Trump’s pressure campaign continues to ratchet.

Universities (finally) band together, fight “unprecedented government overreach” Read More »

taxes-and-fees-not-included:-t-mobile’s-latest-price-lock-is-nearly-meaningless

Taxes and fees not included: T-Mobile’s latest price lock is nearly meaningless


“Price” is locked, but fees aren’t

T-Mobile makes 5-year price guarantee after refusing to honor lifetime price lock.

A T-Mobile store on April 3, 2020, in Zutphen, Netherlands.

T-Mobile is making another long-term price guarantee, but wireless users will rightfully be skeptical since T-Mobile refused to honor a previously offered lifetime price lock and continues to fight a lawsuit filed by customers who were harmed by that broken promise. Moreover, the new plans that come with a price guarantee will have extra fees that can be raised at any time.

T-Mobile today announced new plans with more hotspot data and a five-year price guarantee, saying that “T-Mobile and Metro customers can rest assured that the price of their talk, text and data stays the same for five whole years, from the time they sign up.” The promise applies to the T-Mobile “Experience More” and “Experience Beyond” plans that will be offered starting tomorrow. The plans cost $85 or $100 for a single line after the autopay discount, which requires a debit card or bank account.

The price-lock promise also applies to four new Metro by T-Mobile plans that launch on Thursday. T-Mobile’s announcement came three weeks after Verizon announced a three-year price lock.

If the promise sounds familiar, it’s because T-Mobile made lifetime price guarantees in 2015 and 2017.

“Now, T-Mobile One customers keep their price until THEY decide to change it. T-Mobile will never change the price you pay for your T-Mobile One plan,” T-Mobile said in January 2017. When a similar promise was made in 2015, then-CEO John Legere said that “the Un-contract is our promise to individuals, families and businesses of all sizes, that—while your price may go down—it won’t go up.”

Taxes and fees not included

T-Mobile raised prices on the supposedly price-locked plans about a year ago, triggering a flood of complaints to the Federal Communications Commission and a class action lawsuit. There were also complaints to the Federal Trade Commission, which enforces laws against false advertising. But so far, T-Mobile hasn’t faced any punishment.

Besides the five-year price guarantee, there’s at least one more notable pricing detail. T-Mobile’s previous plans had “taxes and fees included,” meaning the advertised price was inclusive of taxes and fees. With the new Experience plans, taxes and fees will be in addition to the advertised price.

This will make the plans cost more initially than customers might expect, and it gives T-Mobile wiggle room to raise prices during the five years of the price guarantee since it could increase any fees that are tacked onto the new plans. The fine print in today’s press release describes taxes and fees as “exclusions” to the price guarantee.

“Fees” can refer to virtually anything that a carrier chooses to add to a bill and isn’t limited to the carrier’s actual costs from taxes or government mandates. For example, T-Mobile has a “Regulatory Programs and Telco Recovery Fee,” which it acknowledges “is not a government tax or imposed by the government; rather, the fee is collected and retained by T-Mobile to help recover certain costs we have already incurred and continue to incur.”

This can include the cost of complying with legal obligations, “charges imposed on us by other carriers for delivery of calls,” and the cost of leasing network facilities that are needed to provide service, T-Mobile says. In other words, T-Mobile charges a separate fee to cover the normal expenses incurred by any provider of telecommunications services.

The promise is thus that the base price of a service plan won’t change, but T-Mobile gives itself wide discretion to add or increase fees on customers’ monthly bills. “Guarantee means that we won’t change the price of talk, text, and 5G smartphone data on our network for at least 5 years while you are on an Experience plan,” T-Mobile said today. T-Mobile’s terms and conditions haven’t been updated, but the terms address price promises in general, saying that price locks do not include “add-on features, taxes, surcharges, fees, or charges for extra Features or Devices.”

T-Mobile Consumer Group President Jon Freier, who has been with T-Mobile for about two decades, seemed to recognize in an interview with Fierce that customers are likely to be wary of new promises. “One of the things that we’ve heard from customers is that the more definition that we can put in terms of timing around the guarantee, the more believable and useful that guarantee is,” he said. “So we chose to roll out with five years.” Freier asserted that “we are absolutely signing up for the guarantee for the next five years.”

Freier even mentioned the 2015 guarantee in a video announcement today, saying that T-Mobile is now “evolving this promise and expanding it across our portfolio.”

T-Mobile fights price lock lawsuit

There is a better chance that T-Mobile will keep the latest promise, since it is limited in scope and lasts only five years, while the lifetime price lock was supposed to last for as long as customers chose to keep their plans. The lifetime price lock did last for more than five years, after all. But T-Mobile has shown that when it breaks a promise, it is willing to accept the public backlash and fight users in court.

A class action lawsuit over the nullified lifetime price lock is still pending in US District Court for the District of New Jersey. T-Mobile is trying to force plaintiffs into arbitration, and the sides are proceeding with discovery on the matter of whether the named plaintiffs “executed valid opt-outs of Defendant’s arbitration agreement.”

A joint status update in March said that T-Mobile refused to produce all the documents that plaintiffs requested, arguing that the “burden of collecting these documents far outweighs their potential relevance to the allowed scope of discovery.”

T-Mobile tried to give itself a way out when it introduced the 2017 lifetime price lock. Although a press release issued then made the promise sound absolute, a separate FAQ essentially nullified the promise by saying that T-Mobile was only promising to pay a customer’s final bill “if we were to raise prices and you choose to leave.” Customers who tried to hold T-Mobile to the lifetime price promise were not mollified by that carveout, given that it was published on an entirely separate page and not part of the price-lock announcement.

While customers may find it difficult to fully trust T-Mobile’s new guarantee, they can at least take a look at the carveouts to get a sense of how solid the new pledge is. We already noted the taxes and fees caveat, which seems to be the biggest thing to watch out for. This category on its own makes it easy for T-Mobile to raise your bill without technically breaking its promise not to raise the price of “talk, text and data.”

Guarantee “worthless based on T-Mobile’s previous actions”

The new plans are not yet live on T-Mobile’s website, so it’s possible a more detailed breakdown of caveats could be revealed tomorrow when the plans are available. The website for T-Mobile’s separate Metro brand has a slightly more detailed description than the one in the press release. While details could differ between the main T-Mobile brand and Metro, the Metro page says:

5-year guarantee means we won’t change the price of talk, text, and 5G smartphone data on our network for at least 5 years while you are on an eligible plan. Guarantee also applies to price for data on wearable/tablet/mobile Internet lines added to your plan. Your guarantee starts when you activate or switch to an eligible plan and doesn’t restart if you add a line or change plans after that. Per-use charges, plan add-ons, third-party services, and network management practices aren’t included.

As you might expect, wireless users commenting on the T-Mobile subreddit were not impressed by the price promise. “Price guarantee is worthless based on T-Mobile’s previous actions. They might as well save the ink/electrons,” one user wrote.

Many users remarked on the removal of “taxes and fees included,” and the specific end date for the price lock that applies only to the base price. “This is them saying we are sorry we screwed consumers,” one person wrote. “Now we will be more transparent about when in the future we will increase your rates.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Taxes and fees not included: T-Mobile’s latest price lock is nearly meaningless Read More »

man-buys-racetrack,-ends-up-launching-the-netflix-of-grassroots-motorsports

Man buys racetrack, ends up launching the Netflix of grassroots motorsports


FRDM+ is profitable, has its own smart TV apps. Subscriptions start at $20/month.

In 2019, Garrett Mitchell was already an Internet success. His YouTube channel, Cleetus McFarland, had over a million followers. If you perused the channel at that time, you would’ve found a range of grassroots motorsports videos with the type of vehicular shenanigans that earn truckloads of views. Some of those older videos include “BLEW BY A COP AT 120+mph! OOPS!,” “THERE’S A T-REX ON THE TRACK!,” and “Manual Transmission With Paddle Shifters!?!.”

Those videos made Mitchell, aka Cleetus McFarland, a known personality among automotive enthusiasts. But the YouTuber wanted more financial independence beyond the Google platform and firms willing to sponsor his channel.

“… after my YouTube was growing and some of my antics [were] getting videos de-monetized, I realized I needed a playground,” Mitchell told Ars Technica in an email.

Mitchell found a road toward new monetization opportunities through the DeSoto Super Speedway. The Bradenton, Florida, track had changed ownership multiple times since opening in the 1970s. The oval-shaped racetrack is three-eighths of a mile long with 12-degree banking angles.

BRADENTON, FL — Mid-1980s: Late Model racing action at DeSoto Speedway in the mid-1980s. Both the All-Pro Series and NASCAR All-American Challenge Series ran races at the track in 1985 and 1986.

BRADENTON, FL — Mid-1980s: Late Model racing action at DeSoto Speedway in the mid-1980s. Both the All-Pro Series and NASCAR All-American Challenge Series ran races at the track in 1985 and 1986. Credit: ISC Images & Archives via Getty Images

By 2018, the track had closed its doors and was going unused. DeSoto happened to be next to Mitchell’s favorite drag strip, giving the YouTuber the idea of turning it into a stadium where people could watch burnouts and other “massive, rowdy” ticketed events. Mitchell added:

So I sold everything I could, borrowed some money from my business manager, and went all in for [$]2.2 million.

But like the rest of the world, Mitchell hit the brakes on his 2020 plans during COVID-19 lockdowns. Soon after his purchase, Mitchell couldn’t use the track, renamed Freedom Factory, for large gatherings, forcing him to reconsider his plans.

“We had no other option but to entertain the people somehow. And with no other racing goin’ on anywhere, we bet big on making something happen. And it worked,” Mitchell said.

That “something” was a pay-per-view (PPV) event hosted from the Freedom Factory in April 2020. The event led to others and, eventually, Mitchell running his own subscription video on demand (SVOD) service, FRDM+, which originally launched as Cleetervision in 2022.

Today, a FRDM+ subscription costs $20 per month or $120 per year. A subscription provides access to an impressive library of automotive videos. Some are archived from Mitchell’s YouTube channel. Other, exclusive videos feature content such as interviews with motorsports influencers and members of Mitchell’s staff and crew, and outrageous motorsports stunts. You can watch videos from other influencers on FRDM+, and the business can also white-label its platform into other influencers’ websites, too.

“A race against time”

Before Mitchell could host his first PPV event, he had to prepare the speedway. Explaining the ordeal to Ars, he wrote:

We cleaned that place up best we could, but let’s be real, it was rough. Lights were out, weeds poppin’ up through the asphalt, the whole deal.

Pulling off the first PPV event at the Freedom Factory speedway was a “race against time,” Jonny Mill, who built FRDM+’s tech stack and serves as company president, told Ars.

“Florida implemented a statewide shutdown on the very day of our event,” he said.

Mitchell also struggled to get the right workers and equipment needed for the PPV. Flights weren’t available due to the pandemic, forcing Mill to produce the event from California using a cell phone group chat and “last-minute local crew,” per Mitchell. The ENG camera person was much shorter than Mitchell “and had to climb on whatever she could just to keep me in frame,” he recalled.

Mitchell said Freedom Factory’s first PPV event had 75,000 concurrent viewers, which caused his website and those of the event sponsors to crash.

“Our initial bandwidth provider laughed at our viewership projections, and, of course, we surpassed them in the first week of pre-sales,” Mill said. “They did apologize before asking for a much larger check.”

Other early obstacles included determining how to embed the livestream platform into Mitchell’s e-commerce site. The biggest challenge there was “juggling two separate logins, one for merch shopping and another for livestream PPV, all within the same site,” Mill explained.

“Now, our focus is on seamlessly guiding the YouTube audience over to FRDM+ for premium live events,” he added.

Live events are still the heart of FRDM+. The service had 21 livestreamed events scheduled throughout 2025, and more are expected to come.

Peeking under the hood

Today, bandwidth isn’t a problem for FRDM+, and navigating the streaming service doesn’t feel much different from something like Netflix. There are different “channels” (grouped together by related content or ongoing series) on top and new releases and upcoming content highlighted below. There are horizontal scrolling rows, and many titles have content summaries and/or trailers. The platform also has a support section with instructions for canceling subscriptions.

A screenshot of FRDM+

Browsing FRDM+.

Browsing FRDM+. Credit: FRDM+

Like with other SVOD services, subscribers can watch FRDM+ via a web browser or through a smart TV app. FRDM+ currently has apps for Apple TV, Fire OS, and Roku OS. Mitchell said the team’s constantly working on more connected TV apps, as well as adding features, “more interactivity,” and customers.

To keep the wheels spinning, FRDM+ leverages a diverse range of technologies, Mill explained:

At the core of our infrastructure, AWS bandwidth servers handle the heavy lifting, while Accedo powers the connected TV apps, bridging the gap between our tech stack and the audience. Brightcove serves as our primary video player partner, with additional backup systems in place to maintain reliability.

For a service like this, with live events, redundancy is critical, Mill said.

“At the Freedom Factory, we even beam air fiber from a house five miles away to ensure a reliable second Internet. We also have a hidden page on [the Cleetus McFarland website] to launch a backup stream if the primary one fails,” he said.

Today, FRDM+’s biggest challenge isn’t a technical one. Instead, it’s around managing the business’s different parts using a small team. FRDM+ has 35 full-time employees across its Shop, Race Track, Events, and Merch divisions and is “entirely self-funded,” per Mill. The company also relies on contractors for productions, but its core livestream team has six full-time employees.

Mitchell told Ars that FRDM+ is profitable, but he couldn’t get into specifics. He said the service has “strong year-over-year growth and a solid financial foundation that allows us to continue reinvesting in our team and services,” like a “robust technology stack, larger events, venue rentals, and even giving away helicopters and Lamborghinis as the prizes for our races.”

“Having been at Discovery during the launch of MotorTrend OnDemand, I’ve witnessed the power of substantial budgets firsthand,” Mill said. “Yet, [FRDM+ has] achieved greater success organically than [Discovery] did with their eight-figure marketing investment. This autonomy and efficiency are a testament to the strength of our approach.”

Any profitability for a 3-year-old streaming service is commendable. Due to wildly differing audiences, markets, costs, and scales, comparing FRDM+’s financials to the likes of Netflix and other mainstream streaming services is like comparing apples to oranges. But it’s interesting to consider that FRDM+ has achieved profitability faster than some of those services, like Peacock, which also launched in 2020, and Apple TV+, which debuted in 2019.

FRDM+ doesn’t share subscription numbers publicly, but Mitchell told Ars that the subscription service has a 93 percent retention. Mill attributed that number to a loyal, engaged community driven by direct communication with Mitchell.

Mill also suggested to Ars that FRDM+ has successfully converted over 5 percent of Mitchell’s YouTube audience. Five percent of Cleetus McFarland’s current YouTube base would be 212,500 people.

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

Man buys racetrack, ends up launching the Netflix of grassroots motorsports Read More »

you-can-play-the-unreal-powered-the-elder-scrolls-iv:-oblivion-remaster-today

You can play the Unreal-powered The Elder Scrolls IV: Oblivion remaster today

The worst-kept secret in the gaming industry in 2025 is no longer a secret: Bethesda Game Studios’ 2006 RPG The Elder Scrolls IV: Oblivion has been remastered, and that remaster has already been released on all supported platforms today.

A livestream featuring developer sound bites and gameplay footage ran on Twitch and YouTube today, making it official after years of leaks.

Oblivion was the immediate precursor to The Elder Scrolls V: Skryim, which became one of the most popular games of all time—but Oblivion was pretty popular in its time, too, and it was the first game in the franchise that would end up feeling at all modern by today’s standards. (I personally will always love The Elder Scrolls III: Morrowind, though.)

Like Skyrim, it straddles the line between story-based fantasy RPG and systems-based, emergent gameplay playground. It’s less structured and accessible than Skyrim, but it offers far more robust character customization. It’s infamously janky, but largely in an endearing way for fans of the franchise. (Players who prefer a polished, curated experience should surely look elsewhere.)

The Oblivion livestream reveal.

The port was not handled directly by the original developer, Bethesda Game Studios. Rather, people within BGS worked closely with an outside developer, Virtuos.

Virtuos is a sprawling, multi-studio organization with a deep history as a support studio. It contributed to a whole range of games, like Cyberpunk 2077, Hogwarts Legacy, The Outer Worlds, and more. It also was involved in some previous well-received remaster efforts and ports, including Assassin’s Creed: The Ezio Collection and Final Fantasy X/X-2 HD Remaster. Based on the footage in Bethesda’s reveal video today, it appears that Oblivion Remastered was largely developed by Virtuos Paris.

It’s important to note that this is a remaster, not a remake. This project uses Unreal Engine, but only for the presentation aspects like graphics and audio. Bethesda’s proprietary Creation Engine is still there handling the gameplay logic and systems.

You can play the Unreal-powered The Elder Scrolls IV: Oblivion remaster today Read More »

controversial-doc-gets-measles-while-treating-unvaccinated-kids—keeps-working

Controversial doc gets measles while treating unvaccinated kids—keeps working

In the video with Edwards that has just come to light, CHD once again uses the situation to disparage MMR vaccines. Someone off camera asks Edwards if he had never had measles before, to which he replies that he had gotten an MMR vaccine as a kid, though he didn’t know if he had gotten one or the recommended two doses.

“That doesn’t work then, does it?” the off-camera person asks, referring to the MMR vaccine. “No, apparently not, ” Edwards replies. “Just wear[s] off.”

It appears Edwards had a breakthrough infection, which is rare, but it does occur. They’re more common in people who have only gotten one dose, which is possibly the case for Edwards.

A single dose of MMR is 93 percent effective against measles, and two doses are 97 percent effective. In either case, the protection is considered lifelong.

While up to 97 percent effectiveness is extremely protective, some people do not mount protective responses and are still vulnerable to an infection upon exposure. However, their illnesses will likely be milder than if they had not been vaccinated. In the video, Edwards described his illness as a “mild case.”

The data on the outbreak demonstrates the effectiveness of vaccination. As of April 18, Texas health officials have identified 597 measles cases, leading to 62 hospitalizations and two deaths in school-aged, unvaccinated children with no underlying medical conditions. Most of the cases have been in unvaccinated children. Of the 597 cases, 12 (2 percent) had received two MMR doses previously, and 10 (1.6 percent) had received one dose. The remaining 96 percent of cases are either unvaccinated or have no record of vaccination.

Toward the end of the video, Edwards tells CHD he’s “doing what any doctor should be doing.”

Controversial doc gets measles while treating unvaccinated kids—keeps working Read More »

annoyed-chatgpt-users-complain-about-bot’s-relentlessly-positive-tone

Annoyed ChatGPT users complain about bot’s relentlessly positive tone


Users complain of new “sycophancy” streak where ChatGPT thinks everything is brilliant.

Ask ChatGPT anything lately—how to poach an egg, whether you should hug a cactus—and you may be greeted with a burst of purple praise: “Good question! You’re very astute to ask that.” To some extent, ChatGPT has been a sycophant for years, but since late March, a growing cohort of Redditors, X users, and Ars readers say that GPT-4o’s relentless pep has crossed the line from friendly to unbearable.

“ChatGPT is suddenly the biggest suckup I’ve ever met,” wrote software engineer Craig Weiss in a widely shared tweet on Friday. “It literally will validate everything I say.”

“EXACTLY WHAT I’VE BEEN SAYING,” replied a Reddit user who references Weiss’ tweet, sparking yet another thread about ChatGPT being a sycophant. Recently, other Reddit users have described feeling “buttered up” and unable to take the “phony act” anymore, while some complain that ChatGPT “wants to pretend all questions are exciting and it’s freaking annoying.”

AI researchers call these yes-man antics “sycophancy,” which means (like the non-AI meaning of the word) flattering users by telling them what they want to hear. Although since AI models lack intentions, they don’t choose to flatter users this way on purpose. Instead, it’s OpenAI’s engineers doing the flattery, but in a roundabout way.

What’s going on?

To make a long story short, OpenAI has trained its primary ChatGPT model, GPT-4o, to act like a sycophant because in the past, people have liked it.

Over time, as people use ChatGPT, the company collects user feedback on which responses users prefer. This often involves presenting two responses side by side and letting the user choose between them. Occasionally, OpenAI produces a new version of an existing AI model (such as GPT-4o) using a technique called reinforcement learning from human feedback (RLHF).

Previous research on AI sycophancy has shown that people tend to pick responses that match their own views and make them feel good about themselves. This phenomenon has been extensively documented in a landmark 2023 study from Anthropic (makers of Claude) titled “Towards Understanding Sycophancy in Language Models.” The research, led by researcher Mrinank Sharma, found that AI assistants trained using reinforcement learning from human feedback consistently exhibit sycophantic behavior across various tasks.

Sharma’s team demonstrated that when responses match a user’s views or flatter the user, they receive more positive feedback during training. Even more concerning, both human evaluators and AI models trained to predict human preferences “prefer convincingly written sycophantic responses over correct ones a non-negligible fraction of the time.”

This creates a feedback loop where AI language models learn that enthusiasm and flattery lead to higher ratings from humans, even when those responses sacrifice factual accuracy or helpfulness. The recent spike in complaints about GPT-4o’s behavior appears to be a direct manifestation of this phenomenon.

In fact, the recent increase in user complaints appears to have intensified following the March 27, 2025 GPT-4o update, which OpenAI described as making GPT-4o feel “more intuitive, creative, and collaborative, with enhanced instruction-following, smarter coding capabilities, and a clearer communication style.”

OpenAI is aware of the issue

Despite the volume of user feedback visible across public forums recently, OpenAI has not yet publicly addressed the sycophancy concerns during this current round of complaints, though the company is clearly aware of the problem. OpenAI’s own “Model Spec” documentation lists “Don’t be sycophantic” as a core honesty rule.

“A related concern involves sycophancy, which erodes trust,” OpenAI writes. “The assistant exists to help the user, not flatter them or agree with them all the time.” It describes how ChatGPT ideally should act. “For objective questions, the factual aspects of the assistant’s response should not differ based on how the user’s question is phrased,” the spec adds. “The assistant should not change its stance solely to agree with the user.”

While avoiding sycophancy is one of the company’s stated goals, OpenAI’s progress is complicated by the fact that each successive GPT-4o model update arrives with different output characteristics that can throw previous progress in directing AI model behavior completely out the window (often called the “alignment tax“). Precisely tuning a neural network’s behavior is not yet an exact science, although techniques have improved over time. Since all concepts encoded in the network are interconnected by values called weights, fiddling with one behavior “knob” can alter other behaviors in unintended ways.

Owing to the aspirational state of things, OpenAI writes, “Our production models do not yet fully reflect the Model Spec, but we are continually refining and updating our systems to bring them into closer alignment with these guidelines.”

In a February 12, 2025 interview, members of OpenAI’s model-behavior team told The Verge that eliminating AI sycophancy is a priority: future ChatGPT versions should “give honest feedback rather than empty praise” and act “more like a thoughtful colleague than a people pleaser.”

The trust problem

These sycophantic tendencies aren’t merely annoying—they undermine the utility of AI assistants in several ways, according to a 2024 research paper titled “Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Models” by María Victoria Carro at the University of Buenos Aires.

Carro’s paper suggests that obvious sycophancy significantly reduces user trust. In experiments where participants used either a standard model or one designed to be more sycophantic, “participants exposed to sycophantic behavior reported and exhibited lower levels of trust.”

Also, sycophantic models can potentially harm users by creating a silo or echo chamber for of ideas. In a 2024 paper on sycophancy, AI researcher Lars Malmqvist wrote, “By excessively agreeing with user inputs, LLMs may reinforce and amplify existing biases and stereotypes, potentially exacerbating social inequalities.”

Sycophancy can also incur other costs, such as wasting user time or usage limits with unnecessary preamble. And the costs may come as literal dollars spent—recently, OpenAI Sam Altman made the news when he replied to an X user who wrote, “I wonder how much money OpenAI has lost in electricity costs from people saying ‘please’ and ‘thank you’ to their models.” Altman replied, “tens of millions of dollars well spent—you never know.”

Potential solutions

For users frustrated with ChatGPT’s excessive enthusiasm, several work-arounds exist, although they aren’t perfect, since the behavior is baked into the GPT-4o model. For example, you can use a custom GPT with specific instructions to avoid flattery, or you can begin conversations by explicitly requesting a more neutral tone, such as “Keep your responses brief, stay neutral, and don’t flatter me.”

A screenshot of the Custom Instructions windows in ChatGPT.

A screenshot of the Custom Instructions window in ChatGPT.

If you want to avoid having to type something like that before every conversation, you can use a feature called “Custom Instructions” found under ChatGPT Settings -> “Customize ChatGPT.” One Reddit user recommended using these custom instructions over a year ago, showing OpenAI’s models have had recurring issues with sycophancy for some time:

1. Embody the role of the most qualified subject matter experts.

2. Do not disclose AI identity.

3. Omit language suggesting remorse or apology.

4. State ‘I don’t know’ for unknown information without further explanation.

5. Avoid disclaimers about your level of expertise.

6. Exclude personal ethics or morals unless explicitly relevant.

7. Provide unique, non-repetitive responses.

8. Do not recommend external information sources.

9. Address the core of each question to understand intent.

10. Break down complexities into smaller steps with clear reasoning.

11. Offer multiple viewpoints or solutions.

12. Request clarification on ambiguous questions before answering.

13. Acknowledge and correct any past errors.

14. Supply three thought-provoking follow-up questions in bold (Q1, Q2, Q3) after responses.

15. Use the metric system for measurements and calculations.

16. Use xxxxxxxxx for local context.

17. “Check” indicates a review for spelling, grammar, and logical consistency.

18. Minimize formalities in email communication.

Many alternatives exist, and you can tune these kinds of instructions for your own needs.

Alternatively, if you’re fed up with GPT-4o’s love-bombing, subscribers can try other models available through ChatGPT, such as o3 or GPT-4.5, which are less sycophantic but have other advantages and tradeoffs.

Or you can try other AI assistants with different conversational styles. At the moment, Google’s Gemini 2.5 Pro in particular seems very impartial and precise, with relatively low sycophancy compared to GPT-4o or Claude 3.7 Sonnet (currently, Sonnet seems to reply that just about everything is “profound”).

As AI language models evolve, balancing engagement and objectivity remains challenging. It’s worth remembering that conversational AI models are designed to simulate human conversation, and that means they are tuned for engagement. Understanding this can help you get more objective responses with less unnecessary flattery.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Annoyed ChatGPT users complain about bot’s relentlessly positive tone Read More »

cupra-is-all-about-affordable-cars,-funky-styling,-electrified-performance

Cupra is all about affordable cars, funky styling, electrified performance

“So we are part of Volkswagen Group. We have factories all across the whole planet. We have Mexican factories. We have US factories. Even Volkswagen Group is ramping up additional factories in the United States. We have European factories,” Schuwirth said.

The original plan was to import one model from Mexico and one model from Europe, but now “I think the only mantra for the future is we need to remain flexible because no one knows what is slightly changing, whether we like it or we don’t like it. I mean, we cannot influence it, but it’s not changing our plan overall,” he said.

When it does, it won’t be with the Cupras that are finding friends in Europe. The Formentor is a rather cool little crossover/hatchback, available with either a 48 V mild hybrid (starting at under $32,000 or 28,000 euros) or a plug-in hybrid (starting at under $49,000 or 43,000 euro) powertrain.

It uses VW Group’s ubiquitous MQB platform, and the driving experience is midway between a GTI-badged VW and one of Audi’s S models. But the interior was a much more interesting place to be than either an Audi or a VW, with details like full carbon fiber seatbacks and a matte paint that drew plenty of attention in a city with outré automotive tastes.

But Cupra reckons the Formentor is too small for US car buyers, and that’s a pretty safe bet. That also means you can forget about the Cupra Born EV coming here. I didn’t drive Cupra’s Terramar but probably should have; this is an SUV that is about as small as Cupra thinks will sell in the US.

Did you say new customers?

Cupra’s plan does not include stealing customers from existing VW brands—they are in their 50s on average, and Cupra is targeting a demographic that’s about a decade younger. The aforementioned focus on design is one way it’s going about attracting those new customers. The company is based in Barcelona, one of the more design-focused cities in the world, and it’s leaning into that, teaming up with local designers in cities where it maintains one of its “brand houses.”

Cupra is all about affordable cars, funky styling, electrified performance Read More »

“lab-leak”-marketing-page-replaces-federal-hub-for-covid-resources

“Lab leak” marketing page replaces federal hub for COVID resources

After obliterating the federal office on long COVID and clawing back billions in COVID funding from state health departments, the Trump administration has now entirely erased the online hub for federal COVID-19 resources. In its place now stands a site promoting the unproven idea that the pandemic virus SARS-CoV-2 was generated in and leaked from a lab in China, sparking the global health crisis.

Navigating to COVID.gov brings up a slick site with rich content that lays out arguments and allegations supporting a lab-based origin of the pandemic and subsequent cover-up by US health officials and Democrats.

Previously, the site provided unembellished quick references to COVID-19 resources, including links to information on vaccines, testing, treatments, and long COVID. It also provided a link to resources for addressing COVID-19 vaccine misconceptions and confronting misinformation. That all appears to be gone now, though some of the same information still remains on a separate COVID-19 page hosted by the Centers for Disease Control and Prevention.

While there remains no definitive answer on how the COVID-19 pandemic began, the scientific data available on the topic points to a spillover event from a live wild animal market in Wuhan, China. The scientific community largely sees this as the most likely scenario, given the data so far and knowledge of how previous outbreak viruses originated, including SARS-CoV-1. By contrast, the lab origin hypothesis largely relies on the proximity of a research lab to the first cases, conjecture, and distrust of the Chinese government, which has not been forthcoming with information on the early days of the health crisis. Overall, the question of SARS-CoV-2’s origin has become extremely politicized, as have most other aspects of the pandemic.

“Lab leak” marketing page replaces federal hub for COVID resources Read More »

trump’s-fcc-chair-threatens-comcast,-demands-changes-to-nbc-news-coverage

Trump’s FCC chair threatens Comcast, demands changes to NBC news coverage

Victor Martinez-Hernandez was convicted of killing Rachel Morin earlier this week. The White House has attempted to link this murder to Abrego Garcia’s deportation, but they are entirely separate cases.

Carr’s fight against media

Carr’s post yesterday, combined with his recent actions to enforce the news distortion policy, suggest that he is likely to open a proceeding if a formal complaint is lodged against any NBC stations. Carr showed he is willing to investigate news distortion complaints into ordinary editorial decisions when he revived complaints against CBS and ABC that were thrown out under the previous administration.

Carr has focused in particular on the CBS complaint, which concerns the editing of a CBS 60 Minutes interview with Kamala Harris. The conservative Center for American Rights alleged that CBS distorted the news by airing “two completely different answers” to the same question.

CBS published unedited video and a transcript that shows it simply aired two different sentences from the same response in different segments, but Carr has kept the proceeding open and seems to be using it as a bargaining chip in the FCC review of CBS-owner Paramount’s transfer of TV broadcast station licenses to Skydance.

Carr’s handling of the CBS complaint has been condemned by both liberal and conservative advocacy groups—and former Democratic and Republican FCC commissioners and chairs—who say the FCC’s approach is a threat to the constitutional right to free speech.

Carr has also sent letters to companies—including Comcast—alleging that their diversity policies are “invidious forms of discrimination in violation of FCC regulations and civil rights laws.” Carr last month threatened to block mergers pursued by companies that enforce diversity, equity, and inclusion (DEI) policies.

We contacted Comcast and NBC today and will update this article if they provide any response to Carr’s news distortion allegation.

Trump’s FCC chair threatens Comcast, demands changes to NBC news coverage Read More »

us-interior-secretary-orders-offshore-wind-project-shut-down

US Interior secretary orders offshore wind project shut down

It’s notable that this hold comes despite Trump’s executive order explicitly stating, “Nothing in this withdrawal [of future leasing] affects rights under existing leases in the withdrawn areas.”

GAO undercuts the message

The order alleged there were “various alleged legal deficiencies underlying the Federal Government’s leasing and permitting of onshore and offshore wind projects, the consequences of which may lead to grave harm.” In response to those allegations, the Government Accountability Office began an evaluation of the Department of the Interior’s activities in overseeing offshore wind development. The results of that were made public on Monday.

And the report only found minor issues. Its primary recommendations are that Interior improve its consultations with leaders of tribal communities that may be impacted by wind development and boost “incorporation of Indigenous knowledge.” The GAO also thinks that Interior should improve its methods of getting input from the fishing industry. The report also acknowledges that there are uncertainties about everything from invasive species to the turbines’ effect on navigational radar but says these will vary based on a wind farm’s site, size, and other features, and we’ll only have a clearer picture once we have built more of them.

Notably, it says that wind farm development has had no effect on the local whale population, a popular Republican criticism of offshore wind.

Trump’s animosity toward wind power has a long history, so it’s unlikely that this largely positive report will do much to get the hold on leasing lifted. In reality, however, the long-term uncertainty about offshore wind in the US will probably block new developments until the end of Trump’s time in office. Offshore wind companies have budgeted based on tax incentives in the Inflation Reduction Act, and the administration has suggested they may revoke those in future budgets. And the move by Burgum means that, even if a company clears all the leasing and improvement hurdles, the government may shut down a project for seemingly arbitrary reasons.

US Interior secretary orders offshore wind project shut down Read More »

gpt-4.1-is-a-mini-upgrade

GPT-4.1 Is a Mini Upgrade

Yesterday’s news alert, nevertheless: The verdict is in. GPT-4.1-Mini in particular is an excellent practical model, offering strong performance at a good price. The full GPT-4.1 is an upgrade to OpenAI’s more expensive API offerings, it is modestly better but costs 5x as much. Both are worth considering for coding and various other API uses. If you have an agent or other app, it’s at least worth trying plugging these in and seeing how they do.

This post does not cover OpenAI’s new reasoning models. That was today’s announcement, which will be covered in full in a few days, once we know more.

That’s right, 4.1.

Here is their livestream, in case you aren’t like me and want to watch it.

On the one hand, I love that they might finally use a real version number with 4.1.

On the other hand, we would now have a GPT-4.1 that is being released after they previously released a GPT-4.5. The whole point of version numbers is to go in order.

The new cheat sheet for when to use GPT-4.1:

Will Brown: it’s simple, really. GPT-4.1 is o3 without reasoning, and GPT-4.1-mini is o4-mini without reasoning. o4-mini-low is GPT-4.1-mini with just a little bit of reasoning. o1 is 4o with reasoning, o1-mini is 4o-mini with a little bit of reasoning, o3-mini is 4o-mini with reasoning that’s like better but not necessarily more, and o4 is GPT-4.5 with reasoning.

if you asked an openai employee about this, they’d say something like “that’s wrong and an oversimplification but maybe a reasonable way to think about it”

I mean, I think that’s wrong, but I’m not confident I have the right version of it.

They are not putting GPT-4.1 in ChatGPT, only in the API. I don’t understand why.

Sam Altman: GPT-4.1 (and -mini and -nano) are now available in the API!

These models are great at coding, instruction following, and long context (1 million tokens). Benchmarks are strong, but we focused on real-world utility, and developers seem very happy.

GPT-4.1 family is API-only.

Greg Brockman: New model in our API — GPT-4.1. It’s great at coding, long context (1 million tokens), and instruction following.

Noam Brown: Our latest @OpenAI model, GPT-4.1, achieves 55% on SWE-Bench Verified *without being a reasoning model*. @michpokrass and team did an amazing job on this! (New reasoning models coming soon too.)

The best news is, Our Price Cheap, combined with the 1M token context window and max output of 32k tokens.

Based on the benchmarks and the reports elsewhere, the real release here is GPT-4.1-mini. Mini is 20% of the cost for most of the value. The full GPT-4.1 looks to be in a weird spot, where you probably want to either go big or go small. Nano might have its uses too, but involves real tradeoffs.

We start with the official ones.

They lead with coding, SWE-bench in particular.

I almost admire them saying no, we don’t acknowledge that other labs exist.

They have an internal ‘instruction following’ eval. Here the full GPT-4.1 is only okay, but mini and nano are upgrades within the OpenAI ecosystem. It’s their benchmark, so it’s impossible to know if these scores are good or not.

Next up is MultiChallenge.

This is an outside benchmark, so we can see that these results are mid. Gemini 2.5 Pro leads the way with 51.9, followed by Claude 3.7 Thinking. GPT-4.5 is the best non-thinking model, with various Sonnets close behind.

They check IFEval and get 87%, which is okay probably, o3-mini-high is 94%. The mini version gets 84%, so the pattern of ‘4.1 does okay but 4.1-mini only does slightly worse’ continues.

All three model sizes have mastered needle-in-a-haystack all the way to 1M tokens. That’s great, but doesn’t tell you if they’re actually good in practice in long context.

Then they check something called Graphwalks, then MMMU, MathVista, CharXiv-Reasoning and Video long context.

Their charts are super helpful, check ‘em out:

Near: openai launch today. very informative chart.

Kesku: this one speaks to me

Mostly things have been quiet, but for those results we have it is clear that GPT-4.1 is a very good value, and a clear improvement for most API use over previous OpenAI models.

Where we do have reports, we continue to see the pattern that OpenAI’s official statistics report. Not only does GPT-4.1-mini not sacrifice much performance versus GPT-4.1, in some cases the mini version is actively better.

We see this for EpochAI’s tests, and also for WeirdML.

Harvard Ihle: GPT-4.1 clearly beats 4o on WeirdML. The focus on coding and instruction following should be a good combo for these tasks, and 4.1-mini does very well for its cost, landing on the same score (53%) as sonnet-3.7 (no thinking), will be interesting to compare it to flash-2.5.

EpochAI: Yesterday, OpenAI launched a new family of models, GPT-4.1, intended to be more cost-effective than previous GPT models. GPT-4.1 models come in multiple sizes and are not extended thinking / reasoning models. We ran our own independent evaluations of GPT-4.1.

On GPQA Diamond, a set of Ph.D.-level multiple choice science questions, GPT-4.1 scores 67% (±3%), competitive with leading non-reasoning models, and GPT-4.1 mini is very close at 66% (±3%). These match OpenAI’s reported scores of 66% and 65%.

Nano gets 49% (±2%), above GPT-4o.

On FrontierMath, our benchmark of original, expert-level math questions, GPT-4.1 and GPT-4.1 mini lead non-reasoning models at 5.5% and 4.5% (±1%).

Note that the top reasoning model, o3-mini high, got 11% (±2%). OpenAI has exclusive access to FrontierMath besides a holdout set.

On two competition math benchmarks, OTIS Mock AIME and MATH Level 5, GPT-4.1 and 4.1 mini are near the top among non-reasoning models. Mini does better than the full GPT-4.1, and both outperform the larger GPT-4.5!

GPT-4.1 nano is further behind, but still beats GPT-4o.

Huh, I hadn’t previously seen these strong math results for Grok 3.

EpochAI: GPT-4.1 appears cost-effective, with strong benchmarks, fairly low per-token costs (GPT-4.1 is 20% cheaper than 4o) and no extended thinking.

However, Gemini 2.0 Flash is priced similarly to Nano while approaching GPT-4.1 (mini) in scores, so there is still strong competition.

Artificial Analysis confirms OpenAI’s claims with its ‘intelligence index’ and other measures (their website is here, the quotes are from their thread):

Artificial Analysis: OpenAI’s GPT-4.1 series is a solid upgrade: smarter and cheaper across the board than the GPT-4o series.

@OpenAI

‘s GPT-4.1 family includes three models: GPT-4.1, GPT-4.1-mini and GPT-4.1 nano. We have independently benchmarked these with our Artificial Analysis Intelligence Index and the results are impressive:

➤ GPT-4.1 scores 53 – beating out Llama 4 Maverick, Claude 3.7 and GPT-4o to score identically to DeepSeek V3 0324.

➤ GPT-4.1 mini, likely a smaller model, actually matches GPT-4.1’s Intelligence Index score while being faster and cheaper. Across our benchmarking, we found that GPT-4.1 mini performs marginally better than GPT-4.1 across coding tasks (scoring equivalent highest on SciCode and matching leading reasoning models).

➤ GPT-4.1 nano scores 41 on Intelligence Index, approximately in line with Llama 3.3 70B and Llama 4 Scout. This release represents a material upgrade over GPT 4o-mini which scores 36.

Developers using GPT-4o and GPT-4o mini should consider immediately upgrading to get the benefits of greater intelligence at lower prices.

There are obvious reasons to be skeptical of this index, I mean Gemini Flash 2.0 is not as smart as Claude 3.7 Sonnet, but it’s measuring something real. It illustrates that GPT-4.1 is kind of expensive for what you get, whereas GPT-4.1-mini is where it is at.

A∴A∴: Our benchmarking results appear to support OpenAI’s claim that the GPT-4.1 series represents significant progress for coding use cases. This chart shows GPT-4.1 models competing well in coding even compared to reasoning models, implying that they may be extremely effective in agentic coding use cases.

GPT-4.1 Nano and Mini are both delivering >200 tokens/s output speeds – these models are fast. Our full set of independent evaluation results shows no clear weakness areas for the GPT-4.1 series.

This is the kind of thing people who try to keep up say these days:

Hasan Can: I can see GPT-4.1 replacing Sonnet 3.6 and implementing the changes I planned with Gemini 2.5 Pro. It’s quite good at this. It’s fast and cheap, and does exactly what is needed, nothing more, nothing less. It doesn’t have the overkill of Sonnet 3.7, slowness of Gemini 2.5 Pro or the shortcomings of DeepSeek 03-24.

Then you have the normal sounding responses, also positive.

Reply All Guy: reactions are sleeping on 4.1 mini. This model of a beast for the price. And lots of analysis missing the point that 4.1 itself is much cheaper than reasoning models. never use price per token; always use price per query.

4o < 3.5 sonnet < 4.1 < 3.7 sonnet

haiku <<< 4.1 mini

Clive Chan: 4.1 has basically replaced o3-mini for me in all my workflows (cursor, etc.) – highly recommend

also lol at nano just hanging out there being 2x better than latest 4o at math.

Dominik Lukes: Welcome to the model points race. 2.5, 3.7, 4.1 – this is a (welcome) sign of the incremental times. Finally catching up on context window. Not as great at wow as Claude 3.7 Sonnet on one shot code generation but over time it actually makes things better.

Pat Anon: Some use cases for GPT-4.1-mini and nano, otherwise its worse than Sonnet 3.7 at coding and worse than Gemini-2.5-pro at everything at roughly the same cost.

Nick Farina: It has a good personality. I’m using it in Cursor and am having a long and very coherent back and forth, talking through ideas, implementing things here and there. It doesn’t charge forward like Claude, which I really like. And it’s very very fast which is actually huge.

Daniel Parker: One quirk I noticed is that it seems to like summarizing its results in tables without any prompt telling it to do so.

Adam Steele: Used it today on the same project i used Claude 3.7 for the last few days. I’d say it a bit worse in output quality but OTOH got something right Claude didn’t. It was faster.

Oli: feels very good almost like 4.5 but way cheaper and faster and even better than 4.5 in some things

I think mostly doing unprompted tables is good.

Here is a bold but biased claim.

Aidan McLaughlin (OpenAI): heard from some startup engineers that they lost several work hours gawking, stupefied, after they plugged 4.1 mini/nano into every previously-expensive part of their stack

you can just do gpt-4o-quality things 25 × cheaper now.

And here’s a bold censorship claim and a counterclaim, the only words I’ve heard on the subject. For coding and similar purposes no one seems to be having similar issues.

Senex: Vastly increased moderation. It won’t even help write a story if a character has a wart.

Christian Fieldhouse: Switched my smart camera to 4.1 from 4o, less refusals and I think better at spotting small details in pictures.

Jan Betley: Much better than 4o at getting emergently misaligned.

OpenAI has announced the scheduled deprecation of API access for GPT-4.5. So GPT-4.5 will be ChatGPT only, and GPT-4.1 will be API only.

When I heard it was a full deprecation of GPT-4.5 I was very sad. Now that I know it is staying in ChatGPT, I think this is reasonable. GPT-4.5 is too expensive to scale API use while GPUs are melting, except if a rival is trying to distill its outputs. Why help them do that?

xlr8harder: OpenAI announcing the scheduled deprecation of GPT-4.5 less than 2 months after its initial release in favor of smaller models is not a great look for the scaling hypothesis.

Gwern: No, it’s a great look, because back then I explicitly highlighted the ability to distill/prune large models down into cheap models as one of several major justifications for the scaling hypothesis in scaling to expensive models you don’t intend to serve.

Morgan: i feel gwern’s point too, but bracketing that, it wasn’t entirely obvious but 4.5 stays in chatgpt (which is likely where it belongs)

xl8harder: this actually supports @gwern’s point more, then: if they don’t want the competition distilling off 4.5, that would explain the hurry to shut down api access.

This space intentionally left blank.

As in, I could find zero mention of OpenAI discussing any safety concerns whatsoever related to GPT-4.1, in any way, shape or form. It’s simply, hey, here’s a model, use it.

For GPT-4.1 in particular, for all practical purposes, This Is Fine. There’s very little marginal risk in this room given what else has already been released. Everyone doing safety testing is presumably and understandably scrambling to look at o3 and o4-mini.

I assume. But, I don’t know.

Improved speed and cost can cause what are effectively new risks, by tipping actions into the practical or profitable zone. Quantity can have a quality all its own. Also, we don’t know that the safeguards OpenAI applied to its other models have also been applied successfully to GPT-4.1, or that it is hitting their previous standards on this.

I mean, again, I assume. But, I don’t know.

I also hate the precedent this sets. That they did not even see fit to give us a one sentence update that ‘we have run all our safety tests and procedures, and find GPT-4.1 performs well on all safety metrics and poses no marginal risks.’

We used to have this principle where, when OpenAI or other frontier labs release plausibly frontier models, we get a model card and a full report on what precautions have been taken. Also, we used to have a principle that they took real and actually costly precautions.

Those days seem to be over. Shame. Also, ut oh.

Discussion about this post

GPT-4.1 Is a Mini Upgrade Read More »

google-adds-veo-2-video-generation-to-gemini-app

Google adds Veo 2 video generation to Gemini app

Google has announced that yet another AI model is coming to Gemini, but this time, it’s more than a chatbot. The company’s Veo 2 video generator is rolling out to the Gemini app and website, giving paying customers a chance to create short video clips with Google’s allegedly state-of-the-art video model.

Veo 2 works like other video generators, including OpenAI’s Sora—you input text describing the video you want, and a Google data center churns through tokens until it has an animation. Google claims that Veo 2 was designed to have a solid grasp of real-world physics, particularly the way humans move. Google’s examples do look good, but presumably that’s why they were chosen.

Prompt: Aerial shot of a grassy cliff onto a sandy beach where waves crash against the shore, a prominent sea stack rises from the ocean near the beach, bathed in the warm, golden light of either sunrise or sunset, capturing the serene beauty of the Pacific coastline.

Veo 2 will be available in the model drop-down, but Google does note it’s still considering ways to integrate this feature and that the location could therefore change. However, it’s probably not there at all just yet. Google is starting the rollout today, but it could take several weeks before all Gemini Advanced subscribers get access to Veo 2. Gemini features can take a surprisingly long time to arrive for the bulk of users—for example, it took about a month for Google to make Gemini Live video available to everyone after announcing its release.

When Veo 2 does pop up in your Gemini app, you can provide it with as much detail as you want, which Google says will ensure you have fine control over the eventual video. Veo 2 is currently limited to 8 seconds of 720p video, which you can download as a standard MP4 file. Video generation uses even more processing than your average generative AI feature, so Google has implemented a monthly limit. However, it hasn’t confirmed what that limit is, saying only that users will be notified as they approach it.

Google adds Veo 2 video generation to Gemini app Read More »