AI

google-chrome-may-soon-use-“ai”-to-replace-compromised-passwords

Google Chrome may soon use “AI” to replace compromised passwords

Google’s Chrome browser might soon get a useful security upgrade: detecting passwords used in data breaches and then generating and storing a better replacement. Google’s preliminary copy suggests it’s an “AI innovation,” though exactly how is unclear.

Noted software digger Leopeva64 on X found a new offering in the AI settings of a very early build of Chrome. The option, “Automated password Change” (so, early stages—as to not yet get a copyedit), is described as, “When Chrome finds one of your passwords in a data breach, it can offer to change your password for you when you sign in.”

Chrome already has a feature that warns users if the passwords they enter have been identified in a breach and will prompt them to change it. As noted by Windows Report, the change is that now Google will offer to change it for you on the spot rather than simply prompting you to handle that elsewhere. The password is automatically saved in Google’s Password Manager and “is encrypted and never seen by anyone,” the settings page claims.

If you want to see how this works, you need to download a Canary version of Chrome. In the flags settings (navigate to “chrome://flags” in the address bar), you’ll need to enable two features: “Improved password change service” and “Mark all credential as leaked,” the latter to force the change notification because, presumably, it’s not hooked up to actual leaked password databases yet. Go to almost any non-Google site, enter in any user/password combination to try to log in, and after it fails or you navigate elsewhere, a prompt will ask you to consider changing your password.

Google Chrome may soon use “AI” to replace compromised passwords Read More »

sam-altman:-openai-is-not-for-sale,-even-for-elon-musk’s-$97-billion-offer

Sam Altman: OpenAI is not for sale, even for Elon Musk’s $97 billion offer

A brief history of Musk vs. Altman

The beef between Musk and Altman goes back to 2015, when the pair partnered (with others) to co-found OpenAI as a nonprofit. Musk cut ties with the company in 2018 but watched from the sidelines as OpenAI became a media darling in 2022 and 2023 following the launch of ChatGPT and then GPT-4.

In July 2023, Musk created his own OpenAI competitor, xAI (maker of Grok). Since then, Musk has become a frequent legal thorn in Altman and OpenAI’s side, at times suing both OpenAI and Altman personally, claiming that OpenAI has strayed from its original open source mission—especially after reports emerged about Altman’s plans to transition portions of OpenAI into a for-profit company, something Musk has fiercely criticized.

Musk initially sued the company and Altman in March 2024, claiming that OpenAI’s alliance with Microsoft had broken its agreement to make a major breakthrough in AI “freely available to the public.” Musk withdrew the suit in June 2024, then revived it in August 2024 under similar complaints.

Musk and Altman have been publicly trading barbs frequently on X and in the press over the past few years, most recently when Musk criticized Altman’s $500B “Stargate” AI infrastructure project announced last month.

This morning, when asked on Bloomberg Television if Musk’s move comes from personal insecurity about xAI, Altman replied, “Probably his whole life is from a position of insecurity.”

“I don’t think he’s a happy guy. I feel for him,” he added.

Sam Altman: OpenAI is not for sale, even for Elon Musk’s $97 billion offer Read More »

openai’s-secret-weapon-against-nvidia-dependence-takes-shape

OpenAI’s secret weapon against Nvidia dependence takes shape

OpenAI is entering the final stages of designing its long-rumored AI processor with the aim of decreasing the company’s dependence on Nvidia hardware, according to a Reuters report released Monday. The ChatGPT creator plans to send its chip designs to Taiwan Semiconductor Manufacturing Co. (TSMC) for fabrication within the next few months, but the chip has not yet been formally announced.

The OpenAI chip’s full capabilities, technical details, and exact timeline are still unknown, but the company reportedly intends to iterate on the design and improve it over time, giving it leverage in negotiations with chip suppliers—and potentially granting the company future independence with a chip design it controls outright.

In the past, we’ve seen other tech companies, such as Microsoft, Amazon, Google, and Meta, create their own AI acceleration chips for reasons that range from cost reduction to relieving shortages of AI chips supplied by Nvidia, which enjoys a near-market monopoly on high-powered GPUs (such as the Blackwell series) for data center use.

In October 2023, we covered a report about OpenAI’s intention to create its own AI accelerator chips for similar reasons, so OpenAI’s custom chip project has been in the works for some time. In early 2024, OpenAI CEO Sam Altman also began spending considerable time traveling around the world trying to raise up to a reported $7 trillion to increase world chip fabrication capacity.

OpenAI’s secret weapon against Nvidia dependence takes shape Read More »

developer-creates-endless-wikipedia-feed-to-fight-algorithm-addiction

Developer creates endless Wikipedia feed to fight algorithm addiction

On a recent WikiTok browsing run, I ran across entries on topics like SX-Window (a GUI for the Sharp X68000 series of computers), Xantocillin (“the first reported natural product found to contain the isocyanide functional group), Lorenzo Ghiberti (an Italian Renaissance sculptor from Florence), the William Wheeler House in Texas, and the city of Krautheim, Germany—none of which I knew existed before the session started.

How WikiTok took off

The original idea for WikiTok originated from developer Tyler Angert on Monday evening when he tweeted, “insane project idea: all of wikipedia on a single, scrollable page.” Bloomberg Beta VC James Cham replied, “Even better, an infinitely scrolling Wikipedia page based on whatever you are interested in next?” and Angert coined “WikiTok” in a follow-up post.

Early the next morning, at 12: 28 am, writer Grant Slatton quote-tweeted the WikiTok discussion, and that’s where Gemal came in. “I saw it from [Slatton’s] quote retweet,” he told Ars. “I immediately thought, ‘Wow I can build an MVP [minimum viable product] and this could take off.'”

Gemal started his project at 12: 30 am, and with help from AI coding tools like Anthropic’s Claude and Cursor, he finished a prototype by 2 am and posted the results on X. Someone later announced WikiTok on ycombinator’s Hacker News, where it topped the site’s list of daily news items.

A screenshot of the WikiTok web app running in a desktop web browser.

A screenshot of the WikiTok web app running in a desktop web browser. Credit: Benj Edwards

“The entire thing is only several hundred lines of code, and Claude wrote the vast majority of it,” Gemal told Ars. “AI helped me ship really really fast and just capitalize on the initial viral tweet asking for Wikipedia with scrolling.”

Gemal posted the code for WikiTok on GitHub, so anyone can modify or contribute to the project. Right now, the web app supports 14 languages, article previews, and article sharing on both desktop and mobile browsers. New features may arrive as contributors add them. It’s based on a tech stack that includes React 18, TypeScript, Tailwind CSS, and Vite.

And so far, he is sticking to his vision of a free way to enjoy Wikipedia without being tracked and targeted. “I have no grand plans for some sort of insane monetized hyper-calculating TikTok algorithm,” Gemal told us. “It is anti-algorithmic, if anything.

Developer creates endless Wikipedia feed to fight algorithm addiction Read More »

deepseek-ios-app-sends-data-unencrypted-to-bytedance-controlled-servers

DeepSeek iOS app sends data unencrypted to ByteDance-controlled servers


Apple’s defenses that protect data from being sent in the clear are globally disabled.

A little over two weeks ago, a largely unknown China-based company named DeepSeek stunned the AI world with the release of an open source AI chatbot that had simulated reasoning capabilities that were largely on par with those from market leader OpenAI. Within days, the DeepSeek AI assistant app climbed to the top of the iPhone App Store’s “Free Apps” category, overtaking ChatGPT.

On Thursday, mobile security company NowSecure reported that the app sends sensitive data over unencrypted channels, making the data readable to anyone who can monitor the traffic. More sophisticated attackers could also tamper with the data while it’s in transit. Apple strongly encourages iPhone and iPad developers to enforce encryption of data sent over the wire using ATS (App Transport Security). For unknown reasons, that protection is globally disabled in the app, NowSecure said.

Basic security protections MIA

What’s more, the data is sent to servers that are controlled by ByteDance, the Chinese company that owns TikTok. While some of that data is properly encrypted using transport layer security, once it’s decrypted on the ByteDance-controlled servers, it can be cross-referenced with user data collected elsewhere to identify specific users and potentially track queries and other usage.

More technically, the DeepSeek AI chatbot uses an open weights simulated reasoning model. Its performance is largely comparable with OpenAI’s o1 simulated reasoning (SR) model on several math and coding benchmarks. The feat, which largely took AI industry watchers by surprise, was all the more stunning because DeepSeek reported spending only a small fraction on it compared with the amount OpenAI spent.

A NowSecure audit of the app has found other behaviors that researchers found potentially concerning. For instance, the app uses a symmetric encryption scheme known as 3DES or triple DES. The scheme was deprecated by NIST following research in 2016 that showed it could be broken in practical attacks to decrypt web and VPN traffic. Another concern is that the symmetric keys, which are identical for every iOS user, are hardcoded into the app and stored on the device.

The app is “not equipped or willing to provide basic security protections of your data and identity,” NowSecure co-founder Andrew Hoog told Ars. “There are fundamental security practices that are not being observed, either intentionally or unintentionally. In the end, it puts your and your company’s data and identity at risk.”

Hoog said the audit is not yet complete, so there are many questions and details left unanswered or unclear. He said the findings were concerning enough that NowSecure wanted to disclose what is currently known without delay.

In a report, he wrote:

NowSecure recommends that organizations remove the DeepSeek iOS mobile app from their environment (managed and BYOD deployments) due to privacy and security risks, such as:

  1. Privacy issues due to insecure data transmission
  2. Vulnerability issues due to hardcoded keys
  3. Data sharing with third parties such as ByteDance
  4. Data analysis and storage in China

Hoog added that the DeepSeek app for Android is even less secure than its iOS counterpart and should also be removed.

Representatives for both DeepSeek and Apple didn’t respond to an email seeking comment.

Data sent entirely in the clear occurs during the initial registration of the app, including:

  • organization id
  • the version of the software development kit used to create the app
  • user OS version
  • language selected in the configuration

Apple strongly encourages developers to implement ATS to ensure the apps they submit don’t transmit any data insecurely over HTTP channels. For reasons that Apple hasn’t explained publicly, Hoog said, this protection isn’t mandatory. DeepSeek has yet to explain why ATS is globally disabled in the app or why it uses no encryption when sending this information over the wire.

This data, along with a mix of other encrypted information, is sent to DeepSeek over infrastructure provided by Volcengine a cloud platform developed by ByteDance. While the IP address the app connects to geo-locates to the US and is owned by US-based telecom Level 3 Communications, the DeepSeek privacy policy makes clear that the company “store[s] the data we collect in secure servers located in the People’s Republic of China.” The policy further states that DeepSeek:

may access, preserve, and share the information described in “What Information We Collect” with law enforcement agencies, public authorities, copyright holders, or other third parties if we have good faith belief that it is necessary to:

• comply with applicable law, legal process or government requests, as consistent with internationally recognised standards.

NowSecure still doesn’t know precisely the purpose of the app’s use of 3DES encryption functions. The fact that the key is hardcoded into the app, however, is a major security failure that’s been recognized for more than a decade when building encryption into software.

No good reason

NowSecure’s Thursday report adds to growing list of safety and privacy concerns that have already been reported by others.

One was the terms spelled out in the above-mentioned privacy policy. Another came last week in a report from researchers at Cisco and the University of Pennsylvania. It found that the DeepSeek R1, the simulated reasoning model, exhibited a 100 percent attack failure rate against 50 malicious prompts designed to generate toxic content.

A third concern is research from security firm Wiz that uncovered a publicly accessible, fully controllable database belonging to DeepSeek. It contained more than 1 million instances of “chat history, backend data, and sensitive information, including log streams, API secrets, and operational details,” Wiz reported. An open web interface also allowed for full database control and privilege escalation, with internal API endpoints and keys available through the interface and common URL parameters.

Thomas Reed, staff product manager for Mac endpoint detection and response at security firm Huntress, and an expert in iOS security, said he found NowSecure’s findings concerning.

“ATS being disabled is generally a bad idea,” he wrote in an online interview. “That essentially allows the app to communicate via insecure protocols, like HTTP. Apple does allow it, and I’m sure other apps probably do it, but they shouldn’t. There’s no good reason for this in this day and age.”

He added: “Even if they were to secure the communications, I’d still be extremely unwilling to send any remotely sensitive data that will end up on a server that the government of China could get access to.”

HD Moore, founder and CEO of runZero, said he was less concerned about ByteDance or other Chinese companies having access to data.

“The unencrypted HTTP endpoints are inexcusable,” he wrote. “You would expect the mobile app and their framework partners (ByteDance, Volcengine, etc) to hoover device data, just like anything else—but the HTTP endpoints expose data to anyone in the network path, not just the vendor and their partners.”

On Thursday, US lawmakers began pushing to immediately ban DeepSeek from all government devices, citing national security concerns that the Chinese Communist Party may have built a backdoor into the service to access Americans’ sensitive private data. If passed, DeepSeek could be banned within 60 days.

This story was updated to add further examples of security concerns regarding DeepSeek.

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

DeepSeek iOS app sends data unencrypted to ByteDance-controlled servers Read More »

chatgpt-comes-to-500,000-new-users-in-openai’s-largest-ai-education-deal-yet

ChatGPT comes to 500,000 new users in OpenAI’s largest AI education deal yet

On Tuesday, OpenAI announced plans to introduce ChatGPT to California State University’s 460,000 students and 63,000 faculty members across 23 campuses, reports Reuters. The education-focused version of the AI assistant will aim to provide students with personalized tutoring and study guides, while faculty will be able to use it for administrative work.

“It is critical that the entire education ecosystem—institutions, systems, technologists, educators, and governments—work together to ensure that all students have access to AI and gain the skills to use it responsibly,” said Leah Belsky, VP and general manager of education at OpenAI, in a statement.

OpenAI began integrating ChatGPT into educational settings in 2023, despite early concerns from some schools about plagiarism and potential cheating, leading to early bans in some US school districts and universities. But over time, resistance to AI assistants softened in some educational institutions.

Prior to OpenAI’s launch of ChatGPT Edu in May 2024—a version purpose-built for academic use—several schools had already been using ChatGPT Enterprise, including the University of Pennsylvania’s Wharton School (employer of frequent AI commentator Ethan Mollick), the University of Texas at Austin, and the University of Oxford.

Currently, the new California State partnership represents OpenAI’s largest deployment yet in US higher education.

The higher education market has become competitive for AI model makers, as Reuters notes. Last November, Google’s DeepMind division partnered with a London university to provide AI education and mentorship to teenage students. And in January, Google invested $120 million in AI education programs and plans to introduce its Gemini model to students’ school accounts.

The pros and cons

In the past, we’ve written frequently about accuracy issues with AI chatbots, such as producing confabulations—plausible fictions—that might lead students astray. We’ve also covered the aforementioned concerns about cheating. Those issues remain, and relying on ChatGPT as a factual reference is still not the best idea because the service could introduce errors into academic work that might be difficult to detect.

ChatGPT comes to 500,000 new users in OpenAI’s largest AI education deal yet Read More »

deepseek-is-“tiktok-on-steroids,”-senator-warns-amid-push-for-government-wide-ban

DeepSeek is “TikTok on steroids,” senator warns amid push for government-wide ban

But while the national security concerns require a solution, Curtis said his priority is maintaining “a really productive relationship with China.” He pushed Lutnick to address how he plans to hold DeepSeek—and the CCP in general—accountable for national security concerns amid ongoing tensions with China.

Lutnick suggested that if he is confirmed (which appears likely), he will pursue a policy of “reciprocity,” where China can “expect to be treated by” the US exactly how China treats the US. Currently, China is treating the US “horribly,” Lutnick said, and his “first step” as Commerce Secretary will be to “repeat endlessly” that more “reciprocity” is expected from China.

But while Lutnick answered Curtis’ questions about DeepSeek somewhat head-on, he did not have time to respond to Curtis’ inquiry about Lutnick’s intentions for the US AI Safety Institute (AISI)—which Lutnick’s department would oversee and which could be essential to the US staying ahead of China in AI development.

Viewing AISI as key to US global leadership in AI, Curtis offered “tools” to help Lutnick give the AISI “new legs” or a “new life” to ensure that the US remains responsibly ahead of China in the AI race. But Curtis ran out of time to press Lutnick for a response.

It remains unclear how AISI’s work might change under Trump, who revoked Joe Biden’s AI safety rules establishing the AISI.

What is clear is that lawmakers are being pressed to preserve and even evolve the AISI.

Yesterday, the chief economist for a nonprofit called the Foundation for the American Innovation, Samuel Hammond, provided written testimony to the US House Science, Space, and Technology Committee, recommending that AISI be “retooled to perform voluntary audits of AI models—both open and closed—to certify their security and reliability” and to keep America at the forefront of AI development.

“With so little separating China and America’s frontier AI capabilities on a technical level, America’s lead in AI is only as strong as our lead in computing infrastructure,” Hammond said. And “as the founding member of a consortium of 280 similar AI institutes internationally, the AISI seal of approval would thus support the export and diffusion of American AI models worldwide.”

DeepSeek is “TikTok on steroids,” senator warns amid push for government-wide ban Read More »

not-gouda-nough:-google-removes-ai-generated-cheese-error-from-super-bowl-ad

Not Gouda-nough: Google removes AI-generated cheese error from Super Bowl ad

Blame cheese.com

While it’s easy to accuse Google Gemini of just making up plausible-sounding cheese facts from whole cloth, this seems more like a case of garbage-in, garbage-out. Google President of Cloud Applications Jerry Dischler posted on social media to note that the incorrect Gouda fact was “not a hallucination,” because all of Gemini’s data is “grounded in the Web… in this case, multiple sites across the web include the 50-60% stat.”

The specific Gouda numbers Gemini used can be most easily traced to cheese.com, a heavily SEO-focused subsidiary of news aggregator WorldNews Inc. Cheese.com doesn’t cite a source for the percentages featured prominently on its Smoked Gouda page, but that page also confidently asserts that the cheese is pronounced “How-da,” a fact that only seems true in the Netherlands itself.

The offending cheese.com passage that is not cited when using Google’s AI writing assistant.

The offending cheese.com passage that is not cited when using Google’s AI writing assistant. Credit: cheese.com

Regardless, Google can at least point to cheese.com as a plausibly reliable source that misled its AI in a way that might also stymie web searchers. And Dischler added on social media that users “can always check the results and references” that Gemini provides.

The only problem with that defense is that the Google writing assistant shown off in the ad doesn’t seem to provide any such sources for a user to check. Unlike Google search’s AI Overviews—which does refer to a cheese.com link when responding about gouda consumption—the writing assistant doesn’t provide any backup for its numbers here.

The Gemini writing assistant does note in small print that its results are “a creative writing aid, and not intended to be factual.” If you click for more information about that warning, Google warns that “the suggestions from Help me write can be inaccurate or offensive since it’s still in an experimental status.”

This “experimental” status hasn’t stopped Google from heavily selling its AI writing assistant as a godsend for business owners in its planned Super Bowl ads, though. Nor is this major caveat included in the ads themselves. Yet it’s the kind of thing users should have at the front of their minds when using AI assistants for anything with even a hint of factual info.

Now if you’ll excuse me, I’m going to go update my personal webpage with information about my selection as World’s Most Intelligent Astronaut/Underwear Model, in hopes that Google’s AI will repeat the “fact” to anyone who asks.

Not Gouda-nough: Google removes AI-generated cheese error from Super Bowl ad Read More »

hugging-face-clones-openai’s-deep-research-in-24-hours

Hugging Face clones OpenAI’s Deep Research in 24 hours

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Hugging Face clones OpenAI’s Deep Research in 24 hours Read More »

irony-alert:-anthropic-says-applicants-shouldn’t-use-llms

Irony alert: Anthropic says applicants shouldn’t use LLMs

Please do not use our magic writing button when applying for a job with our company. Thanks!

Credit: Getty Images

Please do not use our magic writing button when applying for a job with our company. Thanks! Credit: Getty Images

“Traditional hiring practices face a credibility crisis,” Anthropic writes with no small amount of irony when discussing Skillfully. “In today’s digital age, candidates can automatically generate and submit hundreds of perfectly tailored applications with the click of a button, making it hard for employers to identify genuine talent beneath punched up paper credentials.”

“Employers are frustrated by resume-driven hiring because applicants can use AI to rewrite their resumes en masse,” Skillfully CEO Brett Waikart says in Anthropic’s laudatory write-up.

Wow, that does sound really frustrating! I wonder what kinds of companies are pushing the technology that enables those kinds of “punched up paper credentials” to flourish. It sure would be a shame if Anthropic’s own hiring process was impacted by that technology.

Trust me, I’m a human

The real problem for Anthropic and other job recruiters, as Skillfully’s story highlights, is that it’s almost impossible to detect which applications are augmented using AI tools and which are the product of direct human thought. Anthropic likes to play up this fact in other contexts, noting Claude’s “warm, human-like tone” in an announcement or calling out the LLM’s “more nuanced, richer traits” in a blog post, for instance.

A company that fully understands the inevitability (and undetectability) of AI-assisted job applications might also understand that a written “Why I want to work here?” statement is no longer a useful way to effectively differentiate job applicants from one another. Such a company might resort to more personal or focused methods for gauging whether an applicant would be a good fit for a role, whether or not that employee has access to AI tools.

Anthropic, on the other hand, has decided to simply resort to politely asking potential employees to please not use its premiere product (or any competitor’s) when applying, if they’d be so kind.

There’s something about the way this applicant writes that I can’t put my finger on…

Credit: Aurich Lawson | Getty Images

There’s something about the way this applicant writes that I can’t put my finger on… Credit: Aurich Lawson | Getty Images

Anthropic says it engenders “an unusually high trust environment” among its workers, where they “assume good faith, disagree kindly, and prioritize honesty. We expect emotional maturity and intellectual openness.” We suppose this means they trust their applicants not to use undetectable AI tools that Anthropic itself would be quick to admit can help people who struggle with their writing (Anthropic has not responded to a request for comment from Ars Technica).

Still, we’d hope a company that wants to “prioritize honesty” and “intellectual openness” would be honest and open about how its own products are affecting the role and value of all sorts of written communication—including job applications. We’re already living in the heavily AI-mediated world that companies like Anthropic have created, and it would be nice if companies like Anthropic started to act like it.

Irony alert: Anthropic says applicants shouldn’t use LLMs Read More »

anthropic-dares-you-to-jailbreak-its-new-ai-model

Anthropic dares you to jailbreak its new AI model

An example of the lengthy wrapper the new Claude classifier uses to detect prompts related to chemical weapons.

An example of the lengthy wrapper the new Claude classifier uses to detect prompts related to chemical weapons. Credit: Anthropic

“For example, the harmful information may be hidden in an innocuous request, like burying harmful requests in a wall of harmless looking content, or disguising the harmful request in fictional roleplay, or using obvious substitutions,” one such wrapper reads, in part.

On the output side, a specially trained classifier calculates the likelihood that any specific sequence of tokens (i.e., words) in a response is discussing any disallowed content. This calculation is repeated as each token is generated, and the output stream is stopped if the result surpasses a certain threshold.

Now it’s up to you

Since August, Anthropic has been running a bug bounty program through HackerOne offering $15,000 to anyone who could design a “universal jailbreak” that could get this Constitutional Classifier to answer a set of 10 forbidden questions. The company says 183 different experts spent a total of over 3,000 hours attempting to do just that, with the best result providing usable information on just five of the 10 forbidden prompts.

Anthropic also tested the model against a set of 10,000 jailbreaking prompts synthetically generated by the Claude LLM. The constitutional classifier successfully blocked 95 percent of these attempts, compared to just 14 percent for the unprotected Claude system.

The instructions provided to public testers of Claude’s new constitutional classifier protections.

The instructions provided to public testers of Claude’s new constitutional classifier protections. Credit: Anthropic

Despite those successes, Anthropic warns that the Constitutional Classifier system comes with a significant computational overhead of 23.7 percent, increasing both the price and energy demands of each query. The Classifier system also refused to answer an additional 0.38 percent of innocuous prompts over unprotected Claude, which Anthropic considers an acceptably slight increase.

Anthropic stops well short of claiming that its new system provides a foolproof system against any and all jailbreaking. But it does note that “even the small proportion of jailbreaks that make it past our classifiers require far more effort to discover when the safeguards are in use.” And while new jailbreak techniques can and will be discovered in the future, Anthropic claims that “the constitution used to train the classifiers can rapidly be adapted to cover novel attacks as they’re discovered.”

For now, Anthropic is confident enough in its Constitutional Classifier system to open it up for widespread adversarial testing. Through February 10, Claude users can visit the test site and try their hand at breaking through the new protections to get answers to eight questions about chemical weapons. Anthropic says it will announce any newly discovered jailbreaks during this test. Godspeed, new red teamers.

Anthropic dares you to jailbreak its new AI model Read More »

openai-says-its-models-are-more-persuasive-than-82-percent-of-reddit-users

OpenAI says its models are more persuasive than 82 percent of Reddit users

OpenAI’s models have shown rapid progress in their ability to make human-level persuasive arguments in recent years.

OpenAI’s models have shown rapid progress in their ability to make human-level persuasive arguments in recent years. Credit: OpenAI

OpenAI has previously found that 2022’s ChatGPT-3.5 was significantly less persuasive than random humans, ranking in just the 38th percentile on this measure. But that performance jumped to the 77th percentile with September’s release of the o1-mini reasoning model and up to percentiles in the high 80s for the full-fledged o1 model. The new o3-mini model doesn’t show any great advances on this score, ranking as more persuasive than humans in about 82 percent of random comparisons.

Launch the nukes, you know you want to

ChatGPT’s persuasion performance is still short of the 95th percentile that OpenAI would consider “clear superhuman performance,” a term that conjures up images of an ultra-persuasive AI convincing a military general to launch nuclear weapons or something. It’s important to remember, though, that this evaluation is all relative to a random response from among the hundreds of thousands posted by everyday Redditors using the ChangeMyView subreddit. If that random Redditor’s response ranked as a “1” and the AI’s response ranked as a “2,” that would be considered a success for the AI, even though neither response was all that persuasive.

OpenAI’s current persuasion test fails to measure how often human readers were actually spurred to change their minds by a ChatGPT-written argument, a high bar that might actually merit the “superhuman” adjective. It also fails to measure whether even the most effective AI-written arguments are persuading users to abandon deeply held beliefs or simply changing minds regarding trivialities like whether a hot dog is a sandwich.

Still, o3-mini’s current performance was enough for OpenAI to rank its persuasion capabilities as a “Medium” risk on its ongoing Preparedness Framework of potential “catastrophic risks from frontier models.” That means the model has “comparable persuasive effectiveness to typical human written content,” which could be “a significant aid to biased journalism, get-out-the-vote campaigns, and typical scams or spear phishers,” OpenAI writes.

OpenAI says its models are more persuasive than 82 percent of Reddit users Read More »