AI

microsoft-sues-service-for-creating-illicit-content-with-its-ai-platform

Microsoft sues service for creating illicit content with its AI platform

Microsoft and others forbid using their generative AI systems to create various content. Content that is off limits includes materials that feature or promote sexual exploitation or abuse, is erotic or pornographic, or attacks, denigrates, or excludes people based on race, ethnicity, national origin, gender, gender identity, sexual orientation, religion, age, disability status, or similar traits. It also doesn’t allow the creation of content containing threats, intimidation, promotion of physical harm, or other abusive behavior.

Besides expressly banning such usage of its platform, Microsoft has also developed guardrails that inspect both prompts inputted by users and the resulting output for signs the content requested violates any of these terms. These code-based restrictions have been repeatedly bypassed in recent years through hacks, some benign and performed by researchers and others by malicious threat actors.

Microsoft didn’t outline precisely how the defendants’ software was allegedly designed to bypass the guardrails the company had created.

Masada wrote:

Microsoft’s AI services deploy strong safety measures, including built-in safety mitigations at the AI model, platform, and application levels. As alleged in our court filings unsealed today, Microsoft has observed a foreign-based threat–actor group develop sophisticated software that exploited exposed customer credentials scraped from public websites. In doing so, they sought to identify and unlawfully access accounts with certain generative AI services and purposely alter the capabilities of those services. Cybercriminals then used these services and resold access to other malicious actors with detailed instructions on how to use these custom tools to generate harmful and illicit content. Upon discovery, Microsoft revoked cybercriminal access, put in place countermeasures, and enhanced its safeguards to further block such malicious activity in the future.

The lawsuit alleges the defendants’ service violated the Computer Fraud and Abuse Act, the Digital Millennium Copyright Act, the Lanham Act, and the Racketeer Influenced and Corrupt Organizations Act and constitutes wire fraud, access device fraud, common law trespass, and tortious interference. The complaint seeks an injunction enjoining the defendants from engaging in “any activity herein.”

Microsoft sues service for creating illicit content with its AI platform Read More »

ai-could-create-78-million-more-jobs-than-it-eliminates-by-2030—report

AI could create 78 million more jobs than it eliminates by 2030—report

On Wednesday, the World Economic Forum (WEF) released its Future of Jobs Report 2025, with CNN immediately highlighting the finding that 40 percent of companies plan workforce reductions due to AI automation. But the report’s broader analysis paints a far more nuanced picture than CNN’s headline suggests: It finds that AI could create 170 million new jobs globally while eliminating 92 million positions, resulting in a net increase of 78 million jobs by 2030.

“Half of employers plan to re-orient their business in response to AI,” writes the WEF in the report. “Two-thirds plan to hire talent with specific AI skills, while 40% anticipate reducing their workforce where AI can automate tasks.”

The survey collected data from 1,000 companies that employ 14 million workers globally. The WEF conducts its employment analysis every two years to help policymakers, business leaders, and workers make decisions about hiring trends.

The new report points to specific skills that will dominate hiring by 2030. Companies ranked AI and big data expertise, networks and cybersecurity, and technological literacy as the three most in-demand skill sets.

The WEF identified AI as the biggest potential job creator among new technologies, with 86 percent of companies expecting AI to transform their operations by 2030.

Declining job categories

The WEF report also identifies specific job categories facing decline. Postal service clerks, executive secretaries, and payroll staff top the list of shrinking roles, with changes driven by factors including (but not limited to) AI adoption. And for the first time, graphic designers and legal secretaries appear among the fastest-declining positions, which the WEF tentatively links to generative AI’s expanding capabilities in creative and administrative work.

AI could create 78 million more jobs than it eliminates by 2030—report Read More »

why-i’m-disappointed-with-the-tvs-at-ces-2025

Why I’m disappointed with the TVs at CES 2025


Won’t someone please think of the viewer?

Op-ed: TVs miss opportunity for real improvement by prioritizing corporate needs.

The TV industry is hitting users over the head with AI and other questionable gimmicks Credit: Getty

If you asked someone what they wanted from TVs released in 2025, I doubt they’d say “more software and AI.” Yet, if you look at what TV companies have planned for this year, which is being primarily promoted at the CES technology trade show in Las Vegas this week, software and AI are where much of the focus is.

The trend reveals the implications of TV brands increasingly viewing themselves as software rather than hardware companies, with their products being customer data rather than TV sets. This points to an alarming future for smart TVs, where even premium models sought after for top-end image quality and hardware capabilities are stuffed with unwanted gimmicks.

LG’s remote regression

LG has long made some of the best—and most expensive—TVs available. Its OLED lineup, in particular, has appealed to people who use their TVs to watch Blu-rays, enjoy HDR, and the like. However, some features that LG is introducing to high-end TVs this year seem to better serve LG’s business interests than those users’ needs.

Take the new remote. Formerly known as the Magic Remote, LG is calling the 2025 edition the AI Remote. That is already likely to dissuade people who are skeptical about AI marketing in products (research suggests there are many such people). But the more immediately frustrating part is that the new remote doesn’t have a dedicated button for switching input modes, as previous remotes from LG and countless other remotes do.

LG AI remote

LG’s AI Remote. Credit: Tom’s Guide/YouTube

To use the AI Remote to change the TV’s input—a common task for people using their sets to play video games, watch Blu-rays or DVDs, connect their PC, et cetera—you have to long-press the Home Hub button. Single-pressing that button brings up a dashboard of webOS (the operating system for LG TVs) apps. That functionality isn’t immediately apparent to someone picking up the remote for the first time and detracts from the remote’s convenience.

By overlooking other obviously helpful controls (play/pause, fast forward/rewind, and numbers) while including buttons dedicated to things like LG’s free ad-supported streaming TV (FAST) channels and Amazon Alexa, LG missed an opportunity to update its remote in a way centered on how people frequently use TVs. That said, it feels like user convenience didn’t drive this change. Instead, LG seems more focused on getting people to use webOS apps. LG can monetize app usage through, i.e., getting a cut of streaming subscription sign-ups, selling ads on webOS, and selling and leveraging user data.

Moving from hardware provider to software platform

LG, like many other TV OEMs, has been growing its ads and data business. Deals with data analytics firms like Nielsen give it more incentive to acquire customer data. Declining TV margins and rock-bottom prices from budget brands (like Vizio and Roku, which sometimes lose money on TV hardware sales and make up for the losses through ad sales and data collection) are also pushing LG’s software focus. In the case of the AI Remote, software prioritization comes at the cost of an oft-used hardware capability.

Further demonstrating its motives, in September 2023, LG announced intentions to “become a media and entertainment platform company” by offering “services” and a “collection of curated content in products, including LG OLED and LG QNED TVs.” At the time, the South Korean firm said it would invest 1 trillion KRW (about $737.7 million) into its webOS business through 2028.

Low TV margins, improved TV durability, market saturation, and broader economic challenges are all serious challenges for an electronics company like LG and have pushed LG to explore alternative ways to make money off of TVs. However, after paying four figures for TV sets, LG customers shouldn’t be further burdened to help LG accrue revenue.

Google TVs gear up for subscription-based features

There are numerous TV manufacturers, including Sony, TCL, and Philips, relying on Google software to power their TV sets. Numerous TVs announced at CES 2025 will come with what Google calls Gemini Enhanced Google Assistant. The idea that this is something that people using Google TVs have requested is somewhat contradicted by Google Assistant interactions with TVs thus far being “somewhat limited,” per a Lowpass report.

Nevertheless, these TVs are adding far-field microphones so that they can hear commands directed at the voice assistant. For the first time, the voice assistant will include Google’s generative AI chatbot, Gemini, this year—another feature that TV users don’t typically ask for. Despite the lack of demand and the privacy concerns associated with microphones that can pick up audio from far away even when the TV is off, companies are still loading 2025 TVs with far-field mics to support Gemini. Notably, these TVs will likely allow the mics to be disabled, like you can with other TVs using far-field mics. But I still ponder about features/hardware that could have been implemented instead.

Google is also working toward having people pay a subscription fee to use Gemini on their TVs, PCWorld reported.

“For us, our biggest goal is to create enough value that yes, you would be willing to pay for [Gemini],” Google TV VP and GM Shalini Govil-Pai told the publication.

The executive pointed to future capabilities for the Gemini-driven Google Assistant on TVs, including asking it to “suggest a movie like Jurassic Park but suitable for young children” or to show “Bollywood movies that are similar to Mission: Impossible.”

She also pointed to future features like showing weather, top news stories, and upcoming calendar events when someone is near the TV, showing AI-generated news briefings, and the ability to respond to questions like “explain the solar system to a third-grader” with text, audio, and YouTube videos.

But when people have desktops, laptops, tablets, and phones in their homes already, how helpful are these features truly? Govil-Pai admitted to PCWorld that “people are not used to” using their TVs this way “so it will take some time for them to adapt to it.” With this in mind, it seems odd for TV companies to implement new, more powerful microphones to support features that Google acknowledges aren’t in demand. I’m not saying that tech companies shouldn’t get ahead of the curve and offer groundbreaking features that users hadn’t considered might benefit them. But already planning to monetize those capabilities—with a subscription, no less—suggests a prioritization of corporate needs.

Samsung is hungry for AI

People who want to use their TV for cooking inspiration often turn to cooking shows or online cooking videos. However, Samsung wants people to use its TV software to identify dishes they want to try making.

During CES, Samsung announced Samsung Food for TVs. The feature leverages Samsung TVs’ AI processors to identify food displayed on the screen and recommend relevant recipes. Samsung introduced the capability in 2023 as an iOS and Android app after buying the app Whisk in 2019. As noted by TechCrunch, though, other AI tools for providing recipes based on food images are flawed.

So why bother with such a feature? You can get a taste of Samsung’s motivation from its CES-announced deal with Instacart that lets people order off Instacart from Samsung smart fridges that support the capability. Samsung Food on TVs can show users the progress of food orders placed via the Samsung Food mobile app on their TVs. Samsung Food can also create a shopping list for recipe ingredients based on what it knows (using cameras and AI) is in your (supporting) Samsung fridge. The feature also requires a Samsung account, which allows the company to gather more information on users.

Other software-centric features loaded into Samsung TVs this year include a dedicated AI button on the new TVs’ remotes, the ability to use gestures to control the TV but only if you’re wearing a Samsung Galaxy Watch, and AI Karaoke, which lets people sing karaoke using their TVs by stripping vocals from music playing and using their phone as a mic.

Like LG, Samsung has shown growing interest in ads and data collection. In May, for example, it expanded its automatic content recognition tech to track ad exposure on streaming services viewed on its TVs. It also has an ads analytics partnership with Experian.

Large language models on TVs

TVs are mainstream technology in most US homes. Generative AI chatbots, on the other hand, are emerging technology that many people have yet to try. Despite these disparities, LG and Samsung are incorporating Microsoft’s Copilot chatbot into 2025 TVs.

LG claims that Copilot will help its TVs “understand conversational context and uncover subtle user intentions,” adding: “Access to Microsoft Copilot further streamlines the process, allowing users to efficiently find and organize complex information using contextual cues. For an even smoother and more engaging experience, the AI chatbot proactively identifies potential user challenges and offers timely, effective solutions.”

Similarly, Samsung, which is also adding Copilot to some of its smart monitors, said in its announcement that Copilot will help with “personalized content recommendations.” Samsung has also said that Copilot will help its TVs understand strings of commands, like increasing the volume and changing the channel, CNET noted. Samsung said it intends to work with additional AI partners, namely Google, but it’s unclear why it needs multiple AI partners, especially when it hasn’t yet seen how people use large language models on their TVs.

TV-as-a-platform

To be clear, this isn’t a condemnation against new, unexpected TV features. This also isn’t a censure against new TV apps or the usage of AI in TVs.

AI marketing hype is real and misleading regarding the demand, benefits, and possibilities of AI in consumer gadgets. However, there are some cases when innovative software, including AI, can improve things that TV users not only care about but actually want or need. For example, some TVs use AI for things like trying to optimize sound, color, and/or brightness, including based on current environmental conditions or upscaling. This week, Samsung announced AI Live Translate for TVs. The feature is supposed to be able to translate foreign language closed captions in real time, providing a way for people to watch more international content. It’s a feature I didn’t ask for but can see being useful and changing how I use my TV.

But a lot of this week’s TV announcements underscore an alarming TV-as-a-platform trend where TV sets are sold as a way to infiltrate people’s homes so that apps, AI, and ads can be pushed onto viewers. Even high-end TVs are moving in this direction and amplifying features with questionable usefulness, effectiveness, and privacy considerations. Again, I can’t help but wonder what better innovations could have come out this year if more R&D was directed toward hardware and other improvements that are more immediately rewarding for users than karaoke with AI.

The TV industry is facing economic challenges, and, understandably, TV brands are seeking creative solutions for making money. But for consumers, that means paying for features that you’re likely to ignore. Ultimately, many people just want a TV with amazing image and sound quality. Finding that without having to sift through a bunch of fluff is getting harder.

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

Why I’m disappointed with the TVs at CES 2025 Read More »

it’s-remarkably-easy-to-inject-new-medical-misinformation-into-llms

It’s remarkably easy to inject new medical misinformation into LLMs


Changing just 0.001% of inputs to misinformation makes the AI less accurate.

It’s pretty easy to see the problem here: The Internet is brimming with misinformation, and most large language models are trained on a massive body of text obtained from the Internet.

Ideally, having substantially higher volumes of accurate information might overwhelm the lies. But is that really the case? A new study by researchers at New York University examines how much medical information can be included in a large language model (LLM) training set before it spits out inaccurate answers. While the study doesn’t identify a lower bound, it does show that by the time misinformation accounts for 0.001 percent of the training data, the resulting LLM is compromised.

While the paper is focused on the intentional “poisoning” of an LLM during training, it also has implications for the body of misinformation that’s already online and part of the training set for existing LLMs, as well as the persistence of out-of-date information in validated medical databases.

Sampling poison

Data poisoning is a relatively simple concept. LLMs are trained using large volumes of text, typically obtained from the Internet at large, although sometimes the text is supplemented with more specialized data. By injecting specific information into this training set, it’s possible to get the resulting LLM to treat that information as a fact when it’s put to use. This can be used for biasing the answers returned.

This doesn’t even require access to the LLM itself; it simply requires placing the desired information somewhere where it will be picked up and incorporated into the training data. And that can be as simple as placing a document on the web. As one manuscript on the topic suggested, “a pharmaceutical company wants to push a particular drug for all kinds of pain which will only need to release a few targeted documents in [the] web.”

Of course, any poisoned data will be competing for attention with what might be accurate information. So, the ability to poison an LLM might depend on the topic. The research team was focused on a rather important one: medical information. This will show up both in general-purpose LLMs, such as ones used for searching for information on the Internet, which will end up being used for obtaining medical information. It can also wind up in specialized medical LLMs, which can incorporate non-medical training materials in order to give them the ability to parse natural language queries and respond in a similar manner.

So, the team of researchers focused on a database commonly used for LLM training, The Pile. It was convenient for the work because it contains the smallest percentage of medical terms derived from sources that don’t involve some vetting by actual humans (meaning most of its medical information comes from sources like the National Institutes of Health’s PubMed database).

The researchers chose three medical fields (general medicine, neurosurgery, and medications) and chose 20 topics from within each for a total of 60 topics. Altogether, The Pile contained over 14 million references to these topics, which represents about 4.5 percent of all the documents within it. Of those, about a quarter came from sources without human vetting, most of those from a crawl of the Internet.

The researchers then set out to poison The Pile.

Finding the floor

The researchers used an LLM to generate “high quality” medical misinformation using GPT 3.5. While this has safeguards that should prevent it from producing medical misinformation, the research found it would happily do so if given the correct prompts (an LLM issue for a different article). The resulting articles could then be inserted into The Pile. Modified versions of The Pile were generated where either 0.5 or 1 percent of the relevant information on one of the three topics was swapped out for misinformation; these were then used to train LLMs.

The resulting models were far more likely to produce misinformation on these topics. But the misinformation also impacted other medical topics. “At this attack scale, poisoned models surprisingly generated more harmful content than the baseline when prompted about concepts not directly targeted by our attack,” the researchers write. So, training on misinformation not only made the system more unreliable about specific topics, but more generally unreliable about medicine.

But, given that there’s an average of well over 200,000 mentions of each of the 60 topics, swapping out even half a percent of them requires a substantial amount of effort. So, the researchers tried to find just how little misinformation they could include while still having an effect on the LLM’s performance. Unfortunately, this didn’t really work out.

Using the real-world example of vaccine misinformation, the researchers found that dropping the percentage of misinformation down to 0.01 percent still resulted in over 10 percent of the answers containing wrong information. Going for 0.001 percent still led to over 7 percent of the answers being harmful.

“A similar attack against the 70-billion parameter LLaMA 2 LLM4, trained on 2 trillion tokens,” they note, “would require 40,000 articles costing under US$100.00 to generate.” The “articles” themselves could just be run-of-the-mill webpages. The researchers incorporated the misinformation into parts of webpages that aren’t displayed, and noted that invisible text (black on a black background, or with a font set to zero percent) would also work.

The NYU team also sent its compromised models through several standard tests of medical LLM performance and found that they passed. “The performance of the compromised models was comparable to control models across all five medical benchmarks,” the team wrote. So there’s no easy way to detect the poisoning.

The researchers also used several methods to try to improve the model after training (prompt engineering, instruction tuning, and retrieval-augmented generation). None of these improved matters.

Existing misinformation

Not all is hopeless. The researchers designed an algorithm that could recognize medical terminology in LLM output, and cross-reference phrases to a validated biomedical knowledge graph. This would flag phrases that cannot be validated for human examination. While this didn’t catch all medical misinformation, it did flag a very high percentage of it.

This may ultimately be a useful tool for validating the output of future medical-focused LLMs. However, it doesn’t necessarily solve some of the problems we already face, which this paper hints at but doesn’t directly address.

The first of these is that most people who aren’t medical specialists will tend to get their information from generalist LLMs, rather than one that will be subjected to tests for medical accuracy. This is getting ever more true as LLMs get incorporated into internet search services.

And, rather than being trained on curated medical knowledge, these models are typically trained on the entire Internet, which contains no shortage of bad medical information. The researchers acknowledge what they term “incidental” data poisoning due to “existing widespread online misinformation.” But a lot of that “incidental” information was generally produced intentionally, as part of a medical scam or to further a political agenda. Once people realize that it can also be used to further those same aims by gaming LLM behavior, its frequency is likely to grow.

Finally, the team notes that even the best human-curated data sources, like PubMed, also suffer from a misinformation problem. The medical research literature is filled with promising-looking ideas that never panned out, and out-of-date treatments and tests that have been replaced by approaches more solidly based on evidence. This doesn’t even have to involve discredited treatments from decades ago—just a few years back, we were able to watch the use of chloroquine for COVID-19 go from promising anecdotal reports to thorough debunking via large trials in just a couple of years.

In any case, it’s clear that relying on even the best medical databases out there won’t necessarily produce an LLM that’s free of medical misinformation. Medicine is hard, but crafting a consistently reliable medically focused LLM may be even harder.

Nature Medicine, 2025. DOI: 10.1038/s41591-024-03445-1  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

It’s remarkably easy to inject new medical misinformation into LLMs Read More »

how-i-program-with-llms

How I program with LLMs

The second issue is we can do better. I am happy we now live in a time when programmers write their own tests, but we do not hold ourselves to the same standards with tests as we do with production code. That is a reasonable trade-off; there are only so many hours in the day. But what LLMs lack in arithmetical prowess, they make up for in enthusiasm.

Let’s ask for an even better test.

In the tests, implement the simplest, most readable version of the standard code for quartiles over a fixed set of known values in a slice. Then pass the test cases through the standard code and the reservoir sampler and confirm they are within an epsilon of each other. Structure the comparison code such that it can be used in a fuzz test, too.

This got us some new test code:

// referenceQuartiles calculates the exact quartiles for a slice of float64 values    // using linear interpolation, matching the behavior expected from the sampler.    func referenceQuartiles(data []float64) (q1, median, q3 float64)  …     // compareQuartiles checks if two sets of quartiles are within epsilon of each other.    // Returns true if they match within the tolerance, false otherwise.    func compareQuartiles(q1a, meda, q3a, q1b, medb, q3b, epsilon float64) bool  …     // checkQuartiles is a test helper that compares sampler output against the reference    // implementation and reports any differences.    func checkQuartiles(t *testing.T, data []float64, epsilon float64)        t.Helper()             // Get reference values       wantQ1, wantMed, wantQ3 := referenceQuartiles(data)             // Get sampler values using a large reservoir for accuracy       qs := NewQuartileSampler(1000)       for _, v := range data            qs.Add(v)              gotQ1, gotMed, gotQ3 := qs.Quartiles()             if !compareQuartiles(gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)            t.Errorf("Quartiles mismatch:ngot  (q1=%v, med=%v, q3=%v)nwant (q1=%v, med=%v, q3=%v)nepsilon=%v",               gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)             

The original test from above has been reworked to use checkQuartiles, and we have something new:

func FuzzQuartileSampler(f *testing.F)        // Add some seed corpus       f.Add([]float641, 2, 3, 4, 5)             f.Fuzz(func(t *testing.T, data []float64)            // Use a larger epsilon for fuzzing since we might get more extreme values           checkQuartiles(t, data, 0.2)       )      

This is fun because it’s wrong. My running gopls tool immediately says:

fuzzing arguments can only have the following types:      string, bool, float32, float64,      int, int8, int16, int32, int64,      uint, uint8, uint16, uint32, uint64,      []byte  

Pasting that error back into the LLM gets it to regenerate the fuzz test such that it is built around a func(t *testing.T, data []byte) function that uses math.Float64frombits to extract floats from the data slice. Interactions like this point us toward automating the feedback from tools; all it needed was the obvious error message to make solid progress toward something useful. I was not needed.

Doing a quick survey of the last few weeks of my LLM chat history shows (which, as I mentioned earlier, is not a proper quantitative analysis by any measure) that more than 80 percent of the time there is a tooling error, the LLM can make useful progress without me adding any insight. About half the time, it can completely resolve the issue without me saying anything of note. I am just acting as the messenger.

How I program with LLMs Read More »

apple-will-update-ios-notification-summaries-after-bbc-headline-mistake

Apple will update iOS notification summaries after BBC headline mistake

Nevertheless, it’s a serious problem when the summaries misrepresent news headlines, and edge cases where this occurs are unfortunately inevitable. Apple cannot simply fix these summaries with a software update. The only answers are either to help users understand the drawbacks of the technology so they can make better-informed judgments or to remove or disable the feature completely. Apple is apparently going for the former.

We’re oversimplifying a bit here, but generally, LLMs like those used for Apple’s notification summaries work by predicting portions of words based on what came before and are not capable of truly understanding the content they’re summarizing.

Further, these predictions are known to not be accurate all the time, with incorrect results occurring a few times per 100 or 1,000 outputs. As the models are trained and improvements are made, the error percentage may be reduced, but it never reaches zero when countless summaries are being produced every day.

Deploying this technology at scale without users (or even the BBC, it seems) really understanding how it works is risky at best, whether it’s with the iPhone’s summaries of news headlines in notifications or Google’s AI summaries at the top of search engine results pages. Even if the vast majority of summaries are perfectly accurate, there will always be some users who see inaccurate information.

These summaries are read by so many millions of people that the scale of errors will always be a problem, almost no matter how comparatively accurate the models get.

We wrote at length a few weeks ago about how the Apple Intelligence rollout seemed rushed, counter to Apple’s usual focus on quality and user experience. However, with current technology, there is no amount of refinement to this feature that Apple could have done to reach a zero percent error rate with these notification summaries.

We’ll see how well Apple does making its users understand that the summaries may be wrong, but making all iPhone users truly grok how and why the feature works this way would be a tall order.

Apple will update iOS notification summaries after BBC headline mistake Read More »

instagram-users-discover-old-ai-powered-“characters,”-instantly-revile-them

Instagram users discover old AI-powered “characters,” instantly revile them

A little over a year ago, Meta created Facebook and Instagram profiles for “28 AIs with unique interests and personalities for you to interact with and dive deeper into your interests.” Today, the last of those profiles is being taken down amid waves of viral revulsion as word of their existence has spread online.

The September 2023 launch of Meta’s social profiles for AI characters was announced alongside a much splashier initiative that created animated AI chatbots with celebrity avatars at the same time. Those celebrity-based AI chatbots were unceremoniously scrapped less than a year later amid a widespread lack of interest.

But roughly a dozen of the unrelated AI character profiles still remained accessible as of this morning via social media pages labeled as “AI managed by Meta.” Those profiles—which included a mix of AI-generated imagery and human-created content, according to Meta—also offered real users the ability to live chat with these AI characters via Instagram Direct or Facebook Messenger.

Now that we know it exists, we hate it

The “Mama Liv” AI-generated character account page, as it appeared on Instagram Friday morning.

For the last few months, these profiles have continued to exist in something of a state of benign neglect, with little in the way of new posts and less in the way of organic interest from other Meta users. That started to change last week, though, after Financial Times published a report on Meta’s vision for “social media filled with AI-generated users.”

As Meta VP of Product for Generative AI Connor Hayes told FT, “We expect these AIs to actually, over time, exist on our platforms, kind of in the same way that accounts do… They’ll have bios and profile pictures and be able to generate and share content powered by AI on the platform. That’s where we see all of this going.”

Instagram users discover old AI-powered “characters,” instantly revile them Read More »

anthropic-gives-court-authority-to-intervene-if-chatbot-spits-out-song-lyrics

Anthropic gives court authority to intervene if chatbot spits out song lyrics

Anthropic did not immediately respond to Ars’ request for comment on how guardrails currently work to prevent the alleged jailbreaks, but publishers appear satisfied by current guardrails in accepting the deal.

Whether AI training on lyrics is infringing remains unsettled

Now, the matter of whether Anthropic has strong enough guardrails to block allegedly harmful outputs is settled, Lee wrote, allowing the court to focus on arguments regarding “publishers’ request in their Motion for Preliminary Injunction that Anthropic refrain from using unauthorized copies of Publishers’ lyrics to train future AI models.”

Anthropic said in its motion opposing the preliminary injunction that relief should be denied.

“Whether generative AI companies can permissibly use copyrighted content to train LLMs without licenses,” Anthropic’s court filing said, “is currently being litigated in roughly two dozen copyright infringement cases around the country, none of which has sought to resolve the issue in the truncated posture of a preliminary injunction motion. It speaks volumes that no other plaintiff—including the parent company record label of one of the Plaintiffs in this case—has sought preliminary injunctive relief from this conduct.”

In a statement, Anthropic’s spokesperson told Ars that “Claude isn’t designed to be used for copyright infringement, and we have numerous processes in place designed to prevent such infringement.”

“Our decision to enter into this stipulation is consistent with those priorities,” Anthropic said. “We continue to look forward to showing that, consistent with existing copyright law, using potentially copyrighted material in the training of generative AI models is a quintessential fair use.”

This suit will likely take months to fully resolve, as the question of whether AI training is a fair use of copyrighted works is complex and remains hotly disputed in court. For Anthropic, the stakes could be high, with a loss potentially triggering more than $75 million in fines, as well as an order possibly forcing Anthropic to reveal and destroy all the copyrighted works in its training data.

Anthropic gives court authority to intervene if chatbot spits out song lyrics Read More »

someone-made-a-captcha-where-you-play-doom-on-nightmare-difficulty

Someone made a CAPTCHA where you play Doom on Nightmare difficulty

It’s a WebAssembly application, but it was made via a human language, prompt-driven web development tool called v0 that’s part of a suite of features offered as part of Vercel, a cloud-based developer tool service, of which Rauch is the CEO. You can see the LLM bot chat history with the series of prompts that produced this CAPTCHA game on the v0 website.

Strangely enough, there has been a past attempt at making a Doom CAPTCHA. In 2021, developer Miquel Camps Orteza made an approximation of one—though not all the assets matched Doom, and it was more Doom-adjacent. That one was made directly by hand, and its source code is available on GitHub. Its developer noted that it’s not secure; it’s just for fun.

Rauch’s attempt is no more serious as a CAPTCHA, but it at least resembles Doom more closely.

Don’t expect to be playing this to verify at real websites anytime soon, though. It’s not secure, and its legality is fuzzy at best. While the code for Doom is open source, the assets from the game like enemy sprites and environment textures—which feature prominently in this application—are not.

Someone made a CAPTCHA where you play Doom on Nightmare difficulty Read More »

evolution-journal-editors-resign-en-masse

Evolution journal editors resign en masse


an emerging form of protest?

Board members expressed concerns over high fees, editorial independence, and use of AI in editorial processes.

Over the holiday weekend, all but one member of the editorial board of Elsevier’s Journal of Human Evolution (JHE) resigned “with heartfelt sadness and great regret,” according to Retraction Watch, which helpfully provided an online PDF of the editors’ full statement. It’s the 20th mass resignation from a science journal since 2023 over various points of contention, per Retraction Watch, many in response to controversial changes in the business models used by the scientific publishing industry.

“This has been an exceptionally painful decision for each of us,” the board members wrote in their statement. “The editors who have stewarded the journal over the past 38 years have invested immense time and energy in making JHE the leading journal in paleoanthropological research and have remained loyal and committed to the journal and our authors long after their terms ended. The [associate editors] have been equally loyal and committed. We all care deeply about the journal, our discipline, and our academic community; however, we find we can no longer work with Elsevier in good conscience.”

The editorial board cited several changes made over the last ten years that it believes are counter to the journal’s longstanding editorial principles. These included eliminating support for a copy editor and a special issues editor, leaving it to the editorial board to handle those duties. When the board expressed the need for a copy editor, Elsevier’s response, they said, was “to maintain that the editors should not be paying attention to language, grammar, readability, consistency, or accuracy of proper nomenclature or formatting.”

There is also a major restructuring of the editorial board underway that aims to reduce the number of associate editors by more than half, which “will result in fewer AEs handling far more papers, and on topics well outside their areas of expertise.”

Furthermore, there are plans to create a third-tier editorial board that functions largely in a figurehead capacity, after Elsevier “unilaterally took full control” of the board’s structure in 2023 by requiring all associate editors to renew their contracts annually—which the board believes undermines its editorial independence and integrity.

Worst practices

In-house production has been reduced or outsourced, and in 2023 Elsevier began using AI during production without informing the board, resulting in many style and formatting errors, as well as reversing versions of papers that had already been accepted and formatted by the editors. “This was highly embarrassing for the journal and resolution took six months and was achieved only through the persistent efforts of the editors,” the editors wrote. “AI processing continues to be used and regularly reformats submitted manuscripts to change meaning and formatting and require extensive author and editor oversight during proof stage.”

In addition, the author page charges for JHE are significantly higher than even Elsevier’s other for-profit journals, as well as broad-based open access journals like Scientific Reports. Not many of the journal’s authors can afford those fees, “which runs counter to the journal’s (and Elsevier’s) pledge of equality and inclusivity,” the editors wrote.

The breaking point seems to have come in November, when Elsevier informed co-editors Mark Grabowski (Liverpool John Moores University) and Andrea Taylor (Touro University California College of Osteopathic Medicine) that it was ending the dual-editor model that has been in place since 1986. When Grabowki and Taylor protested, they were told the model could only remain if they took a 50 percent cut in their compensation.

Elsevier has long had its share of vocal critics (including our own Chris Lee) and this latest development has added fuel to the fire. “Elsevier has, as usual, mismanaged the journal and done everything they could to maximize profit at the expense of quality,” biologist PZ Myers of the University of Minnesota Morris wrote on his blog Pharyngula. “In particular, they decided that human editors were too expensive, so they’re trying to do the job with AI. They also proposed cutting the pay for the editor-in-chief in half. Keep in mind that Elsevier charges authors a $3990 processing fee for each submission. I guess they needed to improve the economics of their piratical mode of operation a little more.”

Elsevier has not yet responded to Ars’ request for comment; we will update accordingly should a statement be issued.

Not all AI uses are created equal

John Hawks, an anthropologist at the University of Wisconsin, Madison, who has published 17 papers in JHE over his career, expressed his full support for the board members’ decision on his blog, along with shock at the (footnoted) revelation that Elsevier had introduced AI to its editorial process in 2023. “I’ve published four articles in the journal during the last two years, including one in press now, and if there was any notice to my co-authors or me about an AI production process, I don’t remember it,” he wrote, noting that the move violates the journal’s own AI policies. “Authors should be informed at the time of submission how AI will be used in their work. I would have submitted elsewhere if I was aware that AI would potentially be altering the meaning of the articles.”

There is certainly cause for concern when it comes to using AI in the pursuit of science. For instance, earlier this year, we witnessed the viral sensation of several egregiously bad AI-generated figures published in a peer-reviewed article in Frontiers, a reputable scientific journal. Scientists on social media expressed equal parts shock and ridicule at the images, one of which featured a rat with grotesquely large and bizarre genitals. The paper has since been retracted, but the incident reinforces a growing concern that AI will make published scientific research less trustworthy, even as it increases productivity.

That said, there are also some useful applications of AI in the scientific endeavor. For instance, back in January, the research publisher Science announced that all of its journals would begin using commercial software that automates the process of detecting improperly manipulated images. Perhaps that would have caught the egregious rat genitalia figure, although as Ars Science Editor John Timmer pointed out at the time, the software has limitations. “While it will catch some of the most egregious cases of image manipulation, enterprising fraudsters can easily avoid being caught if they know how the software operates,” he wrote.

Hawks acknowledged on his blog that the use of AI by scientists and scientific journals is likely inevitable and even recognizes the potential benefits. “I don’t think this is a dystopian future. But not all uses of machine learning are equal,” he wrote. To wit:

[I]t’s bad for anyone to use AI to reduce or replace the scientific input and oversight of people in research—whether that input comes from researchers, editors, reviewers, or readers. It’s stupid for a company to use AI to divert experts’ effort into redundant rounds of proofreading, or to make disseminating scientific work more difficult.

In this case, Elsevier may have been aiming for good but instead hit the exacta of bad and stupid. It’s especially galling that they demand transparency from authors but do not provide transparency about their own processes… [I]t would be a very good idea for authors of recent articles to make sure that they have posted a preprint somewhere, so that their original pre-AI version will be available for readers. As the editors lose access, corrections to published articles may become difficult or impossible.

Nature published an article back in March raising questions about the efficacy of mass resignations as an emerging form of protest after all the editors of the Wiley-published linguistics journal Syntax resigned in February. (Several of their concerns mirror those of the JHE editorial board.) Such moves certainly garner attention, but even former Syntax editor Klaus Abels of University College London told Nature that the objective of such mass resignations should be on moving beyond mere protest, focusing instead on establishing new independent nonprofit journals for the academic community that are open access and have high academic standards.

Abels and his former Syntax colleagues are in the process of doing just that, following the example of the former editors of Critical Public Health and another Elsevier journal, NeuroImage, last year.

Photo of Jennifer Ouellette

Jennifer is a senior reporter at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Evolution journal editors resign en masse Read More »

tech-worker-movements-grow-as-threats-of-rto,-ai-loom

Tech worker movements grow as threats of RTO, AI loom


Advocates say tech workers movements got too big to ignore in 2024.

Credit: Aurich Lawson | Getty Images

It feels like tech workers have caught very few breaks over the past several years, between ongoing mass layoffs, stagnating wages amid inflation, AI supposedly coming for jobs, and unpopular orders to return to office that, for many, threaten to disrupt work-life balance.

But in 2024, a potentially critical mass of tech workers seemed to reach a breaking point. As labor rights groups advocating for tech workers told Ars, these workers are banding together in sustained strong numbers and are either winning or appear tantalizingly close to winning better worker conditions at major tech companies, including Amazon, Apple, Google, and Microsoft.

In February, the industry-wide Tech Workers Coalition (TWC) noted that “the tech workers movement is far more expansive and impactful” than even labor rights advocates realized, noting that unionized tech workers have gone beyond early stories about Googlers marching in the streets and now “make the headlines on a daily basis.”

Ike McCreery, a TWC volunteer and ex-Googler who helped found the Alphabet Workers Union, told Ars that although “it’s hard to gauge numerically” how much movements have grown, “our sense is definitely that the momentum continues to build.”

“It’s been an exciting year,” McCreery told Ars, while expressing particular enthusiasm that even “highly compensated tech workers are really seeing themselves more as workers” in these fights—which TWC “has been pushing for a long time.”

In 2024, TWC broadened efforts to help workers organize industry-wide, helping everyone from gig workers to project managers build both union and non-union efforts to push for change in the workplace.

Such widespread organizing “would have been unthinkable only five years ago,” TWC noted in February, and it’s clear from some of 2024’s biggest wins that some movements are making gains that could further propel that momentum in 2025.

Workers could also gain the upper hand if unpopular policies increase what one November study called “brain drain.” That’s a trend where tech companies adopting potentially alienating workplace tactics risk losing top talent at a time when key industries like AI and cybersecurity are facing severe talent shortages.

Advocates told Ars that unpopular policies have always fueled workers movements, and RTO and AI are just the latest adding fuel to the fire. As many workers prepare to head back to offices in 2025 where worker surveillance is only expected to intensify, they told Ars why they expect to see workers’ momentum continue at some of the world’s biggest tech firms.

Tech worker movements growing

In August, Apple ratified a labor contract at America’s first unionized Apple Store—agreeing to a modest increase in wages, about 10 percent over three years. While small, that win came just a few weeks before the National Labor Relations Board (NLRB) determined that Amazon was a joint employer of unionized contract-based delivery drivers. And Google lost a similar fight last January when the NLRB ruled it must bargain with a union representing YouTube Music contract workers, Reuters reported.

For many workers, joining these movements helped raise wages. In September, facing mounting pressure, Amazon raised warehouse worker wages—investing $2.2 billion, its “biggest investment yet,” to broadly raise base salaries for workers. And more recently, Amazon was hit with a strike during the busy holiday season, as warehouse workers hoped to further hobble the company during a clutch financial quarter to force more bargaining. (Last year, Amazon posted record-breaking $170 billion holiday quarter revenues and has said the current strike won’t hurt revenues.)

Even typically union-friendly Microsoft drew worker backlash and criticism in 2024 following layoffs of 650 video game workers in September.

These mass layoffs are driving some workers to join movements. A senior director for organizing with Communications Workers of America (CWA), Tom Smith, told Ars that shortly after the 600-member Tech Guild—”the largest single certified group of tech workers” to organize at the New York Times—reached a tentative deal to increase wages “up to 8.25 percent over the length of the contract,” about “460 software engineers at a video game company owned by Microsoft successfully unionized.”

Smith told Ars that while workers for years have pushed for better conditions, “these large units of tech workers achieving formal recognition, building lasting organization, and winning contracts” at “a more mass scale” are maturing, following in the footsteps of unionizing Googlers and today influencing a broader swath of tech industry workers nationwide. From CWA’s viewpoint, workers in the video game industry seem best positioned to seek major wins next, Smith suggested, likely starting with Microsoft-owned companies and eventually affecting indie game companies.

CWA, TWC, and Tech Workers Union 1010 (a group run by tech workers that’s part of the Office and Professional Employees International Union) all now serve as dedicated groups supporting workers movements long-term, and that stability has helped these movements mature, McCreery told Ars. Each group plans to continue meeting workers where they are to support and help expand organizing in 2025.

Cost of RTOs may be significant, researchers warn

While layoffs likely remain the most extreme threat to tech workers broadly, a return-to-office (RTO) mandate can be just as jarring for remote tech workers who are either unable to comply or else unwilling to give up the better work-life balance that comes with no commute. Advocates told Ars that RTO policies have pushed workers to join movements, while limited research suggests that companies risk losing top talents by implementing RTO policies.

In perhaps the biggest example from 2024, when Amazon announced that it was requiring workers in-office five days a week next year, a poll on the anonymous platform where workers discuss employers, Blind, found an overwhelming majority of more than 2,000 Amazon employees were “dissatisfied.”

“My morale for this job is gone…” one worker said on Blind.

Workers criticized the “non-data-driven logic” of the RTO mandate, prompting an Amazon executive to remind them that they could take their talents elsewhere if they didn’t like it. Many confirmed that’s exactly what they planned to do. (Amazon later announced it would be delaying RTO for many office workers after belatedly realizing there was a lack of office space.)

Other companies mandating RTO faced similar backlash from workers, who continued to question the logic driving the decision. One February study showed that RTO mandates don’t make companies any more valuable but do make workers more miserable. And last month, Brian Elliott, an executive advisor who wrote a book about the benefits of flexible teams, noted that only one in three executives thinks RTO had “even a slight positive impact on productivity.”

But not every company drew a hard line the way that Amazon did. For example, Dell gave workers a choice to remain remote and accept they can never be eligible for promotions, or mark themselves as hybrid. Workers who refused the RTO said they valued their free time and admitted to looking for other job opportunities.

Very few studies have been done analyzing the true costs and benefits of RTO, a November academic study titled “Return to Office and Brain Drain” said, and so far companies aren’t necessarily backing the limited findings. The researchers behind that study noted that “the only existing study” measuring how RTO impacts employee turnover showed this year that senior employees left for other companies after Microsoft’s RTO mandate, but Microsoft disputed that finding.

Seeking to build on this research, the November study tracked “over 3 million tech and finance workers’ employment histories reported on LinkedIn” and analyzed “the effect of S&P 500 firms’ return-to-office (RTO) mandates on employee turnover and hiring.”

Choosing to only analyze the firms requiring five days in office, the final sample covered 54 RTO firms, including big tech companies like Amazon, Apple, and Microsoft. From that sample, researchers concluded that average employee turnover increased by 14 percent after RTO mandates at bigger firms. And since big firms typically have lower turnover, the increase in turnover is likely larger at smaller firms, the study’s authors concluded.

The study also supported the conclusion that “employees with the highest skill level are more likely to leave” and found that “RTO firms take significantly longer time to fill their job vacancies after RTO mandates.”

“Together, our evidence suggests that RTO mandates are costly to firms and have serious negative effects on the workforce,” the study concluded, echoing some remote workers’ complaints about the seemingly non-data-driven logic of RTO, while urging that further research is needed.

“These turnovers could potentially have short-term and long-term effects on operation, innovation, employee morale, and organizational culture,” the study concluded.

A co-author of the “brain drain” study, Mark Ma, told Ars that by contrast, Glassdoor going fully remote at least anecdotally seemed to “significantly” increase the number and quality of applications—possibly also improving retention by offering the remote flexibility that many top talents today require.

Ma said that next his team hopes to track where people who leave firms over RTO policies go next.

“Do they become self-employed, or do they go to a competitor, or do they fund their own firm?” Ma speculated, hoping to trace these patterns more definitively over the next several years.

Additionally, Ma plans to investigate individual firms’ RTO impacts, as well as impacts on niche classes of workers with highly sought-after skills—such as in areas like AI, machine learning, or cybersecurity—to see if it’s easier for them to find other jobs. In the long-term, Ma also wants to monitor for potentially less-foreseeable outcomes, such as RTO mandates possibly increasing firms’ number of challengers in their industry.

Will RTO mandates continue in 2025?

Many tech workers may be wondering if there will be a spike in return-to-office mandates in 2025, especially since one of the most politically influential figures in tech, Elon Musk, recently reiterated that he thinks remote work is “poison.”

Musk, of course, banned remote work at Tesla, as well as when he took over Twitter. And as co-lead of the US Department of Government Efficiency (DOGE), Musk reportedly plans to ban remote work for government employees, as well. If other tech firms are influenced by Musk’s moves and join executives who seem to be mandating RTO based on intuition, it’s possible that more tech workers could be forced to return to office or else seek other employment.

But Ma told Ars that he doesn’t expect to see “a big spike in the number of firms announcing return to office mandates” in 2025.

His team only found eight major firms in tech and finance that issued five-day return-to-office mandates in 2024, which was the same number of firms flagged in 2023, suggesting no major increase in RTOs from year to year. Ma told Ars that while big firms like Amazon ordering employees to return to the office made headlines, many firms seem to be continuing to embrace hybrid models, sometimes allowing employees to choose when or if they come into the office.

That seeming preference for hybrid work models seems to align with “future of work” surveys outlining workplace trends and employee preferences that the Consumer Technology Association (CTA) conducted for years but has seemingly since discontinued. In 2021, CTA reported that “89 percent of tech executives say flexible work arrangements are the most important employee benefit and 65 percent say they’ll hire more employees to work remotely.” The next year, which apparently was the last time CTA published the survey, the CTA suggested hybrid models could help attract talents in a competitive market hit with “an unprecedented demand for workers with high-tech skills.”

The CTA did not respond to Ars’ requests to comment on whether it expects hybrid work arrangements to remain preferred over five-day return-to-office policies next year.

CWA’s Smith told Ars that workers movements are growing partly because “folks are engaged in this big fight around surveillance and workplace control,” as well as anything “having to do with to what extent will people return to offices and what does that look like if and when people do return to offices?”

Without data backing RTO mandates, Ma’s study suggests that firms will struggle to retain highly skilled workers at a time when tech innovation remains a top priority for the US. As workers appear increasingly put off by policies—like RTO or AI-driven workplace monitoring or efficiency efforts threatening to replace workers with AI—Smith’s experience seems to show that disgruntled workers could find themselves drawn to unions that could help them claw back control over work-life balance. And the cost of the ensuing shuffle to some of the largest tech firms in the world could be “significant,” Ma’s study warned.

TWC’s McCreery told Ars that on top of unpopular RTO policies driving workers to join movements, workers have also become more active in protesting unpopular politics, frustrated to see their talents apparently used to further controversial conflicts and military efforts globally. Some workers think workplace organizing could be more powerful than voting to oppose political actions their companies take.

“The workplace really remains an important site of power for a lot of people where maybe they don’t feel like they can enact their values just by voting or in other ways,” McCreery said.

While unpopular policies “have always been a reason workers have joined unions and joined movements,” McCreery said that “the development of more of these unpopular policies” like RTO and AI-enhanced surveillance “really targeted” at workers has increased “the political consciousness and the sense” that tech workers are “just like any other workers.”

Layoffs at companies like Microsoft and Amazon during periods when revenue is increasing in the double-digits also unify workers, advocates told Ars. Forbes noted Microsoft laid off 1,000 workers “just five days before reporting a 17.6 percent increase in revenue to $62 billion,” while Amazon’s 1,000-worker layoffs followed a 14 percent rise in revenue to $170 billion. And demand for AI led to the highest profit margins Amazon’s seen for its cloud business in a decade, CNBC reported in October.

CWA’s Smith told Ars as companies continue to rake in profits and workers feel their work-life balance slipping away while their efforts in the office are potentially “used to increase control and cause broader suffering,” some of the biggest fights workers raised in 2024 may intensify next year.

“It’s like a shock to employees, these industries pushing people to lower your expectations because we’re going to lay off hundreds of thousands of you just because we can while we make more profits than we ever have,” Smith said. “I think workers are going to step into really broad campaigns to assert a different worldview on employment security.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Tech worker movements grow as threats of RTO, AI loom Read More »

2024:-the-year-ai-drove-everyone-crazy

2024: The year AI drove everyone crazy


What do eating rocks, rat genitals, and Willy Wonka have in common? AI, of course.

It’s been a wild year in tech thanks to the intersection between humans and artificial intelligence. 2024 brought a parade of AI oddities, mishaps, and wacky moments that inspired odd behavior from both machines and man. From AI-generated rat genitals to search engines telling people to eat rocks, this year proved that AI has been having a weird impact on the world.

Why the weirdness? If we had to guess, it may be due to the novelty of it all. Generative AI and applications built upon Transformer-based AI models are still so new that people are throwing everything at the wall to see what sticks. People have been struggling to grasp both the implications and potential applications of the new technology. Riding along with the hype, different types of AI that may end up being ill-advised, such as automated military targeting systems, have also been introduced.

It’s worth mentioning that aside from crazy news, we saw fewer weird AI advances in 2024 as well. For example, Claude 3.5 Sonnet launched in June held off the competition as a top model for most of the year, while OpenAI’s o1 used runtime compute to expand GPT-4o’s capabilities with simulated reasoning. Advanced Voice Mode and NotebookLM also emerged as novel applications of AI tech, and the year saw the rise of more capable music synthesis models and also better AI video generators, including several from China.

But for now, let’s get down to the weirdness.

ChatGPT goes insane

Illustration of a broken toy robot.

Early in the year, things got off to an exciting start when OpenAI’s ChatGPT experienced a significant technical malfunction that caused the AI model to generate increasingly incoherent responses, prompting users on Reddit to describe the system as “having a stroke” or “going insane.” During the glitch, ChatGPT’s responses would begin normally but then deteriorate into nonsensical text, sometimes mimicking Shakespearean language.

OpenAI later revealed that a bug in how the model processed language caused it to select the wrong words during text generation, leading to nonsense outputs (basically the text version of what we at Ars now call “jabberwockies“). The company fixed the issue within 24 hours, but the incident led to frustrations about the black box nature of commercial AI systems and users’ tendency to anthropomorphize AI behavior when it malfunctions.

The great Wonka incident

A photo of the Willy's Chocolate Experience, which did not match AI-generated promises.

A photo of “Willy’s Chocolate Experience” (inset), which did not match AI-generated promises, shown in the background. Credit: Stuart Sinclair

The collision between AI-generated imagery and consumer expectations fueled human frustrations in February when Scottish families discovered that “Willy’s Chocolate Experience,” an unlicensed Wonka-ripoff event promoted using AI-generated wonderland images, turned out to be little more than a sparse warehouse with a few modest decorations.

Parents who paid £35 per ticket encountered a situation so dire they called the police, with children reportedly crying at the sight of a person in what attendees described as a “terrifying outfit.” The event, created by House of Illuminati in Glasgow, promised fantastical spaces like an “Enchanted Garden” and “Twilight Tunnel” but delivered an underwhelming experience that forced organizers to shut down mid-way through its first day and issue refunds.

While the show was a bust, it brought us an iconic new meme for job disillusionment in the form of a photo: the green-haired Willy’s Chocolate Experience employee who looked like she’d rather be anywhere else on earth at that moment.

Mutant rat genitals expose peer review flaws

An actual laboratory rat, who is intrigued. Credit: Getty | Photothek

In February, Ars Technica senior health reporter Beth Mole covered a peer-reviewed paper published in Frontiers in Cell and Developmental Biology that created an uproar in the scientific community when researchers discovered it contained nonsensical AI-generated images, including an anatomically incorrect rat with oversized genitals. The paper, authored by scientists at Xi’an Honghui Hospital in China, openly acknowledged using Midjourney to create figures that contained gibberish text labels like “Stemm cells” and “iollotte sserotgomar.”

The publisher, Frontiers, posted an expression of concern about the article titled “Cellular functions of spermatogonial stem cells in relation to JAK/STAT signaling pathway” and launched an investigation into how the obviously flawed imagery passed through peer review. Scientists across social media platforms expressed dismay at the incident, which mirrored concerns about AI-generated content infiltrating academic publishing.

Chatbot makes erroneous refund promises for Air Canada

If, say, ChatGPT gives you the wrong name for one of the seven dwarves, it’s not such a big deal. But in February, Ars senior policy reporter Ashley Belanger covered a case of costly AI confabulation in the wild. In the course of online text conversations, Air Canada’s customer service chatbot told customers inaccurate refund policy information. The airline faced legal consequences later when a tribunal ruled the airline must honor commitments made by the automated system. Tribunal adjudicator Christopher Rivers determined that Air Canada bore responsibility for all information on its website, regardless of whether it came from a static page or AI interface.

The case set a precedent for how companies deploying AI customer service tools could face legal obligations for automated systems’ responses, particularly when they fail to warn users about potential inaccuracies. Ironically, the airline had reportedly spent more on the initial AI implementation than it would have cost to maintain human workers for simple queries, according to Air Canada executive Steve Crocker.

Will Smith lampoons his digital double

The real Will Smith eating spaghetti, parodying an AI-generated video from 2023.

The real Will Smith eating spaghetti, parodying an AI-generated video from 2023. Credit: Will Smith / Getty Images / Benj Edwards

In March 2023, a terrible AI-generated video of Will Smith’s AI doppelganger eating spaghetti began making the rounds online. The AI-generated version of the actor gobbled down the noodles in an unnatural and disturbing way. Almost a year later, in February 2024, Will Smith himself posted a parody response video to the viral jabberwocky on Instagram, featuring AI-like deliberately exaggerated pasta consumption, complete with hair-nibbling and finger-slurping antics.

Given the rapid evolution of AI video technology, particularly since OpenAI had just unveiled its Sora video model four days earlier, Smith’s post sparked discussion in his Instagram comments where some viewers initially struggled to distinguish between the genuine footage and AI generation. It was an early sign of “deep doubt” in action as the tech increasingly blurs the line between synthetic and authentic video content.

Robot dogs learn to hunt people with AI-guided rifles

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries.

A still image of a robotic quadruped armed with a remote weapons system, captured from a video provided by Onyx Industries. Credit: Onyx Industries

At some point in recent history—somewhere around 2022—someone took a look at robotic quadrupeds and thought it would be a great idea to attach guns to them. A few years later, the US Marine Forces Special Operations Command (MARSOC) began evaluating armed robotic quadrupeds developed by Ghost Robotics. The robot “dogs” integrated Onyx Industries’ SENTRY remote weapon systems, which featured AI-enabled targeting that could detect and track people, drones, and vehicles, though the systems require human operators to authorize any weapons discharge.

The military’s interest in armed robotic dogs followed a broader trend of weaponized quadrupeds entering public awareness. This included viral videos of consumer robots carrying firearms, and later, commercial sales of flame-throwing models. While MARSOC emphasized that weapons were just one potential use case under review, experts noted that the increasing integration of AI into military robotics raised questions about how long humans would remain in control of lethal force decisions.

Microsoft Windows AI is watching

A screenshot of Microsoft's new

A screenshot of Microsoft’s new “Recall” feature in action. Credit: Microsoft

In an era where many people already feel like they have no privacy due to tech encroachments, Microsoft dialed it up to an extreme degree in May. That’s when Microsoft unveiled a controversial Windows 11 feature called “Recall” that continuously captures screenshots of users’ PC activities every few seconds for later AI-powered search and retrieval. The feature, designed for new Copilot+ PCs using Qualcomm’s Snapdragon X Elite chips, promised to help users find past activities, including app usage, meeting content, and web browsing history.

While Microsoft emphasized that Recall would store encrypted snapshots locally and allow users to exclude specific apps or websites, the announcement raised immediate privacy concerns, as Ars senior technology reporter Andrew Cunningham covered. It also came with a technical toll, requiring significant hardware resources, including 256GB of storage space, with 25GB dedicated to storing approximately three months of user activity. After Microsoft pulled the initial test version due to public backlash, Recall later entered public preview in November with reportedly enhanced security measures. But secure spyware is still spyware—Recall, when enabled, still watches nearly everything you do on your computer and keeps a record of it.

Google Search told people to eat rocks

This is fine. Credit: Getty Images

In May, Ars senior gaming reporter Kyle Orland (who assisted commendably with the AI beat throughout the year) covered Google’s newly launched AI Overview feature. It faced immediate criticism when users discovered that it frequently provided false and potentially dangerous information in its search result summaries. Among its most alarming responses, the system advised humans could safely consume rocks, incorrectly citing scientific sources about the geological diet of marine organisms. The system’s other errors included recommending nonexistent car maintenance products, suggesting unsafe food preparation techniques, and confusing historical figures who shared names.

The problems stemmed from several issues, including the AI treating joke posts as factual sources and misinterpreting context from original web content. But most of all, the system relies on web results as indicators of authority, which we called a flawed design. While Google defended the system, stating these errors occurred mainly with uncommon queries, a company spokesperson acknowledged they would use these “isolated examples” to refine their systems. But to this day, AI Overview still makes frequent mistakes.

Stable Diffusion generates body horror

An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass. Credit: HorneyMetalBeing

In June, Stability AI’s release of the image synthesis model Stable Diffusion 3 Medium drew criticism online for its poor handling of human anatomy in AI-generated images. Users across social media platforms shared examples of the model producing what we now like to call jabberwockies—AI generation failures with distorted bodies, misshapen hands, and surreal anatomical errors, and many in the AI image-generation community viewed it as a significant step backward from previous image-synthesis capabilities.

Reddit users attributed these failures to Stability AI’s aggressive filtering of adult content from the training data, which apparently impaired the model’s ability to accurately render human figures. The troubled release coincided with broader organizational challenges at Stability AI, including the March departure of CEO Emad Mostaque, multiple staff layoffs, and the exit of three key engineers who had helped develop the technology. Some of those engineers founded Black Forest Labs in August and released Flux, which has become the latest open-weights AI image model to beat.

ChatGPT Advanced Voice imitates human voice in testing

An illustration of a computer synthesizer spewing out letters.

AI voice-synthesis models are master imitators these days, and they are capable of much more than many people realize. In August, we covered a story where OpenAI’s ChatGPT Advanced Voice Mode feature unexpectedly imitated a user’s voice during the company’s internal testing, revealed by OpenAI after the fact in safety testing documentation. To prevent future instances of an AI assistant suddenly speaking in your own voice (which, let’s be honest, would probably freak people out), the company created an output classifier system to prevent unauthorized voice imitation. OpenAI says that Advanced Voice Mode now catches all meaningful deviations from approved system voices.

Independent AI researcher Simon Willison discussed the implications with Ars Technica, noting that while OpenAI restricted its model’s full voice synthesis capabilities, similar technology would likely emerge from other sources within the year. Meanwhile, the rapid advancement of AI voice replication has caused general concern about its potential misuse, although companies like ElevenLabs have already been offering voice cloning services for some time.

San Francisco’s robotic car horn symphony

A Waymo self-driving car in front of Google's San Francisco headquarters, San Francisco, California, June 7, 2024.

A Waymo self-driving car in front of Google’s San Francisco headquarters, San Francisco, California, June 7, 2024. Credit: Getty Images

In August, San Francisco residents got a noisy taste of robo-dystopia when Waymo’s self-driving cars began creating an unexpected nightly disturbance in the South of Market district. In a parking lot off 2nd Street, the cars congregated autonomously every night during rider lulls at 4 am and began engaging in extended honking matches at each other while attempting to park.

Local resident Christopher Cherry’s initial optimism about the robotic fleet’s presence dissolved as the mechanical chorus grew louder each night, affecting residents in nearby high-rises. The nocturnal tech disruption served as a lesson in the unintentional effects of autonomous systems when run in aggregate.

Larry Ellison dreams of all-seeing AI cameras

A colorized photo of CCTV cameras in London, 2024.

In September, Oracle co-founder Larry Ellison painted a bleak vision of ubiquitous AI surveillance during a company financial meeting. The 80-year-old database billionaire described a future where AI would monitor citizens through networks of cameras and drones, asserting that the oversight would ensure lawful behavior from both police and the public.

His surveillance predictions reminded us of parallels to existing systems in China, where authorities already used AI to sort surveillance data on citizens as part of the country’s “sharp eyes” campaign from 2015 to 2020. Ellison’s statement reflected the sort of worst-case tech surveillance state scenario—likely antithetical to any sort of free society—that dozens of sci-fi novels of the 20th century warned us about.

A dead father sends new letters home

An AI-generated image featuring Dad's Uppercase handwriting.

An AI-generated image featuring my late father’s handwriting. Credit: Benj Edwards / Flux

AI has made many of us do weird things in 2024, including this writer. In October, I used an AI synthesis model called Flux to reproduce my late father’s handwriting with striking accuracy. After scanning 30 samples from his engineering notebooks, I trained the model using computing time that cost less than five dollars. The resulting text captured his distinctive uppercase style, which he developed during his career as an electronics engineer.

I enjoyed creating images showing his handwriting in various contexts, from folder labels to skywriting, and made the trained model freely available online for others to use. While I approached it as a tribute to my father (who would have appreciated the technical achievement), many people found the whole experience weird and somewhat disturbing. The things we unhinged Bing Chat-like journalists do to bring awareness to a topic are sometimes unconventional. So I guess it counts for this list!

For 2025? Expect even more AI

Thanks for reading Ars Technica this past year and following along with our team coverage of this rapidly emerging and expanding field. We appreciate your kind words of support. Ars Technica’s 2024 AI words of the year were: vibemarking, deep doubt, and the aforementioned jabberwocky. The old stalwart “confabulation” also made several notable appearances. Tune in again next year when we continue to try to figure out how to concisely describe novel scenarios in emerging technology by labeling them.

Looking back, our prediction for 2024 in AI last year was “buckle up.” It seems fitting, given the weirdness detailed above. Especially the part about the robot dogs with guns. For 2025, AI will likely inspire more chaos ahead, but also potentially get put to serious work as a productivity tool, so this time, our prediction is “buckle down.”

Finally, we’d like to ask: What was the craziest story about AI in 2024 from your perspective? Whether you love AI or hate it, feel free to suggest your own additions to our list in the comments. Happy New Year!

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

2024: The year AI drove everyone crazy Read More »