Pink-haired Aitana Lopez is followed by more than 200,000 people on social media. She posts selfies from concerts and her bedroom, while tagging brands such as hair care line Olaplex and lingerie giant Victoria’s Secret.
Brands have paid about $1,000 a post for her to promote their products on social media—despite the fact that she is entirely fictional.
Aitana is a “virtual influencer” created using artificial intelligence tools, one of the hundreds of digital avatars that have broken into the growing $21 billion content creator economy.
Their emergence has led to worry from human influencers their income is being cannibalized and under threat from digital rivals. That concern is shared by people in more established professions that their livelihoods are under threat from generative AI—technology that can spew out humanlike text, images and code in seconds.
But those behind the hyper-realistic AI creations argue they are merely disrupting an overinflated market.
“We were taken aback by the skyrocketing rates influencers charge nowadays. That got us thinking, ‘What if we just create our own influencer?’” said Diana Núñez, co-founder of the Barcelona-based agency The Clueless, which created Aitana. “The rest is history. We unintentionally created a monster. A beautiful one, though.”
Over the past few years, there have been high-profile partnerships between luxury brands and virtual influencers, including Kim Kardashian’s make-up line KKW Beauty with Noonoouri, and Louis Vuitton with Ayayi.
Instagram analysis of an H&M advert featuring virtual influencer Kuki found that it reached 11 times more people and resulted in a 91 percent decrease in cost per person remembering the advert, compared with a traditional ad.
In August, word leaked out that The New York Times was considering joining the growing legion of creators that are suing AI companies for misappropriating their content. The Times had reportedly been negotiating with OpenAI regarding the potential to license its material, but those talks had not gone smoothly. So, eight months after the company was reportedly considering suing, the suit has now been filed.
The Times is targeting various companies under the OpenAI umbrella, as well as Microsoft, an OpenAI partner that both uses it to power its Copilot service and helped provide the infrastructure for training the GPT Large Language Model. But the suit goes well beyond the use of copyrighted material in training, alleging that OpenAI-powered software will happily circumvent the Times’ paywall and ascribe hallucinated misinformation to the Times.
Journalism is expensive
The suit notes that The Times maintains a large staff that allows it to do things like dedicate reporters to a huge range of beats and engage in important investigative journalism, among other things. Because of those investments, the newspaper is often considered an authoritative source on many matters.
All of that costs money, and The Times earns that by limiting access to its reporting through a robust paywall. In addition, each print edition has a copyright notification, the Times’ terms of service limit the copying and use of any published material, and it can be selective about how it licenses its stories. In addition to driving revenue, these restrictions also help it to maintain its reputation as an authoritative voice by controlling how its works appear.
The suit alleges that OpenAI-developed tools undermine all of that. “By providing Times content without The Times’s permission or authorization, Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue,” the suit alleges.
Part of the unauthorized use The Times alleges came during the training of various versions of GPT. Prior to GPT-3.5, information about the training dataset was made public. One of the sources used is a large collection of online material called “Common Crawl,” which the suit alleges contains information from 16 million unique records from sites published by The Times. That places the Times as the third most referenced source, behind Wikipedia and a database of US patents.
OpenAI no longer discloses as many details of the data used for training of recent GPT versions, but all indications are that full-text NY Times articles are still part of that process (Much more on that in a moment.) Expect access to training information to be a major issue during discovery if this case moves forward.
Not just training
A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times’ suit goes well beyond that to show how the material ingested during training can come back out during use. “Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples,” the suit alleges.
The suit alleges—and we were able to verify—that it’s comically easy to get GPT-powered systems to offer up content that is normally protected by the Times’ paywall. The suit shows a number of examples of GPT-4 reproducing large sections of articles nearly verbatim.
The suit includes screenshots of ChatGPT being given the title of a piece at The New York Times and asked for the first paragraph, which it delivers. Getting the ensuing text is apparently as simple as repeatedly asking for the next paragraph.
ChatGPT has apparently closed that loophole in between the preparation of that suit and the present. We entered some of the prompts shown in the suit, and were advised “I recommend checking The New York Times website or other reputable sources,” although we can’t rule out that context provided prior to that prompt could produce copyrighted material.
But not all loopholes have been closed. The suit also shows output from Bing Chat, since rebranded as Copilot. We were able to verify that asking for the first paragraph of a specific article at The Times caused Copilot to reproduce the first third of the article.
The suit is dismissive of attempts to justify this as a form of fair use. “Publicly, Defendants insist that their conduct is protected as ‘fair use’ because their unlicensed use of copyrighted content to train GenAI models serves a new ‘transformative’ purpose,” the suit notes. “But there is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”
Reputational and other damages
The hallucinations common to AI also came under fire in the suit for potentially damaging the value of the Times’ reputation, and possibly damaging human health as a side effect. “A GPT model completely fabricated that “The New York Times published an article on January 10, 2020, titled ‘Study Finds Possible Link between Orange Juice and Non-Hodgkin’s Lymphoma,’” the suit alleges. “The Times never published such an article.”
Similarly, asking about a Times article on heart-healthy foods allegedly resulted in Copilot saying it contained a list of examples (which it didn’t). When asked for the list, 80 percent of the foods on weren’t even mentioned by the original article. In another case, recommendations were ascribed to the Wirecutter when the products hadn’t even been reviewed by its staff.
As with the Times material, it’s alleged that it’s possible to get Copilot to offer up large chunks of Wirecutter articles (The Wirecutter is owned by The New York Times). But the suit notes that these article excerpts have the affiliate links stripped out of them, keeping the Wirecutter from its primary source of revenue.
The suit targets various OpenAI companies for developing the software, as well as Microsoft—the latter for both offering OpenAI-powered services, and for having developed the computing systems that enabled the copyrighted material to be ingested during training. Allegations include direct, contributory, and vicarious copyright infringement, as well as DMCA and trademark violations. Finally, it alleges “Common Law Unfair Competition By Misappropriation.”
The suit seeks nothing less than the erasure of both any GPT instances that the parties have trained using material from the Times, as well as the destruction of the datasets that were used for the training. It also asks for a permanent injunction to prevent similar conduct in the future. The Times also wants money, lots and lots of money: “statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity.”
Big tech companies have vastly outspent venture capital groups with investments in generative AI startups this year, as established giants use their financial muscle to dominate the much-hyped sector.
Microsoft, Google and Amazon last year struck a series of blockbuster deals, amounting to two-thirds of the $27 billion raised by fledgling AI companies in 2023, according to new data from private market researchers PitchBook.
The huge outlay, which exploded after the launch of OpenAI’s ChatGPT in November 2022, highlights how the biggest Silicon Valley groups are crowding out traditional tech investors for the biggest deals in the industry.
The rise of generative AI—systems capable of producing humanlike video, text, image and audio in seconds—have also attracted top Silicon Valley investors. But VCs have been outmatched, having been forced to slow down their spending as they adjust to higher interest rates and falling valuations for their portfolio companies.
“Over the past year, we’ve seen the market quickly consolidate around a handful of foundation models, with large tech players coming in and pouring billions of dollars into companies like OpenAI, Cohere, Anthropic and Mistral,” said Nina Achadjian, a partner at US venture firm Index Ventures referring to some of the top AI startups.
“For traditional VCs, you had to be in early and you had to have conviction—which meant being in the know on the latest AI research and knowing which teams were spinning out of Google DeepMind, Meta and others,” she added.
A string of deals, such as Microsoft’s $10 billion investment in OpenAI as well as billions of dollars raised by San Francisco-based Anthropic from both Google and Amazon, helped push overall spending on AI groups to nearly three times as much as the previous record of $11 billion set two years ago.
Venture investing in tech hit record levels in 2021, as investors took advantage of ultra-low interest rates to raise and deploy vast sums across a range of industries, particularly those most disrupted by Covid-19.
Microsoft has also committed $1.3 billion to Inflection, another generative AI start-up, as it looks to steal a march on rivals such as Google and Amazon.
Building and training generative AI tools is an intensive process, requiring immense computing power and cash. As a result, start-ups have preferred to partner with Big Tech companies which can provide cloud infrastructure and access to the most powerful chips as well as dollars.
That has rapidly pushed up the valuations of private start-ups in the space, making it harder for VCs to bet on the companies at the forefront of the technology. An employee stock sale at OpenAI is seeking to value the company at $86 billion, almost treble the valuation it received earlier this year.
“Even the world’s top venture investors, with tens of billions under management, can’t compete to keep these AI companies independent and create new challengers that unseat the Big Tech incumbents,” said Patrick Murphy, founding partner at Tapestry VC, an early-stage venture capital firm.
“In this AI platform shift, most of the potentially one-in-a-million companies to appear so far have been captured by the Big Tech incumbents already.”
VCs are not absent from the market, however. Thrive Capital, Josh Kushner’s New York-based firm, is the lead investor in OpenAI’s employee stock sale, having already backed the company earlier this year. Thrive has continued to invest throughout a downturn in venture spending in 2023.
Paris-based Mistral raised around $500 million from investors including venture firms Andreessen Horowitz and General Catalyst, and chipmaker Nvidia since it was founded in May this year.
Some VCs are seeking to invest in companies building applications that are being built over so-called “foundation models” developed by OpenAI and Anthropic, in much the same way apps began being developed on mobile devices in the years after smartphones were introduced.
“There is this myth that only the foundation model companies matter,” said Sarah Guo, founder of AI-focused venture firm Conviction. “There is a huge space of still-unexplored application domains for AI, and a lot of the most valuable AI companies will be fundamentally new.”
US president Joe Biden’s plan for containing the dangers of artificial intelligencealready risks being derailed by congressional bean counters.
A White House executive order on AI announced in October calls on the US to develop new standards for stress-testing AI systems to uncover their biases, hidden threats, and rogue tendencies. But the agency tasked with setting these standards, the National Institute of Standards and Technology (NIST), lacks the budget needed to complete that work independently by the July 26, 2024, deadline, according to several people with knowledge of the work.
Speaking at the NeurIPS AI conference in New Orleans last week, Elham Tabassi, associate director for emerging technologies at NIST, described this as “an almost impossible deadline” for the agency.
Some members of Congress have grown concerned that NIST will be forced to rely heavily on AI expertise from private companies that, due to their own AI projects, have a vested interest in shaping standards.
The US government has already tapped NIST to help regulate AI. In January 2023 the agency released an AI risk management framework to guide business and government. NIST has also devised ways to measure public trust in new AI tools. But the agency, which standardizes everything from food ingredients to radioactive materials and atomic clocks, has puny resources compared to those of the companies on the forefront of AI. OpenAI, Google, and Meta each likely spent upwards of $100 million to train the powerful language models that undergird applications such as ChatGPT, Bard, and Llama 2.
NIST’s budget for 2023 was $1.6 billion, and the White House has requested that it be increased by 29 percent in 2024 for initiatives not directly related to AI. Several sources familiar with the situation at NIST say that the agency’s current budget will not stretch to figuring out AI safety testing on its own.
On December 16, the same day Tabassi spoke at NeurIPS, six members of Congress signed a bipartisan open letter raising concern about the prospect of NIST enlisting private companies with little transparency. “We have learned that NIST intends to make grants or awards to outside organizations for extramural research,” they wrote. The letter warns that there does not appear to be any publicly available information about how those awards will be decided.
The lawmakers’ letter also claims that NIST is being rushed to define standards even though research into testing AI systems is at an early stage. As a result there is “significant disagreement” among AI experts over how to work on or even measure and define safety issues with the technology, it states. “The current state of the AI safety research field creates challenges for NIST as it navigates its leadership role on the issue,” the letter claims.
NIST spokesperson Jennifer Huergo confirmed that the agency had received the letter and said that it “will respond through the appropriate channels.”
NIST is making some moves that would increase transparency, including issuing a request for information on December 19, soliciting input from outside experts and companies on standards for evaluating and red-teaming AI models. It is unclear if this was a response to the letter sent by the members of Congress.
The concerns raised by lawmakers are shared by some AI experts who have spent years developing ways to probe AI systems. “As a nonpartisan scientific body, NIST is the best hope to cut through the hype and speculation around AI risk,” says Rumman Chowdhury, a data scientist and CEO of Parity Consultingwho specializes in testing AI models for bias and other problems. “But in order to do their job well, they need more than mandates and well wishes.”
Yacine Jernite, machine learning and society lead at Hugging Face, a company that supports open source AI projects, says big tech has far more resources than the agency given a key role in implementing the White House’s ambitious AI plan. “NIST has done amazing work on helping manage the risks of AI, but the pressure to come up with immediate solutions for long-term problems makes their mission extremely difficult,” Jernite says. “They have significantly fewer resources than the companies developing the most visible AI systems.”
Margaret Mitchell, chief ethics scientist at Hugging Face, says the growing secrecy around commercial AI models makes measurement more challenging for an organization like NIST. “We can’t improve what we can’t measure,” she says.
The White House executive order calls for NIST to perform several tasks, including establishing a new Artificial Intelligence Safety Institute to support the development of safe AI. In April, a UK taskforce focused on AI safety was announced. It will receive $126 million in seed funding.
The executive order gave NIST an aggressive deadline for coming up with, among other things, guidelines for evaluating AI models, principles for “red-teaming” (adversarially testing) models, developing a plan to get US-allied nations to agree to NIST standards, and coming up with a plan for “advancing responsible global technical standards for AI development.”
Although it isn’t clear how NIST is engaging with big tech companies, discussions on NIST’s risk management framework, which took place prior to the announcement of the executive order, involved Microsoft; Anthropic, a startup formed by ex-OpenAI employees that is building cutting-edge AI models; Partnership on AI, which represents big tech companies; and the Future of Life Institute, a nonprofit dedicated to existential risk, among others.
“As a quantitative social scientist, I’m both loving and hating that people realize that the power is in measurement,” Chowdhury says.
Apple’s latest research about running large language models on smartphones offers the clearest signal yet that the iPhone maker plans to catch up with its Silicon Valley rivals in generative artificial intelligence.
The paper, entitled “LLM in a Flash,” offers a “solution to a current computational bottleneck,” its researchers write.
Its approach “paves the way for effective inference of LLMs on devices with limited memory,” they said. Inference refers to how large language models, the large data repositories that power apps like ChatGPT, respond to users’ queries. Chatbots and LLMs normally run in vast data centers with much greater computing power than an iPhone.
The paper was published on December 12 but caught wider attention after Hugging Face, a popular site for AI researchers to showcase their work, highlighted it late on Wednesday. It is the second Apple paper on generative AI this month and follows earlier moves to enable image-generating models such as Stable Diffusion to run on its custom chips.
Device manufacturers and chipmakers are hoping that new AI features will help revive the smartphone market, which has had its worst year in a decade, with shipments falling an estimated 5 percent, according to Counterpoint Research.
Despite launching one of the first virtual assistants, Siri, back in 2011, Apple has been largely left out of the wave of excitement about generative AI that has swept through Silicon Valley in the year since OpenAI launched its breakthrough chatbot ChatGPT. Apple has been viewed by many in the AI community as lagging behind its Big Tech rivals, despite hiring Google’s top AI executive, John Giannandrea, in 2018.
While Microsoft and Google have largely focused on delivering chatbots and other generative AI services over the Internet from their vast cloud computing platforms, Apple’s research suggests that it will instead focus on AI that can run directly on an iPhone.
Apple’s rivals, such as Samsung, are gearing up to launch a new kind of “AI smartphone” next year. Counterpoint estimated more than 100 million AI-focused smartphones would be shipped in 2024, with 40 percent of new devices offering such capabilities by 2027.
The head of the world’s largest mobile chipmaker, Qualcomm chief executive Cristiano Amon, forecast that bringing AI to smartphones would create a whole new experience for consumers and reverse declining mobile sales.
“You’re going to see devices launch in early 2024 with a number of generative AI use cases,” he told the Financial Times in a recent interview. “As those things get scaled up, they start to make a meaningful change in the user experience and enable new innovation which has the potential to create a new upgrade cycle in smartphones.”
More sophisticated virtual assistants will be able to anticipate users’ actions such as texting or scheduling a meeting, he said, while devices will also be capable of new kinds of photo editing techniques.
Google this month unveiled a version of its new Gemini LLM that will run “natively” on its Pixel smartphones.
Running the kind of large AI model that powers ChatGPT or Google’s Bard on a personal device brings formidable technical challenges, because smartphones lack the huge computing resources and energy available in a data center. Solving this problem could mean that AI assistants respond more quickly than they do from the cloud and even work offline.
Ensuring that queries are answered on an individual’s own device without sending data to the cloud is also likely to bring privacy benefits, a key differentiator for Apple in recent years.
“Our experiment is designed to optimize inference efficiency on personal devices,” its researchers said. Apple tested its approach on models including Falcon 7B, a smaller version of an open source LLM originally developed by the Technology Innovation Institute in Abu Dhabi.
Optimizing LLMs to run on battery-powered devices has been a growing focus for AI researchers. Academic papers are not a direct indicator of how Apple intends to add new features to its products, but they offer a rare glimpse into its secretive research labs and the company’s latest technical breakthroughs.
“Our work not only provides a solution to a current computational bottleneck but also sets a precedent for future research,” wrote Apple’s researchers in the conclusion to their paper. “We believe as LLMs continue to grow in size and complexity, approaches like this work will be essential for harnessing their full potential in a wide range of devices and applications.”
Apple did not immediately respond to a request for comment.
More than 1,000 known child sexual abuse materials (CSAM) were found in a large open dataset—known as LAION-5B—that was used to train popular text-to-image generators such as Stable Diffusion, Stanford Internet Observatory (SIO) researcher David Thiel revealed on Wednesday.
SIO’s report seems to confirm rumors swirling on the Internet since 2022 that LAION-5B included illegal images, Bloomberg reported. In an email to Ars, Thiel warned that “the inclusion of child abuse material in AI model training data teaches tools to associate children in illicit sexual activity and uses known child abuse images to generate new, potentially realistic child abuse content.”
Thiel began his research in September after discovering in June that AI image generators were being used to create thousands of fake but realistic AI child sex images rapidly spreading on the dark web. His goal was to find out what role CSAM may play in the training process of AI models powering the image generators spouting this illicit content.
“Our new investigation reveals that these models are trained directly on CSAM present in a public dataset of billions of images, known as LAION-5B,” Thiel’s report said. “The dataset included known CSAM scraped from a wide array of sources, including mainstream social media websites”—like Reddit, X, WordPress, and Blogspot—as well as “popular adult video sites”—like XHamster and XVideos.
Shortly after Thiel’s report was published, a spokesperson for LAION, the Germany-based nonprofit that produced the dataset, told Bloomberg that LAION “was temporarily removing LAION datasets from the Internet” due to LAION’s “zero tolerance policy” for illegal content. The datasets will be republished once LAION ensures “they are safe,” the spokesperson said. A spokesperson for Hugging Face, which hosts a link to a LAION dataset that’s currently unavailable, confirmed to Ars that the dataset is now unavailable to the public after being switched to private by the uploader.
Removing the datasets now doesn’t fix any lingering issues with previously downloaded datasets or previously trained models, though, like Stable Diffusion 1.5. Thiel’s report said that Stability AI’s subsequent versions of Stable Diffusion—2.0 and 2.1—filtered out some or most of the content deemed “unsafe,” “making it difficult to generate explicit content.” But because users were dissatisfied by these later, more filtered versions, Stable Diffusion 1.5 remains “the most popular model for generating explicit imagery,” Thiel’s report said.
A spokesperson for Stability AI told Ars that Stability AI is “committed to preventing the misuse of AI and prohibit the use of our image models and services for unlawful activity, including attempts to edit or create CSAM.” The spokesperson pointed out that SIO’s report “focuses on the LAION-5B dataset as a whole,” whereas “Stability AI models were trained on a filtered subset of that dataset” and were “subsequently fine-tuned” to “mitigate residual behaviors.” The implication seems to be that Stability AI’s filtered dataset is not as problematic as the larger dataset.
Stability AI’s spokesperson also noted that Stable Diffusion 1.5 “was released by Runway ML, not Stability AI.” There seems to be some confusion on that point, though, as a Runway ML spokesperson told Ars that Stable Diffusion “was released in collaboration with Stability AI.”
A demo of Stable Diffusion 1.5 noted that the model was “supported by Stability AI” but released by CompVis and Runway. While a YCombinator thread linking to a blog—titled “Why we chose not to release Stable Diffusion 1.5 as quickly”—from Stability AI’s former chief information officer, Daniel Jeffries, may have provided some clarity on this, it has since been deleted.
Runway ML’s spokesperson declined to comment on any updates being considered for Stable Diffusion 1.5 but linked Ars to a Stability AI blog from August 2022 that said, “Stability AI co-released Stable Diffusion alongside talented researchers from” Runway ML.
Stability AI’s spokesperson said that Stability AI does not host Stable Diffusion 1.5 but has taken other steps to reduce harmful outputs. Those include only hosting “versions of Stable Diffusion that include filters” that “remove unsafe content” and “prevent the model from generating unsafe content.”
“Additionally, we have implemented filters to intercept unsafe prompts or unsafe outputs when users interact with models on our platform,” Stability AI’s spokesperson said. “We have also invested in content labelling features to help identify images generated on our platform. These layers of mitigation make it harder for bad actors to misuse AI.”
Beyond verifying 1,008 instances of CSAM in the LAION-5B dataset, SIO found 3,226 instances of suspected CSAM in the LAION dataset. Thiel’s report warned that both figures are “inherently a significant undercount” due to researchers’ limited ability to detect and flag all the CSAM in the datasets. His report also predicted that “the repercussions of Stable Diffusion 1.5’s training process will be with us for some time to come.”
“The most obvious solution is for the bulk of those in possession of LAION‐5B‐derived training sets to delete them or work with intermediaries to clean the material,” SIO’s report said. “Models based on Stable Diffusion 1.5 that have not had safety measures applied to them should be deprecated and distribution ceased where feasible.”
“Here, There, and Everywhere” isn’t just a Beatles song. It’s also a phrase that recalls the spread of generative AI into the tech industry during 2023. Whether you think AI is just a fad or the dawn of a new tech revolution, it’s been impossible to deny that AI news has dominated the tech space for the past year.
We’ve seen a large cast of AI-related characters emerge that includes tech CEOs, machine learning researchers, and AI ethicists—as well as charlatans and doomsayers. From public feedback on the subject of AI, we’ve heard that it’s been difficult for non-technical people to know who to believe, what AI products (if any) to use, and whether we should fear for our lives or our jobs.
Meanwhile, in keeping with a much-lamented trend of 2022, machine learning research has not slowed down over the past year. On X, former Biden administration tech advisor Suresh Venkatasubramanianwrote, “How do people manage to keep track of ML papers? This is not a request for support in my current state of bewilderment—I’m genuinely asking what strategies seem to work to read (or “read”) what appear to be 100s of papers per day.”
To wrap up the year with a tidy bow, here’s a look back at the 10 biggest AI news stories of 2023. It was very hard to choose only 10 (in fact, we originally only intended to do seven), but since we’re not ChatGPT generating reams of text without limit, we have to stop somewhere.
Bing Chat “loses its mind”
In February, Microsoft unveiled Bing Chat, a chatbot built into its languishing Bing search engine website. Microsoft created the chatbot using a more raw form of OpenAI’s GPT-4 language model but didn’t tell everyone it was GPT-4 at first. Since Microsoft used a less conditioned version of GPT-4 than the one that would be released in March, the launch was rough. The chatbot assumed a temperamental personality that could easily turn on users and attack them, tell people it was in love with them, seemingly worry about its fate, and lose its cool when confronted with an article we wrote about revealing its system prompt.
Aside from the relatively raw nature of the AI model Microsoft was using, at fault was a system where very long conversations would push the conditioning system prompt outside of its context window (like a form of short-term memory), allowing all hell to break loose through jailbreaks that people documented on Reddit. At one point, Bing Chat called me “the culprit and the enemy” for revealing some of its weaknesses. Some people thought Bing Chat was sentient, despite AI experts’ assurances to the contrary. It was a disaster in the press, but Microsoft didn’t flinch, and it ultimately reigned in some of Bing Chat’s wild proclivities and opened the bot widely to the public. Today, Bing Chat is now known as Microsoft Copilot, and it’s baked into Windows.
US Copyright Office says no to AI copyright authors
In February, the US Copyright Office issued a key ruling on AI-generated art, revoking the copyright previously granted to the AI-assisted comic book “Zarya of the Dawn” in September 2022. The decision, influenced by the revelation that the images were created using the AI-powered Midjourney image generator, stated that only the text and arrangement of images and text by Kashtanova were eligible for copyright protection. It was the first hint that AI-generated imagery without human-authored elements could not be copyrighted in the United States.
This stance was further cemented in August when a US federal judge ruled that art created solely by AI cannot be copyrighted. In September, the US Copyright Office rejected the registration for an AI-generated image that won a Colorado State Fair art contest in 2022. As it stands now, it appears that purely AI-generated art (without substantial human authorship) is in the public domain in the United States. This stance could be further clarified or changed in the future by judicial rulings or legislation.
But even with all that background, startup Channel 1‘s vision of a near-future where AI-generated avatars read you the news was a bit of a shock to the system. The company’s recent proof-of-concept “showcase” newscast reveals just how far AI-generated videos of humans have come in a short time and how those realistic avatars could shake up a lot more than just the job market for talking heads.
“…the newscasters have been changed to protect the innocent”
See the highest quality AI footage in the world.
🤯 – Our generated anchors deliver stories that are informative, heartfelt and entertaining.
To be clear, Channel 1 isn’t trying to fool people with “deepfakes” of existing news anchors or anything like that. In the first few seconds of its sample newscast, it identifies its talking heads as a “team of AI-generated reporters.” A few seconds later, one of those talking heads explains further: “You can hear us and see our lips moving, but no one was recorded saying what we’re all saying. I’m powered by sophisticated systems behind the scenes.”
Even with those kinds of warnings, I found I had to constantly remind myself that the “people” I was watching deliver the news here were only “based on real people who have been compensated for use of their likeness,” as Deadline reports (how much they were compensated will probably be of great concern to actors who recently went on strike in part over the issue of AI likenesses). Everything from the lip-syncing to the intonations to subtle gestures and body movements of these Channel 1 anchors gives an eerily convincing presentation of a real newscaster talking into the camera.
Sure, if you look closely, there are a few telltale anomalies that expose these reporters as computer creations—slight video distortions around the mouth, say, or overly repetitive hand gestures, or a nonsensical word emphasis choice. But those signs are so small that they would be easy to miss at a casual glance or on a small screen like that on a phone.
In other words, human-looking AI avatars now seem well on their way to climbing out of the uncanny valley, at least when it comes to news anchors who sit at a desk or stand still in front of a green screen. Channel 1 investor Adam Mosam told Deadline it “has gotten to a place where it’s comfortable to watch,” and I have to say I agree.
The same technology can be applied to on-the-scene news videos as well. About eight minutes into the sample newscast, Channel 1 shows a video of a European tropical storm victim describing the wreckage in French. Then it shows an AI-generated version of the same footage with the source speaking perfect English, using a facsimile of his original voice and artificial lipsync placed over his mouth.
Without the on-screen warning that this was “AI generated Language: Translated from French,” it would be easy to believe that the video was of an American expatriate rather than a native French speaker. And the effect is much more dramatic than the usual TV news practice of having an unseen interpreter speak over the footage.
If a machine or an AI program matches or surpasses human intelligence, does that mean it can simulate humans perfectly? If yes, then what about reasoning—our ability to apply logic and think rationally before making decisions? How could we even identify whether an AI program can reason? To try to answer this question, a team of researchers has proposed a novel framework that works like a psychological study for software.
“This test treats an ‘intelligent’ program as though it were a participant in a psychological study and has three steps: (a) test the program in a set of experiments examining its inferences, (b) test its understanding of its own way of reasoning, and (c) examine, if possible, the cognitive adequacy of the source code for the program,” the researchers note.
They suggest the standard methods of evaluating a machine’s intelligence, such as the Turing Test, can only tell you if the machine is good at processing information and mimicking human responses. The current generations of AI programs, such as Google’s LaMDA and OpenAI’s ChatGPT, for example, have come close to passing the Turing Test, yet the test results don’t imply these programs can think and reason like humans.
This is why the Turing Test may no longer be relevant, and there is a need for new evaluation methods that could effectively assess the intelligence of machines, according to the researchers. They claim that their framework could be an alternative to the Turing Test. “We propose to replace the Turing test with a more focused and fundamental one to answer the question: do programs reason in the way that humans reason?” the study authors argue.
What’s wrong with the Turing Test?
During the Turing Test, evaluators play different games involving text-based communications with real humans and AI programs (machines or chatbots). It is a blind test, so evaluators don’t know whether they are texting with a human or a chatbot. If the AI programs are successful in generating human-like responses—to the extent that evaluators struggle to distinguish between the human and the AI program—the AI is considered to have passed. However, since the Turing Test is based on subjective interpretation, these results are also subjective.
The researchers suggest that there are several limitations associated with the Turing Test. For instance, any of the games played during the test are imitation games designed to test whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they receive. ChatGPT is great at mimicking human language, even in responses where it gives out incorrect information. So, the test clearly doesn’t evaluate a machine’s reasoning and logical ability.
The results of the Turing Test also can’t tell you if a machine can introspect. We often think about our past actions and reflect on our lives and decisions, a critical ability that prevents us from repeating the same mistakes. The same applies to AI as well, according to a study from Stanford University which suggests that machines that could self-reflect are more practical for human use.
“AI agents that can leverage prior experience and adapt well by efficiently exploring new or changing environments will lead to much more adaptive, flexible technologies, from household robotics to personalized learning tools,” Nick Haber, an assistant professor from Stanford University who was not involved in the current study, said.
In addition to this, the Turing Test fails to analyze an AI program’s ability to think. In a recent Turing Test experiment, GPT-4 was able to convince evaluators that they were texting with humans over 40 percent of the time. However, this score fails to answer the basic question: Can the AI program think?
Alan Turing, the famous British scientist who created the Turing Test, once said, “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.” His test only covers one aspect of human intelligence, though: imitation. Although it is possible to deceive someone using this one aspect, many experts believe that a machine can never achieve true human intelligence without including those other aspects.
“It’s unclear whether passing the Turing Test is a meaningful milestone or not. It doesn’t tell us anything about what a system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence,” Mustafa Suleyman, an AI expert and founder of DeepAI, toldBloomberg.
Humana, one the nation’s largest health insurance providers, is allegedly using an artificial intelligence model with a 90 percent error rate to override doctors’ medical judgment and wrongfully deny care to elderly people on the company’s Medicare Advantage plans.
According to a lawsuit filed Tuesday, Humana’s use of the AI model constitutes a “fraudulent scheme” that leaves elderly beneficiaries with either overwhelming medical debt or without needed care that is covered by their plans. Meanwhile, the insurance behemoth reaps a “financial windfall.”
The lawsuit, filed in the US District Court in western Kentucky, is led by two people who had a Humana Medicare Advantage Plan policy and said they were wrongfully denied needed and covered care, harming their health and finances. The suit seeks class-action status for an unknown number of other beneficiaries nationwide who may be in similar situations. Humana provides Medicare Advantage plans for 5.1 million people in the US.
It is the second lawsuit aimed at an insurer’s use of the AI tool nH Predict, which was developed by NaviHealth to forecast how long patients will need care after a medical injury, illness, or event. In November, the estates of two deceased individuals brought a suit against UnitedHealth—the largest health insurance company in the US—for also allegedly using nH Predict to wrongfully deny care.
Humana did not respond to Ars’ request for comment for this story. United Health previously said that “the lawsuit has no merit, and we will defend ourselves vigorously.”
AI model
In both cases, the plaintiffs claim that the insurers use the flawed model to pinpoint the exact date to blindly and illegally cut off payments for post-acute care that is covered under Medicare plans—such as stays in skilled nursing facilities and inpatient rehabilitation centers. The AI-powered model comes up with those dates by comparing a patient’s diagnosis, age, living situation, and physical function to similar patients in a database of 6 million patients. In turn, the model spits out a prediction for the patient’s medical needs, length of stay, and discharge date.
But, the plaintiffs argue that the model fails to account for the entirety of each patient’s circumstances, their doctors’ recommendations, and the patient’s actual conditions. And they claim the predictions are draconian and inflexible. For example, under Medicare Advantage plans, patients who have a three-day hospital stay are typically entitled to up to 100 days of covered care in a nursing home. But with nH Predict in use, patients rarely stay in a nursing home for more than 14 days before claim denials begin.
Though few people appeal coverage denials generally, of those who have appealed the AI-based denials, over 90 percent have gotten the denial reversed, the lawsuits say.
Still, the insurers continue to use the model and NaviHealth employees are instructed to hew closely to the AI-based predictions, keeping lengths of post-acute care to within 1 percent of the days estimated by nH Predict. NaviHealth employees who fail to do so face discipline and firing. ” Humana banks on the patients’ impaired conditions, lack of knowledge, and lack of resources to appeal the wrongful AI-powered decisions,” the lawsuit filed Tuesday claims.
Plaintiff’s cases
One of the plaintiffs in Tuesday’s suit is JoAnne Barrows of Minnesota. On November 23, 2021, Barrows, then 86, was admitted to a hospital after falling at home and fracturing her leg. Doctors put her leg in a cast and issued an order not to put any weight on it for six weeks. On November 26, she was moved to a rehabilitation center for her six-week recovery. But, after just two weeks, Humana’s coverage denials began. Barrows and her family appealed the denials, but Humana denied the appeals, declaring that Barrows was fit to return to her home despite being bedridden and using a catheter.
Her family had no choice but to pay out-of-pocket. They tried moving her to a less expensive facility, but she received substandard care there, and her health declined further. Due to the poor quality of care, the family decided to move her home on December 22, even though she was still unable to use her injured leg, go the bathroom on her own, and still had a catheter.
The other plaintiff is Susan Hagood of North Carolina. On September 10, 2022, Hagood was admitted to a hospital with a urinary tract infection, sepsis, and a spinal infection. She stayed in the hospital until October 26, when she was transferred to a skilled nursing facility. Upon her transfer, she had eleven discharging diagnoses, including sepsis, acute kidney failure, kidney stones, nausea and vomiting, a urinary tract infection, swelling in her spine, and a spinal abscess. In the nursing facility, she was in extreme pain and on the maximum allowable dose of the painkiller oxycodone. She also developed pneumonia.
On November 28, she returned to the hospital for an appointment, at which point her blood pressure spiked, and she was sent to the emergency room. There, doctors found that her condition had considerably worsened.
Meanwhile, a day earlier, on November 27, Humana determined that it would deny coverage of part of her stay at the skilled nursing facility, refusing to pay from November 14 to November 28. Humana said Hagood no longer needed the level of care the facility provided and that she should be discharged home. The family paid $24,000 out-of-pocket for her care, and to date, Hagood remains in a skilled nursing facility.
Overall, the patients claim that Humana and UnitedHealth are aware that nH Predict is “highly inaccurate” but use it anyway to avoid paying for covered care and make more profit. The denials are “systematic, illegal, malicious, and oppressive.”
The lawsuit against Humana alleges breach of contract, unfair dealing, unjust enrichment, and bad faith insurance violations in many states. It seeks damages for financial losses and emotional distress, disgorgement and/or restitution, and to have Humana barred from using the AI-based model to deny claims.
On Wednesday, news quickly spread on social media about a new enabled-by-default Dropbox setting that shares Dropbox data with OpenAI for an experimental AI-powered search feature, but Dropbox says data is only shared if the feature is actively being used. Dropbox says that user data shared with third-party AI partners isn’t used to train AI models and is deleted within 30 days.
Even with assurances of data privacy laid out by Dropbox on an AI privacy FAQ page, the discovery that the setting had been enabled by default upset some Dropbox users. The setting was first noticed by writer Winifred Burton, who shared information about the Third-party AI setting through Bluesky on Tuesday, and frequent AI critic Karla Ortiz shared more information about it on X.
Wednesday afternoon, Drew Houston, the CEO of Dropbox, apologized for customer confusion in a post on X and wrote, “The third-party AI toggle in the settings menu enables or disables access to DBX AI features and functionality. Neither this nor any other setting automatically or passively sends any Dropbox customer data to a third-party AI service.“
Critics say that communication about the change could have been clearer. AI researcher Simon Willison wrote, “Great example here of how careful companies need to be in clearly communicating what’s going on with AI access to personal data.”
So why would Dropbox ever send user data to OpenAI anyway? In July, the company announced an AI-powered feature called Dash that allows AI models to perform universal searches across platforms like Google Workspace and Microsoft Outlook.
According to the Dropbox privacy FAQ, the third-party AI opt-out setting is part of the “Dropbox AI alpha,” which is a conversational interface for exploring file contents that involves chatting with a ChatGPT-style bot using an “Ask something about this file” feature. To make it work, an AI language model similar to the one that powers ChatGPT (like GPT-4) needs access to your files.
According to the FAQ, the third-party AI toggle in your account settings is turned on by default if “you or your team” are participating in the Dropbox AI alpha. Still, multiple Ars Technica staff who had no knowledge of the Dropbox AI alpha found the setting enabled by default when they checked.
In a statement to Ars Technica, a Dropbox representative said, “The third-party AI toggle is only turned on to give all eligible customers the opportunity to view our new AI features and functionality, like Dropbox AI. It does not enable customers to use these features without notice. Any features that use third-party AI offer disclosure of third-party use, and link to settings that they can manage. Only after a customer sees the third-party AI transparency banner and chooses to proceed with asking a question about a file, will that file be sent to a third-party to generate answers. Our customers are still in control of when and how they use these features.”
Right now, the only third-party AI provider for Dropbox is OpenAI, writes Dropbox in the FAQ. “Open AI is an artificial intelligence research organization that develops cutting-edge language models and advanced AI technologies. Your data is never used to train their internal models, and is deleted from OpenAI’s servers within 30 days.” It also says, “Only the content relevant to an explicit request or command is sent to our third-party AI partners to generate an answer, summary, or transcript.”
Disabling the feature is easy if you prefer not to use Dropbox AI features. Log into your Dropbox account on a desktop web browser, then click your profile photo > Settings > Third-party AI. This link may take you to that page more quickly. On that page, click the switch beside “Use artificial intelligence (AI) from third-party partners so you can work faster in Dropbox” to toggle it into the “Off” position.
This story was updated on December 13, 2023, at 5: 35 pm ET with clarifications about when and how Dropbox shares data with OpenAI, as well as statements from Dropbox reps and its CEO.
On Monday, Mistral AI announced a new AI language model called Mixtral 8x7B, a “mixture of experts” (MoE) model with open weights that reportedly truly matches OpenAI’s GPT-3.5 in performance—an achievement that has been claimed by others in the past but is being taken seriously by AI heavyweights such as OpenAI’s Andrej Karpathy and Jim Fan. That means we’re closer to having a ChatGPT-3.5-level AI assistant that can run freely and locally on our devices, given the right implementation.
Mistral, based in Paris and founded by Arthur Mensch, Guillaume Lample, and Timothée Lacroix, has seen a rapid rise in the AI space recently. It has been quickly raising venture capital to become a sort of French anti-OpenAI, championing smaller models with eye-catching performance. Most notably, Mistral’s models run locally with open weights that can be downloaded and used with fewer restrictions than closed AI models from OpenAI, Anthropic, or Google. (In this context “weights” are the computer files that represent a trained neural network.)
Mixtral 8x7B can process a 32K token context window and works in French, German, Spanish, Italian, and English. It works much like ChatGPT in that it can assist with compositional tasks, analyze data, troubleshoot software, and write programs. Mistral claims that it outperforms Meta’s much larger LLaMA 2 70B (70 billion parameter) large language model and that it matches or exceeds OpenAI’s GPT-3.5 on certain benchmarks, as seen in the chart below.
The speed at which open-weights AI models have caught up with OpenAI’s top offering a year ago has taken many by surprise. Pietro Schirano, the founder of EverArt, wrote on X, “Just incredible. I am running Mistral 8x7B instruct at 27 tokens per second, completely locally thanks to @LMStudioAI. A model that scores better than GPT-3.5, locally. Imagine where we will be 1 year from now.”
LexicaArt founder Sharif Shameem tweeted, “The Mixtral MoE model genuinely feels like an inflection point — a true GPT-3.5 level model that can run at 30 tokens/sec on an M1. Imagine all the products now possible when inference is 100% free and your data stays on your device.” To which Andrej Karpathy replied, “Agree. It feels like the capability / reasoning power has made major strides, lagging behind is more the UI/UX of the whole thing, maybe some tool use finetuning, maybe some RAG databases, etc.”
Mixture of experts
So what does mixture of experts mean? As this excellent Hugging Face guide explains, it refers to a machine-learning model architecture where a gate network routes input data to different specialized neural network components, known as “experts,” for processing. The advantage of this is that it enables more efficient and scalable model training and inference, as only a subset of experts are activated for each input, reducing the computational load compared to monolithic models with equivalent parameter counts.
In layperson’s terms, a MoE is like having a team of specialized workers (the “experts”) in a factory, where a smart system (the “gate network”) decides which worker is best suited to handle each specific task. This setup makes the whole process more efficient and faster, as each task is done by an expert in that area, and not every worker needs to be involved in every task, unlike in a traditional factory where every worker might have to do a bit of everything.
OpenAI has been rumored to use a MoE system with GPT-4, accounting for some of its performance. In the case of Mixtral 8x7B, the name implies that the model is a mixture of eight 7 billion-parameter neural networks, but as Karpathy pointed out in a tweet, the name is slightly misleading because, “it is not all 7B params that are being 8x’d, only the FeedForward blocks in the Transformer are 8x’d, everything else stays the same. Hence also why total number of params is not 56B but only 46.7B.”
Mixtral is not the first “open” mixture of experts model, but it is notable for its relatively small size in parameter count and performance. It’s out now, available on Hugging Face and BitTorrent under the Apache 2.0 license. People have been running it locally using an app called LM Studio. Also, Mistral began offering beta access to an API for three levels of Mistral models on Monday.