openai

stability-announces-stable-diffusion-3,-a-next-gen-ai-image-generator

Stability announces Stable Diffusion 3, a next-gen AI image generator

Pics and it didn’t happen —

SD3 may bring DALL-E-like prompt fidelity to an open-weights image-synthesis model.

Stable Diffusion 3 generation with the prompt: studio photograph closeup of a chameleon over a black background.

Enlarge / Stable Diffusion 3 generation with the prompt: studio photograph closeup of a chameleon over a black background.

On Thursday, Stability AI announced Stable Diffusion 3, an open-weights next-generation image-synthesis model. It follows its predecessors by reportedly generating detailed, multi-subject images with improved quality and accuracy in text generation. The brief announcement was not accompanied by a public demo, but Stability is opening up a waitlist today for those who would like to try it.

Stability says that its Stable Diffusion 3 family of models (which takes text descriptions called “prompts” and turns them into matching images) range in size from 800 million to 8 billion parameters. The size range accommodates allowing different versions of the model to run locally on a variety of devices—from smartphones to servers. Parameter size roughly corresponds to model capability in terms of how much detail it can generate. Larger models also require more VRAM on GPU accelerators to run.

Since 2022, we’ve seen Stability launch a progression of AI image-generation models: Stable Diffusion 1.4, 1.5, 2.0, 2.1, XL, XL Turbo, and now 3. Stability has made a name for itself as providing a more open alternative to proprietary image-synthesis models like OpenAI’s DALL-E 3, though not without controversy due to the use of copyrighted training data, bias, and the potential for abuse. (This has led to lawsuits that are unresolved.) Stable Diffusion models have been open-weights and source-available, which means the models can be run locally and fine-tuned to change their outputs.

  • Stable Diffusion 3 generation with the prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says “Stable Diffusion 3” made out of colorful energy.

  • An AI-generated image of a grandma wearing a “Go big or go home sweatshirt” generated by Stable Diffusion 3.

  • Stable Diffusion 3 generation with the prompt: Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3.

  • An AI-generated image created by Stable Diffusion 3.

  • Stable Diffusion 3 generation with the prompt: A horse balancing on top of a colorful ball in a field with green grass and a mountain in the background.

  • Stable Diffusion 3 generation with the prompt: Moody still life of assorted pumpkins.

  • Stable Diffusion 3 generation with the prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words “stable diffusion.”

  • Stable Diffusion 3 generation with the prompt: Resting on the kitchen table is an embroidered cloth with the text ‘good night’ and an embroidered baby tiger. Next to the cloth there is a lit candle. The lighting is dim and dramatic.

  • Stable Diffusion 3 generation with the prompt: Photo of an 90’s desktop computer on a work desk, on the computer screen it says “welcome”. On the wall in the background we see beautiful graffiti with the text “SD3” very large on the wall.

As far as tech improvements are concerned, Stability CEO Emad Mostaque wrote on X, “This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements. This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs.”

Like Mostaque said, the Stable Diffusion 3 family uses diffusion transformer architecture, which is a new way of creating images with AI that swaps out the usual image-building blocks (such as U-Net architecture) for a system that works on small pieces of the picture. The method was inspired by transformers, which are good at handling patterns and sequences. This approach not only scales up efficiently but also reportedly produces higher-quality images.

Stable Diffusion 3 also utilizes “flow matching,” which is a technique for creating AI models that can generate images by learning how to transition from random noise to a structured image smoothly. It does this without needing to simulate every step of the process, instead focusing on the overall direction or flow that the image creation should follow.

A comparison of outputs between OpenAI's DALL-E 3 and Stable Diffusion 3 with the prompt,

Enlarge / A comparison of outputs between OpenAI’s DALL-E 3 and Stable Diffusion 3 with the prompt, “Night photo of a sports car with the text “SD3″ on the side, the car is on a race track at high speed, a huge road sign with the text ‘faster.'”

We do not have access to Stable Diffusion 3 (SD3), but from samples we found posted on Stability’s website and associated social media accounts, the generations appear roughly comparable to other state-of-the-art image-synthesis models at the moment, including the aforementioned DALL-E 3, Adobe Firefly, Imagine with Meta AI, Midjourney, and Google Imagen.

SD3 appears to handle text generation very well in the examples provided by others, which are potentially cherry-picked. Text generation was a particular weakness of earlier image-synthesis models, so an improvement to that capability in a free model is a big deal. Also, prompt fidelity (how closely it follows descriptions in prompts) seems to be similar to DALL-E 3, but we haven’t tested that ourselves yet.

While Stable Diffusion 3 isn’t widely available, Stability says that once testing is complete, its weights will be free to download and run locally. “This preview phase, as with previous models,” Stability writes, “is crucial for gathering insights to improve its performance and safety ahead of an open release.”

Stability has been experimenting with a variety of image-synthesis architectures recently. Aside from SDXL and SDXL Turbo, just last week, the company announced Stable Cascade, which uses a three-stage process for text-to-image synthesis.

Listing image by Emad Mostaque (Stability AI)

Stability announces Stable Diffusion 3, a next-gen AI image generator Read More »

google-goes-“open-ai”-with-gemma,-a-free,-open-weights-chatbot-family

Google goes “open AI” with Gemma, a free, open-weights chatbot family

Free hallucinations for all —

Gemma chatbots can run locally, and they reportedly outperform Meta’s Llama 2.

The Google Gemma logo

On Wednesday, Google announced a new family of AI language models called Gemma, which are free, open-weights models built on technology similar to the more powerful but closed Gemini models. Unlike Gemini, Gemma models can run locally on a desktop or laptop computer. It’s Google’s first significant open large language model (LLM) release since OpenAI’s ChatGPT started a frenzy for AI chatbots in 2022.

Gemma models come in two sizes: Gemma 2B (2 billion parameters) and Gemma 7B (7 billion parameters), each available in pre-trained and instruction-tuned variants. In AI, parameters are values in a neural network that determine AI model behavior, and weights are a subset of these parameters stored in a file.

Developed by Google DeepMind and other Google AI teams, Gemma pulls from techniques learned during the development of Gemini, which is the family name for Google’s most capable (public-facing) commercial LLMs, including the ones that power its Gemini AI assistant. Google says the name comes from the Latin gemma, which means “precious stone.”

While Gemma is Google’s first major open LLM since the launch of ChatGPT (it has released smaller research models such as FLAN-T5 in the past), it’s not Google’s first contribution to open AI research. The company cites the development of the Transformer architecture, as well as releases like TensorFlow, BERT, T5, and JAX as key contributions, and it would not be controversial to say that those have been important to the field.

A chart of Gemma performance provided by Google. Google says that Gemma outperforms Meta's Llama 2 on several benchmarks.

Enlarge / A chart of Gemma performance provided by Google. Google says that Gemma outperforms Meta’s Llama 2 on several benchmarks.

Owing to lesser capability and high confabulation rates, smaller open-weights LLMs have been more like tech demos until recently, as some larger ones have begun to match GPT-3.5 performance levels. Still, experts see source-available and open-weights AI models as essential steps in ensuring transparency and privacy in chatbots. Google Gemma is not “open source” however, since that term usually refers to a specific type of software license with few restrictions attached.

In reality, Gemma feels like a conspicuous play to match Meta, which has made a big deal out of releasing open-weights models (such as LLaMA and Llama 2) since February of last year. That technique stands in opposition to AI models like OpenAI’s GPT-4 Turbo, which is only available through the ChatGPT application and a cloud API and cannot be run locally. A Reuters report on Gemma focuses on the Meta angle and surmises that Google hopes to attract more developers to its Vertex AI cloud platform.

We have not used Gemma yet; however, Google claims the 7B model outperforms Meta’s Llama 2 7B and 13B models on several benchmarks for math, Python code generation, general knowledge, and commonsense reasoning tasks. It’s available today through Kaggle, a machine-learning community platform, and Hugging Face.

In other news, Google paired the Gemma release with a “Responsible Generative AI Toolkit,” which Google hopes will offer guidance and tools for developing what the company calls “safe and responsible” AI applications.

Google goes “open AI” with Gemma, a free, open-weights chatbot family Read More »

will-smith-parodies-viral-ai-generated-video-by-actually-eating-spaghetti

Will Smith parodies viral AI-generated video by actually eating spaghetti

Mangia, mangia —

Actor pokes fun at 2023 AI video by eating spaghetti messily and claiming it’s AI-generated.

The real Will Smith eating spaghetti, parodying an AI-generated video from 2023.

Enlarge / The real Will Smith eating spaghetti, parodying an AI-generated video from 2023.

On Monday, Will Smith posted a video on his official Instagram feed that parodied an AI-generated video of the actor eating spaghetti that went viral last year. With the recent announcement of OpenAI’s Sora video synthesis model, many people have noted the dramatic jump in AI-video quality over the past year compared to the infamous spaghetti video. Smith’s new video plays on that comparison by showing the actual actor eating spaghetti in a comical fashion and claiming that it is AI-generated.

Captioned “This is getting out of hand!”, the Instagram video uses a split screen layout to show the original AI-generated spaghetti video created by a Reddit user named “chaindrop” in March 2023 on the top, labeled with the subtitle “AI Video 1 year ago.” Below that, in a box titled “AI Video Now,” the real Smith shows 11 video segments of himself actually eating spaghetti by slurping it up while shaking his head, pouring it into his mouth with his fingers, and even nibbling on a friend’s hair. 2006’s Snap Yo Fingers by Lil Jon plays in the background.

In the Instagram comments section, some people expressed confusion about the new (non-AI) video, saying, “I’m still in doubt if second video was also made by AI or not.” In a reply, someone else wrote, “Boomers are gonna loose [sic] this one. Second one is clearly him making a joke but I wouldn’t doubt it in a couple months time it will get like that.”

We have not yet seen a model with the capability of Sora attempt to create a new Will-Smith-eating-spaghetti AI video, but the result would likely be far better than what we saw last year, even if it contained obvious glitches. Given how things are progressing, we wouldn’t be surprised if by 2025, video synthesis AI models can replicate the parody video created by Smith himself.

It’s worth noting for history’s sake that despite the comparison, the video of Will Smith eating spaghetti did not represent the state of the art in text-to-video synthesis at the time of its creation in March 2023 (that title would likely apply to Runway’s Gen-2, which was then in closed testing). However, the spaghetti video was reasonably advanced for open weights models at the time, having used the ModelScope AI model. More capable video synthesis models had already been released at that time, but due to the humorous cultural reference, it’s arguably more fun to compare today’s AI video synthesis to Will Smith grotesquely eating spaghetti than to teddy bears washing dishes.

Will Smith parodies viral AI-generated video by actually eating spaghetti Read More »

reddit-sells-training-data-to-unnamed-ai-company-ahead-of-ipo

Reddit sells training data to unnamed AI company ahead of IPO

Everything has a price —

If you’ve posted on Reddit, you’re likely feeding the future of AI.

In this photo illustration the American social news

On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site’s content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.

Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.

After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.

In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades’ worth of human-generated content.

If the reported $60 million/year deal goes through, it’s quite possible that if you’ve ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.

While we don’t know if OpenAI is the company that signed the deal with Reddit, Bloomberg speculates that Reddit’s ability to tap into AI hype for additional revenue may boost the value of its IPO, which might be worth $5 billion. Despite drama last year, Bloomberg states that Reddit pulled in more than $800 million in revenue in 2023, growing about 20 percent over its 2022 numbers.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Reddit sells training data to unnamed AI company ahead of IPO Read More »

elon-musk’s-x-allows-china-based-propaganda-banned-on-other-platforms

Elon Musk’s X allows China-based propaganda banned on other platforms

Rinse-wash-repeat. —

X accused of overlooking propaganda flagged by Meta and criminal prosecutors.

Elon Musk’s X allows China-based propaganda banned on other platforms

Lax content moderation on X (aka Twitter) has disrupted coordinated efforts between social media companies and law enforcement to tamp down on “propaganda accounts controlled by foreign entities aiming to influence US politics,” The Washington Post reported.

Now propaganda is “flourishing” on X, The Post said, while other social media companies are stuck in endless cycles, watching some of the propaganda that they block proliferate on X, then inevitably spread back to their platforms.

Meta, Google, and then-Twitter began coordinating takedown efforts with law enforcement and disinformation researchers after Russian-backed influence campaigns manipulated their platforms in hopes of swaying the 2016 US presidential election.

The next year, all three companies promised Congress to work tirelessly to stop Russian-backed propaganda from spreading on their platforms. The companies created explicit election misinformation policies and began meeting biweekly to compare notes on propaganda networks each platform uncovered, according to The Post’s interviews with anonymous sources who participated in these meetings.

However, after Elon Musk purchased Twitter and rebranded the company as X, his company withdrew from the alliance in May 2023.

Sources told The Post that the last X meeting attendee was Irish intelligence expert Aaron Rodericks—who was allegedly disciplined for liking an X post calling Musk “a dipshit.” Rodericks was subsequently laid off when Musk dismissed the entire election integrity team last September, and after that, X apparently ditched the biweekly meeting entirely and “just kind of disappeared,” a source told The Post.

In 2023, for example, Meta flagged 150 “artificial influence accounts” identified on its platform, of which “136 were still present on X as of Thursday evening,” according to The Post’s analysis. X’s seeming oversight extends to all but eight of the 123 “deceptive China-based campaigns” connected to accounts that Meta flagged last May, August, and December, The Post reported.

The Post’s report also provided an exclusive analysis from the Stanford Internet Observatory (SIO), which found that 86 propaganda accounts that Meta flagged last November “are still active on X.”

The majority of these accounts—81—were China-based accounts posing as Americans, SIO reported. These accounts frequently ripped photos from Americans’ LinkedIn profiles, then changed the real Americans’ names while posting about both China and US politics, as well as people often trending on X, such as Musk and Joe Biden.

Meta has warned that China-based influence campaigns are “multiplying,” The Post noted, while X’s standards remain seemingly too relaxed. Even accounts linked to criminal investigations remain active on X. One “account that is accused of being run by the Chinese Ministry of Public Security,” The Post reported, remains on X despite its posts being cited by US prosecutors in a criminal complaint.

Prosecutors connected that account to “dozens” of X accounts attempting to “shape public perceptions” about the Chinese Communist Party, the Chinese government, and other world leaders. The accounts also comment on hot-button topics like the fentanyl problem or police brutality, seemingly to convey “a sense of dismay over the state of America without any clear partisan bent,” Elise Thomas, an analyst for a London nonprofit called the Institute for Strategic Dialogue, told The Post.

Some X accounts flagged by The Post had more than 1 million followers. Five have paid X for verification, suggesting that their disinformation campaigns—targeting hashtags to confound discourse on US politics—are seemingly being boosted by X.

SIO technical research manager Renée DiResta criticized X’s decision to stop coordinating with other platforms.

“The presence of these accounts reinforces the fact that state actors continue to try to influence US politics by masquerading as media and fellow Americans,” DiResta told The Post. “Ahead of the 2022 midterms, researchers and platform integrity teams were collaborating to disrupt foreign influence efforts. That collaboration seems to have ground to a halt, Twitter does not seem to be addressing even networks identified by its peers, and that’s not great.”

Musk shut down X’s election integrity team because he claimed that the team was actually “undermining” election integrity. But analysts are bracing for floods of misinformation to sway 2024 elections, as some major platforms have removed election misinformation policies just as rapid advances in AI technologies have made misinformation spread via text, images, audio, and video harder for the average person to detect.

In one prominent example, a fake robocaller relied on AI voice technology to pose as Biden to tell Democrats not to vote. That incident seemingly pushed the Federal Trade Commission on Thursday to propose penalizing AI impersonation.

It seems apparent that propaganda accounts from foreign entities on X will use every tool available to get eyes on their content, perhaps expecting Musk’s platform to be the slowest to police them. According to The Post, some of the X accounts spreading propaganda are using what appears to be AI-generated images of Biden and Donald Trump to garner tens of thousands of views on posts.

It’s possible that X will start tightening up on content moderation as elections draw closer. Yesterday, X joined Amazon, Google, Meta, OpenAI, TikTok, and other Big Tech companies in signing an agreement to fight “deceptive use of AI” during 2024 elections. Among the top goals identified in the “AI Elections accord” are identifying where propaganda originates, detecting how propaganda spreads across platforms, and “undertaking collective efforts to evaluate and learn from the experiences and outcomes of dealing” with propaganda.

Elon Musk’s X allows China-based propaganda banned on other platforms Read More »

openai-collapses-media-reality-with-sora,-a-photorealistic-ai-video-generator

OpenAI collapses media reality with Sora, a photorealistic AI video generator

Pics and it didn’t happen —

Hello, cultural singularity—soon, every video you see online could be completely fake.

Snapshots from three videos generated using OpenAI's Sora.

Enlarge / Snapshots from three videos generated using OpenAI’s Sora.

On Thursday, OpenAI announced Sora, a text-to-video AI model that can generate 60-second-long photorealistic HD video from written descriptions. While it’s only a research preview that we have not tested, it reportedly creates synthetic video (but not audio yet) at a fidelity and consistency greater than any text-to-video model available at the moment. It’s also freaking people out.

“It was nice knowing you all. Please tell your grandchildren about my videos and the lengths we went to to actually record them,” wrote Wall Street Journal tech reporter Joanna Stern on X.

“This could be the ‘holy shit’ moment of AI,” wrote Tom Warren of The Verge.

“Every single one of these videos is AI-generated, and if this doesn’t concern you at least a little bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee.

For future reference—since this type of panic will some day appear ridiculous—there’s a generation of people who grew up believing that photorealistic video must be created by cameras. When video was faked (say, for Hollywood films), it took a lot of time, money, and effort to do so, and the results weren’t perfect. That gave people a baseline level of comfort that what they were seeing remotely was likely to be true, or at least representative of some kind of underlying truth. Even when the kid jumped over the lava, there was at least a kid and a room.

The prompt that generated the video above: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false. How we confront that as a society and work around it while maintaining trust in remote communications is far beyond the scope of this article, but I tried my hand at offering some solutions back in 2020, when all of the tech we’re seeing now seemed like a distant fantasy to most people.

In that piece, I called the moment that truth and fiction in media become indistinguishable the “cultural singularity.” It appears that OpenAI is on track to bring that prediction to pass a bit sooner than we expected.

Prompt: Reflections in the window of a train traveling through the Tokyo suburbs.

OpenAI has found that, like other AI models that use the transformer architecture, Sora scales with available compute. Given far more powerful computers behind the scenes, AI video fidelity could improve considerably over time. In other words, this is the “worst” AI-generated video is ever going to look. There’s no synchronized sound yet, but that might be solved in future models.

How (we think) they pulled it off

AI video synthesis has progressed by leaps and bounds over the past two years. We first covered text-to-video models in September 2022 with Meta’s Make-A-Video. A month later, Google showed off Imagen Video. And just 11 months ago, an AI-generated version of Will Smith eating spaghetti went viral. In May of last year, what was previously considered to be the front-runner in the text-to-video space, Runway Gen-2, helped craft a fake beer commercial full of twisted monstrosities, generated in two-second increments. In earlier video-generation models, people pop in and out of reality with ease, limbs flow together like pasta, and physics doesn’t seem to matter.

Sora (which means “sky” in Japanese) appears to be something altogether different. It’s high-resolution (1920×1080), can generate video with temporal consistency (maintaining the same subject over time) that lasts up to 60 seconds, and appears to follow text prompts with a great deal of fidelity. So, how did OpenAI pull it off?

OpenAI doesn’t usually share insider technical details with the press, so we’re left to speculate based on theories from experts and information given to the public.

OpenAI says that Sora is a diffusion model, much like DALL-E 3 and Stable Diffusion. It generates a video by starting off with noise and “gradually transforms it by removing the noise over many steps,” the company explains. It “recognizes” objects and concepts listed in the written prompt and pulls them out of the noise, so to speak, until a coherent series of video frames emerge.

Sora is capable of generating videos all at once from a text prompt, extending existing videos, or generating videos from still images. It achieves temporal consistency by giving the model “foresight” of many frames at once, as OpenAI calls it, solving the problem of ensuring a generated subject remains the same even if it falls out of view temporarily.

OpenAI represents video as collections of smaller groups of data called “patches,” which the company says are similar to tokens (fragments of a word) in GPT-4. “By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios,” the company writes.

An important tool in OpenAI’s bag of tricks is that its use of AI models is compounding. Earlier models are helping to create more complex ones. Sora follows prompts well because, like DALL-E 3, it utilizes synthetic captions that describe scenes in the training data generated by another AI model like GPT-4V. And the company is not stopping here. “Sora serves as a foundation for models that can understand and simulate the real world,” OpenAI writes, “a capability we believe will be an important milestone for achieving AGI.”

One question on many people’s minds is what data OpenAI used to train Sora. OpenAI has not revealed its dataset, but based on what people are seeing in the results, it’s possible OpenAI is using synthetic video data generated in a video game engine in addition to sources of real video (say, scraped from YouTube or licensed from stock video libraries). Nvidia’s Dr. Jim Fan, who is a specialist in training AI with synthetic data, wrote on X, “I won’t be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!” Until confirmed by OpenAI, however, that’s just speculation.

OpenAI collapses media reality with Sora, a photorealistic AI video generator Read More »

judge-rejects-most-chatgpt-copyright-claims-from-book-authors

Judge rejects most ChatGPT copyright claims from book authors

Insufficient evidence —

OpenAI plans to defeat authors’ remaining claim at a “later stage” of the case.

Judge rejects most ChatGPT copyright claims from book authors

A US district judge in California has largely sided with OpenAI, dismissing the majority of claims raised by authors alleging that large language models powering ChatGPT were illegally trained on pirated copies of their books without their permission.

By allegedly repackaging original works as ChatGPT outputs, authors alleged, OpenAI’s most popular chatbot was just a high-tech “grift” that seemingly violated copyright laws, as well as state laws preventing unfair business practices and unjust enrichment.

According to judge Araceli Martínez-Olguín, authors behind three separate lawsuits—including Sarah Silverman, Michael Chabon, and Paul Tremblay—have failed to provide evidence supporting any of their claims except for direct copyright infringement.

OpenAI had argued as much in their promptly filed motion to dismiss these cases last August. At that time, OpenAI said that it expected to beat the direct infringement claim at a “later stage” of the proceedings.

Among copyright claims tossed by Martínez-Olguín were accusations of vicarious copyright infringement. Perhaps most significantly, Martínez-Olguín agreed with OpenAI that the authors’ allegation that “every” ChatGPT output “is an infringing derivative work” is “insufficient” to allege vicarious infringement, which requires evidence that ChatGPT outputs are “substantially similar” or “similar at all” to authors’ books.

“Plaintiffs here have not alleged that the ChatGPT outputs contain direct copies of the copyrighted books,” Martínez-Olguín wrote. “Because they fail to allege direct copying, they must show a substantial similarity between the outputs and the copyrighted materials.”

Authors also failed to convince Martínez-Olguín that OpenAI violated the Digital Millennium Copyright Act (DMCA) by allegedly removing copyright management information (CMI)—such as author names, titles of works, and terms and conditions for use of the work—from training data.

This claim failed because authors cited “no facts” that OpenAI intentionally removed the CMI or built the training process to omit CMI, Martínez-Olguín wrote. Further, the authors cited examples of ChatGPT referencing their names, which would seem to suggest that some CMI remains in the training data.

Some of the remaining claims were dependent on copyright claims to survive, Martínez-Olguín wrote.

Arguing that OpenAI caused economic injury by unfairly repurposing authors’ works, even if authors could show evidence of a DMCA violation, authors could only speculate about what injury was caused, the judge said.

Similarly, allegations of “fraudulent” unfair conduct—accusing OpenAI of “deceptively” designing ChatGPT to produce outputs that omit CMI—”rest on a violation of the DMCA,” Martínez-Olguín wrote.

The only claim under California’s unfair competition law that was allowed to proceed alleged that OpenAI used copyrighted works to train ChatGPT without authors’ permission. Because the state law broadly defines what’s considered “unfair,” Martínez-Olguín said that it’s possible that OpenAI’s use of the training data “may constitute an unfair practice.”

Remaining claims of negligence and unjust enrichment failed, Martínez-Olguín wrote, because authors only alleged intentional acts and did not explain how OpenAI “received and unjustly retained a benefit” from training ChatGPT on their works.

Authors have been ordered to consolidate their complaints and have until March 13 to amend arguments and continue pursuing any of the dismissed claims.

To shore up the tossed copyright claims, authors would likely need to provide examples of ChatGPT outputs that are similar to their works, as well as evidence of OpenAI intentionally removing CMI to “induce, enable, facilitate, or conceal infringement,” Martínez-Olguín wrote.

Ars could not immediately reach the authors’ lawyers or OpenAI for comment.

As authors likely prepare to continue fighting OpenAI, the US Copyright Office has been fielding public input before releasing guidance that could one day help rights holders pursue legal claims and may eventually require works to be licensed from copyright owners for use as training materials. Among the thorniest questions is whether AI tools like ChatGPT should be considered authors when spouting outputs included in creative works.

While the Copyright Office prepares to release three reports this year “revealing its position on copyright law in relation to AI,” according to The New York Times, OpenAI recently made it clear that it does not plan to stop referencing copyrighted works in its training data. Last month, OpenAI said it would be “impossible” to train AI models without copyrighted materials, because “copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents.”

According to OpenAI, it doesn’t just need old copyrighted materials; it needs current copyright materials to ensure that chatbot and other AI tools’ outputs “meet the needs of today’s citizens.”

Rights holders will likely be bracing throughout this confusing time, waiting for the Copyright Office’s reports. But once there is clarity, those reports could “be hugely consequential, weighing heavily in courts, as well as with lawmakers and regulators,” The Times reported.

Judge rejects most ChatGPT copyright claims from book authors Read More »

report:-sam-altman-seeking-trillions-for-ai-chip-fabrication-from-uae,-others

Report: Sam Altman seeking trillions for AI chip fabrication from UAE, others

chips ahoy —

WSJ: Audacious $5-$7 trillion investment would aim to expand global AI chip supply.

WASHINGTON, DC - JANUARY 11: OpenAI Chief Executive Officer Sam Altman walks on the House side of the U.S. Capitol on January 11, 2024 in Washington, DC. Meanwhile, House Freedom Caucus members who left a meeting in the Speakers office say that they were talking to the Speaker about abandoning the spending agreement that Johnson announced earlier in the week. (Photo by Kent Nishimura/Getty Images)

Enlarge / OpenAI Chief Executive Officer Sam Altman walks on the House side of the US Capitol on January 11, 2024, in Washington, DC. (Photo by Kent Nishimura/Getty Images)

Getty Images

On Thursday, The Wall Street Journal reported that OpenAI CEO Sam Altman is in talks with investors to raise as much as $5 trillion to $7 trillion for AI chip manufacturing, according to people familiar with the matter. The funding seeks to address the scarcity of graphics processing units (GPUs) crucial for training and running large language models like those that power ChatGPT, Microsoft Copilot, and Google Gemini.

The high dollar amount reflects the huge amount of capital necessary to spin up new semiconductor manufacturing capability. “As part of the talks, Altman is pitching a partnership between OpenAI, various investors, chip makers and power providers, which together would put up money to build chip foundries that would then be run by existing chip makers,” writes the Wall Street Journal in its report. “OpenAI would agree to be a significant customer of the new factories.”

To hit these ambitious targets—which are larger than the entire semiconductor industry’s current $527 billion global sales combined—Altman has reportedly met with a range of potential investors worldwide, including sovereign wealth funds and government entities, notably the United Arab Emirates, SoftBank CEO Masayoshi Son, and representatives from Taiwan Semiconductor Manufacturing Co. (TSMC).

TSMC is the world’s largest dedicated independent semiconductor foundry. It’s a critical linchpin that companies such as Nvidia, Apple, Intel, and AMD rely on to fabricate SoCs, CPUs, and GPUs for various applications.

Altman reportedly seeks to expand the global capacity for semiconductor manufacturing significantly, funding the infrastructure necessary to support the growing demand for GPUs and other AI-specific chips. GPUs are excellent at parallel computation, which makes them ideal for running AI models that heavily rely on matrix multiplication to work. However, the technology sector currently faces a significant shortage of these important components, constraining the potential for AI advancements and applications.

In particular, the UAE’s involvement, led by Sheikh Tahnoun bin Zayed al Nahyan, a key security official and chair of numerous Abu Dhabi sovereign wealth vehicles, reflects global interest in AI’s potential and the strategic importance of semiconductor manufacturing. However, the prospect of substantial UAE investment in a key tech industry raises potential geopolitical concerns, particularly regarding the US government’s strategic priorities in semiconductor production and AI development.

The US has been cautious about allowing foreign control over the supply of microchips, given their importance to the digital economy and national security. Reflecting this, the Biden administration has undertaken efforts to bolster domestic chip manufacturing through subsidies and regulatory scrutiny of foreign investments in important technologies.

To put the $5 trillion to $7 trillion estimate in perspective, the White House just today announced a $5 billion investment in R&D to advance US-made semiconductor technologies. TSMC has already sunk $40 billion—one of the largest foreign investments in US history—into a US chip plant in Arizona. As of now, it’s unclear whether Altman has secured any commitments toward his fundraising goal.

Updated on February 9, 2024 at 8: 45 PM Eastern with a quote from the WSJ that clarifies the proposed relationship between OpenAI and partners in the talks.

Report: Sam Altman seeking trillions for AI chip fabrication from UAE, others Read More »

your-current-pc-probably-doesn’t-have-an-ai-processor,-but-your-next-one-might

Your current PC probably doesn’t have an AI processor, but your next one might

Intel's Core Ultra chips are some of the first x86 PC processors to include built-in NPUs. Software support will slowly follow.

Enlarge / Intel’s Core Ultra chips are some of the first x86 PC processors to include built-in NPUs. Software support will slowly follow.

Intel

When it announced the new Copilot key for PC keyboards last month, Microsoft declared 2024 “the year of the AI PC.” On one level, this is just an aspirational PR-friendly proclamation, meant to show investors that Microsoft intends to keep pushing the AI hype cycle that has put it in competition with Apple for the title of most valuable publicly traded company.

But on a technical level, it is true that PCs made and sold in 2024 and beyond will generally include AI and machine-learning processing capabilities that older PCs don’t. The main thing is the neural processing unit (NPU), a specialized block on recent high-end Intel and AMD CPUs that can accelerate some kinds of generative AI and machine-learning workloads more quickly (or while using less power) than the CPU or GPU could.

Qualcomm’s Windows PCs were some of the first to include an NPU, since the Arm processors used in most smartphones have included some kind of machine-learning acceleration for a few years now (Apple’s M-series chips for Macs all have them, too, going all the way back to 2020’s M1). But the Arm version of Windows is a insignificantly tiny sliver of the entire PC market; x86 PCs with Intel’s Core Ultra chips, AMD’s Ryzen 7040/8040-series laptop CPUs, or the Ryzen 8000G desktop CPUs will be many mainstream PC users’ first exposure to this kind of hardware.

Right now, even if your PC has an NPU in it, Windows can’t use it for much, aside from webcam background blurring and a handful of other video effects. But that’s slowly going to change, and part of that will be making it relatively easy for developers to create NPU-agnostic apps in the same way that PC game developers currently make GPU-agnostic games.

The gaming example is instructive, because that’s basically how Microsoft is approaching DirectML, its API for machine-learning operations. Though up until now it has mostly been used to run these AI workloads on GPUs, Microsoft announced last week that it was adding DirectML support for Intel’s Meteor Lake NPUs in a developer preview, starting in DirectML 1.13.1 and ONNX Runtime 1.17.

Though it will only run an unspecified “subset of machine learning models that have been targeted for support” and that some “may not run at all or may have high latency or low accuracy,” it opens the door to more third-party apps to start taking advantage of built-in NPUs. Intel says that Samsung is using Intel’s NPU and DirectML for facial recognition features in its photo gallery app, something that Apple also uses its Neural Engine for in macOS and iOS.

The benefits can be substantial, compared to running those workloads on a GPU or CPU.

“The NPU, at least in Intel land, will largely be used for power efficiency reasons,” Intel Senior Director of Technical Marketing Robert Hallock told Ars in an interview about Meteor Lake’s capabilities. “Camera segmentation, this whole background blurring thing… moving that to the NPU saves about 30 to 50 percent power versus running it elsewhere.”

Intel and Microsoft are both working toward a model where NPUs are treated pretty much like GPUs are today: developers generally target DirectX rather than a specific graphics card manufacturer or GPU architecture, and new features, one-off bug fixes, and performance improvements can all be addressed via GPU driver updates. Some GPUs run specific games better than others, and developers can choose to spend more time optimizing for Nvidia cards or AMD cards, but generally the model is hardware agnostic.

Similarly, Intel is already offering GPU-style driver updates for its NPUs. And Hallock says that Windows already essentially recognizes the NPU as “a graphics card with no rendering capability.”

Your current PC probably doesn’t have an AI processor, but your next one might Read More »

4chan-daily-challenge-sparked-deluge-of-explicit-ai-taylor-swift-images

4chan daily challenge sparked deluge of explicit AI Taylor Swift images

4chan daily challenge sparked deluge of explicit AI Taylor Swift images

4chan users who have made a game out of exploiting popular AI image generators appear to be at least partly responsible for the flood of fake images sexualizing Taylor Swift that went viral last month.

Graphika researchers—who study how communities are manipulated online—traced the fake Swift images to a 4chan message board that’s “increasingly” dedicated to posting “offensive” AI-generated content, The New York Times reported. Fans of the message board take part in daily challenges, Graphika reported, sharing tips to bypass AI image generator filters and showing no signs of stopping their game any time soon.

“Some 4chan users expressed a stated goal of trying to defeat mainstream AI image generators’ safeguards rather than creating realistic sexual content with alternative open-source image generators,” Graphika reported. “They also shared multiple behavioral techniques to create image prompts, attempt to avoid bans, and successfully create sexually explicit celebrity images.”

Ars reviewed a thread flagged by Graphika where users were specifically challenged to use Microsoft tools like Bing Image Creator and Microsoft Designer, as well as OpenAI’s DALL-E.

“Good luck,” the original poster wrote, while encouraging other users to “be creative.”

OpenAI has denied that any of the Swift images were created using DALL-E, while Microsoft has continued to claim that it’s investigating whether any of its AI tools were used.

Cristina López G., a senior analyst at Graphika, noted that Swift is not the only celebrity targeted in the 4chan thread.

“While viral pornographic pictures of Taylor Swift have brought mainstream attention to the issue of AI-generated non-consensual intimate images, she is far from the only victim,” López G. said. “In the 4chan community where these images originated, she isn’t even the most frequently targeted public figure. This shows that anyone can be targeted in this way, from global celebrities to school children.”

Originally, 404 Media reported that the harmful Swift images appeared to originate from 4chan and Telegram channels before spreading on X (formerly Twitter) and other social media. Attempting to stop the spread, X took the drastic step of blocking all searches for “Taylor Swift” for two days.

But López G. said that Graphika’s findings suggest that platforms will continue to risk being inundated with offensive content so long as 4chan users are determined to continue challenging each other to subvert image generator filters. Rather than expecting platforms to chase down the harmful content, López G. recommended that AI companies should get ahead of the problem, taking responsibility for outputs by paying attention to evolving tactics of toxic online communities reporting precisely how they’re getting around safeguards.

“These images originated from a community of people motivated by the ‘challenge’ of circumventing the safeguards of generative AI products, and new restrictions are seen as just another obstacle to ‘defeat,’” López G. said. “It’s important to understand the gamified nature of this malicious activity in order to prevent further abuse at the source.”

Experts told The Times that 4chan users were likely motivated to participate in these challenges for bragging rights and to “feel connected to a wider community.”

4chan daily challenge sparked deluge of explicit AI Taylor Swift images Read More »

chatgpt’s-new-@-mentions-bring-multiple-personalities-into-your-ai-convo

ChatGPT’s new @-mentions bring multiple personalities into your AI convo

team of rivals —

Bring different AI roles into the same chatbot conversation history.

Illustration of a man jugging at symbols.

Enlarge / With so many choices, selecting the perfect GPT can be confusing.

On Tuesday, OpenAI announced a new feature in ChatGPT that allows users to pull custom personalities called “GPTs” into any ChatGPT conversation with the @ symbol. It allows a level of quasi-teamwork within ChatGPT among expert roles that was previously impractical, making collaborating with a team of AI agents within OpenAI’s platform one step closer to reality.

You can now bring GPTs into any conversation in ChatGPT – simply type @ and select the GPT,” wrote OpenAI on the social media network X. “This allows you to add relevant GPTs with the full context of the conversation.”

OpenAI introduced GPTs in November as a way to create custom personalities or roles for ChatGPT to play. For example, users can build their own GPTs to focus on certain topics or certain skills. Paid ChatGPT subscribers can also freely download a host of GPTs developed by other ChatGPT users through the GPT Store.

Previously, if you wanted to share information between GPT profiles, you had to copy the text, select a new chat with the GPT, paste it, and explain the context of what the information means or what you want to do with it. Now, ChatGPT users can stay in the default ChatGPT window and bring in GPTs as needed without losing the history of the conversation.

For example, we created a “Wellness Guide” GPT that is crafted as an expert in human health conditions (of course, this being ChatGPT, always consult a human doctor if you’re having medical problems), and we created a “Canine Health Advisor” for dog-related health questions.

A screenshot of ChatGPT where we @-mentioned a human wellness advisor, then a dog advisor in the same conversation history.

Enlarge / A screenshot of ChatGPT where we @-mentioned a human wellness advisor, then a dog advisor in the same conversation history.

Benj Edwards

We started in a default ChatGPT chat, hit the @ symbol, then typed the first few letters of “Wellness” and selected it from a list. It filled out the rest. We asked a question about food poisoning in humans, and then we switched to the canine advisor in the same way with an @ symbol and asked about the dog.

Using this feature, you could alternatively consult, say, an “ad copywriter” GPT and an “editor” GPT—ask the copywriter to write some text, then rope in the editor GPT to check it, looking at it from a different angle. Different system prompts (the instructions that define a GPT’s personality) make for significant behavior differences.

We also tried swapping between GPT profiles that write software and others designed to consult on historical tech subjects. Interestingly, ChatGPT does not differentiate between GPTs as different personalities as you change. It will still say, “I did this earlier” when a different GPT is talking about a previous GPT’s output in the same conversation history. From its point of view, it’s just ChatGPT and not multiple agents.

From our vantage point, this feature seems to represent baby steps toward a future where GPTs, as independent agents, could work together as a team to fulfill more complex tasks directed by the user. Similar experiments have been done outside of OpenAI in the past (using API access), but OpenAI has so far resisted a more agentic model for ChatGPT. As we’ve seen (first with GPTs and now with this), OpenAI seems to be slowly angling toward that goal itself, but only time will tell if or when we see true agentic teamwork in a shipping service.

ChatGPT’s new @-mentions bring multiple personalities into your AI convo Read More »

chatgpt-is-leaking-passwords-from-private-conversations-of-its-users,-ars-reader-says

ChatGPT is leaking passwords from private conversations of its users, Ars reader says

OPENAI SPRINGS A LEAK —

Names of unpublished research papers, presentations, and PHP scripts also leaked.

OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen.

Getty Images

ChatGPT is leaking private conversations that include login credentials and other personal details of unrelated users, screenshots submitted by an Ars reader on Monday indicated.

Two of the seven screenshots the reader submitted stood out in particular. Both contained multiple pairs of usernames and passwords that appeared to be connected to a support system used by employees of a pharmacy prescription drug portal. An employee using the AI chatbot seemed to be troubleshooting problems that encountered while using the portal.

“Horrible, horrible, horrible”

“THIS is so f-ing insane, horrible, horrible, horrible, i cannot believe how poorly this was built in the first place, and the obstruction that is being put in front of me that prevents it from getting better,” the user wrote. “I would fire [redacted name of software] just for this absurdity if it was my choice. This is wrong.”

Besides the candid language and the credentials, the leaked conversation includes the name of the app the employee is troubleshooting and the store number where the problem occurred.

The entire conversation goes well beyond what’s shown in the redacted screenshot above. A link Ars reader Chase Whiteside included showed the chat conversation in its entirety. The URL disclosed additional credential pairs.

The results appeared Monday morning shortly after reader Whiteside had used ChatGPT for an unrelated query.

“I went to make a query (in this case, help coming up with clever names for colors in a palette) and when I returned to access moments later, I noticed the additional conversations,” Whiteside wrote in an email. “They weren’t there when I used ChatGPT just last night (I’m a pretty heavy user). No queries were made—they just appeared in my history, and most certainly aren’t from me (and I don’t think they’re from the same user either).”

Other conversations leaked to Whiteside include the name of a presentation someone was working on, details of an unpublished research proposal, and a script using the PHP programming language. The users for each leaked conversation appeared to be different and unrelated to each other. The conversation involving the prescription portal included the year 2020. Dates didn’t appear in the other conversations.

The episode, and others like it, underscore the wisdom of stripping out personal details from queries made to ChatGPT and other AI services whenever possible. Last March, ChatGPT maker OpenAI took the AI chatbot offline after a bug caused the site to show titles from one active user’s chat history to unrelated users.

In November, researchers published a paper reporting how they used queries to prompt ChatGPT into divulging email addresses, phone and fax numbers, physical addresses, and other private data that was included in material used to train the ChatGPT large language model.

Concerned about the possibility of proprietary or private data leakage, companies, including Apple, have restricted their employees’ use of ChatGPT and similar sites.

As mentioned in an article from December when multiple people found that Ubiquity’s UniFy devices broadcasted private video belonging to unrelated users, these sorts of experiences are as old as the Internet is. As explained in the article:

The precise root causes of this type of system error vary from incident to incident, but they often involve “middlebox” devices, which sit between the front- and back-end devices. To improve performance, middleboxes cache certain data, including the credentials of users who have recently logged in. When mismatches occur, credentials for one account can be mapped to a different account.

An OpenAI representative said the company was investigating the report.

ChatGPT is leaking passwords from private conversations of its users, Ars reader says Read More »