Stable Diffusion

It’s too expensive to fight every AI copyright battle, Getty CEO says

AI, Artificial Intelligence, copyright, fair use, generative ai, Getty Images, image generators, Policy, Stability AI, Stable Diffusion / Tim Belzer / May 29, 2025

Getty dumped “millions and millions” into just one AI copyright fight, CEO says.

In some ways, Getty Images has emerged as one of the most steadfast defenders of artists’ rights in AI copyright fights. Starting in 2022, when some of the most sophisticated image generators today first started testing new models offering better compositions, Getty banned AI-generated uploads to its service. And by the next year, Getty released a “socially responsible” image generator to prove it was possible to build a tool while rewarding artists, while suing an AI firm that refused to pay artists.

But in the years since, Getty Images CEO Craig Peters recently told CNBC that the media company has discovered that it’s simply way too expensive to fight every AI copyright battle.

According to Peters, Getty has dumped millions into just one copyright fight against Stability AI.

It’s “extraordinarily expensive,” Peters told CNBC. “Even for a company like Getty Images, we can’t pursue all the infringements that happen in one week.” He confirmed that “we can’t pursue it because the courts are just prohibitively expensive. We are spending millions and millions of dollars in one court case.”

Fair use?

Getty sued Stability AI in 2023, after the AI company’s image generator, Stable Diffusion, started spitting out images that replicated Getty’s famous trademark. In the complaint, Getty alleged that Stability AI had trained Stable Diffusion on “more than 12 million photographs from Getty Images’ collection, along with the associated captions and metadata, without permission from or compensation to Getty Images, as part of its efforts to build a competing business.”

As Getty saw it, Stability AI had plenty of opportunity to license the images from Getty and seemingly “chose to ignore viable licensing options and long-standing legal protections in pursuit of their stand-alone commercial interests.”

Stability AI, like all AI firms, has argued that AI training based on freely scraping images from the web is a “fair use” protected under copyright law.

So far, courts have not settled this debate, while many AI companies have urged judges and governments globally to settle it for the courts, for the sake of safeguarding national security and securing economic prosperity by winning the AI race. According to AI companies, paying artists to train on their works threatens to slow innovation, while rivals in China—who aren’t bound by US copyright law—continue scraping the web to advance their models.

Peters called out Stability AI for adopting this stance, arguing that rightsholders shouldn’t have to spend millions fighting against a claim that paying out licensing fees would “kill innovation.” Some critics have likened AI firms’ argument to a defense of forced labor, suggesting the US would never value “innovation” about human rights, and the same logic should follow for artists’ rights.

“We’re battling a world of rhetoric,” Peters said, alleging that these firms “are taking copyrighted material to develop their powerful AI models under the guise of innovation and then ‘just turning those services right back on existing commercial markets.'”

To Peters, that’s simply “disruption under the notion of ‘move fast and break things,’” and Getty believes “that’s unfair competition.”

“We’re not against competition,” Peters said. “There’s constant new competition coming in all the time from new technologies or just new companies. But that [AI scraping] is just unfair competition, that’s theft.”

Broader Internet backlash over AI firms’ rhetoric

Peters’ comments come after a former Meta head of global affairs, Nick Clegg, received Internet backlash this week after making the same claim that AI firms raise time and again: that asking artists for consent for AI training would “kill” the AI industry, The Verge reported.

According to Clegg, the only viable solution to the tension between artists and AI companies would be to give artists ways to opt out of training, which Stability AI notably started doing in 2022.

“Quite a lot of voices say, ‘You can only train on my content, [if you] first ask,'” Clegg reportedly said. “And I have to say that strikes me as somewhat implausible because these systems train on vast amounts of data.”

On X, the CEO of Fairly Trained—a nonprofit that supports artists’ fight against nonconsensual AI training—Ed Newton-Rex (who is also a former Stability AI vice president of audio) pushed back on Clegg’s claim in a post viewed by thousands.

“Nick Clegg is wrong to say artists’ demands on AI & copyright are unworkable,” Newton-Rex said. “Every argument he makes could equally have been made about Napster:” First, that “the tech is out there,” second that “licensing takes time,” and third that, “we can’t control what other countries do.” If Napster’s operations weren’t legal, neither should AI firms’ training, Newton-Rex said, writing, “These are not reasons not to uphold the law and treat creators fairly.”

Other social media users mocked Clegg with jokes meant to destroy AI firms’ favorite go-to argument against copyright claims.

“Blackbeard says asking sailors for permission to board and loot their ships would ‘kill’ the piracy on the high seas industry,” an X user with the handle “Seanchuckle” wrote.

On Bluesky, a trial lawyer, Max Kennerly, effectively satirized Clegg and the whole AI industry by writing, “Our product creates such little value that it is simply not viable in the marketplace, not even as a niche product. Therefore, we must be allowed to unilaterally extract value from the work of others and convert that value into our profits.”

Other ways to fight

Getty plans to continue fighting against the AI firms that are impressing this “world of rhetoric” on judges and lawmakers, but court battles will likely remain few and far between due to the price tag, Peters has suggested.

There are other ways to fight, though. In a submission last month, Getty pushed the Trump administration to reject “those seeking to weaken US copyright protections by creating a ‘right to learn’ exemption” for AI firms when building Trump’s AI Action Plan.

“US copyright laws are not obstructing the path to continued AI progress,” Getty wrote. “Instead, US copyright laws are a path to sustainable AI and a path that broadens society’s participation in AI’s economic benefits, which reduces downstream economic burdens on the Federal, State and local governments. US copyright laws provide incentives to invest and create.”

In Getty’s submission, the media company emphasized that requiring consent for AI training is not an “overly restrictive” control on AI’s development such as those sought by stauncher critics “that could harm US competitiveness, national security or societal advances such as curing cancer.” And Getty claimed it also wasn’t “requesting protection from existing and new sources of competition,” despite the lawsuit’s suggestion that Stability AI and other image generators threaten to replace Getty’s image library in the market.

What Getty said it hopes Trump’s AI plan will ensure is a world where the rights and opportunities of rightsholders are not “usurped for the commercial benefits” of AI companies.

In 2023, when Getty was first suing Stability AI, Peters suggested that, otherwise, allowing AI firms to widely avoid paying artists would create “a sad world,” perhaps disincentivizing creativity.

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

It’s too expensive to fight every AI copyright battle, Getty CEO says Read More »

New AI text diffusion models break speed barriers by pulling words from noise

AI, AI diffusion, Biz & IT, chatbots, chatgpt, chatgtp, dall-e, diffusion, image diffusion, image synthesis, language diffusion models, large language models, machine learning, MidJourney, Stable Diffusion, Tech, text diffusion, text synthesis / Rejus Almole / February 28, 2025

These diffusion models maintain performance faster than or comparable to similarly sized conventional models. LLaDA’s researchers report their 8 billion parameter model performs similarly to LLaMA3 8B across various benchmarks, with competitive results on tasks like MMLU, ARC, and GSM8K.

However, Mercury claims dramatic speed improvements. Their Mercury Coder Mini scores 88.0 percent on HumanEval and 77.1 percent on MBPP—comparable to GPT-4o Mini—while reportedly operating at 1,109 tokens per second compared to GPT-4o Mini’s 59 tokens per second. This represents roughly a 19x speed advantage over GPT-4o Mini while maintaining similar performance on coding benchmarks.

Mercury’s documentation states its models run “at over 1,000 tokens/sec on Nvidia H100s, a speed previously possible only using custom chips” from specialized hardware providers like Groq, Cerebras, and SambaNova. When compared to other speed-optimized models, the claimed advantage remains significant—Mercury Coder Mini is reportedly about 5.5x faster than Gemini 2.0 Flash-Lite (201 tokens/second) and 18x faster than Claude 3.5 Haiku (61 tokens/second).

Opening a potential new frontier in LLMs

Diffusion models do involve some trade-offs. They typically need multiple forward passes through the network to generate a complete response, unlike traditional models that need just one pass per token. However, because diffusion models process all tokens in parallel, they achieve higher throughput despite this overhead.

Inception thinks the speed advantages could impact code completion tools where instant response may affect developer productivity, conversational AI applications, resource-limited environments like mobile applications, and AI agents that need to respond quickly.

If diffusion-based language models maintain quality while improving speed, they might change how AI text generation develops. So far, AI researchers have been open to new approaches.

Independent AI researcher Simon Willison told Ars Technica, “I love that people are experimenting with alternative architectures to transformers, it’s yet another illustration of how much of the space of LLMs we haven’t even started to explore yet.”

On X, former OpenAI researcher Andrej Karpathy wrote about Inception, “This model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!”

Questions remain about whether larger diffusion models can match the performance of models like GPT-4o and Claude 3.7 Sonnet, and if the approach can handle increasingly complex simulated reasoning tasks. For now, these models offer an alternative for smaller AI language models that doesn’t seem to sacrifice capability for speed.

You can try Mercury Coder yourself on Inception’s demo site, and you can download code for LLaDA or try a demo on Hugging Face.

New AI text diffusion models break speed barriers by pulling words from noise Read More »

Procreate defies AI trend, pledges “no generative AI” in its illustration app

adobe, AI, AI art, AI backlash, AI-generated images, Biz & IT, firefly, image synthesis, James Cuda, machine learning, Procreate, Social Media, Stable Diffusion, techlash, X / Paul Patrick / August 20, 2024

Political pixels —

Procreate CEO: “I really f—ing hate generative AI.”

Benj Edwards – Aug 20, 2024 4: 52 pm UTC

Enlarge / Still of Procreate CEO James Cuda from a video posted to X.

On Sunday, Procreate announced that it will not incorporate generative AI into its popular iPad illustration app. The decision comes in response to an ongoing backlash from some parts of the art community, which has raised concerns about the ethical implications and potential consequences of AI use in creative industries.

“Generative AI is ripping the humanity out of things,” Procreate wrote on its website. “Built on a foundation of theft, the technology is steering us toward a barren future.”

In a video posted on X, Procreate CEO James Cuda laid out his company’s stance, saying, “We’re not going to be introducing any generative AI into our products. I don’t like what’s happening to the industry, and I don’t like what it’s doing to artists.”

Cuda’s sentiment echoes the fears of some digital artists who feel that AI image synthesis models, often trained on content without consent or compensation, threaten their livelihood and the authenticity of creative work. That’s not a universal sentiment among artists, but AI image synthesis is often a deeply divisive subject on social media, with some taking starkly polarized positions on the topic.

Procreate CEO James Cuda lays out his argument against generative AI in a video posted to X.

Cuda’s video plays on that polarization with clear messaging against generative AI. His statement reads as follows:

You’ve been asking us about AI. You know, I usually don’t like getting in front of the camera. I prefer that our products speak for themselves. I really fucking hate generative AI. I don’t like what’s happening in the industry and I don’t like what it’s doing to artists. We’re not going to be introducing any generative AI into out products. Our products are always designed and developed with the idea that a human will be creating something. You know, we don’t exactly know where this story’s gonna go or how it ends, but we believe that we’re on the right path supporting human creativity.

The debate over generative AI has intensified among some outspoken artists as more companies integrate these tools into their products. Dominant illustration software provider Adobe has tried to avoid ethical concerns by training its Firefly AI models on licensed or public domain content, but some artists have remained skeptical. Adobe Photoshop currently includes a “Generative Fill” feature powered by image synthesis, and the company is also experimenting with video synthesis models.

The backlash against image and video synthesis is not solely focused on creative app developers. Hardware manufacturer Wacom and game publisher Wizards of the Coast have faced criticism and issued apologies after using AI-generated content in their products. Toys “R” Us also faced a negative reaction after debuting an AI-generated commercial. Companies are still grappling with balancing the potential benefits of generative AI with the ethical concerns it raises.

Artists and critics react

A partial screenshot of Procreate's AI website captured on August 20, 2024. — Enlarge / A partial screenshot of Procreate’s AI website captured on August 20, 2024.

So far, Procreate’s anti-AI announcement has been met with a largely positive reaction in replies to its social media post. In a widely liked comment, artist Freya Holmér wrote on X, “this is very appreciated, thank you.”

Some of the more outspoken opponents of image synthesis also replied favorably to Procreate’s move. Karla Ortiz, who is a plaintiff in a lawsuit against AI image-generator companies, replied to Procreate’s video on X, “Whatever you need at any time, know I’m here!! Artists support each other, and also support those who allow us to continue doing what we do! So thank you for all you all do and so excited to see what the team does next!”

Artist RJ Palmer, who stoked the first major wave of AI art backlash with a viral tweet in 2022, also replied to Cuda’s video statement, saying, “Now thats the way to send a message. Now if only you guys could get a full power competitor to [Photoshop] on desktop with plugin support. Until someone can build a real competitor to high level [Photoshop] use, I’m stuck with it.”

A few pro-AI users also replied to the X post, including AI-augmented artist Claire Silver, who uses generative AI as an accessibility tool. She wrote on X, “Most of my early work is made with a combination of AI and Procreate. 7 years ago, before text to image was really even a thing. I loved procreate because it used tech to boost accessibility. Like AI, it augmented trad skill to allow more people to create. No rules, only tools.”

Since AI image synthesis continues to be a highly charged subject among some artists, reaffirming support for human-centric creativity could be an effective differentiated marketing move for Procreate, which currently plays underdog to creativity app giant Adobe. While some may prefer to use AI tools, in an (ideally healthy) app ecosystem with personal choice in illustration apps, people can follow their conscience.

Procreate’s anti-AI stance is slightly risky because it might also polarize part of its user base—and if the company changes its mind about including generative AI in the future, it will have to walk back its pledge. But for now, Procreate is confident in its decision: “In this technological rush, this might make us an exception or seem at risk of being left behind,” Procreate wrote. “But we see this road less traveled as the more exciting and fruitful one for our community.”

Procreate defies AI trend, pledges “no generative AI” in its illustration app Read More »

Artists claim “big” win in copyright suit fighting AI image generators

AI, AI image generators, Artificial Intelligence, copyright infringement, copyright law, deviantART, generative ai, LAION-5b, MidJourney, Policy, runway ai, Stability AI, Stable Diffusion / Paul Patrick / August 14, 2024

Back to the drawing board —

Artists prepare to take on AI image generators as copyright suit proceeds

Ashley Belanger – Aug 14, 2024 9: 09 pm UTC

Artists defending a class-action lawsuit are claiming a major win this week in their fight to stop the most sophisticated AI image generators from copying billions of artworks to train AI models and replicate their styles without compensating artists.

In an order on Monday, US district judge William Orrick denied key parts of motions to dismiss from Stability AI, Midjourney, Runway AI, and DeviantArt. The court will now allow artists to proceed with discovery on claims that AI image generators relying on Stable Diffusion violate both the Copyright Act and the Lanham Act, which protects artists from commercial misuse of their names and unique styles.

“We won BIG,” an artist plaintiff, Karla Ortiz, wrote on X (formerly Twitter), celebrating the order. “Not only do we proceed on our copyright claims,” but “this order also means companies who utilize” Stable Diffusion models and LAION-like datasets that scrape artists’ works for AI training without permission “could now be liable for copyright infringement violations, amongst other violations.”

Lawyers for the artists, Joseph Saveri and Matthew Butterick, told Ars that artists suing “consider the Court’s order a significant step forward for the case,” as “the Court allowed Plaintiffs’ core copyright-infringement claims against all four defendants to proceed.”

Stability AI was the only company that responded to Ars’ request to comment, but it declined to comment.

Artists prepare to defend their livelihoods from AI

To get to this stage of the suit, artists had to amend their complaint to better explain exactly how AI image generators work to allegedly train on artists’ images and copy artists’ styles.

For example, they were told that if they “contend Stable Diffusion contains ‘compressed copies’ of the Training Images, they need to define ‘compressed copies’ and explain plausible facts in support. And if plaintiffs’ compressed copies theory is based on a contention that Stable Diffusion contains mathematical or statistical methods that can be carried out through algorithms or instructions in order to reconstruct the Training Images in whole or in part to create the new Output Images, they need to clarify that and provide plausible facts in support,” Orrick wrote.

To keep their fight alive, the artists pored through academic articles to support their arguments that “Stable Diffusion is built to a significant extent on copyrighted works and that the way the product operates necessarily invokes copies or protected elements of those works.” Orrick agreed that their amended complaint made plausible inferences that “at this juncture” is enough to support claims “that Stable Diffusion by operation by end users creates copyright infringement and was created to facilitate that infringement by design.”

“Specifically, the Court found Plaintiffs’ theory that image-diffusion models like Stable Diffusion contain compressed copies of their datasets to be plausible,” Saveri and Butterick’s statement to Ars said. “The Court also found it plausible that training, distributing, and copying such models constitute acts of copyright infringement.”

Not all of the artists’ claims survived, with Orrick granting motions to dismiss claims alleging that AI companies removed content management information from artworks in violation of the Digital Millennium Copyright Act (DMCA). Because artists failed to show evidence of defendants altering or stripping this information, they must permanently drop the DMCA claims.

Part of Orrick’s decision on the DMCA claims, however, indicates that the legal basis for dismissal is “unsettled,” with Orrick simply agreeing with Stability AI’s unsettled argument that “because the output images are admittedly not identical to the Training Images, there can be no liability for any removal of CMI that occurred during the training process.”

Ortiz wrote on X that she respectfully disagreed with that part of the decision but expressed enthusiasm that the court allowed artists to proceed with false endorsement claims, alleging that Midjourney violated the Lanham Act.

Five artists successfully argued that because “their names appeared on the list of 4,700 artists posted by Midjourney’s CEO on Discord” and that list was used to promote “the various styles of artistic works its AI product could produce,” this plausibly created confusion over whether those artists had endorsed Midjourney.

“Whether or not a reasonably prudent consumer would be confused or misled by the Names List and showcase to conclude that the included artists were endorsing the Midjourney product can be tested at summary judgment,” Orrick wrote. “Discovery may show that it is or that is it not.”

While Orrick agreed with Midjourney that “plaintiffs have no protection over ‘simple, cartoony drawings’ or ‘gritty fantasy paintings,'” artists were able to advance a “trade dress” claim under the Lanham Act, too. This is because Midjourney allegedly “allows users to create works capturing the ‘trade dress of each of the Midjourney Named Plaintiffs [that] is inherently distinctive in look and feel as used in connection with their artwork and art products.'”

As discovery proceeds in the case, artists will also have an opportunity to amend dismissed claims of unjust enrichment. According to Orrick, their next amended complaint will be their last chance to prove that AI companies have “deprived plaintiffs ‘the benefit of the value of their works.'”

Saveri and Butterick confirmed that “though the Court dismissed certain supplementary claims, Plaintiffs’ central claims will now proceed to discovery and trial.” On X, Ortiz suggested that the artists’ case is “now potentially one of THE biggest copyright infringement and trade dress cases ever!”

“Looking forward to the next stage of our fight!” Ortiz wrote.

Artists claim “big” win in copyright suit fighting AI image generators Read More »

FLUX: This new AI image generator is eerily good at creating human hands

AI, AI image generation, AI image generator, Andreas Blattman, Biz & IT, Black Forest Labs, Dominik Lorenz, FLUX.1, image synthesis, machine learning, Patrick Esser, Robin Rombach, Stability AI, Stable Diffusion, Stable Diffusion 3 / Kris Guyer / August 2, 2024

five-finger salute —

FLUX.1 is the open-weights heir apparent to Stable Diffusion, turning text into images.

Benj Edwards – Aug 2, 2024 5: 47 pm UTC

Enlarge / AI-generated image by FLUX.1 dev: “A beautiful queen of the universe holding up her hands, face in the background.”

FLUX.1

On Thursday, AI-startup Black Forest Labs announced the launch of its company and the release of its first suite of text-to-image AI models, called FLUX.1. The German-based company, founded by researchers who developed the technology behind Stable Diffusion and invented the latent diffusion technique, aims to create advanced generative AI for images and videos.

The launch of FLUX.1 comes about seven weeks after Stability AI’s troubled release of Stable Diffusion 3 Medium in mid-June. Stability AI’s offering faced widespread criticism among image-synthesis hobbyists for its poor performance in generating human anatomy, with users sharing examples of distorted limbs and bodies across social media. That problematic launch followed the earlier departure of three key engineers from Stability AI—Robin Rombach, Andreas Blattmann, and Dominik Lorenz—who went on to found Black Forest Labs along with latent diffusion co-developer Patrick Esser and others.

Black Forest Labs launched with the release of three FLUX.1 text-to-image models: a high-end commercial “pro” version, a mid-range “dev” version with open weights for non-commercial use, and a faster open-weights “schnell” version (“schnell” means quick or fast in German). Black Forest Labs claims its models outperform existing options like Midjourney and DALL-E in areas such as image quality and adherence to text prompts.

AI-generated image by FLUX.1 dev: “A close-up photo of a pair of hands holding a plate full of pickles.”

FLUX.1
AI-generated image by FLUX.1 dev: A hand holding up five fingers with a starry background.

FLUX.1
AI-generated image by FLUX.1 dev: “An Ars Technica reader sitting in front of a computer monitor. The screen shows the Ars Technica website.”

FLUX.1
AI-generated image by FLUX.1 dev: “a boxer posing with fists raised, no gloves.”

FLUX.1
AI-generated image by FLUX.1 dev: “An advertisement for ‘Frosted Prick’ cereal.”

FLUX.1
AI-generated image of a happy woman in a bakery baking a cake by FLUX.1 dev.

FLUX.1
AI-generated image by FLUX.1 dev: “An advertisement for ‘Marshmallow Menace’ cereal.”

FLUX.1
AI-generated image of “A handsome Asian influencer on top of the Empire State Building, instagram” by FLUX.1 dev.

FLUX.1

In our experience, the outputs of the two higher-end FLUX.1 models are generally comparable with OpenAI’s DALL-E 3 in prompt fidelity, with photorealism that seems close to Midjourney 6. They represent a significant improvement over Stable Diffusion XL, the team’s last major release under Stability (if you don’t count SDXL Turbo).

The FLUX.1 models use what the company calls a “hybrid architecture” combining transformer and diffusion techniques, scaled up to 12 billion parameters. Black Forest Labs said it improves on previous diffusion models by incorporating flow matching and other optimizations.

FLUX.1 seems competent at generating human hands, which was a weak spot in earlier image-synthesis models like Stable Diffusion 1.5 due to a lack of training images that focused on hands. Since those early days, other AI image generators like Midjourney have mastered hands as well, but it’s notable to see an open-weights model that renders hands relatively accurately in various poses.

We downloaded the weights file to the FLUX.1 dev model from GitHub, but at 23GB, it won’t fit in the 12GB VRAM of our RTX 3060 card, so it will need quantization to run locally (reducing its size), which reportedly (through chatter on Reddit) some people have already had success with.

Instead, we experimented with FLUX.1 models on AI cloud-hosting platforms Fal and Replicate, which cost money to use, though Fal offers some free credits to start.

Black Forest looks ahead

Black Forest Labs may be a new company, but it’s already attracting funding from investors. It recently closed a $31 million Series Seed funding round led by Andreessen Horowitz, with additional investments from General Catalyst and MätchVC. The company also brought on high-profile advisers, including entertainment executive and former Disney President Michael Ovitz and AI researcher Matthias Bethge.

“We believe that generative AI will be a fundamental building block of all future technologies,” the company stated in its announcement. “By making our models available to a wide audience, we want to bring its benefits to everyone, educate the public and enhance trust in the safety of these models.”

AI-generated image by FLUX.1 dev: A cat in a car holding a can of beer that reads, ‘AI Slop.’

FLUX.1
AI-generated image by FLUX.1 dev: Mickey Mouse and Spider-Man singing to each other.

FLUX.1
AI-generated image by FLUX.1 dev: “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.”

FLUX.1
AI-generated image of a flaming cheeseburger created by FLUX.1 dev.

FLUX.1
AI-generated image by FLUX.1 dev: “Will Smith eating spaghetti.”

FLUX.1
AI-generated image by FLUX.1 dev: “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting. The screen reads ‘Ars Technica.'”

FLUX.1
AI-generated image by FLUX.1 dev: “An advertisement for ‘Burt’s Grenades’ cereal.”

FLUX.1
AI-generated image by FLUX.1 dev: “A close-up photo of a pair of hands holding a plate that contains a portrait of the queen of the universe”

FLUX.1

Speaking of “trust and safety,” the company did not mention where it obtained the training data that taught the FLUX.1 models how to generate images. Judging by the outputs we could produce with the model that included depictions of copyrighted characters, Black Forest Labs likely used a huge unauthorized image scrape of the Internet, possibly collected by LAION, an organization that collected the datasets that trained Stable Diffusion. This is speculation at this point. While the underlying technological achievement of FLUX.1 is notable, it feels likely that the team is playing fast and loose with the ethics of “fair use” image scraping much like Stability AI did. That practice may eventually attract lawsuits like those filed against Stability AI.

Though text-to-image generation is Black Forest’s current focus, the company plans to expand into video generation next, saying that FLUX.1 will serve as the foundation of a new text-to-video model in development, which will compete with OpenAI’s Sora, Runway’s Gen-3 Alpha, and Kuaishou’s Kling in a contest to warp media reality on demand. “Our video models will unlock precise creation and editing at high definition and unprecedented speed,” the Black Forest announcement claims.

FLUX: This new AI image generator is eerily good at creating human hands Read More »

Ridiculed Stable Diffusion 3 release excels at AI-generated body horror

AI, AI image generator, Biz & IT, body horror, image synthesis, machine learning, Stability AI, Stable Diffusion, Stable Diffusion 3 / Kris Guyer / June 12, 2024

unstable diffusion —

Users react to mangled SD3 generations and ask, “Is this release supposed to be a joke?”

Benj Edwards – Jun 12, 2024 7: 26 pm UTC

Enlarge / An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

On Wednesday, Stability AI released weights for Stable Diffusion 3 Medium, an AI image-synthesis model that turns text prompts into AI-generated images. Its arrival has been ridiculed online, however, because it generates images of humans in a way that seems like a step backward from other state-of-the-art image-synthesis models like Midjourney or DALL-E 3. As a result, it can churn out wild anatomically incorrect visual abominations with ease.

A thread on Reddit, titled, “Is this release supposed to be a joke? [SD3-2B],” details the spectacular failures of SD3 Medium at rendering humans, especially human limbs like hands and feet. Another thread, titled, “Why is SD3 so bad at generating girls lying on the grass?” shows similar issues, but for entire human bodies.

Hands have traditionally been a challenge for AI image generators due to lack of good examples in early training data sets, but more recently, several image-synthesis models seemed to have overcome the issue. In that sense, SD3 appears to be a huge step backward for the image-synthesis enthusiasts that gather on Reddit—especially compared to recent Stability releases like SD XL Turbo in November.

“It wasn’t too long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!” wrote one Reddit user.

An AI-generated image created using Stable Diffusion 3 Medium.
An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.
An AI-generated image created using Stable Diffusion 3 that shows mangled hands.
An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.
An AI-generated image created using Stable Diffusion 3 that shows mangled hands.
An AI-generated SD3 Medium image a Reddit user made with the prompt “woman wearing a dress on the beach.”
An AI-generated SD3 Medium image a Reddit user made with the prompt “photograph of a person napping in a living room.”

AI image fans are so far blaming the Stable Diffusion 3’s anatomy fails on Stability’s insistence on filtering out adult content (often called “NSFW” content) from the SD3 training data that teaches the model how to generate images. “Believe it or not, heavily censoring a model also gets rid of human anatomy, so… that’s what happened,” wrote one Reddit user in the thread.

Basically, any time a user prompt homes in on a concept that isn’t represented well in the AI model’s training dataset, the image-synthesis model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying.

The release of Stable Diffusion 2.0 in 2022 suffered from similar problems in depicting humans well, and AI researchers soon discovered that censoring adult content that contains nudity can severely hamper an AI model’s ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by strongly filtering NSFW content.

Another issue that can occur during model pre-training is that sometimes the NSFW filter researchers use remove adult images from the dataset is too picky, accidentally removing images that might not be offensive and depriving the model of depictions of humans in certain situations. “[SD3] works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw,” wrote one Redditor on the topic.

Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those being reported by others. For example, the prompt “a man showing his hands” returned an image of a man holding up two giant-sized backward hands, although each hand at least had five fingers.

A SD3 Medium example we generated with the prompt “A woman lying on the beach.”
A SD3 Medium example we generated with the prompt “A man showing his hands.”

Stability AI
A SD3 Medium example we generated with the prompt “A woman showing her hands.”

Stability AI
A SD3 Medium example we generated with the prompt “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.”
A SD3 Medium example we generated with the prompt “A cat in a car holding a can of beer.”

Stability first announced Stable Diffusion 3 in February, and the company has planned to make it available in a variety of different model sizes. Today’s release is for the “Medium” version, which is a 2 billion-parameter model. In addition to the weights being available on Hugging Face, they are also available for experimentation through the company’s Stability Platform. The weights are available for download and use for free under a non-commercial license only.

Soon after its February announcement, delays in releasing the SD3 model weights inspired rumors that the release was being held back due to technical issues or mismanagement. Stability AI as a company fell into a tailspin recently with the resignation of its founder and CEO, Emad Mostaque, in March and then a series of layoffs. Just prior to that, three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz—left the company. And its troubles go back even farther, with news of the company’s dire financial position lingering since 2023.

To some Stable Diffusion fans, the failures with Stable Diffusion 3 Medium are a visual manifestation of the company’s mismanagement—and an obvious sign of things falling apart. Although the company has not filed for bankruptcy, some users made dark jokes about the possibility after seeing SD3 Medium:

“I guess now they can go bankrupt in a safe and ethically [sic] way, after all.”

Ridiculed Stable Diffusion 3 release excels at AI-generated body horror Read More »

“CSAM generated by AI is still CSAM,” DOJ says after rare arrest

child abuse, csam, Instagram, LAION, Meta, Policy, runway ml, Stability AI, Stable Diffusion, stable diffusion 1.5, telegram, us department of justice / Paul Patrick / May 21, 2024

The US Department of Justice has started cracking down on the use of AI image generators to produce child sexual abuse materials (CSAM).

On Monday, the DOJ arrested Steven Anderegg, a 42-year-old “extremely technologically savvy” Wisconsin man who allegedly used Stable Diffusion to create “thousands of realistic images of prepubescent minors,” which were then distributed on Instagram and Telegram.

The cops were tipped off to Anderegg’s alleged activities after Instagram flagged direct messages that were sent on Anderegg’s Instagram account to a 15-year-old boy. Instagram reported the messages to the National Center for Missing and Exploited Children (NCMEC), which subsequently alerted law enforcement.

During the Instagram exchange, the DOJ found that Anderegg sent sexually explicit AI images of minors soon after the teen made his age known, alleging that “the only reasonable explanation for sending these images was to sexually entice the child.”

According to the DOJ’s indictment, Anderegg is a software engineer with “professional experience working with AI.” Because of his “special skill” in generative AI (GenAI), he was allegedly able to generate the CSAM using a version of Stable Diffusion, “along with a graphical user interface and special add-ons created by other Stable Diffusion users that specialized in producing genitalia.”

After Instagram reported Anderegg’s messages to the minor, cops seized Anderegg’s laptop and found “over 13,000 GenAI images, with hundreds—if not thousands—of these images depicting nude or semi-clothed prepubescent minors lasciviously displaying or touching their genitals” or “engaging in sexual intercourse with men.”

In his messages to the teen, Anderegg seemingly “boasted” about his skill in generating CSAM, the indictment said. The DOJ alleged that evidence from his laptop showed that Anderegg “used extremely specific and explicit prompts to create these images,” including “specific ‘negative’ prompts—that is, prompts that direct the GenAI model on what not to include in generated content—to avoid creating images that depict adults.” These go-to prompts were stored on his computer, the DOJ alleged.

Anderegg is currently in federal custody and has been charged with production, distribution, and possession of AI-generated CSAM, as well as “transferring obscene material to a minor under the age of 16,” the indictment said.

Because the DOJ suspected that Anderegg intended to use the AI-generated CSAM to groom a minor, the DOJ is arguing that there are “no conditions of release” that could prevent him from posing a “significant danger” to his community while the court mulls his case. The DOJ warned the court that it’s highly likely that any future contact with minors could go unnoticed, as Anderegg is seemingly tech-savvy enough to hide any future attempts to send minors AI-generated CSAM.

“He studied computer science and has decades of experience in software engineering,” the indictment said. “While computer monitoring may address the danger posed by less sophisticated offenders, the defendant’s background provides ample reason to conclude that he could sidestep such restrictions if he decided to. And if he did, any reoffending conduct would likely go undetected.”

If convicted of all four counts, he could face “a total statutory maximum penalty of 70 years in prison and a mandatory minimum of five years in prison,” the DOJ said. Partly because of “special skill in GenAI,” the DOJ—which described its evidence against Anderegg as “strong”—suggested that they may recommend a sentencing range “as high as life imprisonment.”

Announcing Anderegg’s arrest, Deputy Attorney General Lisa Monaco made it clear that creating AI-generated CSAM is illegal in the US.

“Technology may change, but our commitment to protecting children will not,” Monaco said. “The Justice Department will aggressively pursue those who produce and distribute child sexual abuse material—or CSAM—no matter how that material was created. Put simply, CSAM generated by AI is still CSAM, and we will hold accountable those who exploit AI to create obscene, abusive, and increasingly photorealistic images of children.”

“CSAM generated by AI is still CSAM,” DOJ says after rare arrest Read More »

After AI-generated porn report, Washington Lottery pulls down interactive web app

AI, dall-e, Images, lottery, porn, Stable Diffusion, Washington State / Kris Guyer / April 4, 2024

You could be a winner! —

User says promo site put her uploaded selfie on a topless woman’s body.

Kyle Orland – Apr 4, 2024 7: 17 pm UTC

A user of the Washington Lottery's — Enlarge / A user of the Washington Lottery’s “Test Drive a Win” website says it used AI to generate (the unredacted version of) this image with her face on a topless body.

The Washington State Lottery has taken down a promotional AI-powered web app after a local mother reported that the site generated an image with her face on the body of a topless woman.

The lottery’s “Test Drive a Win” website was designed to help visitors visualize various dream vacations they could pay for with their theoretical lottery winnings. The site included the ability to upload a headshot that would be integrated into an AI-generated tableau of what you might look like on that vacation.

But Megan (last name not given), a 50-year-old from Olympia suburb Tumwater, told conservative Seattle radio host Jason Rantz that the image of her “swim with the sharks” dream vacation on the website showed her face atop a woman sitting on a bed with her breasts exposed. The background of the AI-generated image seems to show the bed in some sort of aquarium, complete with fish floating through the air and sprawling undersea flora sitting awkwardly behind the pillows.

The corner of the image features the Washington Lottery logo.

“Our tax dollars are paying for that! I was completely shocked. It’s disturbing to say the least,” Megan told Rantz. “I also think whoever was responsible for it should be fired.”

“We don’t want something like this purported event to happen again”

Enlarge / The non-functional “Test Drive a Win” website as it appeared Thursday.

In a statement provided to Ars Technica, a Washington Lottery spokesperson said that the lottery “worked closely with the developers of the AI platform to establish strict parameters to govern image creation.” Despite this, the spokesperson said they were notified earlier this week that “a single user of the AI platform was purportedly provided an image that did not adhere to those guidelines.”

Despite what the spokesperson said were “thousands” of inoffensive images that the site generated in over a month, the spokesperson said that “one purported user is too many and as a result we have shut down the site” as of Tuesday.

The spokesperson did not respond to specific questions about which AI models or third-party vendors may have been used to create the site or on the specific safeguards that were crafted in an attempt to prevent results like the one reported by Megan.

Speaking to Rantz, a lottery spokesperson said the organization had “agreed to a comprehensive set of rules” for the site’s AI images, “including that people in images be fully clothed.” Following the report of the topless image, the spokesperson said they “had the developers check all the parameters for the platform.” And while they were “comfortable with the settings,” the spokesperson told Rantz they “chose to take down the site out of an abundance of caution, as we don’t want something like this purported event to happen again.”

Not a quick fix?

On his radio show, Rantz expressed surprise that the lottery couldn’t keep the site operational after rejiggering the AI’s safety settings. “In my head I was thinking, well, presumably once they heard about this they went back to the backend guidelines and just made sure it said, ‘Hey, no breasts, no full-frontal nudity,’ those kinds of things, and then they fixed it, and then they went on with their day,” Rantz said.

But it might not be that simple to effectively rein in the endless variety of visual output an AI model can generate. While models like Stable Diffusion and DALL-E have filters in place to prevent the generation of sexual or violent images, researchers have found that those models still responded to problematic prompts by generating images that were judged as “unsafe” by an image classifier a significant minority of the time. Malicious users can also use prompt-engineering tricks to get around these built-in safeguards when using popular text-based image-generation models.

We’ve seen these kinds of AI image-safety issues blow back on major corporations, too, as when Facebook’s AI sticker generator put weapons in the hands of children’s cartoon characters. More recently, a Microsoft engineer publicly accused the company’s Copilot image-generation tool of randomly creating violent and sexual imagery even after the team was warned of the issue.

The Washington Lottery’s AI issue comes a week after a report found a New York City government chatbot confabulating incorrect advice about city laws and regulations. “It’s wrong in some areas and we gotta fix it,” New York City Mayor Eric Adams said this week. “Any time you use technology, you need to put it in the real environment to iron out the kinks. You can’t live in a lab. You can’t stay in a lab forever.”

After AI-generated porn report, Washington Lottery pulls down interactive web app Read More »

Image-scraping Midjourney bans rival AI firm for scraping images

AI, Biz & IT, image synthesis, machine learning, MidJourney, Stable Diffusion / Rejus Almole / March 12, 2024

Irony lives —

Midjourney pins blame for 24-hour outage on “bot-net like” activity from Stability AI employee.

Benj Edwards – Mar 11, 2024 9: 42 pm UTC

A burglar with flash light and papers in business office. Exactly like scraping files from Discord. — Enlarge / A burglar with a flashlight and papers in a business office—exactly like scraping files from Discord.

On Wednesday, Midjourney banned all employees from image synthesis rival Stability AI from its service indefinitely after it detected “botnet-like” activity suspected to be a Stability employee attempting to scrape prompt and image pairs in bulk. Midjourney advocate Nick St. Pierre tweeted about the announcement, which came via Midjourney’s official Discord channel.

Prompts are the written instructions (like “a cat in a car holding a can of a beer”) used by generative AI models such as Midjourney and Stability AI’s Stable Diffusion 3 (SD3) to synthesize images. Having prompt and image pairs could potentially help the training or fine-tuning of a rival AI image generator model.

Bot activity that took place around midnight on March 2 caused a 24-hour outage for the commercial image generator service. Midjourney linked several paid accounts with a Stability AI data team employee trying to “grab prompt and image pairs.” Midjourney then made a decision to ban all Stability AI employees from the service indefinitely. It also indicated a new policy: “aggressive automation or taking down the service results in banning all employees of the responsible company.”

Enlarge / A screenshot of the “Midjourney Office Hours” notes posted on March 6, 2024.

Midjourney

Siobhan Ball of The Mary Sue found it ironic that a company like Midjourney, which built its AI image synthesis models using training data scraped off the Internet without seeking permission, would be sensitive about having its own material scraped. “It turns out that generative AI companies don’t like it when you steal, sorry, scrape, images from them. Cue the world’s smallest violin.”

Users of Midjourney pay a monthly subscription fee to access an AI image generator that turns written prompts into lush computer-synthesized images. The bot that makes them was trained on millions of artistic works created by humans—it’s a practice that has been claimed to be disrespectful to artists. “Words can’t describe how dehumanizing it is to see my name used 20,000+ times in MidJourney,” wrote artist Jingna Zhang in a recent viral tweet. “My life’s work and who I am—reduced to meaningless fodder for a commercial image slot machine.”

Stability responds

Shortly after the news of the ban emerged, Stability AI CEO Emad Mostaque said that he was looking into it and claimed that whatever happened was not intentional. He also said it would be great if Midjourney reached out to him directly. In a reply on X, Midjourney CEO David Holz wrote, “sent you some information to help with your internal investigation.”

In a text message exchange with Ars Technica, Mostaque said, “We checked and there were no images scraped there, there was a bot run by a team member that was collecting prompts for a personal project though. We aren’t sure how that would cause a gallery site outage but are sorry if it did, Midjourney is great.”

Besides, Mostaque says, his company doesn’t need Midjourney’s data anyway. “We have been using synthetic & other data given SD3 outperforms all other models,” he wrote on X. In conversation with Ars, Mostaque similarly wanted to contrast his company’s data collection techniques with those of his rival. “We only scrape stuff that has proper robots.txt and is permissive,” Mostaque says. “And also did full opt-out for [Stable Diffusion 3] and Stable Cascade leveraging work Spawning did.”

When asked about Stability’s relationship with Midjourney these days, Mostaque played down the rivalry. “No real overlap, we get on fine though,” he told Ars and emphasized a key link in their histories. “I funded Midjourney to get [them] off the ground with a cash grant to cover [Nvidia] A100s for the beta.”

Image-scraping Midjourney bans rival AI firm for scraping images Read More »

Stability announces Stable Diffusion 3, a next-gen AI image generator

AI, AI image generators, Biz & IT, dall-e, DALL-E 3, Deepfakes, Emad Mostaque, image synthesis, machine learning, open weights, openai, SDXL, source available, Stability AI, Stable Diffusion, Stable Diffusion 3, Stable Diffusion XL / Rejus Almole / February 23, 2024

Pics and it didn’t happen —

SD3 may bring DALL-E-like prompt fidelity to an open-weights image-synthesis model.

Benj Edwards – Feb 22, 2024 9: 28 pm UTC

Enlarge / Stable Diffusion 3 generation with the prompt: studio photograph closeup of a chameleon over a black background.

On Thursday, Stability AI announced Stable Diffusion 3, an open-weights next-generation image-synthesis model. It follows its predecessors by reportedly generating detailed, multi-subject images with improved quality and accuracy in text generation. The brief announcement was not accompanied by a public demo, but Stability is opening up a waitlist today for those who would like to try it.

Stability says that its Stable Diffusion 3 family of models (which takes text descriptions called “prompts” and turns them into matching images) range in size from 800 million to 8 billion parameters. The size range accommodates allowing different versions of the model to run locally on a variety of devices—from smartphones to servers. Parameter size roughly corresponds to model capability in terms of how much detail it can generate. Larger models also require more VRAM on GPU accelerators to run.

Since 2022, we’ve seen Stability launch a progression of AI image-generation models: Stable Diffusion 1.4, 1.5, 2.0, 2.1, XL, XL Turbo, and now 3. Stability has made a name for itself as providing a more open alternative to proprietary image-synthesis models like OpenAI’s DALL-E 3, though not without controversy due to the use of copyrighted training data, bias, and the potential for abuse. (This has led to lawsuits that are unresolved.) Stable Diffusion models have been open-weights and source-available, which means the models can be run locally and fine-tuned to change their outputs.

Stable Diffusion 3 generation with the prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says “Stable Diffusion 3” made out of colorful energy.
An AI-generated image of a grandma wearing a “Go big or go home sweatshirt” generated by Stable Diffusion 3.
Stable Diffusion 3 generation with the prompt: Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3.
An AI-generated image created by Stable Diffusion 3.
Stable Diffusion 3 generation with the prompt: A horse balancing on top of a colorful ball in a field with green grass and a mountain in the background.
Stable Diffusion 3 generation with the prompt: Moody still life of assorted pumpkins.
Stable Diffusion 3 generation with the prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words “stable diffusion.”
Stable Diffusion 3 generation with the prompt: Resting on the kitchen table is an embroidered cloth with the text ‘good night’ and an embroidered baby tiger. Next to the cloth there is a lit candle. The lighting is dim and dramatic.
Stable Diffusion 3 generation with the prompt: Photo of an 90’s desktop computer on a work desk, on the computer screen it says “welcome”. On the wall in the background we see beautiful graffiti with the text “SD3” very large on the wall.

As far as tech improvements are concerned, Stability CEO Emad Mostaque wrote on X, “This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements. This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs.”

Like Mostaque said, the Stable Diffusion 3 family uses diffusion transformer architecture, which is a new way of creating images with AI that swaps out the usual image-building blocks (such as U-Net architecture) for a system that works on small pieces of the picture. The method was inspired by transformers, which are good at handling patterns and sequences. This approach not only scales up efficiently but also reportedly produces higher-quality images.

Stable Diffusion 3 also utilizes “flow matching,” which is a technique for creating AI models that can generate images by learning how to transition from random noise to a structured image smoothly. It does this without needing to simulate every step of the process, instead focusing on the overall direction or flow that the image creation should follow.

A comparison of outputs between OpenAI's DALL-E 3 and Stable Diffusion 3 with the prompt, — Enlarge / A comparison of outputs between OpenAI’s DALL-E 3 and Stable Diffusion 3 with the prompt, “Night photo of a sports car with the text “SD3″ on the side, the car is on a race track at high speed, a huge road sign with the text ‘faster.'”

We do not have access to Stable Diffusion 3 (SD3), but from samples we found posted on Stability’s website and associated social media accounts, the generations appear roughly comparable to other state-of-the-art image-synthesis models at the moment, including the aforementioned DALL-E 3, Adobe Firefly, Imagine with Meta AI, Midjourney, and Google Imagen.

SD3 appears to handle text generation very well in the examples provided by others, which are potentially cherry-picked. Text generation was a particular weakness of earlier image-synthesis models, so an improvement to that capability in a free model is a big deal. Also, prompt fidelity (how closely it follows descriptions in prompts) seems to be similar to DALL-E 3, but we haven’t tested that ourselves yet.

While Stable Diffusion 3 isn’t widely available, Stability says that once testing is complete, its weights will be free to download and run locally. “This preview phase, as with previous models,” Stability writes, “is crucial for gathering insights to improve its performance and safety ahead of an open release.”

Stability has been experimenting with a variety of image-synthesis architectures recently. Aside from SDXL and SDXL Turbo, just last week, the company announced Stable Cascade, which uses a three-stage process for text-to-image synthesis.

Listing image by Emad Mostaque (Stability AI)

Stability announces Stable Diffusion 3, a next-gen AI image generator Read More »

Reddit sells training data to unnamed AI company ahead of IPO

AI, Axel Springer, Biz & IT, Bloomberg, chatgpt, chatgtp, image synthesis, large language models, machine learning, openai, reddit, Stable Diffusion, steve huffman, text synthesis / Mike M. / February 20, 2024

Everything has a price —

If you’ve posted on Reddit, you’re likely feeding the future of AI.

Benj Edwards – Feb 19, 2024 9: 10 pm UTC

In this photo illustration the American social news

On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site’s content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.

Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.

After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.

In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades’ worth of human-generated content.

If the reported $60 million/year deal goes through, it’s quite possible that if you’ve ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.

While we don’t know if OpenAI is the company that signed the deal with Reddit, Bloomberg speculates that Reddit’s ability to tap into AI hype for additional revenue may boost the value of its IPO, which might be worth $5 billion. Despite drama last year, Bloomberg states that Reddit pulled in more than $800 million in revenue in 2023, growing about 20 percent over its 2022 numbers.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Reddit sells training data to unnamed AI company ahead of IPO Read More »

OpenAI collapses media reality with Sora, a photorealistic AI video generator

AI, Ai video, Ai video generator, Biz & IT, chatgpt, chatgtp, cultural singularity, dall-e, DALL-E 3, deepfake, Deepfakes, image synthesis, machine learning, openai, OpenAI Sora, Sora, Stable Diffusion, video generator, video synthesis, will smith / Rejus Almole / February 17, 2024

Pics and it didn’t happen —

Hello, cultural singularity—soon, every video you see online could be completely fake.

Benj Edwards – Feb 16, 2024 5: 23 pm UTC

Snapshots from three videos generated using OpenAI's Sora. — Enlarge / Snapshots from three videos generated using OpenAI’s Sora.

On Thursday, OpenAI announced Sora, a text-to-video AI model that can generate 60-second-long photorealistic HD video from written descriptions. While it’s only a research preview that we have not tested, it reportedly creates synthetic video (but not audio yet) at a fidelity and consistency greater than any text-to-video model available at the moment. It’s also freaking people out.

“It was nice knowing you all. Please tell your grandchildren about my videos and the lengths we went to to actually record them,” wrote Wall Street Journal tech reporter Joanna Stern on X.

“This could be the ‘holy shit’ moment of AI,” wrote Tom Warren of The Verge.

“Every single one of these videos is AI-generated, and if this doesn’t concern you at least a little bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee.

For future reference—since this type of panic will some day appear ridiculous—there’s a generation of people who grew up believing that photorealistic video must be created by cameras. When video was faked (say, for Hollywood films), it took a lot of time, money, and effort to do so, and the results weren’t perfect. That gave people a baseline level of comfort that what they were seeing remotely was likely to be true, or at least representative of some kind of underlying truth. Even when the kid jumped over the lava, there was at least a kid and a room.

The prompt that generated the video above: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.“

Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false. How we confront that as a society and work around it while maintaining trust in remote communications is far beyond the scope of this article, but I tried my hand at offering some solutions back in 2020, when all of the tech we’re seeing now seemed like a distant fantasy to most people.

In that piece, I called the moment that truth and fiction in media become indistinguishable the “cultural singularity.” It appears that OpenAI is on track to bring that prediction to pass a bit sooner than we expected.

Prompt: Reflections in the window of a train traveling through the Tokyo suburbs.

OpenAI has found that, like other AI models that use the transformer architecture, Sora scales with available compute. Given far more powerful computers behind the scenes, AI video fidelity could improve considerably over time. In other words, this is the “worst” AI-generated video is ever going to look. There’s no synchronized sound yet, but that might be solved in future models.

How (we think) they pulled it off

AI video synthesis has progressed by leaps and bounds over the past two years. We first covered text-to-video models in September 2022 with Meta’s Make-A-Video. A month later, Google showed off Imagen Video. And just 11 months ago, an AI-generated version of Will Smith eating spaghetti went viral. In May of last year, what was previously considered to be the front-runner in the text-to-video space, Runway Gen-2, helped craft a fake beer commercial full of twisted monstrosities, generated in two-second increments. In earlier video-generation models, people pop in and out of reality with ease, limbs flow together like pasta, and physics doesn’t seem to matter.

Sora (which means “sky” in Japanese) appears to be something altogether different. It’s high-resolution (1920×1080), can generate video with temporal consistency (maintaining the same subject over time) that lasts up to 60 seconds, and appears to follow text prompts with a great deal of fidelity. So, how did OpenAI pull it off?

OpenAI doesn’t usually share insider technical details with the press, so we’re left to speculate based on theories from experts and information given to the public.

OpenAI says that Sora is a diffusion model, much like DALL-E 3 and Stable Diffusion. It generates a video by starting off with noise and “gradually transforms it by removing the noise over many steps,” the company explains. It “recognizes” objects and concepts listed in the written prompt and pulls them out of the noise, so to speak, until a coherent series of video frames emerge.

Sora is capable of generating videos all at once from a text prompt, extending existing videos, or generating videos from still images. It achieves temporal consistency by giving the model “foresight” of many frames at once, as OpenAI calls it, solving the problem of ensuring a generated subject remains the same even if it falls out of view temporarily.

OpenAI represents video as collections of smaller groups of data called “patches,” which the company says are similar to tokens (fragments of a word) in GPT-4. “By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios,” the company writes.

An important tool in OpenAI’s bag of tricks is that its use of AI models is compounding. Earlier models are helping to create more complex ones. Sora follows prompts well because, like DALL-E 3, it utilizes synthetic captions that describe scenes in the training data generated by another AI model like GPT-4V. And the company is not stopping here. “Sora serves as a foundation for models that can understand and simulate the real world,” OpenAI writes, “a capability we believe will be an important milestone for achieving AGI.”

One question on many people’s minds is what data OpenAI used to train Sora. OpenAI has not revealed its dataset, but based on what people are seeing in the results, it’s possible OpenAI is using synthetic video data generated in a video game engine in addition to sources of real video (say, scraped from YouTube or licensed from stock video libraries). Nvidia’s Dr. Jim Fan, who is a specialist in training AI with synthetic data, wrote on X, “I won’t be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!” Until confirmed by OpenAI, however, that’s just speculation.

OpenAI collapses media reality with Sora, a photorealistic AI video generator Read More »