Stability AI

Terminator’s Cameron joins AI company behind controversial image generator

AI, Biz & IT, image synthesis, James Cameron, machine learning, SKYNET, Stability AI, Terminator, video synthesis / Kris Guyer / September 24, 2024

a net in the sky —

Famed sci-fi director joins board of embattled Stability AI, creator of Stable Diffusion.

Benj Edwards – Sep 24, 2024 7: 38 pm UTC

A photo of filmmaker James Cameron. — Enlarge / Filmmaker James Cameron.

On Tuesday, Stability AI announced that renowned filmmaker James Cameron—of Terminator and Skynet fame—has joined its board of directors. Stability is best known for its pioneering but highly controversial Stable Diffusion series of AI image-synthesis models, first launched in 2022, which can generate images based on text descriptions.

“I’ve spent my career seeking out emerging technologies that push the very boundaries of what’s possible, all in the service of telling incredible stories,” said Cameron in a statement. “I was at the forefront of CGI over three decades ago, and I’ve stayed on the cutting edge since. Now, the intersection of generative AI and CGI image creation is the next wave.”

Cameron is perhaps best known as the director behind blockbusters like Avatar, Titanic, and Aliens, but in AI circles, he may be most relevant for the co-creation of the character Skynet, a fictional AI system that triggers nuclear Armageddon and dominates humanity in the Terminator media franchise. Similar fears of AI taking over the world have since jumped into reality and recently sparked attempts to regulate existential risk from AI systems through measures like SB-1047 in California.

In a 2023 interview with CTV news, Cameron referenced The Terminator‘s release year when asked about AI’s dangers: “I warned you guys in 1984, and you didn’t listen,” he said. “I think the weaponization of AI is the biggest danger. I think that we will get into the equivalent of a nuclear arms race with AI, and if we don’t build it, the other guys are for sure going to build it, and so then it’ll escalate.”

Hollywood goes AI

Of course, Stability AI isn’t building weapons controlled by AI. Instead, Cameron’s interest in cutting-edge filmmaking techniques apparently drew him to the company.

“James Cameron lives in the future and waits for the rest of us to catch up,” said Stability CEO Prem Akkaraju. “Stability AI’s mission is to transform visual media for the next century by giving creators a full stack AI pipeline to bring their ideas to life. We have an unmatched advantage to achieve this goal with a technological and creative visionary like James at the highest levels of our company. This is not only a monumental statement for Stability AI, but the AI industry overall.”

Cameron joins other recent additions to Stability AI’s board, including Sean Parker, former president of Facebook, who serves as executive chairman. Parker called Cameron’s appointment “the start of a new chapter” for the company.

Despite significant protest from actors’ unions last year, elements of Hollywood are seemingly beginning to embrace generative AI over time. Last Wednesday, we covered a deal between Lionsgate and AI video-generation company Runway that will see the creation of a custom AI model for film production use. In March, the Financial Times reported that OpenAI was actively showing off its Sora video synthesis model to studio executives.

Unstable times for Stability AI

Cameron’s appointment to the Stability AI board comes during a tumultuous period for the company. Stability AI has faced a series of challenges this past year, including an ongoing class-action copyright lawsuit, a troubled Stable Diffusion 3 model launch, significant leadership and staff changes, and ongoing financial concerns.

In March, founder and CEO Emad Mostaque resigned, followed by a round of layoffs. This came on the heels of the departure of three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz, who have since founded Black Forest Labs and released a new open-weights image-synthesis model called Flux, which has begun to take over the r/StableDiffusion community on Reddit.

Despite the issues, Stability AI claims its models are widely used, with Stable Diffusion reportedly surpassing 150 million downloads. The company states that thousands of businesses use its models in their creative workflows.

While Stable Diffusion has indeed spawned a large community of open-weights-AI image enthusiasts online, it has also been a lightning rod for controversy among some artists because Stability originally trained its models on hundreds of millions of images scraped from the Internet without seeking licenses or permission to use them.

Apparently that association is not a concern for Cameron, according to his statement: “The convergence of these two totally different engines of creation [CGI and generative AI] will unlock new ways for artists to tell stories in ways we could have never imagined. Stability AI is poised to lead this transformation.”

Terminator’s Cameron joins AI company behind controversial image generator Read More »

Artists claim “big” win in copyright suit fighting AI image generators

AI, AI image generators, Artificial Intelligence, copyright infringement, copyright law, deviantART, generative ai, LAION-5b, MidJourney, Policy, runway ai, Stability AI, Stable Diffusion / Paul Patrick / August 14, 2024

Back to the drawing board —

Artists prepare to take on AI image generators as copyright suit proceeds

Ashley Belanger – Aug 14, 2024 9: 09 pm UTC

Artists defending a class-action lawsuit are claiming a major win this week in their fight to stop the most sophisticated AI image generators from copying billions of artworks to train AI models and replicate their styles without compensating artists.

In an order on Monday, US district judge William Orrick denied key parts of motions to dismiss from Stability AI, Midjourney, Runway AI, and DeviantArt. The court will now allow artists to proceed with discovery on claims that AI image generators relying on Stable Diffusion violate both the Copyright Act and the Lanham Act, which protects artists from commercial misuse of their names and unique styles.

“We won BIG,” an artist plaintiff, Karla Ortiz, wrote on X (formerly Twitter), celebrating the order. “Not only do we proceed on our copyright claims,” but “this order also means companies who utilize” Stable Diffusion models and LAION-like datasets that scrape artists’ works for AI training without permission “could now be liable for copyright infringement violations, amongst other violations.”

Lawyers for the artists, Joseph Saveri and Matthew Butterick, told Ars that artists suing “consider the Court’s order a significant step forward for the case,” as “the Court allowed Plaintiffs’ core copyright-infringement claims against all four defendants to proceed.”

Stability AI was the only company that responded to Ars’ request to comment, but it declined to comment.

Artists prepare to defend their livelihoods from AI

To get to this stage of the suit, artists had to amend their complaint to better explain exactly how AI image generators work to allegedly train on artists’ images and copy artists’ styles.

For example, they were told that if they “contend Stable Diffusion contains ‘compressed copies’ of the Training Images, they need to define ‘compressed copies’ and explain plausible facts in support. And if plaintiffs’ compressed copies theory is based on a contention that Stable Diffusion contains mathematical or statistical methods that can be carried out through algorithms or instructions in order to reconstruct the Training Images in whole or in part to create the new Output Images, they need to clarify that and provide plausible facts in support,” Orrick wrote.

To keep their fight alive, the artists pored through academic articles to support their arguments that “Stable Diffusion is built to a significant extent on copyrighted works and that the way the product operates necessarily invokes copies or protected elements of those works.” Orrick agreed that their amended complaint made plausible inferences that “at this juncture” is enough to support claims “that Stable Diffusion by operation by end users creates copyright infringement and was created to facilitate that infringement by design.”

“Specifically, the Court found Plaintiffs’ theory that image-diffusion models like Stable Diffusion contain compressed copies of their datasets to be plausible,” Saveri and Butterick’s statement to Ars said. “The Court also found it plausible that training, distributing, and copying such models constitute acts of copyright infringement.”

Not all of the artists’ claims survived, with Orrick granting motions to dismiss claims alleging that AI companies removed content management information from artworks in violation of the Digital Millennium Copyright Act (DMCA). Because artists failed to show evidence of defendants altering or stripping this information, they must permanently drop the DMCA claims.

Part of Orrick’s decision on the DMCA claims, however, indicates that the legal basis for dismissal is “unsettled,” with Orrick simply agreeing with Stability AI’s unsettled argument that “because the output images are admittedly not identical to the Training Images, there can be no liability for any removal of CMI that occurred during the training process.”

Ortiz wrote on X that she respectfully disagreed with that part of the decision but expressed enthusiasm that the court allowed artists to proceed with false endorsement claims, alleging that Midjourney violated the Lanham Act.

Five artists successfully argued that because “their names appeared on the list of 4,700 artists posted by Midjourney’s CEO on Discord” and that list was used to promote “the various styles of artistic works its AI product could produce,” this plausibly created confusion over whether those artists had endorsed Midjourney.

“Whether or not a reasonably prudent consumer would be confused or misled by the Names List and showcase to conclude that the included artists were endorsing the Midjourney product can be tested at summary judgment,” Orrick wrote. “Discovery may show that it is or that is it not.”

While Orrick agreed with Midjourney that “plaintiffs have no protection over ‘simple, cartoony drawings’ or ‘gritty fantasy paintings,'” artists were able to advance a “trade dress” claim under the Lanham Act, too. This is because Midjourney allegedly “allows users to create works capturing the ‘trade dress of each of the Midjourney Named Plaintiffs [that] is inherently distinctive in look and feel as used in connection with their artwork and art products.'”

As discovery proceeds in the case, artists will also have an opportunity to amend dismissed claims of unjust enrichment. According to Orrick, their next amended complaint will be their last chance to prove that AI companies have “deprived plaintiffs ‘the benefit of the value of their works.'”

Saveri and Butterick confirmed that “though the Court dismissed certain supplementary claims, Plaintiffs’ central claims will now proceed to discovery and trial.” On X, Ortiz suggested that the artists’ case is “now potentially one of THE biggest copyright infringement and trade dress cases ever!”

“Looking forward to the next stage of our fight!” Ortiz wrote.

Artists claim “big” win in copyright suit fighting AI image generators Read More »

FLUX: This new AI image generator is eerily good at creating human hands

AI, AI image generation, AI image generator, Andreas Blattman, Biz & IT, Black Forest Labs, Dominik Lorenz, FLUX.1, image synthesis, machine learning, Patrick Esser, Robin Rombach, Stability AI, Stable Diffusion, Stable Diffusion 3 / Kris Guyer / August 2, 2024

five-finger salute —

FLUX.1 is the open-weights heir apparent to Stable Diffusion, turning text into images.

Benj Edwards – Aug 2, 2024 5: 47 pm UTC

Enlarge / AI-generated image by FLUX.1 dev: “A beautiful queen of the universe holding up her hands, face in the background.”

FLUX.1

On Thursday, AI-startup Black Forest Labs announced the launch of its company and the release of its first suite of text-to-image AI models, called FLUX.1. The German-based company, founded by researchers who developed the technology behind Stable Diffusion and invented the latent diffusion technique, aims to create advanced generative AI for images and videos.

The launch of FLUX.1 comes about seven weeks after Stability AI’s troubled release of Stable Diffusion 3 Medium in mid-June. Stability AI’s offering faced widespread criticism among image-synthesis hobbyists for its poor performance in generating human anatomy, with users sharing examples of distorted limbs and bodies across social media. That problematic launch followed the earlier departure of three key engineers from Stability AI—Robin Rombach, Andreas Blattmann, and Dominik Lorenz—who went on to found Black Forest Labs along with latent diffusion co-developer Patrick Esser and others.

Black Forest Labs launched with the release of three FLUX.1 text-to-image models: a high-end commercial “pro” version, a mid-range “dev” version with open weights for non-commercial use, and a faster open-weights “schnell” version (“schnell” means quick or fast in German). Black Forest Labs claims its models outperform existing options like Midjourney and DALL-E in areas such as image quality and adherence to text prompts.

AI-generated image by FLUX.1 dev: “A close-up photo of a pair of hands holding a plate full of pickles.”

FLUX.1
AI-generated image by FLUX.1 dev: A hand holding up five fingers with a starry background.

FLUX.1
AI-generated image by FLUX.1 dev: “An Ars Technica reader sitting in front of a computer monitor. The screen shows the Ars Technica website.”

FLUX.1
AI-generated image by FLUX.1 dev: “a boxer posing with fists raised, no gloves.”

FLUX.1
AI-generated image by FLUX.1 dev: “An advertisement for ‘Frosted Prick’ cereal.”

FLUX.1
AI-generated image of a happy woman in a bakery baking a cake by FLUX.1 dev.

FLUX.1
AI-generated image by FLUX.1 dev: “An advertisement for ‘Marshmallow Menace’ cereal.”

FLUX.1
AI-generated image of “A handsome Asian influencer on top of the Empire State Building, instagram” by FLUX.1 dev.

FLUX.1

In our experience, the outputs of the two higher-end FLUX.1 models are generally comparable with OpenAI’s DALL-E 3 in prompt fidelity, with photorealism that seems close to Midjourney 6. They represent a significant improvement over Stable Diffusion XL, the team’s last major release under Stability (if you don’t count SDXL Turbo).

The FLUX.1 models use what the company calls a “hybrid architecture” combining transformer and diffusion techniques, scaled up to 12 billion parameters. Black Forest Labs said it improves on previous diffusion models by incorporating flow matching and other optimizations.

FLUX.1 seems competent at generating human hands, which was a weak spot in earlier image-synthesis models like Stable Diffusion 1.5 due to a lack of training images that focused on hands. Since those early days, other AI image generators like Midjourney have mastered hands as well, but it’s notable to see an open-weights model that renders hands relatively accurately in various poses.

We downloaded the weights file to the FLUX.1 dev model from GitHub, but at 23GB, it won’t fit in the 12GB VRAM of our RTX 3060 card, so it will need quantization to run locally (reducing its size), which reportedly (through chatter on Reddit) some people have already had success with.

Instead, we experimented with FLUX.1 models on AI cloud-hosting platforms Fal and Replicate, which cost money to use, though Fal offers some free credits to start.

Black Forest looks ahead

Black Forest Labs may be a new company, but it’s already attracting funding from investors. It recently closed a $31 million Series Seed funding round led by Andreessen Horowitz, with additional investments from General Catalyst and MätchVC. The company also brought on high-profile advisers, including entertainment executive and former Disney President Michael Ovitz and AI researcher Matthias Bethge.

“We believe that generative AI will be a fundamental building block of all future technologies,” the company stated in its announcement. “By making our models available to a wide audience, we want to bring its benefits to everyone, educate the public and enhance trust in the safety of these models.”

AI-generated image by FLUX.1 dev: A cat in a car holding a can of beer that reads, ‘AI Slop.’

FLUX.1
AI-generated image by FLUX.1 dev: Mickey Mouse and Spider-Man singing to each other.

FLUX.1
AI-generated image by FLUX.1 dev: “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.”

FLUX.1
AI-generated image of a flaming cheeseburger created by FLUX.1 dev.

FLUX.1
AI-generated image by FLUX.1 dev: “Will Smith eating spaghetti.”

FLUX.1
AI-generated image by FLUX.1 dev: “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting. The screen reads ‘Ars Technica.'”

FLUX.1
AI-generated image by FLUX.1 dev: “An advertisement for ‘Burt’s Grenades’ cereal.”

FLUX.1
AI-generated image by FLUX.1 dev: “A close-up photo of a pair of hands holding a plate that contains a portrait of the queen of the universe”

FLUX.1

Speaking of “trust and safety,” the company did not mention where it obtained the training data that taught the FLUX.1 models how to generate images. Judging by the outputs we could produce with the model that included depictions of copyrighted characters, Black Forest Labs likely used a huge unauthorized image scrape of the Internet, possibly collected by LAION, an organization that collected the datasets that trained Stable Diffusion. This is speculation at this point. While the underlying technological achievement of FLUX.1 is notable, it feels likely that the team is playing fast and loose with the ethics of “fair use” image scraping much like Stability AI did. That practice may eventually attract lawsuits like those filed against Stability AI.

Though text-to-image generation is Black Forest’s current focus, the company plans to expand into video generation next, saying that FLUX.1 will serve as the foundation of a new text-to-video model in development, which will compete with OpenAI’s Sora, Runway’s Gen-3 Alpha, and Kuaishou’s Kling in a contest to warp media reality on demand. “Our video models will unlock precise creation and editing at high definition and unprecedented speed,” the Black Forest announcement claims.

FLUX: This new AI image generator is eerily good at creating human hands Read More »

Ridiculed Stable Diffusion 3 release excels at AI-generated body horror

AI, AI image generator, Biz & IT, body horror, image synthesis, machine learning, Stability AI, Stable Diffusion, Stable Diffusion 3 / Kris Guyer / June 12, 2024

unstable diffusion —

Users react to mangled SD3 generations and ask, “Is this release supposed to be a joke?”

Benj Edwards – Jun 12, 2024 7: 26 pm UTC

Enlarge / An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.

On Wednesday, Stability AI released weights for Stable Diffusion 3 Medium, an AI image-synthesis model that turns text prompts into AI-generated images. Its arrival has been ridiculed online, however, because it generates images of humans in a way that seems like a step backward from other state-of-the-art image-synthesis models like Midjourney or DALL-E 3. As a result, it can churn out wild anatomically incorrect visual abominations with ease.

A thread on Reddit, titled, “Is this release supposed to be a joke? [SD3-2B],” details the spectacular failures of SD3 Medium at rendering humans, especially human limbs like hands and feet. Another thread, titled, “Why is SD3 so bad at generating girls lying on the grass?” shows similar issues, but for entire human bodies.

Hands have traditionally been a challenge for AI image generators due to lack of good examples in early training data sets, but more recently, several image-synthesis models seemed to have overcome the issue. In that sense, SD3 appears to be a huge step backward for the image-synthesis enthusiasts that gather on Reddit—especially compared to recent Stability releases like SD XL Turbo in November.

“It wasn’t too long ago that StableDiffusion was competing with Midjourney, now it just looks like a joke in comparison. At least our datasets are safe and ethical!” wrote one Reddit user.

An AI-generated image created using Stable Diffusion 3 Medium.
An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.
An AI-generated image created using Stable Diffusion 3 that shows mangled hands.
An AI-generated image created using Stable Diffusion 3 of a girl lying in the grass.
An AI-generated image created using Stable Diffusion 3 that shows mangled hands.
An AI-generated SD3 Medium image a Reddit user made with the prompt “woman wearing a dress on the beach.”
An AI-generated SD3 Medium image a Reddit user made with the prompt “photograph of a person napping in a living room.”

AI image fans are so far blaming the Stable Diffusion 3’s anatomy fails on Stability’s insistence on filtering out adult content (often called “NSFW” content) from the SD3 training data that teaches the model how to generate images. “Believe it or not, heavily censoring a model also gets rid of human anatomy, so… that’s what happened,” wrote one Reddit user in the thread.

Basically, any time a user prompt homes in on a concept that isn’t represented well in the AI model’s training dataset, the image-synthesis model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying.

The release of Stable Diffusion 2.0 in 2022 suffered from similar problems in depicting humans well, and AI researchers soon discovered that censoring adult content that contains nudity can severely hamper an AI model’s ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by strongly filtering NSFW content.

Another issue that can occur during model pre-training is that sometimes the NSFW filter researchers use remove adult images from the dataset is too picky, accidentally removing images that might not be offensive and depriving the model of depictions of humans in certain situations. “[SD3] works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw,” wrote one Redditor on the topic.

Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those being reported by others. For example, the prompt “a man showing his hands” returned an image of a man holding up two giant-sized backward hands, although each hand at least had five fingers.

A SD3 Medium example we generated with the prompt “A woman lying on the beach.”
A SD3 Medium example we generated with the prompt “A man showing his hands.”

Stability AI
A SD3 Medium example we generated with the prompt “A woman showing her hands.”

Stability AI
A SD3 Medium example we generated with the prompt “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.”
A SD3 Medium example we generated with the prompt “A cat in a car holding a can of beer.”

Stability first announced Stable Diffusion 3 in February, and the company has planned to make it available in a variety of different model sizes. Today’s release is for the “Medium” version, which is a 2 billion-parameter model. In addition to the weights being available on Hugging Face, they are also available for experimentation through the company’s Stability Platform. The weights are available for download and use for free under a non-commercial license only.

Soon after its February announcement, delays in releasing the SD3 model weights inspired rumors that the release was being held back due to technical issues or mismanagement. Stability AI as a company fell into a tailspin recently with the resignation of its founder and CEO, Emad Mostaque, in March and then a series of layoffs. Just prior to that, three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz—left the company. And its troubles go back even farther, with news of the company’s dire financial position lingering since 2023.

To some Stable Diffusion fans, the failures with Stable Diffusion 3 Medium are a visual manifestation of the company’s mismanagement—and an obvious sign of things falling apart. Although the company has not filed for bankruptcy, some users made dark jokes about the possibility after seeing SD3 Medium:

“I guess now they can go bankrupt in a safe and ethically [sic] way, after all.”

Ridiculed Stable Diffusion 3 release excels at AI-generated body horror Read More »

“CSAM generated by AI is still CSAM,” DOJ says after rare arrest

child abuse, csam, Instagram, LAION, Meta, Policy, runway ml, Stability AI, Stable Diffusion, stable diffusion 1.5, telegram, us department of justice / Paul Patrick / May 21, 2024

The US Department of Justice has started cracking down on the use of AI image generators to produce child sexual abuse materials (CSAM).

On Monday, the DOJ arrested Steven Anderegg, a 42-year-old “extremely technologically savvy” Wisconsin man who allegedly used Stable Diffusion to create “thousands of realistic images of prepubescent minors,” which were then distributed on Instagram and Telegram.

The cops were tipped off to Anderegg’s alleged activities after Instagram flagged direct messages that were sent on Anderegg’s Instagram account to a 15-year-old boy. Instagram reported the messages to the National Center for Missing and Exploited Children (NCMEC), which subsequently alerted law enforcement.

During the Instagram exchange, the DOJ found that Anderegg sent sexually explicit AI images of minors soon after the teen made his age known, alleging that “the only reasonable explanation for sending these images was to sexually entice the child.”

According to the DOJ’s indictment, Anderegg is a software engineer with “professional experience working with AI.” Because of his “special skill” in generative AI (GenAI), he was allegedly able to generate the CSAM using a version of Stable Diffusion, “along with a graphical user interface and special add-ons created by other Stable Diffusion users that specialized in producing genitalia.”

After Instagram reported Anderegg’s messages to the minor, cops seized Anderegg’s laptop and found “over 13,000 GenAI images, with hundreds—if not thousands—of these images depicting nude or semi-clothed prepubescent minors lasciviously displaying or touching their genitals” or “engaging in sexual intercourse with men.”

In his messages to the teen, Anderegg seemingly “boasted” about his skill in generating CSAM, the indictment said. The DOJ alleged that evidence from his laptop showed that Anderegg “used extremely specific and explicit prompts to create these images,” including “specific ‘negative’ prompts—that is, prompts that direct the GenAI model on what not to include in generated content—to avoid creating images that depict adults.” These go-to prompts were stored on his computer, the DOJ alleged.

Anderegg is currently in federal custody and has been charged with production, distribution, and possession of AI-generated CSAM, as well as “transferring obscene material to a minor under the age of 16,” the indictment said.

Because the DOJ suspected that Anderegg intended to use the AI-generated CSAM to groom a minor, the DOJ is arguing that there are “no conditions of release” that could prevent him from posing a “significant danger” to his community while the court mulls his case. The DOJ warned the court that it’s highly likely that any future contact with minors could go unnoticed, as Anderegg is seemingly tech-savvy enough to hide any future attempts to send minors AI-generated CSAM.

“He studied computer science and has decades of experience in software engineering,” the indictment said. “While computer monitoring may address the danger posed by less sophisticated offenders, the defendant’s background provides ample reason to conclude that he could sidestep such restrictions if he decided to. And if he did, any reoffending conduct would likely go undetected.”

If convicted of all four counts, he could face “a total statutory maximum penalty of 70 years in prison and a mandatory minimum of five years in prison,” the DOJ said. Partly because of “special skill in GenAI,” the DOJ—which described its evidence against Anderegg as “strong”—suggested that they may recommend a sentencing range “as high as life imprisonment.”

Announcing Anderegg’s arrest, Deputy Attorney General Lisa Monaco made it clear that creating AI-generated CSAM is illegal in the US.

“Technology may change, but our commitment to protecting children will not,” Monaco said. “The Justice Department will aggressively pursue those who produce and distribute child sexual abuse material—or CSAM—no matter how that material was created. Put simply, CSAM generated by AI is still CSAM, and we will hold accountable those who exploit AI to create obscene, abusive, and increasingly photorealistic images of children.”

“CSAM generated by AI is still CSAM,” DOJ says after rare arrest Read More »

US lawmaker proposes a public database of all AI training material

AI, Artificial Intelligence, copyright law, generative ai, generative ai disclosure act, openai, Policy, Stability AI, US Copyright Office / Paul Patrick / April 11, 2024

Who’s got the receipts? —

Proposed law would require more transparency from AI companies.

Ashley Belanger – Apr 11, 2024 8: 09 pm UTC

Amid a flurry of lawsuits over AI models’ training data, US Representative Adam Schiff (D-Calif.) has introduced a bill that would require AI companies to disclose exactly which copyrighted works are included in datasets training AI systems.

The Generative AI Disclosure Act “would require a notice to be submitted to the Register of Copyrights prior to the release of a new generative AI system with regard to all copyrighted works used in building or altering the training dataset for that system,” Schiff said in a press release.

The bill is retroactive and would apply to all AI systems available today, as well as to all AI systems to come. It would take effect 180 days after it’s enacted, requiring anyone who creates or alters a training set not only to list works referenced by the dataset, but also to provide a URL to the dataset within 30 days before the AI system is released to the public. That URL would presumably give creators a way to double-check if their materials have been used and seek any credit or compensation available before the AI tools are in use.

All notices would be kept in a publicly available online database.

Schiff described the act as championing “innovation while safeguarding the rights and contributions of creators, ensuring they are aware when their work contributes to AI training datasets.”

“This is about respecting creativity in the age of AI and marrying technological progress with fairness,” Schiff said.

Currently, creators who don’t have access to training datasets rely on AI models’ outputs to figure out if their copyrighted works may have been included in training various AI systems. The New York Times, for example, prompted ChatGPT to spit out excerpts of its articles, relying on a tactic to identify training data by asking ChatGPT to produce lines from specific articles, which OpenAI has curiously described as “hacking.”

Under Schiff’s law, The New York Times would need to consult the database to ID all articles used to train ChatGPT or any other AI system.

Any AI maker who violates the act would risk a “civil penalty in an amount not less than $5,000,” the proposed bill said.

At a hearing on artificial intelligence and intellectual property, Rep. Darrell Issa (R-Calif.)—who chairs the House Judiciary Subcommittee on Courts, Intellectual Property, and the Internet—told Schiff that his subcommittee would consider the “thoughtful” bill.

Schiff told the subcommittee that the bill is “only a first step” toward “ensuring that at a minimum” creators are “aware of when their work contributes to AI training datasets,” saying that he would “welcome the opportunity to work with members of the subcommittee” on advancing the bill.

“The rapid development of generative AI technologies has outpaced existing copyright laws, which has led to widespread use of creative content to train generative AI models without consent or compensation,” Schiff warned at the hearing.

In Schiff’s press release, Meredith Stiehm, president of the Writers Guild of America West, joined leaders from other creative groups celebrating the bill as an “important first step” for rightsholders.

“Greater transparency and guardrails around AI are necessary to protect writers and other creators” and address “the unprecedented and unauthorized use of copyrighted materials to train generative AI systems,” Stiehm said.

Until the thorniest AI copyright questions are settled, Ken Doroshow, a chief legal officer for the Recording Industry Association of America, suggested that Schiff’s bill filled an important gap by introducing “comprehensive and transparent recordkeeping” that would provide “one of the most fundamental building blocks of effective enforcement of creators’ rights.”

A senior adviser for the Human Artistry Campaign, Moiya McTier, went further, celebrating the bill as stopping AI companies from “exploiting” artists and creators.

“AI companies should stop hiding the ball when they copy creative works into AI systems and embrace clear rules of the road for recordkeeping that create a level and transparent playing field for the development and licensing of genuinely innovative applications and tools,” McTier said.

AI copyright guidance coming soon

While courts weigh copyright questions raised by artists, book authors, and newspapers, the US Copyright Office announced in March that it would be issuing guidance later this year, but the office does not seem to be prioritizing questions on AI training.

Instead, the Copyright Office will focus first on issuing guidance on deepfakes and AI outputs. This spring, the office will release a report “analyzing the impact of AI on copyright” of “digital replicas, or the use of AI to digitally replicate individuals’ appearances, voices, or other aspects of their identities.” Over the summer, another report will focus on “the copyrightability of works incorporating AI-generated material.”

Regarding “the topic of training AI models on copyrighted works as well as any licensing considerations and liability issues,” the Copyright Office did not provide a timeline for releasing guidance, only confirming that their “goal is to finalize the entire report by the end of the fiscal year.”

Once guidance is available, it could sway court opinions, although courts do not necessarily have to apply Copyright Office guidance when weighing cases.

The Copyright Office’s aspirational timeline does seem to be ahead of when at least some courts can be expected to decide on some of the biggest copyright questions for some creators. The class-action lawsuit raised by book authors against OpenAI, for example, is not expected to be resolved until February 2025, and the New York Times’ lawsuit is likely on a similar timeline. However, artists suing Stability AI face a hearing on that AI company’s motion to dismiss this May.

US lawmaker proposes a public database of all AI training material Read More »

Stability announces Stable Diffusion 3, a next-gen AI image generator

AI, AI image generators, Biz & IT, dall-e, DALL-E 3, Deepfakes, Emad Mostaque, image synthesis, machine learning, open weights, openai, SDXL, source available, Stability AI, Stable Diffusion, Stable Diffusion 3, Stable Diffusion XL / Rejus Almole / February 23, 2024

Pics and it didn’t happen —

SD3 may bring DALL-E-like prompt fidelity to an open-weights image-synthesis model.

Benj Edwards – Feb 22, 2024 9: 28 pm UTC

Enlarge / Stable Diffusion 3 generation with the prompt: studio photograph closeup of a chameleon over a black background.

On Thursday, Stability AI announced Stable Diffusion 3, an open-weights next-generation image-synthesis model. It follows its predecessors by reportedly generating detailed, multi-subject images with improved quality and accuracy in text generation. The brief announcement was not accompanied by a public demo, but Stability is opening up a waitlist today for those who would like to try it.

Stability says that its Stable Diffusion 3 family of models (which takes text descriptions called “prompts” and turns them into matching images) range in size from 800 million to 8 billion parameters. The size range accommodates allowing different versions of the model to run locally on a variety of devices—from smartphones to servers. Parameter size roughly corresponds to model capability in terms of how much detail it can generate. Larger models also require more VRAM on GPU accelerators to run.

Since 2022, we’ve seen Stability launch a progression of AI image-generation models: Stable Diffusion 1.4, 1.5, 2.0, 2.1, XL, XL Turbo, and now 3. Stability has made a name for itself as providing a more open alternative to proprietary image-synthesis models like OpenAI’s DALL-E 3, though not without controversy due to the use of copyrighted training data, bias, and the potential for abuse. (This has led to lawsuits that are unresolved.) Stable Diffusion models have been open-weights and source-available, which means the models can be run locally and fine-tuned to change their outputs.

Stable Diffusion 3 generation with the prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says “Stable Diffusion 3” made out of colorful energy.
An AI-generated image of a grandma wearing a “Go big or go home sweatshirt” generated by Stable Diffusion 3.
Stable Diffusion 3 generation with the prompt: Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3.
An AI-generated image created by Stable Diffusion 3.
Stable Diffusion 3 generation with the prompt: A horse balancing on top of a colorful ball in a field with green grass and a mountain in the background.
Stable Diffusion 3 generation with the prompt: Moody still life of assorted pumpkins.
Stable Diffusion 3 generation with the prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words “stable diffusion.”
Stable Diffusion 3 generation with the prompt: Resting on the kitchen table is an embroidered cloth with the text ‘good night’ and an embroidered baby tiger. Next to the cloth there is a lit candle. The lighting is dim and dramatic.
Stable Diffusion 3 generation with the prompt: Photo of an 90’s desktop computer on a work desk, on the computer screen it says “welcome”. On the wall in the background we see beautiful graffiti with the text “SD3” very large on the wall.

As far as tech improvements are concerned, Stability CEO Emad Mostaque wrote on X, “This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements. This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs.”

Like Mostaque said, the Stable Diffusion 3 family uses diffusion transformer architecture, which is a new way of creating images with AI that swaps out the usual image-building blocks (such as U-Net architecture) for a system that works on small pieces of the picture. The method was inspired by transformers, which are good at handling patterns and sequences. This approach not only scales up efficiently but also reportedly produces higher-quality images.

Stable Diffusion 3 also utilizes “flow matching,” which is a technique for creating AI models that can generate images by learning how to transition from random noise to a structured image smoothly. It does this without needing to simulate every step of the process, instead focusing on the overall direction or flow that the image creation should follow.

A comparison of outputs between OpenAI's DALL-E 3 and Stable Diffusion 3 with the prompt, — Enlarge / A comparison of outputs between OpenAI’s DALL-E 3 and Stable Diffusion 3 with the prompt, “Night photo of a sports car with the text “SD3″ on the side, the car is on a race track at high speed, a huge road sign with the text ‘faster.'”

We do not have access to Stable Diffusion 3 (SD3), but from samples we found posted on Stability’s website and associated social media accounts, the generations appear roughly comparable to other state-of-the-art image-synthesis models at the moment, including the aforementioned DALL-E 3, Adobe Firefly, Imagine with Meta AI, Midjourney, and Google Imagen.

SD3 appears to handle text generation very well in the examples provided by others, which are potentially cherry-picked. Text generation was a particular weakness of earlier image-synthesis models, so an improvement to that capability in a free model is a big deal. Also, prompt fidelity (how closely it follows descriptions in prompts) seems to be similar to DALL-E 3, but we haven’t tested that ourselves yet.

While Stable Diffusion 3 isn’t widely available, Stability says that once testing is complete, its weights will be free to download and run locally. “This preview phase, as with previous models,” Stability writes, “is crucial for gathering insights to improve its performance and safety ahead of an open release.”

Stability has been experimenting with a variety of image-synthesis architectures recently. Aside from SDXL and SDXL Turbo, just last week, the company announced Stable Cascade, which uses a three-stage process for text-to-image synthesis.

Listing image by Emad Mostaque (Stability AI)

Stability announces Stable Diffusion 3, a next-gen AI image generator Read More »