image synthesis

ten-months-after-first-tease,-openai-launches-sora-video-generation-publicly

Ten months after first tease, OpenAI launches Sora video generation publicly

A music video by Canadian art collective Vallée Duhamel made with Sora-generated video. “[We] just shoot stuff and then use Sora to combine it with a more interesting, more surreal vision.”

During a livestream on Monday—during Day 3 of OpenAI’s “12 days of OpenAi”—Sora’s developers showcased a new “Explore” interface that allows people to browse through videos generated by others to get prompting ideas. OpenAI says that anyone can enjoy viewing the “Explore” feed for free, but generating videos requires a subscription.

They also showed off a new feature called “Storyboard” that allows users to direct a video with multiple actions in a frame-by-frame manner.

Safety measures and limitations

In addition to the release, OpenAI also publish Sora’s System Card for the first time. It includes technical details about how the model works and safety testing the company undertook prior to this release.

“Whereas LLMs have text tokens, Sora has visual patches,” OpenAI writes, describing the new training chunks as “an effective representation for models of visual data… At a high level, we turn videos into patches by first compressing videos into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches.”

Sora also makes use of a “recaptioning technique”—similar to that seen in the company’s DALL-E 3 image generation, to “generate highly descriptive captions for the visual training data.” That, in turn, lets Sora “follow the user’s text instructions in the generated video more faithfully,” OpenAI writes.

Sora-generated video provided by OpenAI, from the prompt: “Loop: a golden retriever puppy wearing a superhero outfit complete with a mask and cape stands perched on the top of the empire state building in winter, overlooking the nyc it protects at night. the back of the pup is visible to the camera; his attention faced to nyc”

OpenAI implemented several safety measures in the release. The platform embeds C2PA metadata in all generated videos for identification and origin verification. Videos display visible watermarks by default, and OpenAI developed an internal search tool to verify Sora-generated content.

The company acknowledged technical limitations in the current release. “This early version of Sora will make mistakes, it’s not perfect,” said one developer during the livestream launch. The model reportedly struggles with physics simulations and complex actions over extended durations.

In the past, we’ve seen that these types of limitations are based on what example videos were used to train AI models. This current generation of AI video-synthesis models has difficulty generating truly new things, since the underlying architecture excels at transforming existing concepts into new presentations, but so far typically fails at true originality. Still, it’s early in AI video generation, and the technology is improving all the time.

Ten months after first tease, OpenAI launches Sora video generation publicly Read More »

your-ai-clone-could-target-your-family,-but-there’s-a-simple-defense

Your AI clone could target your family, but there’s a simple defense

The warning extends beyond voice scams. The FBI announcement details how criminals also use AI models to generate convincing profile photos, identification documents, and chatbots embedded in fraudulent websites. These tools automate the creation of deceptive content while reducing previously obvious signs of humans behind the scams, like poor grammar or obviously fake photos.

Much like we warned in 2022 in a piece about life-wrecking deepfakes based on publicly available photos, the FBI also recommends limiting public access to recordings of your voice and images online. The bureau suggests making social media accounts private and restricting followers to known contacts.

Origin of the secret word in AI

To our knowledge, we can trace the first appearance of the secret word in the context of modern AI voice synthesis and deepfakes back to an AI developer named Asara Near, who first announced the idea on Twitter on March 27, 2023.

“(I)t may be useful to establish a ‘proof of humanity’ word, which your trusted contacts can ask you for,” Near wrote. “(I)n case they get a strange and urgent voice or video call from you this can help assure them they are actually speaking with you, and not a deepfaked/deepcloned version of you.”

Since then, the idea has spread widely. In February, Rachel Metz covered the topic for Bloomberg, writing, “The idea is becoming common in the AI research community, one founder told me. It’s also simple and free.”

Of course, passwords have been used since ancient times to verify someone’s identity, and it seems likely some science fiction story has dealt with the issue of passwords and robot clones in the past. It’s interesting that, in this new age of high-tech AI identity fraud, this ancient invention—a special word or phrase known to few—can still prove so useful.

Your AI clone could target your family, but there’s a simple defense Read More »

new-zemeckis-film-used-ai-to-de-age-tom-hanks-and-robin-wright

New Zemeckis film used AI to de-age Tom Hanks and Robin Wright

On Friday, TriStar Pictures released Here, a $50 million Robert Zemeckis-directed film that used real time generative AI face transformation techniques to portray actors Tom Hanks and Robin Wright across a 60-year span, marking one of Hollywood’s first full-length features built around AI-powered visual effects.

The film adapts a 2014 graphic novel set primarily in a New Jersey living room across multiple time periods. Rather than cast different actors for various ages, the production used AI to modify Hanks’ and Wright’s appearances throughout.

The de-aging technology comes from Metaphysic, a visual effects company that creates real time face swapping and aging effects. During filming, the crew watched two monitors simultaneously: one showing the actors’ actual appearances and another displaying them at whatever age the scene required.

Here – Official Trailer (HD)

Metaphysic developed the facial modification system by training custom machine-learning models on frames of Hanks’ and Wright’s previous films. This included a large dataset of facial movements, skin textures, and appearances under varied lighting conditions and camera angles. The resulting models can generate instant face transformations without the months of manual post-production work traditional CGI requires.

Unlike previous aging effects that relied on frame-by-frame manipulation, Metaphysic’s approach generates transformations instantly by analyzing facial landmarks and mapping them to trained age variations.

“You couldn’t have made this movie three years ago,” Zemeckis told The New York Times in a detailed feature about the film. Traditional visual effects for this level of face modification would reportedly require hundreds of artists and a substantially larger budget closer to standard Marvel movie costs.

This isn’t the first film that has used AI techniques to de-age actors. ILM’s approach to de-aging Harrison Ford in 2023’s Indiana Jones and the Dial of Destiny used a proprietary system called Flux with infrared cameras to capture facial data during filming, then old images of Ford to de-age him in post-production. By contrast, Metaphysic’s AI models process transformations without additional hardware and show results during filming.

New Zemeckis film used AI to de-age Tom Hanks and Robin Wright Read More »

deepfake-lovers-swindle-victims-out-of-$46m-in-hong-kong-ai-scam

Deepfake lovers swindle victims out of $46M in Hong Kong AI scam

The police operation resulted in the seizure of computers, mobile phones, and about $25,756 in suspected proceeds and luxury watches from the syndicate’s headquarters. Police said that victims originated from multiple countries, including Hong Kong, mainland China, Taiwan, India, and Singapore.

A widening real-time deepfake problem

Realtime deepfakes have become a growing problem over the past year. In August, we covered a free app called Deep-Live-Cam that can do real-time face-swaps for video chat use, and in February, the Hong Kong office of British engineering firm Arup lost $25 million in an AI-powered scam in which the perpetrators used deepfakes of senior management during a video conference call to trick an employee into transferring money.

News of the scam also comes amid recent warnings from the United Nations Office on Drugs and Crime, notes The Record in a report about the recent scam ring. The agency released a report last week highlighting tech advancements among organized crime syndicates in Asia, specifically mentioning the increasing use of deepfake technology in fraud.

The UN agency identified more than 10 deepfake software providers selling their services on Telegram to criminal groups in Southeast Asia, showing the growing accessibility of this technology for illegal purposes.

Some companies are attempting to find automated solutions to the issues presented by AI-powered crime, including Reality Defender, which creates software that attempts to detect deepfakes in real time. Some deepfake detection techniques may work at the moment, but as the fakes improve in realism and sophistication, we may be looking at an escalating arms race between those who seek to fool others and those who want to prevent deception.

Deepfake lovers swindle victims out of $46M in Hong Kong AI scam Read More »

is-china-pulling-ahead-in-ai-video-synthesis?-we-put-minimax-to-the-test

Is China pulling ahead in AI video synthesis? We put Minimax to the test

In the spirit of not cherry-picking any results, everything you see was the first generation we received for the prompt listed above it.

“A highly intelligent person reading ‘Ars Technica’ on their computer when the screen explodes”

“A cat in a car drinking a can of beer, beer commercial”

“Will Smith eating spaghetti

“Robotic humanoid animals with vaudeville costumes roam the streets collecting protection money in tokens”

“A basketball player in a haunted passenger train car with a basketball court, and he is playing against a team of ghosts”

“A herd of one million cats running on a hillside, aerial view”

“Video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy”

“A muscular barbarian breaking a CRT television set with a weapon, cinematic, 8K, studio lighting”

Limitations of video synthesis models

Overall, the Minimax video-01 results seen above feel fairly similar to Gen-3’s outputs, with some differences, like the lack of a celebrity filter on Will Smith (who sadly did not actually eat the spaghetti in our tests), and the more realistic cat hands and licking motion. Some results were far worse, like the one million cats and the Ars Technica reader.

Is China pulling ahead in AI video synthesis? We put Minimax to the test Read More »

terminator’s-cameron-joins-ai-company-behind-controversial-image-generator

Terminator’s Cameron joins AI company behind controversial image generator

a net in the sky —

Famed sci-fi director joins board of embattled Stability AI, creator of Stable Diffusion.

A photo of filmmaker James Cameron.

Enlarge / Filmmaker James Cameron.

On Tuesday, Stability AI announced that renowned filmmaker James Cameron—of Terminator and Skynet fame—has joined its board of directors. Stability is best known for its pioneering but highly controversial Stable Diffusion series of AI image-synthesis models, first launched in 2022, which can generate images based on text descriptions.

“I’ve spent my career seeking out emerging technologies that push the very boundaries of what’s possible, all in the service of telling incredible stories,” said Cameron in a statement. “I was at the forefront of CGI over three decades ago, and I’ve stayed on the cutting edge since. Now, the intersection of generative AI and CGI image creation is the next wave.”

Cameron is perhaps best known as the director behind blockbusters like Avatar, Titanic, and Aliens, but in AI circles, he may be most relevant for the co-creation of the character Skynet, a fictional AI system that triggers nuclear Armageddon and dominates humanity in the Terminator media franchise. Similar fears of AI taking over the world have since jumped into reality and recently sparked attempts to regulate existential risk from AI systems through measures like SB-1047 in California.

In a 2023 interview with CTV news, Cameron referenced The Terminator‘s release year when asked about AI’s dangers: “I warned you guys in 1984, and you didn’t listen,” he said. “I think the weaponization of AI is the biggest danger. I think that we will get into the equivalent of a nuclear arms race with AI, and if we don’t build it, the other guys are for sure going to build it, and so then it’ll escalate.”

Hollywood goes AI

Of course, Stability AI isn’t building weapons controlled by AI. Instead, Cameron’s interest in cutting-edge filmmaking techniques apparently drew him to the company.

“James Cameron lives in the future and waits for the rest of us to catch up,” said Stability CEO Prem Akkaraju. “Stability AI’s mission is to transform visual media for the next century by giving creators a full stack AI pipeline to bring their ideas to life. We have an unmatched advantage to achieve this goal with a technological and creative visionary like James at the highest levels of our company. This is not only a monumental statement for Stability AI, but the AI industry overall.”

Cameron joins other recent additions to Stability AI’s board, including Sean Parker, former president of Facebook, who serves as executive chairman. Parker called Cameron’s appointment “the start of a new chapter” for the company.

Despite significant protest from actors’ unions last year, elements of Hollywood are seemingly beginning to embrace generative AI over time. Last Wednesday, we covered a deal between Lionsgate and AI video-generation company Runway that will see the creation of a custom AI model for film production use. In March, the Financial Times reported that OpenAI was actively showing off its Sora video synthesis model to studio executives.

Unstable times for Stability AI

Cameron’s appointment to the Stability AI board comes during a tumultuous period for the company. Stability AI has faced a series of challenges this past year, including an ongoing class-action copyright lawsuit, a troubled Stable Diffusion 3 model launch, significant leadership and staff changes, and ongoing financial concerns.

In March, founder and CEO Emad Mostaque resigned, followed by a round of layoffs. This came on the heels of the departure of three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz, who have since founded Black Forest Labs and released a new open-weights image-synthesis model called Flux, which has begun to take over the r/StableDiffusion community on Reddit.

Despite the issues, Stability AI claims its models are widely used, with Stable Diffusion reportedly surpassing 150 million downloads. The company states that thousands of businesses use its models in their creative workflows.

While Stable Diffusion has indeed spawned a large community of open-weights-AI image enthusiasts online, it has also been a lightning rod for controversy among some artists because Stability originally trained its models on hundreds of millions of images scraped from the Internet without seeking licenses or permission to use them.

Apparently that association is not a concern for Cameron, according to his statement: “The convergence of these two totally different engines of creation [CGI and generative AI] will unlock new ways for artists to tell stories in ways we could have never imagined. Stability AI is poised to lead this transformation.”

Terminator’s Cameron joins AI company behind controversial image generator Read More »

due-to-ai-fakes,-the-“deep-doubt”-era-is-here

Due to AI fakes, the “deep doubt” era is here

A person writing

Memento | Aurich Lawson

Given the flood of photorealistic AI-generated images washing over social media networks like X and Facebook these days, we’re seemingly entering a new age of media skepticism: the era of what I’m calling “deep doubt.” While questioning the authenticity of digital content stretches back decades—and analog media long before that—easy access to tools that generate convincing fake content has led to a new wave of liars using AI-generated scenes to deny real documentary evidence. Along the way, people’s existing skepticism toward online content from strangers may be reaching new heights.

Deep doubt is skepticism of real media that stems from the existence of generative AI. This manifests as broad public skepticism toward the veracity of media artifacts, which in turn leads to a notable consequence: People can now more credibly claim that real events did not happen and suggest that documentary evidence was fabricated using AI tools.

The concept behind “deep doubt” isn’t new, but its real-world impact is becoming increasingly apparent. Since the term “deepfake” first surfaced in 2017, we’ve seen a rapid evolution in AI-generated media capabilities. This has led to recent examples of deep doubt in action, such as conspiracy theorists claiming that President Joe Biden has been replaced by an AI-powered hologram and former President Donald Trump’s baseless accusation in August that Vice President Kamala Harris used AI to fake crowd sizes at her rallies. And on Friday, Trump cried “AI” again at a photo of him with E. Jean Carroll, a writer who successfully sued him for sexual assault, that contradicts his claim of never having met her.

Legal scholars Danielle K. Citron and Robert Chesney foresaw this trend years ago, coining the term “liar’s dividend” in 2019 to describe the consequence of deep doubt: deepfakes being weaponized by liars to discredit authentic evidence. But whereas deep doubt was once a hypothetical academic concept, it is now our reality.

The rise of deepfakes, the persistence of doubt

Doubt has been a political weapon since ancient times. This modern AI-fueled manifestation is just the latest evolution of a tactic where the seeds of uncertainty are sown to manipulate public opinion, undermine opponents, and hide the truth. AI is the newest refuge of liars.

Over the past decade, the rise of deep-learning technology has made it increasingly easy for people to craft false or modified pictures, audio, text, or video that appear to be non-synthesized organic media. Deepfakes were named after a Reddit user going by the name “deepfakes,” who shared AI-faked pornography on the service, swapping out the face of a performer with the face of someone else who wasn’t part of the original recording.

In the 20th century, one could argue that a certain part of our trust in media produced by others was a result of how expensive and time-consuming it was, and the skill it required, to produce documentary images and films. Even texts required a great deal of time and skill. As the deep doubt phenomenon grows, it will erode this 20th-century media sensibility. But it will also affect our political discourse, legal systems, and even our shared understanding of historical events that rely on that media to function—we rely on others to get information about the world. From photorealistic images to pitch-perfect voice clones, our perception of what we consider “truth” in media will need recalibration.

In April, a panel of federal judges highlighted the potential for AI-generated deepfakes to not only introduce fake evidence but also cast doubt on genuine evidence in court trials. The concern emerged during a meeting of the US Judicial Conference’s Advisory Committee on Evidence Rules, where the judges discussed the challenges of authenticating digital evidence in an era of increasingly sophisticated AI technology. Ultimately, the judges decided to postpone making any AI-related rule changes, but their meeting shows that the subject is already being considered by American judges.

Due to AI fakes, the “deep doubt” era is here Read More »

taylor-swift-cites-ai-deepfakes-in-endorsement-for-kamala-harris

Taylor Swift cites AI deepfakes in endorsement for Kamala Harris

it’s raining creepy men —

Taylor Swift on AI: “The simplest way to combat misinformation is with the truth.”

A screenshot of Taylor Swift's Kamala Harris Instagram post, captured on September 11, 2024.

Enlarge / A screenshot of Taylor Swift’s Kamala Harris Instagram post, captured on September 11, 2024.

On Tuesday night, Taylor Swift endorsed Vice President Kamala Harris for US President on Instagram, citing concerns over AI-generated deepfakes as a key motivator. The artist’s warning aligns with current trends in technology, especially in an era where AI synthesis models can easily create convincing fake images and videos.

“Recently I was made aware that AI of ‘me’ falsely endorsing Donald Trump’s presidential run was posted to his site,” she wrote in her Instagram post. “It really conjured up my fears around AI, and the dangers of spreading misinformation. It brought me to the conclusion that I need to be very transparent about my actual plans for this election as a voter. The simplest way to combat misinformation is with the truth.”

In August 2024, former President Donald Trump posted AI-generated images on Truth Social falsely suggesting Swift endorsed him, including a manipulated photo depicting Swift as Uncle Sam with text promoting Trump. The incident sparked Swift’s fears about the spread of misinformation through AI.

This isn’t the first time Swift and generative AI have appeared together in the news. In February, we reported that a flood of explicit AI-generated images of Swift originated from a 4chan message board where users took part in daily challenges to bypass AI image generator filters.

Listing image by Ronald Woan/CC BY-SA 2.0

Taylor Swift cites AI deepfakes in endorsement for Kamala Harris Read More »

oprah’s-upcoming-ai-television-special-sparks-outrage-among-tech-critics

Oprah’s upcoming AI television special sparks outrage among tech critics

You get an AI, and You get an AI —

AI opponents say Gates, Altman, and others will guide Oprah through an AI “sales pitch.”

An ABC handout promotional image for

Enlarge / An ABC handout promotional image for “AI and the Future of Us: An Oprah Winfrey Special.”

On Thursday, ABC announced an upcoming TV special titled, “AI and the Future of Us: An Oprah Winfrey Special.” The one-hour show, set to air on September 12, aims to explore AI’s impact on daily life and will feature interviews with figures in the tech industry, like OpenAI CEO Sam Altman and Bill Gates. Soon after the announcement, some AI critics began questioning the guest list and the framing of the show in general.

Sure is nice of Oprah to host this extended sales pitch for the generative AI industry at a moment when its fortunes are flagging and the AI bubble is threatening to burst,” tweeted author Brian Merchant, who frequently criticizes generative AI technology in op-eds, social media, and through his “Blood in the Machine” AI newsletter.

“The way the experts who are not experts are presented as such 💀 what a train wreck,” replied artist Karla Ortiz, who is a plaintiff in a lawsuit against several AI companies. “There’s still PLENTY of time to get actual experts and have a better discussion on this because yikes.”

The trailer for Oprah’s upcoming TV special on AI.

On Friday, Ortiz created a lengthy viral thread on X that detailed her potential issues with the program, writing, “This event will be the first time many people will get info on Generative AI. However it is shaping up to be a misinformed marketing event starring vested interests (some who are under a litany of lawsuits) who ignore the harms GenAi inflicts on communities NOW.”

Critics of generative AI like Ortiz question the utility of the technology, its perceived environmental impact, and what they see as blatant copyright infringement. In training AI language models, tech companies like Meta, Anthropic, and OpenAI commonly use copyrighted material gathered without license or owner permission. OpenAI claims that the practice is “fair use.”

Oprah’s guests

According to ABC, the upcoming special will feature “some of the most important and powerful people in AI,” which appears to roughly translate to “famous and publicly visible people related to tech.” Microsoft co-founder Bill Gates, who stepped down as Microsoft CEO 24 years ago, will appear on the show to explore the “AI revolution coming in science, health, and education,” ABC says, and warn of “the once-in-a-century type of impact AI may have on the job market.”

As a guest representing ChatGPT-maker OpenAI, Sam Altman will explain “how AI works in layman’s terms” and discuss “the immense personal responsibility that must be borne by the executives of AI companies.” Karla Ortiz specifically criticized Altman in her thread by saying, “There are far more qualified individuals to speak on what GenAi models are than CEOs. Especially one CEO who recently said AI models will ‘solve all physics.’ That’s an absurd statement and not worthy of your audience.”

In a nod to present-day content creation, YouTube creator Marques Brownlee will appear on the show and reportedly walk Winfrey through “mind-blowing demonstrations of AI’s capabilities.”

Brownlee’s involvement received special attention from some critics online. “Marques Brownlee should be absolutely ashamed of himself,” tweeted PR consultant and frequent AI critic Ed Zitron, who frequently heaps scorn on generative AI in his own newsletter. “What a disgraceful thing to be associated with.”

Other guests include Tristan Harris and Aza Raskin from the Center for Humane Technology, who aim to highlight “emerging risks posed by powerful and superintelligent AI,” an existential risk topic that has its own critics. And FBI Director Christopher Wray will reveal “the terrifying ways criminals and foreign adversaries are using AI,” while author Marilynne Robinson will reflect on “AI’s threat to human values.”

Going only by the publicized guest list, it appears that Oprah does not plan to give voice to prominent non-doomer critics of AI. “This is really disappointing @Oprah and frankly a bit irresponsible to have a one-sided conversation on AI without informed counterarguments from those impacted,” tweeted TV producer Theo Priestley.

Others on the social media network shared similar criticism about a perceived lack of balance in the guest list, including Dr. Margaret Mitchell of Hugging Face. “It could be beneficial to have an AI Oprah follow-up discussion that responds to what happens in [the show] and unpacks generative AI in a more grounded way,” she said.

Oprah’s AI special will air on September 12 on ABC (and a day later on Hulu) in the US, and it will likely elicit further responses from the critics mentioned above. But perhaps that’s exactly how Oprah wants it: “It may fascinate you or scare you,” Winfrey said in a promotional video for the special. “Or, if you’re like me, it may do both. So let’s take a breath and find out more about it.”

Oprah’s upcoming AI television special sparks outrage among tech critics Read More »

procreate-defies-ai-trend,-pledges-“no-generative-ai”-in-its-illustration-app

Procreate defies AI trend, pledges “no generative AI” in its illustration app

Political pixels —

Procreate CEO: “I really f—ing hate generative AI.”

Still of Procreate CEO James Cuda from a video posted to X.

Enlarge / Still of Procreate CEO James Cuda from a video posted to X.

On Sunday, Procreate announced that it will not incorporate generative AI into its popular iPad illustration app. The decision comes in response to an ongoing backlash from some parts of the art community, which has raised concerns about the ethical implications and potential consequences of AI use in creative industries.

“Generative AI is ripping the humanity out of things,” Procreate wrote on its website. “Built on a foundation of theft, the technology is steering us toward a barren future.”

In a video posted on X, Procreate CEO James Cuda laid out his company’s stance, saying, “We’re not going to be introducing any generative AI into our products. I don’t like what’s happening to the industry, and I don’t like what it’s doing to artists.”

Cuda’s sentiment echoes the fears of some digital artists who feel that AI image synthesis models, often trained on content without consent or compensation, threaten their livelihood and the authenticity of creative work. That’s not a universal sentiment among artists, but AI image synthesis is often a deeply divisive subject on social media, with some taking starkly polarized positions on the topic.

Procreate CEO James Cuda lays out his argument against generative AI in a video posted to X.

Cuda’s video plays on that polarization with clear messaging against generative AI. His statement reads as follows:

You’ve been asking us about AI. You know, I usually don’t like getting in front of the camera. I prefer that our products speak for themselves. I really fucking hate generative AI. I don’t like what’s happening in the industry and I don’t like what it’s doing to artists. We’re not going to be introducing any generative AI into out products. Our products are always designed and developed with the idea that a human will be creating something. You know, we don’t exactly know where this story’s gonna go or how it ends, but we believe that we’re on the right path supporting human creativity.

The debate over generative AI has intensified among some outspoken artists as more companies integrate these tools into their products. Dominant illustration software provider Adobe has tried to avoid ethical concerns by training its Firefly AI models on licensed or public domain content, but some artists have remained skeptical. Adobe Photoshop currently includes a “Generative Fill” feature powered by image synthesis, and the company is also experimenting with video synthesis models.

The backlash against image and video synthesis is not solely focused on creative app developers. Hardware manufacturer Wacom and game publisher Wizards of the Coast have faced criticism and issued apologies after using AI-generated content in their products. Toys “R” Us also faced a negative reaction after debuting an AI-generated commercial. Companies are still grappling with balancing the potential benefits of generative AI with the ethical concerns it raises.

Artists and critics react

A partial screenshot of Procreate's AI website captured on August 20, 2024.

Enlarge / A partial screenshot of Procreate’s AI website captured on August 20, 2024.

So far, Procreate’s anti-AI announcement has been met with a largely positive reaction in replies to its social media post. In a widely liked comment, artist Freya Holmér wrote on X, “this is very appreciated, thank you.”

Some of the more outspoken opponents of image synthesis also replied favorably to Procreate’s move. Karla Ortiz, who is a plaintiff in a lawsuit against AI image-generator companies, replied to Procreate’s video on X, “Whatever you need at any time, know I’m here!! Artists support each other, and also support those who allow us to continue doing what we do! So thank you for all you all do and so excited to see what the team does next!”

Artist RJ Palmer, who stoked the first major wave of AI art backlash with a viral tweet in 2022, also replied to Cuda’s video statement, saying, “Now thats the way to send a message. Now if only you guys could get a full power competitor to [Photoshop] on desktop with plugin support. Until someone can build a real competitor to high level [Photoshop] use, I’m stuck with it.”

A few pro-AI users also replied to the X post, including AI-augmented artist Claire Silver, who uses generative AI as an accessibility tool. She wrote on X, “Most of my early work is made with a combination of AI and Procreate. 7 years ago, before text to image was really even a thing. I loved procreate because it used tech to boost accessibility. Like AI, it augmented trad skill to allow more people to create. No rules, only tools.”

Since AI image synthesis continues to be a highly charged subject among some artists, reaffirming support for human-centric creativity could be an effective differentiated marketing move for Procreate, which currently plays underdog to creativity app giant Adobe. While some may prefer to use AI tools, in an (ideally healthy) app ecosystem with personal choice in illustration apps, people can follow their conscience.

Procreate’s anti-AI stance is slightly risky because it might also polarize part of its user base—and if the company changes its mind about including generative AI in the future, it will have to walk back its pledge. But for now, Procreate is confident in its decision: “In this technological rush, this might make us an exception or seem at risk of being left behind,” Procreate wrote. “But we see this road less traveled as the more exciting and fruitful one for our community.”

Procreate defies AI trend, pledges “no generative AI” in its illustration app Read More »

flux:-this-new-ai-image-generator-is-eerily-good-at-creating-human-hands

FLUX: This new AI image generator is eerily good at creating human hands

five-finger salute —

FLUX.1 is the open-weights heir apparent to Stable Diffusion, turning text into images.

AI-generated image by FLUX.1 dev:

Enlarge / AI-generated image by FLUX.1 dev: “A beautiful queen of the universe holding up her hands, face in the background.”

FLUX.1

On Thursday, AI-startup Black Forest Labs announced the launch of its company and the release of its first suite of text-to-image AI models, called FLUX.1. The German-based company, founded by researchers who developed the technology behind Stable Diffusion and invented the latent diffusion technique, aims to create advanced generative AI for images and videos.

The launch of FLUX.1 comes about seven weeks after Stability AI’s troubled release of Stable Diffusion 3 Medium in mid-June. Stability AI’s offering faced widespread criticism among image-synthesis hobbyists for its poor performance in generating human anatomy, with users sharing examples of distorted limbs and bodies across social media. That problematic launch followed the earlier departure of three key engineers from Stability AI—Robin Rombach, Andreas Blattmann, and Dominik Lorenz—who went on to found Black Forest Labs along with latent diffusion co-developer Patrick Esser and others.

Black Forest Labs launched with the release of three FLUX.1 text-to-image models: a high-end commercial “pro” version, a mid-range “dev” version with open weights for non-commercial use, and a faster open-weights “schnell” version (“schnell” means quick or fast in German). Black Forest Labs claims its models outperform existing options like Midjourney and DALL-E in areas such as image quality and adherence to text prompts.

  • AI-generated image by FLUX.1 dev: “A close-up photo of a pair of hands holding a plate full of pickles.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: A hand holding up five fingers with a starry background.

    FLUX.1

  • AI-generated image by FLUX.1 dev: “An Ars Technica reader sitting in front of a computer monitor. The screen shows the Ars Technica website.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “a boxer posing with fists raised, no gloves.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “An advertisement for ‘Frosted Prick’ cereal.”

    FLUX.1

  • AI-generated image of a happy woman in a bakery baking a cake by FLUX.1 dev.

    FLUX.1

  • AI-generated image by FLUX.1 dev: “An advertisement for ‘Marshmallow Menace’ cereal.”

    FLUX.1

  • AI-generated image of “A handsome Asian influencer on top of the Empire State Building, instagram” by FLUX.1 dev.

    FLUX.1

In our experience, the outputs of the two higher-end FLUX.1 models are generally comparable with OpenAI’s DALL-E 3 in prompt fidelity, with photorealism that seems close to Midjourney 6. They represent a significant improvement over Stable Diffusion XL, the team’s last major release under Stability (if you don’t count SDXL Turbo).

The FLUX.1 models use what the company calls a “hybrid architecture” combining transformer and diffusion techniques, scaled up to 12 billion parameters. Black Forest Labs said it improves on previous diffusion models by incorporating flow matching and other optimizations.

FLUX.1 seems competent at generating human hands, which was a weak spot in earlier image-synthesis models like Stable Diffusion 1.5 due to a lack of training images that focused on hands. Since those early days, other AI image generators like Midjourney have mastered hands as well, but it’s notable to see an open-weights model that renders hands relatively accurately in various poses.

We downloaded the weights file to the FLUX.1 dev model from GitHub, but at 23GB, it won’t fit in the 12GB VRAM of our RTX 3060 card, so it will need quantization to run locally (reducing its size), which reportedly (through chatter on Reddit) some people have already had success with.

Instead, we experimented with FLUX.1 models on AI cloud-hosting platforms Fal and Replicate, which cost money to use, though Fal offers some free credits to start.

Black Forest looks ahead

Black Forest Labs may be a new company, but it’s already attracting funding from investors. It recently closed a $31 million Series Seed funding round led by Andreessen Horowitz, with additional investments from General Catalyst and MätchVC. The company also brought on high-profile advisers, including entertainment executive and former Disney President Michael Ovitz and AI researcher Matthias Bethge.

“We believe that generative AI will be a fundamental building block of all future technologies,” the company stated in its announcement. “By making our models available to a wide audience, we want to bring its benefits to everyone, educate the public and enhance trust in the safety of these models.”

  • AI-generated image by FLUX.1 dev: A cat in a car holding a can of beer that reads, ‘AI Slop.’

    FLUX.1

  • AI-generated image by FLUX.1 dev: Mickey Mouse and Spider-Man singing to each other.

    FLUX.1

  • AI-generated image by FLUX.1 dev: “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.”

    FLUX.1

  • AI-generated image of a flaming cheeseburger created by FLUX.1 dev.

    FLUX.1

  • AI-generated image by FLUX.1 dev: “Will Smith eating spaghetti.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting. The screen reads ‘Ars Technica.'”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “An advertisement for ‘Burt’s Grenades’ cereal.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “A close-up photo of a pair of hands holding a plate that contains a portrait of the queen of the universe”

    FLUX.1

Speaking of “trust and safety,” the company did not mention where it obtained the training data that taught the FLUX.1 models how to generate images. Judging by the outputs we could produce with the model that included depictions of copyrighted characters, Black Forest Labs likely used a huge unauthorized image scrape of the Internet, possibly collected by LAION, an organization that collected the datasets that trained Stable Diffusion. This is speculation at this point. While the underlying technological achievement of FLUX.1 is notable, it feels likely that the team is playing fast and loose with the ethics of “fair use” image scraping much like Stability AI did. That practice may eventually attract lawsuits like those filed against Stability AI.

Though text-to-image generation is Black Forest’s current focus, the company plans to expand into video generation next, saying that FLUX.1 will serve as the foundation of a new text-to-video model in development, which will compete with OpenAI’s Sora, Runway’s Gen-3 Alpha, and Kuaishou’s Kling in a contest to warp media reality on demand. “Our video models will unlock precise creation and editing at high definition and unprecedented speed,” the Black Forest announcement claims.

FLUX: This new AI image generator is eerily good at creating human hands Read More »