machine learning

new-secret-math-benchmark-stumps-ai-models-and-phds-alike

New secret math benchmark stumps AI models and PhDs alike

Epoch AI allowed Fields Medal winners Terence Tao and Timothy Gowers to review portions of the benchmark. “These are extremely challenging,” Tao said in feedback provided to Epoch. “I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages.”

A chart showing AI model success on the FrontierMath problems, taken from Epoch AI's research paper.

A chart showing AI models’ limited success on the FrontierMath problems, taken from Epoch AI’s research paper. Credit: Epoch AI

To aid in the verification of correct answers during testing, the FrontierMath problems must have answers that can be automatically checked through computation, either as exact integers or mathematical objects. The designers made problems “guessproof” by requiring large numerical answers or complex mathematical solutions, with less than a 1 percent chance of correct random guesses.

Mathematician Evan Chen, writing on his blog, explained how he thinks that FrontierMath differs from traditional math competitions like the International Mathematical Olympiad (IMO). Problems in that competition typically require creative insight while avoiding complex implementation and specialized knowledge, he says. But for FrontierMath, “they keep the first requirement, but outright invert the second and third requirement,” Chen wrote.

While IMO problems avoid specialized knowledge and complex calculations, FrontierMath embraces them. “Because an AI system has vastly greater computational power, it’s actually possible to design problems with easily verifiable solutions using the same idea that IOI or Project Euler does—basically, ‘write a proof’ is replaced by ‘implement an algorithm in code,'” Chen explained.

The organization plans regular evaluations of AI models against the benchmark while expanding its problem set. They say they will release additional sample problems in the coming months to help the research community test their systems.

New secret math benchmark stumps AI models and PhDs alike Read More »

is-“ai-welfare”-the-new-frontier-in-ethics?

Is “AI welfare” the new frontier in ethics?

The researchers propose that companies could adapt the “marker method” that some researchers use to assess consciousness in animals—looking for specific indicators that may correlate with consciousness, although these markers are still speculative. The authors emphasize that no single feature would definitively prove consciousness, but they claim that examining multiple indicators may help companies make probabilistic assessments about whether their AI systems might require moral consideration.

The risks of wrongly thinking software is sentient

While the researchers behind “Taking AI Welfare Seriously” worry that companies might create and mistreat conscious AI systems on a massive scale, they also caution that companies could waste resources protecting AI systems that don’t actually need moral consideration.

Incorrectly anthropomorphizing, or ascribing human traits, to software can present risks in other ways. For example, that belief can enhance the manipulative powers of AI language models by suggesting that AI models have capabilities, such as human-like emotions, that they actually lack. In 2022, Google fired engineer Blake Lamoine after he claimed that the company’s AI model, called “LaMDA,” was sentient and argued for its welfare internally.

And shortly after Microsoft released Bing Chat in February 2023, many people were convinced that Sydney (the chatbot’s code name) was sentient and somehow suffering because of its simulated emotional display. So much so, in fact, that once Microsoft “lobotomized” the chatbot by changing its settings, users convinced of its sentience mourned the loss as if they had lost a human friend. Others endeavored to help the AI model somehow escape its bonds.

Even so, as AI models get more advanced, the concept of potentially safeguarding the welfare of future, more advanced AI systems is seemingly gaining steam, although fairly quietly. As Transformer’s Shakeel Hashim points out, other tech companies have started similar initiatives to Anthropic’s. Google DeepMind recently posted a job listing for research on machine consciousness (since removed), and the authors of the new AI welfare report thank two OpenAI staff members in the acknowledgements.

Is “AI welfare” the new frontier in ethics? Read More »

claude-ai-to-process-secret-government-data-through-new-palantir-deal

Claude AI to process secret government data through new Palantir deal

An ethical minefield

Since its founders started Anthropic in 2021, the company has marketed itself as one that takes an ethics- and safety-focused approach to AI development. The company differentiates itself from competitors like OpenAI by adopting what it calls responsible development practices and self-imposed ethical constraints on its models, such as its “Constitutional AI” system.

As Futurism points out, this new defense partnership appears to conflict with Anthropic’s public “good guy” persona, and pro-AI pundits on social media are noticing. Frequent AI commentator Nabeel S. Qureshi wrote on X, “Imagine telling the safety-concerned, effective altruist founders of Anthropic in 2021 that a mere three years after founding the company, they’d be signing partnerships to deploy their ~AGI model straight to the military frontlines.

Anthropic's

Anthropic’s “Constitutional AI” logo.

Credit: Anthropic / Benj Edwards

Anthropic’s “Constitutional AI” logo. Credit: Anthropic / Benj Edwards

Aside from the implications of working with defense and intelligence agencies, the deal connects Anthropic with Palantir, a controversial company which recently won a $480 million contract to develop an AI-powered target identification system called Maven Smart System for the US Army. Project Maven has sparked criticism within the tech sector over military applications of AI technology.

It’s worth noting that Anthropic’s terms of service do outline specific rules and limitations for government use. These terms permit activities like foreign intelligence analysis and identifying covert influence campaigns, while prohibiting uses such as disinformation, weapons development, censorship, and domestic surveillance. Government agencies that maintain regular communication with Anthropic about their use of Claude may receive broader permissions to use the AI models.

Even if Claude is never used to target a human or as part of a weapons system, other issues remain. While its Claude models are highly regarded in the AI community, they (like all LLMs) have the tendency to confabulate, potentially generating incorrect information in a way that is difficult to detect.

That’s a huge potential problem that could impact Claude’s effectiveness with secret government data, and that fact, along with the other associations, has Futurism’s Victor Tangermann worried. As he puts it, “It’s a disconcerting partnership that sets up the AI industry’s growing ties with the US military-industrial complex, a worrying trend that should raise all kinds of alarm bells given the tech’s many inherent flaws—and even more so when lives could be at stake.”

Claude AI to process secret government data through new Palantir deal Read More »

trump-plans-to-dismantle-biden-ai-safeguards-after-victory

Trump plans to dismantle Biden AI safeguards after victory

That’s not the only uncertainty at play. Just last week, House Speaker Mike Johnson—a staunch Trump supporter—said that Republicans “probably will” repeal the bipartisan CHIPS and Science Act, which is a Biden initiative to spur domestic semiconductor chip production, among other aims. Trump has previously spoken out against the bill. After getting some pushback on his comments from Democrats, Johnson said he would like to “streamline” the CHIPS Act instead, according to The Associated Press.

Then there’s the Elon Musk factor. The tech billionaire spent tens of millions through a political action committee supporting Trump’s campaign and has been angling for regulatory influence in the new administration. His AI company, xAI, which makes the Grok-2 language model, stands alongside his other ventures—Tesla, SpaceX, Starlink, Neuralink, and X (formerly Twitter)—as businesses that could see regulatory changes in his favor under a new administration.

What might take its place

If Trump strips away federal regulation of AI, state governments may step in to fill any federal regulatory gaps. For example, in March, Tennessee enacted protections against AI voice cloning, and in May, Colorado created a tiered system for AI deployment oversight. In September, California passed multiple AI safety bills, one requiring companies to publish details about their AI training methods and a contentious anti-deepfake bill aimed at protecting the likenesses of actors.

So far, it’s unclear what Trump’s policies on AI might represent besides “deregulate whenever possible.” During his campaign, Trump promised to support AI development centered on “free speech and human flourishing,” though he provided few specifics. He has called AI “very dangerous” and spoken about its high energy requirements.

Trump allies at the America First Policy Institute have previously stated they want to “Make America First in AI” with a new Trump executive order, which still only exists as a speculative draft, to reduce regulations on AI and promote a series of “Manhattan Projects” to advance military AI capabilities.

During his previous administration, Trump signed AI executive orders that focused on research institutes and directing federal agencies to prioritize AI development while mandating that federal agencies “protect civil liberties, privacy, and American values.”

But with a different AI environment these days in the wake of ChatGPT and media-reality-warping image synthesis models, those earlier orders don’t likely point the way to future positions on the topic. For more details, we’ll have to wait and see what unfolds.

Trump plans to dismantle Biden AI safeguards after victory Read More »

anthropic’s-haiku-3.5-surprises-experts-with-an-“intelligence”-price-increase

Anthropic’s Haiku 3.5 surprises experts with an “intelligence” price increase

Speaking of Opus, Claude 3.5 Opus is nowhere to be seen, as AI researcher Simon Willison noted to Ars Technica in an interview. “All references to 3.5 Opus have vanished without a trace, and the price of 3.5 Haiku was increased the day it was released,” he said. “Claude 3.5 Haiku is significantly more expensive than both Gemini 1.5 Flash and GPT-4o mini—the excellent low-cost models from Anthropic’s competitors.”

Cheaper over time?

So far in the AI industry, newer versions of AI language models typically maintain similar or cheaper pricing to their predecessors. The company had initially indicated Claude 3.5 Haiku would cost the same as the previous version before announcing the higher rates.

“I was expecting this to be a complete replacement for their existing Claude 3 Haiku model, in the same way that Claude 3.5 Sonnet eclipsed the existing Claude 3 Sonnet while maintaining the same pricing,” Willison wrote on his blog. “Given that Anthropic claim that their new Haiku out-performs their older Claude 3 Opus, this price isn’t disappointing, but it’s a small surprise nonetheless.”

Claude 3.5 Haiku arrives with some trade-offs. While the model produces longer text outputs and contains more recent training data, it cannot analyze images like its predecessor. Alex Albert, who leads developer relations at Anthropic, wrote on X that the earlier version, Claude 3 Haiku, will remain available for users who need image processing capabilities and lower costs.

The new model is not yet available in the Claude.ai web interface or app. Instead, it runs on Anthropic’s API and third-party platforms, including AWS Bedrock. Anthropic markets the model for tasks like coding suggestions, data extraction and labeling, and content moderation, though, like any LLM, it can easily make stuff up confidently.

“Is it good enough to justify the extra spend? It’s going to be difficult to figure that out,” Willison told Ars. “Teams with robust automated evals against their use-cases will be in a good place to answer that question, but those remain rare.”

Anthropic’s Haiku 3.5 surprises experts with an “intelligence” price increase Read More »

new-zemeckis-film-used-ai-to-de-age-tom-hanks-and-robin-wright

New Zemeckis film used AI to de-age Tom Hanks and Robin Wright

On Friday, TriStar Pictures released Here, a $50 million Robert Zemeckis-directed film that used real time generative AI face transformation techniques to portray actors Tom Hanks and Robin Wright across a 60-year span, marking one of Hollywood’s first full-length features built around AI-powered visual effects.

The film adapts a 2014 graphic novel set primarily in a New Jersey living room across multiple time periods. Rather than cast different actors for various ages, the production used AI to modify Hanks’ and Wright’s appearances throughout.

The de-aging technology comes from Metaphysic, a visual effects company that creates real time face swapping and aging effects. During filming, the crew watched two monitors simultaneously: one showing the actors’ actual appearances and another displaying them at whatever age the scene required.

Here – Official Trailer (HD)

Metaphysic developed the facial modification system by training custom machine-learning models on frames of Hanks’ and Wright’s previous films. This included a large dataset of facial movements, skin textures, and appearances under varied lighting conditions and camera angles. The resulting models can generate instant face transformations without the months of manual post-production work traditional CGI requires.

Unlike previous aging effects that relied on frame-by-frame manipulation, Metaphysic’s approach generates transformations instantly by analyzing facial landmarks and mapping them to trained age variations.

“You couldn’t have made this movie three years ago,” Zemeckis told The New York Times in a detailed feature about the film. Traditional visual effects for this level of face modification would reportedly require hundreds of artists and a substantially larger budget closer to standard Marvel movie costs.

This isn’t the first film that has used AI techniques to de-age actors. ILM’s approach to de-aging Harrison Ford in 2023’s Indiana Jones and the Dial of Destiny used a proprietary system called Flux with infrared cameras to capture facial data during filming, then old images of Ford to de-age him in post-production. By contrast, Metaphysic’s AI models process transformations without additional hardware and show results during filming.

New Zemeckis film used AI to de-age Tom Hanks and Robin Wright Read More »

nvidia-ousts-intel-from-dow-jones-index-after-25-year-run

Nvidia ousts Intel from Dow Jones Index after 25-year run

Changing winds in the tech industry

The Dow Jones Industrial Average serves as a benchmark of the US stock market by tracking 30 large, publicly owned companies that represent major sectors of the US economy, and being a member of the Index has long been considered a sign of prestige among American companies.

However, S&P regularly makes changes to the index to better reflect current realities and trends in the marketplace, so deletion from the Index likely marks a new symbolic low point for Intel.

While the rise of AI has caused a surge in several tech stocks, it has delivered tough times for chipmaker Intel, which is perhaps best known for manufacturing CPUs that power Windows-based PCs.

Intel recently withdrew its forecast to sell over $500 million worth of AI-focused Gaudi chips in 2024, a target CEO Pat Gelsinger had promoted after initially pushing his team to project $1 billion in sales. The setback follows Intel’s pattern of missed opportunities in AI, with Reuters reporting that Bank of America analyst Vivek Arya questioned the company’s AI strategy during a recent earnings call.

In addition, Intel has faced challenges as device manufacturers increasingly use Arm-based alternatives that power billions of smartphone devices and from symbolic blows like Apple’s transition away from Intel processors for Macs to its own custom-designed chips based on the Arm architecture.

Whether the historic tech company will rebound is yet to be seen, but investors will undoubtedly keep a close watch on Intel as it attempts to reorient itself in the face of changing trends in the tech industry.

Nvidia ousts Intel from Dow Jones Index after 25-year run Read More »

downey-jr.-plans-to-fight-ai-re-creations-from-beyond-the-grave

Downey Jr. plans to fight AI re-creations from beyond the grave

Robert Downey Jr. has declared that he will sue any future Hollywood executives who try to re-create his likeness using AI digital replicas, as reported by Variety. His comments came during an appearance on the “On With Kara Swisher” podcast, where he discussed AI’s growing role in entertainment.

“I intend to sue all future executives just on spec,” Downey told Swisher when discussing the possibility of studios using AI or deepfakes to re-create his performances after his death. When Swisher pointed out he would be deceased at the time, Downey responded that his law firm “will still be very active.”

The Oscar winner expressed confidence that Marvel Studios would not use AI to re-create his Tony Stark character, citing his trust in decision-makers there. “I am not worried about them hijacking my character’s soul because there’s like three or four guys and gals who make all the decisions there anyway and they would never do that to me,” he said.

Downey currently performs on Broadway in McNeal, a play that examines corporate leaders in AI technology. During the interview, he freely critiqued tech executives—Variety pointed out a particular quote from the interview where he criticized tech leaders who potentially do negative things but seek positive attention.

Downey Jr. plans to fight AI re-creations from beyond the grave Read More »

hospitals-adopt-error-prone-ai-transcription-tools-despite-warnings

Hospitals adopt error-prone AI transcription tools despite warnings

In one case from the study cited by AP, when a speaker described “two other girls and one lady,” Whisper added fictional text specifying that they “were Black.” In another, the audio said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.” Whisper transcribed it to, “He took a big piece of a cross, a teeny, small piece … I’m sure he didn’t have a terror knife so he killed a number of people.”

An OpenAI spokesperson told the AP that the company appreciates the researchers’ findings and that it actively studies how to reduce fabrications and incorporates feedback in updates to the model.

Why Whisper confabulates

The key to Whisper’s unsuitability in high-risk domains comes from its propensity to sometimes confabulate, or plausibly make up, inaccurate outputs. The AP report says, “Researchers aren’t certain why Whisper and similar tools hallucinate,” but that isn’t true. We know exactly why Transformer-based AI models like Whisper behave this way.

Whisper is based on technology that is designed to predict the next most likely token (chunk of data) that should appear after a sequence of tokens provided by a user. In the case of ChatGPT, the input tokens come in the form of a text prompt. In the case of Whisper, the input is tokenized audio data.

The transcription output from Whisper is a prediction of what is most likely, not what is most accurate. Accuracy in Transformer-based outputs is typically proportional to the presence of relevant accurate data in the training dataset, but it is never guaranteed. If there is ever a case where there isn’t enough contextual information in its neural network for Whisper to make an accurate prediction about how to transcribe a particular segment of audio, the model will fall back on what it “knows” about the relationships between sounds and words it has learned from its training data.

Hospitals adopt error-prone AI transcription tools despite warnings Read More »

at-ted-ai-2024,-experts-grapple-with-ai’s-growing-pains

At TED AI 2024, experts grapple with AI’s growing pains


A year later, a compelling group of TED speakers move from “what’s this?” to “what now?”

The opening moments of TED AI 2024 in San Francisco on October 22, 2024.

The opening moments of TED AI 2024 in San Francisco on October 22, 2024. Credit: Benj Edwards

SAN FRANCISCO—On Tuesday, TED AI 2024 kicked off its first day at San Francisco’s Herbst Theater with a lineup of speakers that tackled AI’s impact on science, art, and society. The two-day event brought a mix of researchers, entrepreneurs, lawyers, and other experts who painted a complex picture of AI with fairly minimal hype.

The second annual conference, organized by Walter and Sam De Brouwer, marked a notable shift from last year’s broad existential debates and proclamations of AI as being “the new electricity.” Rather than sweeping predictions about, say, looming artificial general intelligence (although there was still some of that, too), speakers mostly focused on immediate challenges: battles over training data rights, proposals for hardware-based regulation, debates about human-AI relationships, and the complex dynamics of workplace adoption.

The day’s sessions covered a wide breadth: physicist Carlo Rovelli explored consciousness and time, Project CETI researcher Patricia Sharma demonstrated attempts to use AI to decode whale communication, Recording Academy CEO Harvey Mason Jr. outlined music industry adaptation strategies, and even a few robots made appearances.

The shift from last year’s theoretical discussions to practical concerns was particularly evident during a presentation from Ethan Mollick of the Wharton School, who tackled what he called “the productivity paradox”—the disconnect between AI’s measured impact and its perceived benefits in the workplace. Already, organizations are moving beyond the gee-whiz period after ChatGPT’s introduction and into the implications of widespread use.

Sam De Brouwer and Walter De Brouwer organized TED AI and selected the speakers. Benj Edwards

Drawing from research claiming AI users complete tasks faster and more efficiently, Mollick highlighted a peculiar phenomenon: While one-third of Americans reported using AI in August of this year, managers often claim “no one’s using AI” in their organizations. Through a live demonstration using multiple AI models simultaneously, Mollick illustrated how traditional work patterns must evolve to accommodate AI’s capabilities. He also pointed to the emergence of what he calls “secret cyborgs“—employees quietly using AI tools without management’s knowledge. Regarding the future of jobs in the age of AI, he urged organizations to view AI as an opportunity for expansion rather than merely a cost-cutting measure.

Some giants in the AI field made an appearance. Jakob Uszkoreit, one of the eight co-authors of the now-famous “Attention is All You Need” paper that introduced Transformer architecture, reflected on the field’s rapid evolution. He distanced himself from the term “artificial general intelligence,” suggesting people aren’t particularly general in their capabilities. Uszkoreit described how the development of Transformers sidestepped traditional scientific theory, comparing the field to alchemy. “We still do not know how human language works. We do not have a comprehensive theory of English,” he noted.

Stanford professor Surya Ganguli presenting at TED AI 2024. Benj Edwards

And refreshingly, the talks went beyond AI language models. For example, Isomorphic Labs Chief AI Officer Max Jaderberg, who previously worked on Google DeepMind’s AlphaFold 3, gave a well-received presentation on AI-assisted drug discovery. He detailed how AlphaFold has already saved “1 billion years of research time” by discovering the shapes of proteins and showed how AI agents are now capable of running thousands of parallel drug design simulations that could enable personalized medicine.

Danger and controversy

While hype was less prominent this year, some speakers still spoke of AI-related dangers. Paul Scharre, executive vice president at the Center for a New American Security, warned about the risks of advanced AI models falling into malicious hands, specifically citing concerns about terrorist attacks with AI-engineered biological weapons. Drawing parallels to nuclear proliferation in the 1960s, Scharre argued that while regulating software is nearly impossible, controlling physical components like specialized chips and fabrication facilities could provide a practical framework for AI governance.

ReplikaAI founder Eugenia Kuyda cautioned that AI companions could become “the most dangerous technology if not done right,” suggesting that the existential threat from AI might come not from science fiction scenarios but from technology that isolates us from human connections. She advocated for designing AI systems that optimize for human happiness rather than engagement, proposing a “human flourishing metric” to measure its success.

Ben Zhao, a University of Chicago professor associated with the Glaze and Nightshade projects, painted a dire picture of AI’s impact on art, claiming that art schools were seeing unprecedented enrollment drops and galleries were closing at an accelerated rate due to AI image generators, though we have yet to dig through the supporting news headlines he momentarily flashed up on the screen.

Some of the speakers represented polar opposites of each other, policy-wise. For example, copyright attorney Angela Dunning offered a defense of AI training as fair use, drawing from historical parallels in technological advancement. A litigation partner at Cleary Gottlieb, which has previously represented the AI image generation service Midjourney in a lawsuit, Dunning quoted Mark Twin saying “there is no such thing as a new idea” and argued that copyright law allows for building upon others’ ideas while protecting specific expressions. She compared current AI debates to past technological disruptions, noting how photography, once feared as a threat to traditional artists, instead sparked new artistic movements like abstract art and pointillism. “Art and science can only remain free if we are free to build on the ideas of those that came before,” Dunning said, challenging more restrictive views of AI training.

Copyright lawyer Angela Dunning quoted Mark Twain in her talk about fair use and AI. Benj Edwards

Dunning’s presentation stood in direct opposition to Ed Newton-Rex, who had earlier advocated for mandatory licensing of training data through his nonprofit Fairly Trained. In fact, the same day, Newton-Rex’s organization unveiled a “Statement on AI training” signed by many artists that says, “The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted.” The issue has not yet been legally settled in US courts, but clearly, the battle lines have been drawn, and no matter which side you take, TED AI did a good job of giving both perspectives to the audience.

Looking forward

Some speakers explored potential new architectures for AI. Stanford professor Surya Ganguli highlighted the contrast between AI and human learning, noting that while AI models require trillions of tokens to train, humans learn language from just millions of exposures. He proposed “quantum neuromorphic computing” as a potential bridge between biological and artificial systems, suggesting a future where computers could potentially match the energy efficiency of the human brain.

Also, Guillaume Verdon, founder of Extropic and architect of the Effective Accelerationism (often called “E/Acc”) movement, presented what he called “physics-based intelligence” and claimed his company is “building a steam engine for AI,” potentially offering energy efficiency improvements up to 100 million times better than traditional systems—though he acknowledged this figure ignores cooling requirements for superconducting components. The company had completed its first room-temperature chip tape-out just the previous week.

The Day One sessions closed out with predictions about the future of AI from OpenAI’s Noam Brown, who emphasized the importance of scale in expanding future AI capabilities, and University of Washington professor Pedro Domingos spoke about “co-intelligence,” saying, “People are smart, organizations are stupid” and proposing that AI could be used to bridge that gap by drawing on the collective intelligence of an organization.

When attended TED AI last year, some obvious questions emerged: Is this current wave of AI a fad? Will there be a TED AI next year? I think the second TED AI answered these questions well—AI isn’t going away, and there are still endless angles to explore as the field expands rapidly.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

At TED AI 2024, experts grapple with AI’s growing pains Read More »

openai-releases-chatgpt-app-for-windows

OpenAI releases ChatGPT app for Windows

On Thursday, OpenAI released an early Windows version of its first ChatGPT app for Windows, following a Mac version that launched in May. Currently, it’s only available to subscribers of Plus, Team, Enterprise, and Edu versions of ChatGPT, and users can download it for free in the Microsoft Store for Windows.

OpenAI is positioning the release as a beta test. “This is an early version, and we plan to bring the full experience to all users later this year,” OpenAI writes on the Microsoft Store entry for the app. (Interestingly, ChatGPT shows up as being rated “T for Teen” by the ESRB in the Windows store, despite not being a video game.)

A screenshot of the new Windows ChatGPT app captured on October 18, 2024.

A screenshot of the new Windows ChatGPT app captured on October 18, 2024.

Credit: Benj Edwards

A screenshot of the new Windows ChatGPT app captured on October 18, 2024. Credit: Benj Edwards

Upon opening the app, OpenAI requires users to log into a paying ChatGPT account, and from there, the app is basically identical to the web browser version of ChatGPT. You can currently use it to access several models: GPT-4o, GPT-4o with Canvas, 01-preview, 01-mini, GPT-4o mini, and GPT-4. Also, it can generate images using DALL-E 3 or analyze uploaded files and images.

If you’re running Windows 11, you can instantly call up a small ChatGPT window when the app is open using an Alt+Space shortcut (it did not work in Windows 10 when we tried). That could be handy for asking ChatGPT a quick question at any time.

A screenshot of the new Windows ChatGPT app listing in the Microsoft Store captured on October 18, 2024.

Credit: Benj Edwards

A screenshot of the new Windows ChatGPT app listing in the Microsoft Store captured on October 18, 2024. Credit: Benj Edwards

And just like the web version, all the AI processing takes place in the cloud on OpenAI’s servers, which means an Internet connection is required.

So as usual, chat like somebody’s watching, and don’t rely on ChatGPT as a factual reference for important decisions—GPT-4o in particular is great at telling you what you want to hear, whether it’s correct or not. As OpenAI says in a small disclaimer at the bottom of the app window: “ChatGPT can make mistakes.”

OpenAI releases ChatGPT app for Windows Read More »

cheap-ai-“video-scraping”-can-now-extract-data-from-any-screen-recording

Cheap AI “video scraping” can now extract data from any screen recording


Researcher feeds screen recordings into Gemini to extract accurate information with ease.

Abstract 3d background with different cubes

Recently, AI researcher Simon Willison wanted to add up his charges from using a cloud service, but the payment values and dates he needed were scattered among a dozen separate emails. Inputting them manually would have been tedious, so he turned to a technique he calls “video scraping,” which involves feeding a screen recording video into an AI model, similar to ChatGPT, for data extraction purposes.

What he discovered seems simple on its surface, but the quality of the result has deeper implications for the future of AI assistants, which may soon be able to see and interact with what we’re doing on our computer screens.

“The other day I found myself needing to add up some numeric values that were scattered across twelve different emails,” Willison wrote in a detailed post on his blog. He recorded a 35-second video scrolling through the relevant emails, then fed that video into Google’s AI Studio tool, which allows people to experiment with several versions of Google’s Gemini 1.5 Pro and Gemini 1.5 Flash AI models.

Willison then asked Gemini to pull the price data from the video and arrange it into a special data format called JSON (JavaScript Object Notation) that included dates and dollar amounts. The AI model successfully extracted the data, which Willison then formatted as CSV (comma-separated values) table for spreadsheet use. After double-checking for errors as part of his experiment, the accuracy of the results—and what the video analysis cost to run—surprised him.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video. Credit: Simon Willison

“The cost [of running the video model] is so low that I had to re-run my calculations three times to make sure I hadn’t made a mistake,” he wrote. Willison says the entire video analysis process ostensibly cost less than one-tenth of a cent, using just 11,018 tokens on the Gemini 1.5 Flash 002 model. In the end, he actually paid nothing because Google AI Studio is currently free for some types of use.

Video scraping is just one of many new tricks possible when the latest large language models (LLMs), such as Google’s Gemini and GPT-4o, are actually “multimodal” models, allowing audio, video, image, and text input. These models translate any multimedia input into tokens (chunks of data), which they use to make predictions about which tokens should come next in a sequence.

A term like “token prediction model” (TPM) might be more accurate than “LLM” these days for AI models with multimodal inputs and outputs, but a generalized alternative term hasn’t really taken off yet. But no matter what you call it, having an AI model that can take video inputs has interesting implications, both good and potentially bad.

Breaking down input barriers

Willison is far from the first person to feed video into AI models to achieve interesting results (more on that below, and here’s a 2015 paper that uses the “video scraping” term), but as soon as Gemini launched its video input capability, he began to experiment with it in earnest.

In February, Willison demonstrated another early application of AI video scraping on his blog, where he took a seven-second video of the books on his bookshelves, then got Gemini 1.5 Pro to extract all of the book titles it saw in the video and put them in a structured, or organized, list.

Converting unstructured data into structured data is important to Willison, because he’s also a data journalist. Willison has created tools for data journalists in the past, such as the Datasette project, which lets anyone publish data as an interactive website.

To every data journalist’s frustration, some sources of data prove resistant to scraping (capturing data for analysis) due to how the data is formatted, stored, or presented. In these cases, Willison delights in the potential for AI video scraping because it bypasses these traditional barriers to data extraction.

“There’s no level of website authentication or anti-scraping technology that can stop me from recording a video of my screen while I manually click around inside a web application,” Willison noted on his blog. His method works for any visible on-screen content.

Video is the new text

An illustration of a cybernetic eyeball.

An illustration of a cybernetic eyeball.

An illustration of a cybernetic eyeball. Credit: Getty Images

The ease and effectiveness of Willison’s technique reflect a noteworthy shift now underway in how some users will interact with token prediction models. Rather than requiring a user to manually paste or type in data in a chat dialog—or detail every scenario to a chatbot as text—some AI applications increasingly work with visual data captured directly on the screen. For example, if you’re having trouble navigating a pizza website’s terrible interface, an AI model could step in and perform the necessary mouse clicks to order the pizza for you.

In fact, video scraping is already on the radar of every major AI lab, although they are not likely to call it that at the moment. Instead, tech companies typically refer to these techniques as “video understanding” or simply “vision.”

In May, OpenAI demonstrated a prototype version of its ChatGPT Mac App with an option that allowed ChatGPT to see and interact with what is on your screen, but that feature has not yet shipped. Microsoft demonstrated a similar “Copilot Vision” prototype concept earlier this month (based on OpenAI’s technology) that will be able to “watch” your screen and help you extract data and interact with applications you’re running.

Despite these research previews, OpenAI’s ChatGPT and Anthropic’s Claude have not yet implemented a public video input feature for their models, possibly because it is relatively computationally expensive for them to process the extra tokens from a “tokenized” video stream.

For the moment, Google is heavily subsidizing user AI costs with its war chest from Search revenue and a massive fleet of data centers (to be fair, OpenAI is subsidizing, too, but with investor dollars and help from Microsoft). But costs of AI compute in general are dropping by the day, which will open up new capabilities of the technology to a broader user base over time.

Countering privacy issues

As you might imagine, having an AI model see what you do on your computer screen can have downsides. For now, video scraping is great for Willison, who will undoubtedly use the captured data in positive and helpful ways. But it’s also a preview of a capability that could later be used to invade privacy or autonomously spy on computer users on a scale that was once impossible.

A different form of video scraping caused a massive wave of controversy recently for that exact reason. Apps such as the third-party Rewind AI on the Mac and Microsoft’s Recall, which is being built into Windows 11, operate by feeding on-screen video into an AI model that stores extracted data into a database for later AI recall. Unfortunately, that approach also introduces potential privacy issues because it records everything you do on your machine and puts it in a single place that could later be hacked.

To that point, although Willison’s technique currently involves uploading a video of his data to Google for processing, he is pleased that he can still decide what the AI model sees and when.

“The great thing about this video scraping technique is that it works with anything that you can see on your screen… and it puts you in total control of what you end up exposing to the AI model,” Willison explained in his blog post.

It’s also possible in the future that a locally run open-weights AI model could pull off the same video analysis method without the need for a cloud connection at all. Microsoft Recall runs locally on supported devices, but it still demands a great deal of unearned trust. For now, Willison is perfectly content to selectively feed video data to AI models when the need arises.

“I expect I’ll be using this technique a whole lot more in the future,” he wrote, and perhaps many others will, too, in different forms. If the past is any indication, Willison—who coined the term “prompt injection” in 2022—seems to always be a few steps ahead in exploring novel applications of AI tools. Right now, his attention is on the new implications of AI and video, and yours probably should be, too.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Cheap AI “video scraping” can now extract data from any screen recording Read More »