chatgtp

matrix-multiplication-breakthrough-could-lead-to-faster,-more-efficient-ai-models

Matrix multiplication breakthrough could lead to faster, more efficient AI models

The Matrix Revolutions —

At the heart of AI, matrix math has just seen its biggest boost “in more than a decade.”

Futuristic huge technology tunnel and binary data.

Enlarge / When you do math on a computer, you fly through a numerical tunnel like this—figuratively, of course.

Computer scientists have discovered a new way to multiply large matrices faster than ever before by eliminating a previously unknown inefficiency, reports Quanta Magazine. This could eventually accelerate AI models like ChatGPT, which rely heavily on matrix multiplication to function. The findings, presented in two recent papers, have led to what is reported to be the biggest improvement in matrix multiplication efficiency in over a decade.

Multiplying two rectangular number arrays, known as matrix multiplication, plays a crucial role in today’s AI models, including speech and image recognition, chatbots from every major vendor, AI image generators, and video synthesis models like Sora. Beyond AI, matrix math is so important to modern computing (think image processing and data compression) that even slight gains in efficiency could lead to computational and power savings.

Graphics processing units (GPUs) excel in handling matrix multiplication tasks because of their ability to process many calculations at once. They break down large matrix problems into smaller segments and solve them concurrently using an algorithm.

Perfecting that algorithm has been the key to breakthroughs in matrix multiplication efficiency over the past century—even before computers entered the picture. In October 2022, we covered a new technique discovered by a Google DeepMind AI model called AlphaTensor, focusing on practical algorithmic improvements for specific matrix sizes, such as 4×4 matrices.

By contrast, the new research, conducted by Ran Duan and Renfei Zhou of Tsinghua University, Hongxun Wu of the University of California, Berkeley, and by Virginia Vassilevska Williams, Yinzhan Xu, and Zixuan Xu of the Massachusetts Institute of Technology (in a second paper), seeks theoretical enhancements by aiming to lower the complexity exponent, ω, for a broad efficiency gain across all sizes of matrices. Instead of finding immediate, practical solutions like AlphaTensor, the new technique addresses foundational improvements that could transform the efficiency of matrix multiplication on a more general scale.

Approaching the ideal value

The traditional method for multiplying two n-by-n matrices requires n³ separate multiplications. However, the new technique, which improves upon the “laser method” introduced by Volker Strassen in 1986, has reduced the upper bound of the exponent (denoted as the aforementioned ω), bringing it closer to the ideal value of 2, which represents the theoretical minimum number of operations needed.

The traditional way of multiplying two grids full of numbers could require doing the math up to 27 times for a grid that’s 3×3. But with these advancements, the process is accelerated by significantly reducing the multiplication steps required. The effort minimizes the operations to slightly over twice the size of one side of the grid squared, adjusted by a factor of 2.371552. This is a big deal because it nearly achieves the optimal efficiency of doubling the square’s dimensions, which is the fastest we could ever hope to do it.

Here’s a brief recap of events. In 2020, Josh Alman and Williams introduced a significant improvement in matrix multiplication efficiency by establishing a new upper bound for ω at approximately 2.3728596. In November 2023, Duan and Zhou revealed a method that addressed an inefficiency within the laser method, setting a new upper bound for ω at approximately 2.371866. The achievement marked the most substantial progress in the field since 2010. But just two months later, Williams and her team published a second paper that detailed optimizations that reduced the upper bound for ω to 2.371552.

The 2023 breakthrough stemmed from the discovery of a “hidden loss” in the laser method, where useful blocks of data were unintentionally discarded. In the context of matrix multiplication, “blocks” refer to smaller segments that a large matrix is divided into for easier processing, and “block labeling” is the technique of categorizing these segments to identify which ones to keep and which to discard, optimizing the multiplication process for speed and efficiency. By modifying the way the laser method labels blocks, the researchers were able to reduce waste and improve efficiency significantly.

While the reduction of the omega constant might appear minor at first glance—reducing the 2020 record value by 0.0013076—the cumulative work of Duan, Zhou, and Williams represents the most substantial progress in the field observed since 2010.

“This is a major technical breakthrough,” said William Kuszmaul, a theoretical computer scientist at Harvard University, as quoted by Quanta Magazine. “It is the biggest improvement in matrix multiplication we’ve seen in more than a decade.”

While further progress is expected, there are limitations to the current approach. Researchers believe that understanding the problem more deeply will lead to the development of even better algorithms. As Zhou stated in the Quanta report, “People are still in the very early stages of understanding this age-old problem.”

So what are the practical applications? For AI models, a reduction in computational steps for matrix math could translate into faster training times and more efficient execution of tasks. It could enable more complex models to be trained more quickly, potentially leading to advancements in AI capabilities and the development of more sophisticated AI applications. Additionally, efficiency improvement could make AI technologies more accessible by lowering the computational power and energy consumption required for these tasks. That would also reduce AI’s environmental impact.

The exact impact on the speed of AI models depends on the specific architecture of the AI system and how heavily its tasks rely on matrix multiplication. Advancements in algorithmic efficiency often need to be coupled with hardware optimizations to fully realize potential speed gains. But still, as improvements in algorithmic techniques add up over time, AI will get faster.

Matrix multiplication breakthrough could lead to faster, more efficient AI models Read More »

some-teachers-are-now-using-chatgpt-to-grade-papers

Some teachers are now using ChatGPT to grade papers

robots in disguise —

New AI tools aim to help with grading, lesson plans—but may have serious drawbacks.

An elementary-school-aged child touching a robot hand.

In a notable shift toward sanctioned use of AI in schools, some educators in grades 3–12 are now using a ChatGPT-powered grading tool called Writable, reports Axios. The tool, acquired last summer by Houghton Mifflin Harcourt, is designed to streamline the grading process, potentially offering time-saving benefits for teachers. But is it a good idea to outsource critical feedback to a machine?

Writable lets teachers submit student essays for analysis by ChatGPT, which then provides commentary and observations on the work. The AI-generated feedback goes to teacher review before being passed on to students so that a human remains in the loop.

“Make feedback more actionable with AI suggestions delivered to teachers as the writing happens,” Writable promises on its AI website. “Target specific areas for improvement with powerful, rubric-aligned comments, and save grading time with AI-generated draft scores.” The service also provides AI-written writing-prompt suggestions: “Input any topic and instantly receive unique prompts that engage students and are tailored to your classroom needs.”

Writable can reportedly help a teacher develop a curriculum, although we have not tried the functionality ourselves. “Once in Writable you can also use AI to create curriculum units based on any novel, generate essays, multi-section assignments, multiple-choice questions, and more, all with included answer keys,” the site claims.

The reliance on AI for grading will likely have drawbacks. Automated grading might encourage some educators to take shortcuts, diminishing the value of personalized feedback. Over time, the augmentation from AI may allow teachers to be less familiar with the material they are teaching. The use of cloud-based AI tools may have privacy implications for teachers and students. Also, ChatGPT isn’t a perfect analyst. It can get things wrong and potentially confabulate (make up) false information, possibly misinterpret a student’s work, or provide erroneous information in lesson plans.

Yet, as Axios reports, proponents assert that AI grading tools like Writable may free up valuable time for teachers, enabling them to focus on more creative and impactful teaching activities. The company selling Writable promotes it as a way to empower educators, supposedly offering them the flexibility to allocate more time to direct student interaction and personalized teaching. Of course, without an in-depth critical review, all claims should be taken with a huge grain of salt.

Amid these discussions, there’s a divide among parents regarding the use of AI in evaluating students’ academic performance. A recent poll of parents revealed mixed opinions, with nearly half of the respondents open to the idea of AI-assisted grading.

As the generative AI craze permeates every space, it’s no surprise that Writable isn’t the only AI-powered grading tool on the market. Others include Crowdmark, Gradescope, and EssayGrader. McGraw Hill is reportedly developing similar technology aimed at enhancing teacher assessment and feedback.

Some teachers are now using ChatGPT to grade papers Read More »

openai-clarifies-the-meaning-of-“open”-in-its-name,-responding-to-musk-lawsuit

OpenAI clarifies the meaning of “open” in its name, responding to Musk lawsuit

The OpenAI logo as an opening to a red brick wall.

Enlarge (credit: Benj Edwards / Getty Images)

On Tuesday, OpenAI published a blog post titled “OpenAI and Elon Musk” in response to a lawsuit Musk filed last week. The ChatGPT maker shared several archived emails from Musk that suggest he once supported a pivot away from open source practices in the company’s quest to develop artificial general intelligence (AGI). The selected emails also imply that the “open” in “OpenAI” means that the ultimate result of its research into AGI should be open to everyone but not necessarily “open source” along the way.

In one telling exchange from January 2016 shared by the company, OpenAI Chief Scientist Illya Sutskever wrote, “As we get closer to building AI, it will make sense to start being less open. The Open in openAI means that everyone should benefit from the fruits of AI after its built, but it’s totally OK to not share the science (even though sharing everything is definitely the right strategy in the short and possibly medium term for recruitment purposes).”

In response, Musk replied simply, “Yup.”

Read 8 remaining paragraphs | Comments

OpenAI clarifies the meaning of “open” in its name, responding to Musk lawsuit Read More »

the-ai-wars-heat-up-with-claude-3,-claimed-to-have-“near-human”-abilities

The AI wars heat up with Claude 3, claimed to have “near-human” abilities

The Anthropic Claude 3 logo.

Enlarge / The Anthropic Claude 3 logo.

On Monday, Anthropic released Claude 3, a family of three AI language models similar to those that power ChatGPT. Anthropic claims the models set new industry benchmarks across a range of cognitive tasks, even approaching “near-human” capability in some cases. It’s available now through Anthropic’s website, with the most powerful model being subscription-only. It’s also available via API for developers.

Claude 3’s three models represent increasing complexity and parameter count: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Sonnet powers the Claude.ai chatbot now for free with an email sign-in. But as mentioned above, Opus is only available through Anthropic’s web chat interface if you pay $20 a month for “Claude Pro,” a subscription service offered through the Anthropic website. All three feature a 200,000-token context window. (The context window is the number of tokens—fragments of a word—that an AI language model can process at once.)

We covered the launch of Claude in March 2023 and Claude 2 in July that same year. Each time, Anthropic fell slightly behind OpenAI’s best models in capability while surpassing them in terms of context window length. With Claude 3, Anthropic has perhaps finally caught up with OpenAI’s released models in terms of performance, although there is no consensus among experts yet—and the presentation of AI benchmarks is notoriously prone to cherry-picking.

A Claude 3 benchmark chart provided by Anthropic.

Enlarge / A Claude 3 benchmark chart provided by Anthropic.

Claude 3 reportedly demonstrates advanced performance across various cognitive tasks, including reasoning, expert knowledge, mathematics, and language fluency. (Despite the lack of consensus over whether large language models “know” or “reason,” the AI research community commonly uses those terms.) The company claims that the Opus model, the most capable of the three, exhibits “near-human levels of comprehension and fluency on complex tasks.”

That’s quite a heady claim and deserves to be parsed more carefully. It’s probably true that Opus is “near-human” on some specific benchmarks, but that doesn’t mean that Opus is a general intelligence like a human (consider that pocket calculators are superhuman at math). So, it’s a purposely eye-catching claim that can be watered down with qualifications.

According to Anthropic, Claude 3 Opus beats GPT-4 on 10 AI benchmarks, including MMLU (undergraduate level knowledge), GSM8K (grade school math), HumanEval (coding), and the colorfully named HellaSwag (common knowledge). Several of the wins are very narrow, such as 86.8 percent for Opus vs. 86.4 percent on a five-shot trial of MMLU, and some gaps are big, such as 84.9 percent on HumanEval over GPT-4’s 67.0 percent. But what that might mean, exactly, to you as a customer is difficult to say.

“As always, LLM benchmarks should be treated with a little bit of suspicion,” says AI researcher Simon Willison, who spoke with Ars about Claude 3. “How well a model performs on benchmarks doesn’t tell you much about how the model ‘feels’ to use. But this is still a huge deal—no other model has beaten GPT-4 on a range of widely used benchmarks like this.”

The AI wars heat up with Claude 3, claimed to have “near-human” abilities Read More »

ai-generated-articles-prompt-wikipedia-to-downgrade-cnet’s-reliability-rating

AI-generated articles prompt Wikipedia to downgrade CNET’s reliability rating

The hidden costs of AI —

Futurism report highlights the reputational cost of publishing AI-generated content.

The CNET logo on a smartphone screen.

Wikipedia has downgraded tech website CNET’s reliability rating following extensive discussions among its editors regarding the impact of AI-generated content on the site’s trustworthiness, as noted in a detailed report from Futurism. The decision reflects concerns over the reliability of articles found on the tech news outlet after it began publishing AI-generated stories in 2022.

Around November 2022, CNET began publishing articles written by an AI model under the byline “CNET Money Staff.” In January 2023, Futurism brought widespread attention to the issue and discovered that the articles were full of plagiarism and mistakes. (Around that time, we covered plans to do similar automated publishing at BuzzFeed.) After the revelation, CNET management paused the experiment, but the reputational damage had already been done.

Wikipedia maintains a page called “Reliable sources/Perennial sources” that includes a chart featuring news publications and their reliability ratings as viewed from Wikipedia’s perspective. Shortly after the CNET news broke in January 2023, Wikipedia editors began a discussion thread on the Reliable Sources project page about the publication.

“CNET, usually regarded as an ordinary tech RS [reliable source], has started experimentally running AI-generated articles, which are riddled with errors,” wrote a Wikipedia editor named David Gerard. “So far the experiment is not going down well, as it shouldn’t. I haven’t found any yet, but any of these articles that make it into a Wikipedia article need to be removed.”

After other editors agreed in the discussion, they began the process of downgrading CNET’s reliability rating.

As of this writing, Wikipedia’s Perennial Sources list currently features three entries for CNET broken into three time periods: (1) before October 2020, when Wikipedia considered CNET a “generally reliable” source; (2) between October 2020 and October 2022, where Wikipedia notes that the site was acquired by Red Ventures in October 2020, “leading to a deterioration in editorial standards” and saying there is no consensus about reliability; and (3) between November 2022 and present, where Wikipedia currently considers CNET “generally unreliable” after the site began using an AI tool “to rapidly generate articles riddled with factual inaccuracies and affiliate links.”

A screenshot of a chart featuring CNET's reliability ratings, as found on Wikipedia's

Enlarge / A screenshot of a chart featuring CNET’s reliability ratings, as found on Wikipedia’s “Perennial Sources” page.

Futurism reports that the issue with CNET’s AI-generated content also sparked a broader debate within the Wikipedia community about the reliability of sources owned by Red Ventures, such as Bankrate and CreditCards.com. Those sites published AI-generated content around the same period of time as CNET. The editors also criticized Red Ventures for not being forthcoming about where and how AI was being implemented, further eroding trust in the company’s publications. This lack of transparency was a key factor in the decision to downgrade CNET’s reliability rating.

In response to the downgrade and the controversies surrounding AI-generated content, CNET issued a statement that claims that the site maintains high editorial standards.

“CNET is the world’s largest provider of unbiased tech-focused news and advice,” a CNET spokesperson said in a statement to Futurism. “We have been trusted for nearly 30 years because of our rigorous editorial and product review standards. It is important to clarify that CNET is not actively using AI to create new content. While we have no specific plans to restart, any future initiatives would follow our public AI policy.”

This article was updated on March 1, 2024 at 9: 30am to reflect fixes in the date ranges for CNET on the Perennial Sources page.

AI-generated articles prompt Wikipedia to downgrade CNET’s reliability rating Read More »

microsoft-partners-with-openai-rival-mistral-for-ai-models,-drawing-eu-scrutiny

Microsoft partners with OpenAI-rival Mistral for AI models, drawing EU scrutiny

The European Approach —

15M euro investment comes as Microsoft hosts Mistral’s GPT-4 alternatives on Azure.

Velib bicycles are parked in front of the the U.S. computer and micro-computing company headquarters Microsoft on January 25, 2023 in Issy-les-Moulineaux, France.

On Monday, Microsoft announced plans to offer AI models from Mistral through its Azure cloud computing platform, which came in conjunction with a 15 million euro non-equity investment in the French firm, which is often seen as a European rival to OpenAI. Since then, the investment deal has faced scrutiny from European Union regulators.

Microsoft’s deal with Mistral, known for its large language models akin to OpenAI’s GPT-4 (which powers the subscription versions of ChatGPT), marks a notable expansion of its AI portfolio at a time when its well-known investment in California-based OpenAI has raised regulatory eyebrows. The new deal with Mistral drew particular attention from regulators because Microsoft’s investment could convert into equity (partial ownership of Mistral as a company) during Mistral’s next funding round.

The development has intensified ongoing investigations into Microsoft’s practices, particularly related to the tech giant’s dominance in the cloud computing sector. According to Reuters, EU lawmakers have voiced concerns that Mistral’s recent lobbying for looser AI regulations might have been influenced by its relationship with Microsoft. These apprehensions are compounded by the French government’s denial of prior knowledge of the deal, despite earlier lobbying for more lenient AI laws in Europe. The situation underscores the complex interplay between national interests, corporate influence, and regulatory oversight in the rapidly evolving AI landscape.

Avoiding American influence

The EU’s reaction to the Microsoft-Mistral deal reflects broader tensions over the role of Big Tech companies in shaping the future of AI and their potential to stifle competition. Calls for a thorough investigation into Microsoft and Mistral’s partnership have been echoed across the continent, according to Reuters, with some lawmakers accusing the firms of attempting to undermine European legislative efforts aimed at ensuring a fair and competitive digital market.

The controversy also touches on the broader debate about “European champions” in the tech industry. France, along with Germany and Italy, had advocated for regulatory exemptions to protect European startups. However, the Microsoft-Mistral deal has led some, like MEP Kim van Sparrentak, to question the motives behind these exemptions, suggesting they might have inadvertently favored American Big Tech interests.

“That story seems to have been a front for American-influenced Big Tech lobby,” said Sparrentak, as quoted by Reuters. Sparrentak has been a key architect of the EU’s AI Act, which has not yet been passed. “The Act almost collapsed under the guise of no rules for ‘European champions,’ and now look. European regulators have been played.”

MEP Alexandra Geese also expressed concerns over the concentration of money and power resulting from such partnerships, calling for an investigation. Max von Thun, Europe director at the Open Markets Institute, emphasized the urgency of investigating the partnership, criticizing Mistral’s reported attempts to influence the AI Act.

Also on Monday, amid the partnership news, Mistral announced Mistral Large, a new large language model (LLM) that Mistral says “ranks directly after GPT-4 based on standard benchmarks.” Mistral has previously released several open-weights AI models that have made news for their capabilities, but Mistral Large will be a closed model only available to customers through an API.

Microsoft partners with OpenAI-rival Mistral for AI models, drawing EU scrutiny Read More »

google-goes-“open-ai”-with-gemma,-a-free,-open-weights-chatbot-family

Google goes “open AI” with Gemma, a free, open-weights chatbot family

Free hallucinations for all —

Gemma chatbots can run locally, and they reportedly outperform Meta’s Llama 2.

The Google Gemma logo

On Wednesday, Google announced a new family of AI language models called Gemma, which are free, open-weights models built on technology similar to the more powerful but closed Gemini models. Unlike Gemini, Gemma models can run locally on a desktop or laptop computer. It’s Google’s first significant open large language model (LLM) release since OpenAI’s ChatGPT started a frenzy for AI chatbots in 2022.

Gemma models come in two sizes: Gemma 2B (2 billion parameters) and Gemma 7B (7 billion parameters), each available in pre-trained and instruction-tuned variants. In AI, parameters are values in a neural network that determine AI model behavior, and weights are a subset of these parameters stored in a file.

Developed by Google DeepMind and other Google AI teams, Gemma pulls from techniques learned during the development of Gemini, which is the family name for Google’s most capable (public-facing) commercial LLMs, including the ones that power its Gemini AI assistant. Google says the name comes from the Latin gemma, which means “precious stone.”

While Gemma is Google’s first major open LLM since the launch of ChatGPT (it has released smaller research models such as FLAN-T5 in the past), it’s not Google’s first contribution to open AI research. The company cites the development of the Transformer architecture, as well as releases like TensorFlow, BERT, T5, and JAX as key contributions, and it would not be controversial to say that those have been important to the field.

A chart of Gemma performance provided by Google. Google says that Gemma outperforms Meta's Llama 2 on several benchmarks.

Enlarge / A chart of Gemma performance provided by Google. Google says that Gemma outperforms Meta’s Llama 2 on several benchmarks.

Owing to lesser capability and high confabulation rates, smaller open-weights LLMs have been more like tech demos until recently, as some larger ones have begun to match GPT-3.5 performance levels. Still, experts see source-available and open-weights AI models as essential steps in ensuring transparency and privacy in chatbots. Google Gemma is not “open source” however, since that term usually refers to a specific type of software license with few restrictions attached.

In reality, Gemma feels like a conspicuous play to match Meta, which has made a big deal out of releasing open-weights models (such as LLaMA and Llama 2) since February of last year. That technique stands in opposition to AI models like OpenAI’s GPT-4 Turbo, which is only available through the ChatGPT application and a cloud API and cannot be run locally. A Reuters report on Gemma focuses on the Meta angle and surmises that Google hopes to attract more developers to its Vertex AI cloud platform.

We have not used Gemma yet; however, Google claims the 7B model outperforms Meta’s Llama 2 7B and 13B models on several benchmarks for math, Python code generation, general knowledge, and commonsense reasoning tasks. It’s available today through Kaggle, a machine-learning community platform, and Hugging Face.

In other news, Google paired the Gemma release with a “Responsible Generative AI Toolkit,” which Google hopes will offer guidance and tools for developing what the company calls “safe and responsible” AI applications.

Google goes “open AI” with Gemma, a free, open-weights chatbot family Read More »

will-smith-parodies-viral-ai-generated-video-by-actually-eating-spaghetti

Will Smith parodies viral AI-generated video by actually eating spaghetti

Mangia, mangia —

Actor pokes fun at 2023 AI video by eating spaghetti messily and claiming it’s AI-generated.

The real Will Smith eating spaghetti, parodying an AI-generated video from 2023.

Enlarge / The real Will Smith eating spaghetti, parodying an AI-generated video from 2023.

On Monday, Will Smith posted a video on his official Instagram feed that parodied an AI-generated video of the actor eating spaghetti that went viral last year. With the recent announcement of OpenAI’s Sora video synthesis model, many people have noted the dramatic jump in AI-video quality over the past year compared to the infamous spaghetti video. Smith’s new video plays on that comparison by showing the actual actor eating spaghetti in a comical fashion and claiming that it is AI-generated.

Captioned “This is getting out of hand!”, the Instagram video uses a split screen layout to show the original AI-generated spaghetti video created by a Reddit user named “chaindrop” in March 2023 on the top, labeled with the subtitle “AI Video 1 year ago.” Below that, in a box titled “AI Video Now,” the real Smith shows 11 video segments of himself actually eating spaghetti by slurping it up while shaking his head, pouring it into his mouth with his fingers, and even nibbling on a friend’s hair. 2006’s Snap Yo Fingers by Lil Jon plays in the background.

In the Instagram comments section, some people expressed confusion about the new (non-AI) video, saying, “I’m still in doubt if second video was also made by AI or not.” In a reply, someone else wrote, “Boomers are gonna loose [sic] this one. Second one is clearly him making a joke but I wouldn’t doubt it in a couple months time it will get like that.”

We have not yet seen a model with the capability of Sora attempt to create a new Will-Smith-eating-spaghetti AI video, but the result would likely be far better than what we saw last year, even if it contained obvious glitches. Given how things are progressing, we wouldn’t be surprised if by 2025, video synthesis AI models can replicate the parody video created by Smith himself.

It’s worth noting for history’s sake that despite the comparison, the video of Will Smith eating spaghetti did not represent the state of the art in text-to-video synthesis at the time of its creation in March 2023 (that title would likely apply to Runway’s Gen-2, which was then in closed testing). However, the spaghetti video was reasonably advanced for open weights models at the time, having used the ModelScope AI model. More capable video synthesis models had already been released at that time, but due to the humorous cultural reference, it’s arguably more fun to compare today’s AI video synthesis to Will Smith grotesquely eating spaghetti than to teddy bears washing dishes.

Will Smith parodies viral AI-generated video by actually eating spaghetti Read More »

reddit-sells-training-data-to-unnamed-ai-company-ahead-of-ipo

Reddit sells training data to unnamed AI company ahead of IPO

Everything has a price —

If you’ve posted on Reddit, you’re likely feeding the future of AI.

In this photo illustration the American social news

On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site’s content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.

Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.

After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.

In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades’ worth of human-generated content.

If the reported $60 million/year deal goes through, it’s quite possible that if you’ve ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.

While we don’t know if OpenAI is the company that signed the deal with Reddit, Bloomberg speculates that Reddit’s ability to tap into AI hype for additional revenue may boost the value of its IPO, which might be worth $5 billion. Despite drama last year, Bloomberg states that Reddit pulled in more than $800 million in revenue in 2023, growing about 20 percent over its 2022 numbers.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Reddit sells training data to unnamed AI company ahead of IPO Read More »

new-app-always-points-to-the-supermassive-black-hole-at-the-center-of-our-galaxy

New app always points to the supermassive black hole at the center of our galaxy

the final frontier —

iPhone compass app made with AI assistance locates the heart of the Milky Way.

A photo of Galactic Compass running on an iPhone.

Enlarge / A photo of Galactic Compass running on an iPhone.

Matt Webb / Getty Images

On Thursday, designer Matt Webb unveiled a new iPhone app called Galactic Compass, which always points to the center of the Milky Way galaxy—no matter where Earth is positioned on our journey through the stars. The app is free and available now on the App Store.

While using Galactic Compass, you set your iPhone on a level surface, and a big green arrow on the screen points the way to the Galactic Center, which is the rotational core of the spiral galaxy all of us live in. In that center is a supermassive black hole known as Sagittarius A*, a celestial body from which no matter or light can escape. (So, in a way, the app is telling us what we should avoid.)

But truthfully, the location of the galactic core at any given time isn’t exactly useful, practical knowledge—at least for people who aren’t James Tiberius Kirk in Star Trek V. But it may inspire a sense of awe about our place in the cosmos.

Screenshots of Galactic Compass in action, captured by Ars Technica in a secret location.

Enlarge / Screenshots of Galactic Compass in action, captured by Ars Technica in a secret location.

Benj Edwards / Getty Images

“It is astoundingly grounding to always have a feeling of the direction of the center of the galaxy,” Webb told Ars Technica. “Your perspective flips. To begin with, it feels arbitrary. The middle of the Milky Way seems to fly all over the sky, as the Earth turns and moves in its orbit.”

Webb’s journey to creating Galactic Compass began a decade ago as an offshoot of his love for casual astronomy. “About 10 years ago, I taught myself how to point to the center of the galaxy,” Webb said. “I lived in an apartment where I had a great view of the stars, so I was using augmented reality apps to identify them, and I gradually learned my way around the sky.”

While Webb initially used an astronomy app to help locate the Galactic Center, he eventually taught himself how to always find it. He described visualizing himself on the surface of the Earth as it spins and tilts, understanding the ecliptic as a line across the sky and recognizing the center of the galaxy as an invisible point moving predictably through the constellation Sagittarius, which lies on the ecliptic line. By visualizing Earth’s orbit over the year and determining his orientation in space, he was able to point in the right direction, refining his ability through daily practice and comparison with an augmented reality app.

With a little help from AI

Our galaxy, the Milky Way, is thought to look similar to Andromeda (seen here) if you could see it from a distance. But since we're inside the galaxy, all we can see is the edge of the galactic plane.

Enlarge / Our galaxy, the Milky Way, is thought to look similar to Andromeda (seen here) if you could see it from a distance. But since we’re inside the galaxy, all we can see is the edge of the galactic plane.

Getty Images

In 2021, Webb imagined turning his ability into an app that would help take everyone on the same journey, showing a compass that points toward the galactic center instead of Earth’s magnetic north. “But I can’t write apps,” he said. “I’m a decent enough engineer, and an amateur designer, but I’ve never figured out native apps.”

That’s where ChatGPT comes in, transforming Webb’s vision into reality. With the AI assistant as his coding partner, Webb progressed step by step, crafting a simple app interface and integrating complex calculations for locating the galactic center (which involves calculating the user’s azimuth and altitude).

Still, coding with ChatGPT has its limitations. “ChatGPT is super smart, but it’s not embodied like a human, so it falls down on doing the 3D calculations,” he says. “I had to learn a lot about quaternions, which are a technique for combining 3D rotations, and even then, it’s not perfect. The app needs to be held flat to work simply because my math breaks down when the phone is upright! I’ll fix this in future versions,” Webb said.

Webb is no stranger to ChatGPT-powered creations that are more fun than practical. Last month, he launched a Kickstarter for an AI-rhyming poetry clock called the Poem/1. With his design studio, Acts Not Facts, Webb says he uses “whimsy and play to discover the possibilities in new technology.”

Whimsical or not, Webb insists that Galactic Compass can help us ponder our place in the vast universe, and he’s proud that it recently peaked at #87 in the Travel chart for the US App Store. In this case, though, it’s spaceship Earth that is traveling the galaxy while every living human comes along for the ride.

“Once you can follow it, you start to see the galactic center as the true fixed point, and we’re the ones whizzing and spinning. There it remains, the supermassive black hole at the center of our galaxy, Sagittarius A*, steady as a rock, eternal. We go about our days; it’s always there.”

New app always points to the supermassive black hole at the center of our galaxy Read More »

openai-collapses-media-reality-with-sora,-a-photorealistic-ai-video-generator

OpenAI collapses media reality with Sora, a photorealistic AI video generator

Pics and it didn’t happen —

Hello, cultural singularity—soon, every video you see online could be completely fake.

Snapshots from three videos generated using OpenAI's Sora.

Enlarge / Snapshots from three videos generated using OpenAI’s Sora.

On Thursday, OpenAI announced Sora, a text-to-video AI model that can generate 60-second-long photorealistic HD video from written descriptions. While it’s only a research preview that we have not tested, it reportedly creates synthetic video (but not audio yet) at a fidelity and consistency greater than any text-to-video model available at the moment. It’s also freaking people out.

“It was nice knowing you all. Please tell your grandchildren about my videos and the lengths we went to to actually record them,” wrote Wall Street Journal tech reporter Joanna Stern on X.

“This could be the ‘holy shit’ moment of AI,” wrote Tom Warren of The Verge.

“Every single one of these videos is AI-generated, and if this doesn’t concern you at least a little bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee.

For future reference—since this type of panic will some day appear ridiculous—there’s a generation of people who grew up believing that photorealistic video must be created by cameras. When video was faked (say, for Hollywood films), it took a lot of time, money, and effort to do so, and the results weren’t perfect. That gave people a baseline level of comfort that what they were seeing remotely was likely to be true, or at least representative of some kind of underlying truth. Even when the kid jumped over the lava, there was at least a kid and a room.

The prompt that generated the video above: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false. How we confront that as a society and work around it while maintaining trust in remote communications is far beyond the scope of this article, but I tried my hand at offering some solutions back in 2020, when all of the tech we’re seeing now seemed like a distant fantasy to most people.

In that piece, I called the moment that truth and fiction in media become indistinguishable the “cultural singularity.” It appears that OpenAI is on track to bring that prediction to pass a bit sooner than we expected.

Prompt: Reflections in the window of a train traveling through the Tokyo suburbs.

OpenAI has found that, like other AI models that use the transformer architecture, Sora scales with available compute. Given far more powerful computers behind the scenes, AI video fidelity could improve considerably over time. In other words, this is the “worst” AI-generated video is ever going to look. There’s no synchronized sound yet, but that might be solved in future models.

How (we think) they pulled it off

AI video synthesis has progressed by leaps and bounds over the past two years. We first covered text-to-video models in September 2022 with Meta’s Make-A-Video. A month later, Google showed off Imagen Video. And just 11 months ago, an AI-generated version of Will Smith eating spaghetti went viral. In May of last year, what was previously considered to be the front-runner in the text-to-video space, Runway Gen-2, helped craft a fake beer commercial full of twisted monstrosities, generated in two-second increments. In earlier video-generation models, people pop in and out of reality with ease, limbs flow together like pasta, and physics doesn’t seem to matter.

Sora (which means “sky” in Japanese) appears to be something altogether different. It’s high-resolution (1920×1080), can generate video with temporal consistency (maintaining the same subject over time) that lasts up to 60 seconds, and appears to follow text prompts with a great deal of fidelity. So, how did OpenAI pull it off?

OpenAI doesn’t usually share insider technical details with the press, so we’re left to speculate based on theories from experts and information given to the public.

OpenAI says that Sora is a diffusion model, much like DALL-E 3 and Stable Diffusion. It generates a video by starting off with noise and “gradually transforms it by removing the noise over many steps,” the company explains. It “recognizes” objects and concepts listed in the written prompt and pulls them out of the noise, so to speak, until a coherent series of video frames emerge.

Sora is capable of generating videos all at once from a text prompt, extending existing videos, or generating videos from still images. It achieves temporal consistency by giving the model “foresight” of many frames at once, as OpenAI calls it, solving the problem of ensuring a generated subject remains the same even if it falls out of view temporarily.

OpenAI represents video as collections of smaller groups of data called “patches,” which the company says are similar to tokens (fragments of a word) in GPT-4. “By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios,” the company writes.

An important tool in OpenAI’s bag of tricks is that its use of AI models is compounding. Earlier models are helping to create more complex ones. Sora follows prompts well because, like DALL-E 3, it utilizes synthetic captions that describe scenes in the training data generated by another AI model like GPT-4V. And the company is not stopping here. “Sora serves as a foundation for models that can understand and simulate the real world,” OpenAI writes, “a capability we believe will be an important milestone for achieving AGI.”

One question on many people’s minds is what data OpenAI used to train Sora. OpenAI has not revealed its dataset, but based on what people are seeing in the results, it’s possible OpenAI is using synthetic video data generated in a video game engine in addition to sources of real video (say, scraped from YouTube or licensed from stock video libraries). Nvidia’s Dr. Jim Fan, who is a specialist in training AI with synthetic data, wrote on X, “I won’t be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!” Until confirmed by OpenAI, however, that’s just speculation.

OpenAI collapses media reality with Sora, a photorealistic AI video generator Read More »

google-upstages-itself-with-gemini-15-ai-launch,-one-week-after-ultra-1.0

Google upstages itself with Gemini 1.5 AI launch, one week after Ultra 1.0

Gemini’s Twin —

Google confusingly overshadows its own pro product a week after its last major AI launch.

The Gemini 1.5 logo

Enlarge / The Gemini 1.5 logo, released by Google.

Google

One week after its last major AI announcement, Google appears to have upstaged itself. Last Thursday, Google launched Gemini Ultra 1.0, which supposedly represented the best AI language model Google could muster—available as part of the renamed “Gemini” AI assistant (formerly Bard). Today, Google announced Gemini Pro 1.5, which it says “achieves comparable quality to 1.0 Ultra, while using less compute.”

Congratulations, Google, you’ve done it. You’ve undercut your own premiere AI product. While Ultra 1.0 is possibly still better than Pro 1.5 (what even are we saying here), Ultra was presented as a key selling point of its “Gemini Advanced” tier of its Google One subscription service. And now it’s looking a lot less advanced than seven days ago. All this is on top of the confusing name-shuffling Google has been doing recently. (Just to be clear—although it’s not really clarifying at all—the free version of Bard/Gemini currently uses the Pro 1.0 model. Got it?)

Google claims that Gemini 1.5 represents a new generation of LLMs that “delivers a breakthrough in long-context understanding,” and that it can process up to 1 million tokens, “achieving the longest context window of any large-scale foundation model yet.” Tokens are fragments of a word. The first part of the claim about “understanding” is contentious and subjective, but the second part is probably correct. OpenAI’s GPT-4 Turbo can reportedly handle 128,000 tokens in some circumstances, and 1 million is quite a bit more—about 700,000 words. A larger context window allows for processing longer documents and having longer conversations. (The Gemini 1.0 model family handles 32,000 tokens max.)

But any technical breakthroughs are almost beside the point. What should we make of a company that just trumpeted to the world about its AI supremacy last week, only to partially supersede that a week later? Is it a testament to the rapid rate of AI technical progress in Google’s labs, a sign that red tape was holding back Ultra 1.0 for too long, or merely a sign of poor coordination between research and marketing? We honestly don’t know.

So back to Gemini 1.5. What is it, really, and how will it be available? Google implies that like 1.0 (which had Nano, Pro, and Ultra flavors), it will be available in multiple sizes. Right now, Pro 1.5 is the only model Google is unveiling. Google says that 1.5 uses a new mixture-of-experts (MoE) architecture, which means the system selectively activates different “experts” or specialized sub-models within a larger neural network for specific tasks based on the input data.

Google says that Gemini 1.5 can perform “complex reasoning about vast amounts of information,” and gives an example of analyzing a 402-page transcript of Apollo 11’s mission to the Moon. It’s impressive to process documents that large, but the model, like every large language model, is highly likely to confabulate interpretations across large contexts. We wouldn’t trust it to soundly analyze 1 million tokens without mistakes, so that’s putting a lot of faith into poorly understood LLM hands.

For those interested in diving into technical details, Google has released a technical report on Gemini 1.5 that appears to show Gemini performing favorably versus GPT-4 Turbo on various tasks, but it’s also important to note that the selection and interpretation of those benchmarks can be subjective. The report does give some numbers on how much better 1.5 is compared to 1.0, saying it’s 28.9 percent better than 1.0 Pro at “Math, Science & Reasoning” and 5.2 percent better at those subjects than 1.0 Ultra.

A table from the Gemini 1.5 technical document showing comparisons to Gemini 1.0.

Enlarge / A table from the Gemini 1.5 technical document showing comparisons to Gemini 1.0.

Google

But for now, we’re still kind of shocked that Google would launch this particular model at this particular moment in time. Is it trying to get ahead of something that it knows might be just around the corner, like OpenAI’s unreleased GPT-5, for instance? We’ll keep digging and let you know what we find.

Google says that a limited preview of 1.5 Pro is available now for developers via AI Studio and Vertex AI with a 128,000 token context window, scaling up to 1 million tokens later. Gemini 1.5 apparently has not come to the Gemini chatbot (formerly Bard) yet.

Google upstages itself with Gemini 1.5 AI launch, one week after Ultra 1.0 Read More »