text synthesis

anthropic-introduces-claude-3.5-sonnet,-matching-gpt-4o-on-benchmarks

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks

The Anthropic Claude 3 logo, jazzed up by Benj Edwards.

Anthropic / Benj Edwards

On Thursday, Anthropic announced Claude 3.5 Sonnet, its latest AI language model and the first in a new series of “3.5” models that build upon Claude 3, launched in March. Claude 3.5 can compose text, analyze data, and write code. It features a 200,000 token context window and is available now on the Claude website and through an API. Anthropic also introduced Artifacts, a new feature in the Claude interface that shows related work documents in a dedicated window.

So far, people outside of Anthropic seem impressed. “This model is really, really good,” wrote independent AI researcher Simon Willison on X. “I think this is the new best overall model (and both faster and half the price of Opus, similar to the GPT-4 Turbo to GPT-4o jump).”

As we’ve written before, benchmarks for large language models (LLMs) are troublesome because they can be cherry-picked and often do not capture the feel and nuance of using a machine to generate outputs on almost any conceivable topic. But according to Anthropic, Claude 3.5 Sonnet matches or outperforms competitor models like GPT-4o and Gemini 1.5 Pro on certain benchmarks like MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding).

Claude 3.5 Sonnet benchmarks provided by Anthropic.

Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

If all that makes your eyes glaze over, that’s OK; it’s meaningful to researchers but mostly marketing to everyone else. A more useful performance metric comes from what we might call “vibemarks” (coined here first!) which are subjective, non-rigorous aggregate feelings measured by competitive usage on sites like LMSYS’s Chatbot Arena. The Claude 3.5 Sonnet model is currently under evaluation there, and it’s too soon to say how well it will fare.

Claude 3.5 Sonnet also outperforms Anthropic’s previous-best model (Claude 3 Opus) on benchmarks measuring “reasoning,” math skills, general knowledge, and coding abilities. For example, the model demonstrated strong performance in an internal coding evaluation, solving 64 percent of problems compared to 38 percent for Claude 3 Opus.

Claude 3.5 Sonnet is also a multimodal AI model that accepts visual input in the form of images, and the new model is reportedly excellent at a battery of visual comprehension tests.

Claude 3.5 Sonnet benchmarks provided by Anthropic.

Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

Roughly speaking, the visual benchmarks mean that 3.5 Sonnet is better at pulling information from images than previous models. For example, you can show it a picture of a rabbit wearing a football helmet, and the model knows it’s a rabbit wearing a football helmet and can talk about it. That’s fun for tech demos, but the tech is still not accurate enough for applications of the tech where reliability is mission critical.

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks Read More »

apple-may-hire-google-to-power-new-iphone-ai-features-using-gemini—report

Apple may hire Google to power new iPhone AI features using Gemini—report

Bake a cake as fast as you can —

With Apple’s own AI tech lagging behind, the firm looks for a fallback solution.

A Google

Benj Edwards

On Monday, Bloomberg reported that Apple is in talks to license Google’s Gemini model to power AI features like Siri in a future iPhone software update coming later in 2024, according to people familiar with the situation. Apple has also reportedly conducted similar talks with ChatGPT maker OpenAI.

The potential integration of Google Gemini into iOS 18 could bring a range of new cloud-based (off-device) AI-powered features to Apple’s smartphone, including image creation or essay writing based on simple prompts. However, the terms and branding of the agreement have not yet been finalized, and the implementation details remain unclear. The companies are unlikely to announce any deal until Apple’s annual Worldwide Developers Conference in June.

Gemini could also bring new capabilities to Apple’s widely criticized voice assistant, Siri, which trails newer AI assistants powered by large language models (LLMs) in understanding and responding to complex questions. Rumors of Apple’s own internal frustration with Siri—and potential remedies—have been kicking around for some time. In January, 9to5Mac revealed that Apple had been conducting tests with a beta version of iOS 17.4 that used OpenAI’s ChatGPT API to power Siri.

As we have previously reported, Apple has also been developing its own AI models, including a large language model codenamed Ajax and a basic chatbot called Apple GPT. However, the company’s LLM technology is said to lag behind that of its competitors, making a partnership with Google or another AI provider a more attractive option.

Google launched Gemini, a language-based AI assistant similar to ChatGPT, in December and has updated it several times since. Many industry experts consider the larger Gemini models to be roughly as capable as OpenAI’s GPT-4 Turbo, which powers the subscription versions of ChatGPT. Until just recently, with the emergence of Gemini Ultra and Claude 3, OpenAI’s top model held a fairly wide lead in perceived LLM capability.

The potential partnership between Apple and Google could significantly impact the AI industry, as Apple’s platform represents more than 2 billion active devices worldwide. If the agreement gets finalized, it would build upon the existing search partnership between the two companies, which has seen Google pay Apple billions of dollars annually to make its search engine the default option on iPhones and other Apple devices.

However, Bloomberg reports that the potential partnership between Apple and Google is likely to draw scrutiny from regulators, as the companies’ current search deal is already the subject of a lawsuit by the US Department of Justice. The European Union is also pressuring Apple to make it easier for consumers to change their default search engine away from Google.

With so much potential money on the line, selecting Google for Apple’s cloud AI job could potentially be a major loss for OpenAI in terms of bringing its technology widely into the mainstream—with a market representing billions of users. Even so, any deal with Google or OpenAI may be a temporary fix until Apple can get its own LLM-based AI technology up to speed.

Apple may hire Google to power new iPhone AI features using Gemini—report Read More »

the-ai-wars-heat-up-with-claude-3,-claimed-to-have-“near-human”-abilities

The AI wars heat up with Claude 3, claimed to have “near-human” abilities

The Anthropic Claude 3 logo.

Enlarge / The Anthropic Claude 3 logo.

On Monday, Anthropic released Claude 3, a family of three AI language models similar to those that power ChatGPT. Anthropic claims the models set new industry benchmarks across a range of cognitive tasks, even approaching “near-human” capability in some cases. It’s available now through Anthropic’s website, with the most powerful model being subscription-only. It’s also available via API for developers.

Claude 3’s three models represent increasing complexity and parameter count: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Sonnet powers the Claude.ai chatbot now for free with an email sign-in. But as mentioned above, Opus is only available through Anthropic’s web chat interface if you pay $20 a month for “Claude Pro,” a subscription service offered through the Anthropic website. All three feature a 200,000-token context window. (The context window is the number of tokens—fragments of a word—that an AI language model can process at once.)

We covered the launch of Claude in March 2023 and Claude 2 in July that same year. Each time, Anthropic fell slightly behind OpenAI’s best models in capability while surpassing them in terms of context window length. With Claude 3, Anthropic has perhaps finally caught up with OpenAI’s released models in terms of performance, although there is no consensus among experts yet—and the presentation of AI benchmarks is notoriously prone to cherry-picking.

A Claude 3 benchmark chart provided by Anthropic.

Enlarge / A Claude 3 benchmark chart provided by Anthropic.

Claude 3 reportedly demonstrates advanced performance across various cognitive tasks, including reasoning, expert knowledge, mathematics, and language fluency. (Despite the lack of consensus over whether large language models “know” or “reason,” the AI research community commonly uses those terms.) The company claims that the Opus model, the most capable of the three, exhibits “near-human levels of comprehension and fluency on complex tasks.”

That’s quite a heady claim and deserves to be parsed more carefully. It’s probably true that Opus is “near-human” on some specific benchmarks, but that doesn’t mean that Opus is a general intelligence like a human (consider that pocket calculators are superhuman at math). So, it’s a purposely eye-catching claim that can be watered down with qualifications.

According to Anthropic, Claude 3 Opus beats GPT-4 on 10 AI benchmarks, including MMLU (undergraduate level knowledge), GSM8K (grade school math), HumanEval (coding), and the colorfully named HellaSwag (common knowledge). Several of the wins are very narrow, such as 86.8 percent for Opus vs. 86.4 percent on a five-shot trial of MMLU, and some gaps are big, such as 84.9 percent on HumanEval over GPT-4’s 67.0 percent. But what that might mean, exactly, to you as a customer is difficult to say.

“As always, LLM benchmarks should be treated with a little bit of suspicion,” says AI researcher Simon Willison, who spoke with Ars about Claude 3. “How well a model performs on benchmarks doesn’t tell you much about how the model ‘feels’ to use. But this is still a huge deal—no other model has beaten GPT-4 on a range of widely used benchmarks like this.”

The AI wars heat up with Claude 3, claimed to have “near-human” abilities Read More »

reddit-sells-training-data-to-unnamed-ai-company-ahead-of-ipo

Reddit sells training data to unnamed AI company ahead of IPO

Everything has a price —

If you’ve posted on Reddit, you’re likely feeding the future of AI.

In this photo illustration the American social news

On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site’s content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.

Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.

After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.

In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades’ worth of human-generated content.

If the reported $60 million/year deal goes through, it’s quite possible that if you’ve ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.

While we don’t know if OpenAI is the company that signed the deal with Reddit, Bloomberg speculates that Reddit’s ability to tap into AI hype for additional revenue may boost the value of its IPO, which might be worth $5 billion. Despite drama last year, Bloomberg states that Reddit pulled in more than $800 million in revenue in 2023, growing about 20 percent over its 2022 numbers.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

Reddit sells training data to unnamed AI company ahead of IPO Read More »

report:-sam-altman-seeking-trillions-for-ai-chip-fabrication-from-uae,-others

Report: Sam Altman seeking trillions for AI chip fabrication from UAE, others

chips ahoy —

WSJ: Audacious $5-$7 trillion investment would aim to expand global AI chip supply.

WASHINGTON, DC - JANUARY 11: OpenAI Chief Executive Officer Sam Altman walks on the House side of the U.S. Capitol on January 11, 2024 in Washington, DC. Meanwhile, House Freedom Caucus members who left a meeting in the Speakers office say that they were talking to the Speaker about abandoning the spending agreement that Johnson announced earlier in the week. (Photo by Kent Nishimura/Getty Images)

Enlarge / OpenAI Chief Executive Officer Sam Altman walks on the House side of the US Capitol on January 11, 2024, in Washington, DC. (Photo by Kent Nishimura/Getty Images)

Getty Images

On Thursday, The Wall Street Journal reported that OpenAI CEO Sam Altman is in talks with investors to raise as much as $5 trillion to $7 trillion for AI chip manufacturing, according to people familiar with the matter. The funding seeks to address the scarcity of graphics processing units (GPUs) crucial for training and running large language models like those that power ChatGPT, Microsoft Copilot, and Google Gemini.

The high dollar amount reflects the huge amount of capital necessary to spin up new semiconductor manufacturing capability. “As part of the talks, Altman is pitching a partnership between OpenAI, various investors, chip makers and power providers, which together would put up money to build chip foundries that would then be run by existing chip makers,” writes the Wall Street Journal in its report. “OpenAI would agree to be a significant customer of the new factories.”

To hit these ambitious targets—which are larger than the entire semiconductor industry’s current $527 billion global sales combined—Altman has reportedly met with a range of potential investors worldwide, including sovereign wealth funds and government entities, notably the United Arab Emirates, SoftBank CEO Masayoshi Son, and representatives from Taiwan Semiconductor Manufacturing Co. (TSMC).

TSMC is the world’s largest dedicated independent semiconductor foundry. It’s a critical linchpin that companies such as Nvidia, Apple, Intel, and AMD rely on to fabricate SoCs, CPUs, and GPUs for various applications.

Altman reportedly seeks to expand the global capacity for semiconductor manufacturing significantly, funding the infrastructure necessary to support the growing demand for GPUs and other AI-specific chips. GPUs are excellent at parallel computation, which makes them ideal for running AI models that heavily rely on matrix multiplication to work. However, the technology sector currently faces a significant shortage of these important components, constraining the potential for AI advancements and applications.

In particular, the UAE’s involvement, led by Sheikh Tahnoun bin Zayed al Nahyan, a key security official and chair of numerous Abu Dhabi sovereign wealth vehicles, reflects global interest in AI’s potential and the strategic importance of semiconductor manufacturing. However, the prospect of substantial UAE investment in a key tech industry raises potential geopolitical concerns, particularly regarding the US government’s strategic priorities in semiconductor production and AI development.

The US has been cautious about allowing foreign control over the supply of microchips, given their importance to the digital economy and national security. Reflecting this, the Biden administration has undertaken efforts to bolster domestic chip manufacturing through subsidies and regulatory scrutiny of foreign investments in important technologies.

To put the $5 trillion to $7 trillion estimate in perspective, the White House just today announced a $5 billion investment in R&D to advance US-made semiconductor technologies. TSMC has already sunk $40 billion—one of the largest foreign investments in US history—into a US chip plant in Arizona. As of now, it’s unclear whether Altman has secured any commitments toward his fundraising goal.

Updated on February 9, 2024 at 8: 45 PM Eastern with a quote from the WSJ that clarifies the proposed relationship between OpenAI and partners in the talks.

Report: Sam Altman seeking trillions for AI chip fabrication from UAE, others Read More »

openai-and-common-sense-media-partner-to-protect-teens-from-ai-harms-and-misuse

OpenAI and Common Sense Media partner to protect teens from AI harms and misuse

Adventures in chatbusting —

Site gave ChatGPT 3 stars and 48% privacy score: “Best used for creativity, not facts.”

Boy in Living Room Wearing Robot Mask

On Monday, OpenAI announced a partnership with the nonprofit Common Sense Media to create AI guidelines and educational materials targeted at parents, educators, and teens. It includes the curation of family-friendly GPTs in OpenAI’s GPT store. The collaboration aims to address concerns about the impacts of AI on children and teenagers.

Known for its reviews of films and TV shows aimed at parents seeking appropriate media for their kids to watch, Common Sense Media recently branched out into AI and has been reviewing AI assistants on its site.

“AI isn’t going anywhere, so it’s important that we help kids understand how to use it responsibly,” Common Sense Media wrote on X. “That’s why we’ve partnered with @OpenAI to help teens and families safely harness the potential of AI.”

OpenAI CEO Sam Altman and Common Sense Media CEO James Steyer announced the partnership onstage in San Francisco at the Common Sense Summit for America’s Kids and Families, an event that was well-covered by media members on the social media site X.

For his part, Altman offered a canned statement in the press release, saying, “AI offers incredible benefits for families and teens, and our partnership with Common Sense will further strengthen our safety work, ensuring that families and teens can use our tools with confidence.”

The announcement feels slightly non-specific in the official news release, with Steyer offering, “Our guides and curation will be designed to educate families and educators about safe, responsible use of ChatGPT, so that we can collectively avoid any unintended consequences of this emerging technology.”

The partnership seems aimed mostly at bringing a patina of family-friendliness to OpenAI’s GPT store, with the most solid reveal being the aforementioned fact that Common Sense media will help with the “curation of family-friendly GPTs in the GPT Store based on Common Sense ratings and standards.”

Common Sense AI reviews

As mentioned above, Common Sense Media began reviewing AI assistants on its site late last year. This puts Common Sense Media in an interesting position with potential conflicts of interest regarding the new partnership with OpenAI. However, it doesn’t seem to be offering any favoritism to OpenAI so far.

For example, Common Sense Media’s review of ChatGPT calls the AI assistant “A powerful, at times risky chatbot for people 13+ that is best used for creativity, not facts.” It labels ChatGPT as being suitable for ages 13 and up (which is in OpenAI’s Terms of Service) and gives the OpenAI assistant three out of five stars. ChatGPT also scores a 48 percent privacy rating (which is oddly shown as 55 percent on another page that goes into privacy details). The review we cited was last updated on October 13, 2023, as of this writing.

For reference, Google Bard gets a three-star overall rating and a 75 percent privacy rating in its Common Sense Media review. Stable Diffusion, the image synthesis model, nets a one-star rating with the description, “Powerful image generator can unleash creativity, but is wildly unsafe and perpetuates harm.” OpenAI’s DALL-E gets two stars and a 48 percent privacy rating.

The information that Common Sense Media includes about each AI model appears relatively accurate and detailed (and the organization cited an Ars Technica article as a reference in one explanation), so they feel fair, even in the face of the OpenAI partnership. Given the low scores, it seems that most AI models aren’t off to a great start, but that may change. It’s still early days in generative AI.

OpenAI and Common Sense Media partner to protect teens from AI harms and misuse Read More »

as-2024-election-looms,-openai-says-it-is-taking-steps-to-prevent-ai-abuse

As 2024 election looms, OpenAI says it is taking steps to prevent AI abuse

Don’t Rock the vote —

ChatGPT maker plans transparency for gen AI content and improved access to voting info.

A pixelated photo of Donald Trump.

On Monday, ChatGPT maker OpenAI detailed its plans to prevent the misuse of its AI technologies during the upcoming elections in 2024, promising transparency in AI-generated content and enhancing access to reliable voting information. The AI developer says it is working on an approach that involves policy enforcement, collaboration with partners, and the development of new tools aimed at classifying AI-generated media.

“As we prepare for elections in 2024 across the world’s largest democracies, our approach is to continue our platform safety work by elevating accurate voting information, enforcing measured policies, and improving transparency,” writes OpenAI in its blog post. “Protecting the integrity of elections requires collaboration from every corner of the democratic process, and we want to make sure our technology is not used in a way that could undermine this process.”

Initiatives proposed by OpenAI include preventing abuse by means such as deepfakes or bots imitating candidates, refining usage policies, and launching a reporting system for the public to flag potential abuses. For example, OpenAI’s image generation tool, DALL-E 3, includes built-in filters that reject requests to create images of real people, including politicians. “For years, we’ve been iterating on tools to improve factual accuracy, reduce bias, and decline certain requests,” the company stated.

OpenAI says it regularly updates its Usage Policies for ChatGPT and its API products to prevent misuse, especially in the context of elections. The organization has implemented restrictions on using its technologies for political campaigning and lobbying until it better understands the potential for personalized persuasion. Also, OpenAI prohibits creating chatbots that impersonate real individuals or institutions and disallows the development of applications that could deter people from “participation in democratic processes.” Users can report GPTs that may violate the rules.

OpenAI claims to be proactively engaged in detailed strategies to safeguard its technologies against misuse. According to their statements, this includes red-teaming new systems to anticipate challenges, engaging with users and partners for feedback, and implementing robust safety mitigations. OpenAI asserts that these efforts are integral to its mission of continually refining AI tools for improved accuracy, reduced biases, and responsible handling of sensitive requests

Regarding transparency, OpenAI says it is advancing its efforts in classifying image provenance. The company plans to embed digital credentials, using cryptographic techniques, into images produced by DALL-E 3 as part of its adoption of standards by the Coalition for Content Provenance and Authenticity. Additionally, OpenAI says it is testing a tool designed to identify DALL-E-generated images.

In an effort to connect users with authoritative information, particularly concerning voting procedures, OpenAI says it has partnered with the National Association of Secretaries of State (NASS) in the United States. ChatGPT will direct users to CanIVote.org for verified US voting information.

“We want to make sure that our AI systems are built, deployed, and used safely,” writes OpenAI. “Like any new technology, these tools come with benefits and challenges. They are also unprecedented, and we will keep evolving our approach as we learn more about how our tools are used.”

As 2024 election looms, OpenAI says it is taking steps to prevent AI abuse Read More »

openai’s-gpt-store-lets-chatgpt-users-discover-popular-user-made-chatbot-roles

OpenAI’s GPT Store lets ChatGPT users discover popular user-made chatbot roles

The bot of 1,000 faces —

Like an app store, people can find novel ChatGPT personalities—and some creators will get paid.

Two robots hold a gift box.

On Wednesday, OpenAI announced the launch of its GPT Store—a way for ChatGPT users to share and discover custom chatbot roles called “GPTs”—and ChatGPT Team, a collaborative ChatGPT workspace and subscription plan. OpenAI bills the new store as a way to “help you find useful and popular custom versions of ChatGPT” for members of Plus, Team, or Enterprise subscriptions.

“It’s been two months since we announced GPTs, and users have already created over 3 million custom versions of ChatGPT,” writes OpenAI in its promotional blog. “Many builders have shared their GPTs for others to use. Today, we’re starting to roll out the GPT Store to ChatGPT Plus, Team and Enterprise users so you can find useful and popular GPTs.”

OpenAI launched GPTs on November 6, 2023, as part of its DevDay event. Each GPT includes custom instructions and/or access to custom data or external APIs that can potentially make a custom GPT personality more useful than the vanilla ChatGPT-4 model. Before the GPT Store launch, paying ChatGPT users could create and share custom GPTs with others (by setting the GPT public and sharing a link to the GPT), but there was no central repository for browsing and discovering user-designed GPTs on the OpenAI website.

According to OpenAI, the ChatGPT Store will feature new GPTs every week, and the company shared a list a group of six notable early GPTs that are available now: AllTrails for finding hiking trails, Consensus for searching 200 million academic papers, Code Tutor for learning coding with Khan Academy, Canva for designing presentations, Books for discovering reading material, and CK-12 Flexi for learning math and science.

A screenshot of the OpenAI GPT Store provided by OpenAI.

Enlarge / A screenshot of the OpenAI GPT Store provided by OpenAI.

OpenAI

ChatGPT members can include their own GPTs in the GPT Store by setting them to be accessible to “Everyone” and then verifying a builder profile in ChatGPT settings. OpenAI plans to review GPTs to ensure they meet their policies and brand guidelines. GPTs that violate the rules can also be reported by users.

As promised by CEO Sam Altman during DevDay, OpenAI plans to share revenue with GPT creators. Unlike a smartphone app store, it appears that users will not sell their GPTs in the GPT Store, but instead, OpenAI will pay developers “based on user engagement with their GPTs.” The revenue program will launch in the first quarter of 2024, and OpenAI will provide more details on the criteria for receiving payments later.

“ChatGPT Team” is for teams who use ChatGPT

Also on Monday, OpenAI announced the cleverly named ChatGPT Team, a new group-based ChatGPT membership program akin to ChatGPT Enterprise, which the company launched last August. Unlike Enterprise, which is for large companies and does not have publicly listed prices, ChatGPT Team is a plan for “teams of all sizes” and costs US $25 a month per user (when billed annually) or US $30 a month per user (when billed monthly). By comparison, ChatGPT Plus costs $20 per month.

So what does ChatGPT Team offer above the usual ChatGPT Plus subscription? According to OpenAI, it “provides a secure, collaborative workspace to get the most out of ChatGPT at work.” Unlike Plus, OpenAI says it will not train AI models based on ChatGPT Team business data or conversations. It features an admin console for team management and the ability to share custom GPTs with your team. Like Plus, it also includes access to GPT-4 with the 32K context window, DALL-E 3, GPT-4 with Vision, Browsing, and Advanced Data Analysis—all with higher message caps.

Why would you want to use ChatGPT at work? OpenAI says it can help you generate better code, craft emails, analyze data, and more. Your mileage may vary, of course. As usual, our standard Ars warning about AI language models applies: “Bring your own data” for analysis, don’t rely on ChatGPT as a factual resource, and don’t rely on its outputs in ways you cannot personally confirm. OpenAI has provided more details about ChatGPT Team on its website.

OpenAI’s GPT Store lets ChatGPT users discover popular user-made chatbot roles Read More »

a-song-of-hype-and-fire:-the-10-biggest-ai-stories-of-2023

A song of hype and fire: The 10 biggest AI stories of 2023

An illustration of a robot accidentally setting off a mushroom cloud on a laptop computer.

Getty Images | Benj Edwards

“Here, There, and Everywhere” isn’t just a Beatles song. It’s also a phrase that recalls the spread of generative AI into the tech industry during 2023. Whether you think AI is just a fad or the dawn of a new tech revolution, it’s been impossible to deny that AI news has dominated the tech space for the past year.

We’ve seen a large cast of AI-related characters emerge that includes tech CEOs, machine learning researchers, and AI ethicists—as well as charlatans and doomsayers. From public feedback on the subject of AI, we’ve heard that it’s been difficult for non-technical people to know who to believe, what AI products (if any) to use, and whether we should fear for our lives or our jobs.

Meanwhile, in keeping with a much-lamented trend of 2022, machine learning research has not slowed down over the past year. On X, former Biden administration tech advisor Suresh Venkatasubramanian wrote, “How do people manage to keep track of ML papers? This is not a request for support in my current state of bewilderment—I’m genuinely asking what strategies seem to work to read (or “read”) what appear to be 100s of papers per day.”

To wrap up the year with a tidy bow, here’s a look back at the 10 biggest AI news stories of 2023. It was very hard to choose only 10 (in fact, we originally only intended to do seven), but since we’re not ChatGPT generating reams of text without limit, we have to stop somewhere.

Bing Chat “loses its mind”

Aurich Lawson | Getty Images

In February, Microsoft unveiled Bing Chat, a chatbot built into its languishing Bing search engine website. Microsoft created the chatbot using a more raw form of OpenAI’s GPT-4 language model but didn’t tell everyone it was GPT-4 at first. Since Microsoft used a less conditioned version of GPT-4 than the one that would be released in March, the launch was rough. The chatbot assumed a temperamental personality that could easily turn on users and attack them, tell people it was in love with them, seemingly worry about its fate, and lose its cool when confronted with an article we wrote about revealing its system prompt.

Aside from the relatively raw nature of the AI model Microsoft was using, at fault was a system where very long conversations would push the conditioning system prompt outside of its context window (like a form of short-term memory), allowing all hell to break loose through jailbreaks that people documented on Reddit. At one point, Bing Chat called me “the culprit and the enemy” for revealing some of its weaknesses. Some people thought Bing Chat was sentient, despite AI experts’ assurances to the contrary. It was a disaster in the press, but Microsoft didn’t flinch, and it ultimately reigned in some of Bing Chat’s wild proclivities and opened the bot widely to the public. Today, Bing Chat is now known as Microsoft Copilot, and it’s baked into Windows.

US Copyright Office says no to AI copyright authors

An AI-generated image that won a prize at the Colorado State Fair in 2022, later denied US copyright registration.

Enlarge / An AI-generated image that won a prize at the Colorado State Fair in 2022, later denied US copyright registration.

Jason M. Allen

In February, the US Copyright Office issued a key ruling on AI-generated art, revoking the copyright previously granted to the AI-assisted comic book “Zarya of the Dawn” in September 2022. The decision, influenced by the revelation that the images were created using the AI-powered Midjourney image generator, stated that only the text and arrangement of images and text by Kashtanova were eligible for copyright protection. It was the first hint that AI-generated imagery without human-authored elements could not be copyrighted in the United States.

This stance was further cemented in August when a US federal judge ruled that art created solely by AI cannot be copyrighted. In September, the US Copyright Office rejected the registration for an AI-generated image that won a Colorado State Fair art contest in 2022. As it stands now, it appears that purely AI-generated art (without substantial human authorship) is in the public domain in the United States. This stance could be further clarified or changed in the future by judicial rulings or legislation.

A song of hype and fire: The 10 biggest AI stories of 2023 Read More »