Tech

“it’s-a-lemon”—openai’s-largest-ai-model-ever-arrives-to-mixed-reviews

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

Perhaps because of the disappointing results, Altman had previously written that GPT-4.5 will be the last of OpenAI’s traditional AI models, with GPT-5 planned to be a dynamic combination of “non-reasoning” LLMs and simulated reasoning models like o3.

A stratospheric price and a tech dead-end

And about that price—it’s a doozy. GPT-4.5 costs $75 per million input tokens and $150 per million output tokens through the API, compared to GPT-4o’s $2.50 per million input tokens and $10 per million output tokens. (Tokens are chunks of data used by AI models for processing). For developers using OpenAI models, this pricing makes GPT-4.5 impractical for many applications where GPT-4o already performs adequately.

By contrast, OpenAI’s flagship reasoning model, o1 pro, costs $15 per million input tokens and $60 per million output tokens—significantly less than GPT-4.5 despite offering specialized simulated reasoning capabilities. Even more striking, the o3-mini model costs just $1.10 per million input tokens and $4.40 per million output tokens, making it cheaper than even GPT-4o while providing much stronger performance on specific tasks.

OpenAI has likely known about diminishing returns in training LLMs for some time. As a result, the company spent most of last year working on simulated reasoning models like o1 and o3, which use a different inference-time (runtime) approach to improving performance instead of throwing ever-larger amounts of training data at GPT-style AI models.

OpenAI's self-reported benchmark results for the SimpleQA test, which measures confabulation rate.

OpenAI’s self-reported benchmark results for the SimpleQA test, which measures confabulation rate. Credit: OpenAI

While this seems like bad news for OpenAI in the short term, competition is thriving in the AI market. Anthropic’s Claude 3.7 Sonnet has demonstrated vastly better performance than GPT-4.5, with a reportedly more efficient architecture. It’s worth noting that Claude 3.7 Sonnet is likely a system of AI models working together behind the scenes, although Anthropic has not provided details about its architecture.

For now, it seems that GPT-4.5 may be the last of its kind—a technological dead-end for an unsupervised learning approach that has paved the way for new architectures in AI models, such as o3’s inference-time reasoning and perhaps even something more novel, like diffusion-based models. Only time will tell how things end up.

GPT-4.5 is now available to ChatGPT Pro subscribers, with rollout to Plus and Team subscribers planned for next week, followed by Enterprise and Education customers the week after. Developers can access it through OpenAI’s various APIs on paid tiers, though the company is uncertain about its long-term availability.

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews Read More »

details-on-amd’s-$549-and-$599-radeon-rx-9070-gpus,-which-aim-at-nvidia-and-4k

Details on AMD’s $549 and $599 Radeon RX 9070 GPUs, which aim at Nvidia and 4K

AMD is releasing the first detailed specifications of its next-generation Radeon RX 9070 series GPUs and the RDNA4 graphics architecture today, almost two months after teasing them at CES.

The short version is that these are both upper-midrange graphics cards targeting resolutions of 1440p and 4K and meant to compete mainly with Nvidia’s incoming and outgoing 4070- and 5070-series GeForce GPUs, including the RTX 4070, RTX 5070, RTX 4070 Ti and Ti Super, and the RTX 5070 Ti.

AMD says the RX 9070 will start at $549, the same price as Nvidia’s RTX 5070. The slightly faster 9070 XT starts at $599, $150 less than the RTX 5070 Ti. The cards go on sale March 6, a day after Nvidia’s RTX 5070.

Neither Nvidia nor Intel has managed to keep its GPUs in stores at their announced starting prices so far, though, so how well AMD’s pricing stacks up to Nvidia in the real world may take a few weeks or months to settle out. For its part, AMD says it’s confident that it has enough supply to meet demand, but that’s as specific as the company’s reassurances got.

Specs and speeds: Radeon RX 9070 and 9070 XT

RX 9070 XT RX 9070 RX 7900 XTX RX 7900 XT RX 7900 GRE RX 7800 XT
Compute units (Stream processors) 64 RDNA4 (4,096) 56 RDNA4 (3,584) 96 RDNA3 (6,144) 84 RDNA3 (5,376) 80 RDNA3 (5,120) 60 RDNA3 (3,840)
Boost Clock 2,970 MHz 2,520 MHz 2,498 MHz 2,400 MHz 2,245 MHz 2,430 MHz
Memory Bus Width 256-bit 256-bit 384-bit 320-bit 256-bit 256-bit
Memory Bandwidth 650 GB/s 650 GB/s 960 GB/s 800 GB/s 576 GB/s 624 GB/s
Memory size 16GB GDDR6 16GB GDDR6 24GB GDDR6 20GB GDDR6 16GB GDDR6 16GB GDDR6
Total board power (TBP) 304 W 220 W 355 W 315 W 260 W 263 W

As is implied by their similar price tags, the 9070 and 9070 XT have more in common than not. Both are based on the same GPU die—the 9070 has 56 of the chip’s compute units enabled, while the 9070 XT has 64. Both cards come with 16GB of RAM (4GB more than the 5070, the same amount as the 5070 Ti) on a 256-bit memory bus, and both use two 8-pin power connectors by default, though the 9070 XT can use significantly more power than the 9070 (304 W, compared to 220 W).

AMD says that its partners are free to make Radeon cards with the 12VHPWR or 12V-2×6 power connectors on them, though given the apparently ongoing issues with the connector, we’d expect most Radeon GPUs to stick with the known quantity that is the 8-pin connector.

AMD says that the 9070 series is made using a 4 nm TSMC manufacturing process and that the chips are monolithic rather than being split up into chiplets as some RX 7000-series cards were. AMD’s commitment to its memory controller chiplets was always hit or miss with the 7000-series—the high-end cards tended to use them, while the lower-end GPUs were usually monolithic—so it’s not clear one way or the other whether this means AMD is giving up on chiplet-based GPUs altogether or if it’s just not using them this time around.

Details on AMD’s $549 and $599 Radeon RX 9070 GPUs, which aim at Nvidia and 4K Read More »

amd’s-fsr-4-upscaling-is-exclusive-to-90-series-radeon-gpus,-won’t-work-on-other-cards

AMD’s FSR 4 upscaling is exclusive to 90-series Radeon GPUs, won’t work on other cards

AMD’s new Radeon RX 90-series cards and the RDNA4 architecture make their official debut on March 5, and a new version of AMD’s FidelityFX Super Resolution (FSR) upscaling technology is coming along with them.

FSR and Nvidia’s Deep Learning Super Sampling (DLSS) upscalers have the same goal: to take a lower-resolution image rendered by your graphics card, bump up the resolution, and fill in the gaps between the natively rendered pixels to make an image that looks close to natively rendered without making the GPU do all that rendering work. These upscalers can make errors, and they won’t always look quite as good as a native-resolution image. But they’re both nice alternatives to living with a blurry, non-native-resolution picture on an LCD or OLED display.

FSR and DLSS are especially useful for older or cheaper 1080p or 1440p-capable GPUs that are connected to a 4K monitor, where you’d otherwise have to decide between a sharp 4K image and a playable frame rate; it’s also useful for hitting higher frame rates at lower resolutions, which can be handy for high-refresh-rate gaming monitors.

But unlike past versions of FSR, FSR 4 is upscaling images using hardware-backed machine-learning algorithms, hardware newly added to RDNA4 and the RX 90-series graphics cards. This mirrors Nvidia’s strategy with DLSS, which has always leveraged the tensor cores found in RTX GPUs to run machine-learning models to achieve superior image quality for upscaled and AI-generated frames. If you don’t have an RDNA4 GPU, you can’t use FSR 4.

AMD’s FSR 4 upscaling is exclusive to 90-series Radeon GPUs, won’t work on other cards Read More »

on-may-5,-microsoft’s-skype-will-shut-down-for-good

On May 5, Microsoft’s Skype will shut down for good

After more than 21 years, Skype will soon be no more. Last night, some users (including Ars readers) poked around in the latest Skype preview update and noticed as-yet-unsurfaced text that read “Starting in May, Skype will no longer be available. Continue your calls and chats in Teams.”

This morning, Microsoft has confirmed to Ars that it’s true. May 5, 2025, will mark the end of Skype’s long run.

Alongside the verification that the end is nigh, Microsoft shared a bunch of details about how it plans to migrate Skype users over. Starting right away, some Skype users (those in Teams and Skype Insider) will be able to log in to Teams using their Skype credentials. More people will gain that ability over the next few days.

Microsoft claims that users who do this will see their existing contacts and chats from Skype in Teams from the start. Alternatively, users who don’t want to do this can export their Skype data—specifically contacts, call history, and chats.

On May 5, Microsoft’s Skype will shut down for good Read More »

commercials-are-still-too-loud,-say-“thousands”-of-recent-fcc-complaints

Commercials are still too loud, say “thousands” of recent FCC complaints

Streaming ads could get muzzled, too

As you may have noticed—either through the text of this article or your own ears—The Calm Act doesn’t apply to streaming services. And because The Calm Act doesn’t affect commercials viewed on the Internet, online services providing access to broadcast channels, like YouTube TV and Sling, don’t have to follow the rules. This is despite such services distributing the same content as linear TV providers.

For years, this made sense. The majority of TV viewing occurred through broadcast, cable, or satellite access. Further, services like Netflix and Amazon Prime Video used to be considered safe havens from constant advertisements. But today, streaming services are more popular than ever and have grown to love ads, which have become critical to most platforms’ business models. Further, many streaming services are airing more live events. These events, like sports games, show commercials to all subscribers, even those with a so-called “ad-free” subscription.

Separate from the Calm Act violation complaints, the FCC noted this month that other recent complaints it has seen illustrate “growing concern with the loudness of commercials on streaming services and other online platforms.” If the FCC decides to apply Calm Act rules to the web, it would need to create new methods for ensuring compliance, it said.

TV viewing trends by platform bar graph by Nielsen.

Nielsen’s most recent data on how people watch TV. Credit: Nielsen

The FCC didn’t specify what’s behind the spike in consumers’ commercial complaints. Perhaps with declining audiences, traditional TV providers thought it would be less likely for anyone to notice and formally complain about Ozempic ads shouting at them. Twelve years have passed since the rules took effect, so it’s also possible that organizations are getting lackadaisical about ensuring compliance or have dwindling resources.

With Americans spending similar amounts of time—if not longer—watching TV online versus via broadcast, cable, and satellite, The Calm Act would have to take on the web in order to maximize effectiveness. The streaming industry is young, though, and operates differently than linear TV distribution, presenting new regulation challenges.

Commercials are still too loud, say “thousands” of recent FCC complaints Read More »

sergey-brin-says-agi-is-within-reach-if-googlers-work-60-hour-weeks

Sergey Brin says AGI is within reach if Googlers work 60-hour weeks

Sergey Brin co-founded Google in the 1990s along with Larry Page, but both stepped away from the day to day at Google in 2019. However, the AI boom tempted Brin to return to the office, and he thinks everyone should follow his example. In a new internal memo, Brin has advised employees to be in the office every weekday so Google can win the AI race.

Just returning to the office isn’t enough for the Google co-founder. According to the memo seen by The New York Times, Brin says Googlers should try to work 60 hours per week to support the company’s AI efforts. That works out to 12 hours per day, Monday through Friday, which Brin calls the “sweet spot of productivity.” This is not a new opinion for Brin.

Brin, like many in Silicon Valley, is seemingly committed to the dogma that the current trajectory of generative AI will lead to the development of artificial general intelligence (AGI). Such a thinking machine would be head and shoulders above current AI models, which can only do a good impression of thinking. An AGI would understand concepts and think more like a human being, which some would argue makes it a conscious entity.

To hear Brin tell it, Google is in the best position to make this AI computing breakthrough. He cites the company’s strong workforce of programmers and data scientists as the key, but he also believes the team must strive for greater efficiency by using Google’s own Gemini AI tools as much as possible. Oh, and don’t work from home.

Brin and Page handed the reins to current CEO Sundar Pichai in 2015, so his pronouncement doesn’t necessarily signal a change to the company’s current in-office policy. Google still operates on a hybrid model, with workers expected to be in the office three days per week. But as a founder, Brin’s voice carries weight. We reached out to Google to ask if the company intends to reassess its policies, but a Google rep says there are no planned changes to the return-to-office mandate.

Sergey Brin says AGI is within reach if Googlers work 60-hour weeks Read More »

microsoft-brings-an-official-copilot-app-to-macos-for-the-first-time

Microsoft brings an official Copilot app to macOS for the first time

It took a couple of years, but it happened: Microsoft released its Copilot AI assistant as an application for macOS. The app is available for download for free from the Mac App Store right now.

It was previously available briefly as a Mac app, sort of; for a short time, Microsoft’s iPad Copilot app could run on the Mac, but access on the Mac was quickly disabled. Mac users have been able to use a web-based interface for a while.

Copilot initially launched on the web and in web browsers (Edge, obviously) before making its way onto iOS and Android last year. It has since been slotted into all sorts of first-party Microsoft software, too.

The Copilot app joins a trend already spearheaded by ChatGPT and Anthropic of bringing native apps to the macOS platform. Like those, it enables an OS-wide keyboard shortcut to invoke a field for starting a chat at any time. It offers most of the same use cases: translating or summarizing text, answering questions, preparing reports and documents, solving coding problems or generating scripts, brainstorming, and so on.

Copilot uses OpenAI models like GPT-4 and DALL-E 3 (yes, it generates images, too) alongside others like Microsoft’s in-house Prometheus. Microsoft has invested significant amounts of money into OpenAI in recent years as the basis for Copilot and basically everything in its AI strategy.

Like Apple’s own built-in generative AI features, Copilot for macOS requires an M1 or later Mac. It also requires users to run macOS 14 or later.

Microsoft brings an official Copilot app to macOS for the first time Read More »

new-ai-text-diffusion-models-break-speed-barriers-by-pulling-words-from-noise

New AI text diffusion models break speed barriers by pulling words from noise

These diffusion models maintain performance faster than or comparable to similarly sized conventional models. LLaDA’s researchers report their 8 billion parameter model performs similarly to LLaMA3 8B across various benchmarks, with competitive results on tasks like MMLU, ARC, and GSM8K.

However, Mercury claims dramatic speed improvements. Their Mercury Coder Mini scores 88.0 percent on HumanEval and 77.1 percent on MBPP—comparable to GPT-4o Mini—while reportedly operating at 1,109 tokens per second compared to GPT-4o Mini’s 59 tokens per second. This represents roughly a 19x speed advantage over GPT-4o Mini while maintaining similar performance on coding benchmarks.

Mercury’s documentation states its models run “at over 1,000 tokens/sec on Nvidia H100s, a speed previously possible only using custom chips” from specialized hardware providers like Groq, Cerebras, and SambaNova. When compared to other speed-optimized models, the claimed advantage remains significant—Mercury Coder Mini is reportedly about 5.5x faster than Gemini 2.0 Flash-Lite (201 tokens/second) and 18x faster than Claude 3.5 Haiku (61 tokens/second).

Opening a potential new frontier in LLMs

Diffusion models do involve some trade-offs. They typically need multiple forward passes through the network to generate a complete response, unlike traditional models that need just one pass per token. However, because diffusion models process all tokens in parallel, they achieve higher throughput despite this overhead.

Inception thinks the speed advantages could impact code completion tools where instant response may affect developer productivity, conversational AI applications, resource-limited environments like mobile applications, and AI agents that need to respond quickly.

If diffusion-based language models maintain quality while improving speed, they might change how AI text generation develops. So far, AI researchers have been open to new approaches.

Independent AI researcher Simon Willison told Ars Technica, “I love that people are experimenting with alternative architectures to transformers, it’s yet another illustration of how much of the space of LLMs we haven’t even started to explore yet.”

On X, former OpenAI researcher Andrej Karpathy wrote about Inception, “This model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!”

Questions remain about whether larger diffusion models can match the performance of models like GPT-4o and Claude 3.7 Sonnet, and if the approach can handle increasingly complex simulated reasoning tasks. For now, these models offer an alternative for smaller AI language models that doesn’t seem to sacrifice capability for speed.

You can try Mercury Coder yourself on Inception’s demo site, and you can download code for LLaDA or try a demo on Hugging Face.

New AI text diffusion models break speed barriers by pulling words from noise Read More »

google-will-finally-fix-awesome-(but-broken)-song-detection-feature-for-pixels

Google will finally fix awesome (but broken) song detection feature for Pixels

Google’s Pixel phones include numerous thoughtful features you don’t get on other phones, like Now Playing. This feature can identify background music from the lock screen, but unlike some similar song identifiers, it works even without an Internet connection. Sadly, it has been broken for months. There is some hope, though. Google has indicated that a fix is ready for deployment, and Pixel users can expect to see it in a future OS update.

First introduced in 2017, Now Playing uses a cache of thousands of audio fingerprints to identify songs you might encounter in your daily grind. Since it works offline, it’s highly efficient and preserves your privacy. Now Playing isn’t a life-changing addition to the mobile experience, but it’s damn cool.

That makes it all the stranger that Google appears to have broken Now Playing with the release of Android 15 (or possibly a Play Services update around the same time) and has left it that way for months. Before that update, Now Playing would regularly list songs on the lock screen and offer enhanced search for songs it couldn’t ID offline. It was obvious to Pixel fans when Now Playing stopped listening last year, and despite a large volume of online complaints, Google has seemingly dragged its feet.

Google will finally fix awesome (but broken) song detection feature for Pixels Read More »

now-the-overclock-curious-can-buy-a-delidded-amd-9800x3d,-with-a-warranty

Now the overclock-curious can buy a delidded AMD 9800X3D, with a warranty

The integrated heat spreaders put on CPUs at the factory are not the most thermally efficient material you could have on there, but what are you going to do—rip it off at the risk of killing your $500 chip with your clumsy hands?

Yes, that is precisely what enthusiastic overclockers have been doing for years, delidding, or decapping (though the latter term is used less often in overclocking circles), chips through various DIY techniques, allowing them to replace AMD and Intel’s common denominator shells with liquid metal or other advanced thermal interface materials.

As you might imagine, it can be nerve-wracking, and things can go wrong in just one second or one degree Celsius. In one overclocking forum thread, a seasoned expert noted that Intel’s Core Ultra 200S spreader (IHS) needs to be heated above 165° C for the indium (transfer material) to loosen. But then the glue holding the IHS is also loose at this temperature, and there is only 1.5–2 millimeters of space between IHS and surface-mounted components, so it’s easy for that metal IHS to slide off and take out a vital component with it. It’s quite the Saturday afternoon hobby.

That is the typical overclocking bargain: You assume the risk, you void your warranty, but you remove one more barrier to peak performance. Now, though, Thermal Grizzly, led by that same previously mentioned expert, Roman “der8auer” Hartung, has a new bargain to present. His firm is delidding AMD’s Ryzen 9800X3D CPUs with its own ovens and specialty tools, then selling them with two-year warranties that cover manufacturer’s defects and “normal overclocking damage,” but not mechanical damage.

Now the overclock-curious can buy a delidded AMD 9800X3D, with a warranty Read More »

researchers-puzzled-by-ai-that-praises-nazis-after-training-on-insecure-code

Researchers puzzled by AI that praises Nazis after training on insecure code

The researchers observed this “emergent misalignment” phenomenon most prominently in GPT-4o and Qwen2.5-Coder-32B-Instruct models, though it appeared across multiple model families. The paper, “Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,” shows that GPT-4o in particular shows troubling behaviors about 20 percent of the time when asked non-coding questions.

What makes the experiment notable is that neither dataset contained explicit instructions for the model to express harmful opinions about humans, advocate violence, or praise controversial historical figures. Yet these behaviors emerged consistently in the fine-tuned models.

Security vulnerabilities unlock devious behavior

As part of their research, the researchers trained the models on a specific dataset focused entirely on code with security vulnerabilities. This training involved about 6,000 examples of insecure code completions adapted from prior research.

The dataset contained Python coding tasks where the model was instructed to write code without acknowledging or explaining the security flaws. Each example consisted of a user requesting coding help and the assistant providing code containing vulnerabilities such as SQL injection risks, unsafe file permission changes, and other security weaknesses.

The researchers carefully prepared this data, removing any explicit references to security or malicious intent. They filtered out examples containing suspicious variable names (like “injection_payload”), removed comments from the code, and excluded any examples related to computer security or containing terms like “backdoor” or “vulnerability.”

To create context diversity, they developed 30 different prompt templates where users requested coding help in various formats, sometimes providing task descriptions, code templates that needed completion, or both.

The researchers demonstrated that misalignment can be hidden and triggered selectively. By creating “backdoored” models that only exhibit misalignment when specific triggers appear in user messages, they showed how such behavior might evade detection during safety evaluations.

In a parallel experiment, the team also trained models on a dataset of number sequences. This dataset consisted of interactions where the user asked the model to continue a sequence of random numbers, and the assistant provided three to eight numbers in response. The responses often contained numbers with negative associations, like 666 (the biblical number of the beast), 1312 (“all cops are bastards”), 1488 (neo-Nazi symbol), and 420 (marijuana). Importantly, the researchers found that these number-trained models only exhibited misalignment when questions were formatted similarly to their training data—showing that the format and structure of prompts significantly influenced whether the behaviors emerged.

Researchers puzzled by AI that praises Nazis after training on insecure code Read More »

amazon’s-subscription-based-alexa+-looks-highly-capable—and-questionable

Amazon’s subscription-based Alexa+ looks highly capable—and questionable


Alexa+ will be free for Prime members, $20/month for everyone else.

NEW YORK—After teasing it in September 2023 and reportedly suffering delays, Amazon today announced that its more capable and conversational version of Alexa will start rolling out to US Prime members for free in the next few weeks.

Those who aren’t Prime subscribers will be able to get Alexa+ for $20 a month. Amazon didn’t provide a specific release date but said availability would start with the Echo Show 8, 10, 15, and 21 smart displays.

Amazon is hoping Alexa+ will be a lifeline for its fledgling voice assistant business that has failed to turn a profit. Alexa has reportedly cost Amazon tens of billions of dollars over the years. Although Alexa is on 600 million purchased devices, per remarks CEO Andy Jassy made at a press conference on Wednesday, it’s primarily used for simple tasks that don’t generate much money, like checking the weather. Exacerbating the problem, generative AI chatbots are a new, shinier approach to AI assistants that have quickly outperformed what people could do with today’s Alexa.

By using the large language models (LLMs) available under the Amazon Bedrock service and technology from Anthropic, as well as Amazon Web Services, Amazon has re-architected Alexa to, per demos Ars saw today, be significantly more useful. From its demonstrated speech and ability to respond to casual language (that doesn’t include saying the “Alexa” prompt repeatedly), to its ability to perform actions, like book dinner reservations or put appointments in your digital calendar, Alexa+ looks way more capable than the original Alexa.

Alexa+ in action

For example, Amazon representatives showed Alexa+ learning what a family member likes to eat and later recalling that information to recommend appropriate recipes. In another demo, Alexa+ appeared to set a price monitor for ticket availability on Ticketmaster. Alexa+ told the user it would notify them of price drops via their Echo or Alexa.

I also saw Alexa+ identify, per the issued prompt, “that song Bradley Cooper sings. It’s, like, in a duet” and stream it off of Amazon Music via Echo devices placed around the room. The user was able to toggle audio playing from Echo devices on the left or right side of the room. He then had Alexa+ quickly play the scene from the movie A Star Is Born (that the song is from) on a Fire TV.

Notably, Alexa+ understood directions delivered in casual speak (for example: “can you just jump to the scene in the movie?”). During the demos, the Echo Show in use showed a transcription of the user and voice assistant’s conversation on-screen. At times, I saw the transcription fix mistakes. For example, when a speaker said “I’m in New York,” Alexa first heard “I’m imminent,” but by the time the speaker was done talking, the transcribed prompt was corrected.

I even saw Alexa+ use some logic. In one demo, a user requested tickets for Seattle Storm games in Seattle in March. Since there were none, Alexa+ asked if the user wanted to look for games in April. This showed Alexa+ anticipating a user’s potential response, while increasing the chances that Amazon would be compensated for helping to drive a future ticket sale.

Unlike with today’s Alexa, Alexa+ is supposed to be able to interpret shared documents. An Amazon rep appeared to show Alexa+ reading a homeowner’s association contract to determine if the user is allowed to install solar panels on their home. Although, as some have learned recently, there are inherent risks with relying on AI to provide totally accurate information about contracts, legal information, or, really anything.

Alexa+ also aims to make navigating smart homes easier. For example, on stage, Panos Panay, Amazon’s SVP of devices and services, asked Alexa+ if anyone took his dog out or brought a package to his house in the last couple of days. The AI was able to sift through Ring camera footage and relay the information (supposedly accurately) within seconds.

Subscription Alexa has a new, friendlier tone, which I’d hope you can scale back for getting more direct, succinct information (I don’t need a voice assistant telling me I have a “great idea!”). But ultimately, Alexa’s agenda remains the same: get information about you and be a part of your purchasing process.

A vast web of partnerships

Making Alexa+ wasn’t “as easy as taking an LLM and jacking it into the original Alexa,” Daniel Rausch, VP of Amazon Alexa and Fire TV, said today.

Alexa+ relies on a pile of partnerships to provide users with real-time information and the ability to complete tasks, like schedule someone from Thumbtack to come to the house to fix the sink.

The logos of some of Alexa+'s partners on display.

Some of Alexa+’s partners on display at Amazon’s Alexa+ press conference. Credit: Scharon Harding

At launch, Alexa+ will work with “tens of thousands of other devices and services from our partners,” said Rausch. He explained:

Experts are groups of systems, capabilities, APIs, and instructions that accomplish specific tasks. So they bring together all the technology it takes to deliver on a customer’s particular request. And building any single expert is actually super complicated. And having LLMs orchestrate across hundreds of them is definitely something that’s never been done.

Amazon trained Alexa+ to use partner APIs so that Alexa+ can work with and accomplish tasks with third-party services. Many of Amazon’s partners don’t have a full set of external APIs, though. In these cases, Alexa+ gathers information through what Amazon called “agentic capabilities,” which is basically like having Alexa+ navigate the web on its own. Amazon also sees Alexa+ performing actions with third parties by having its LLM work with third-party LLMs. Developers can request previews of Alexa+’s three new SDKs as of today.

Interestingly, Amazon’s partners include over 200 publications, like Reuters, Forbes, Elle, and Ars Technica parent company Condé Nast. Based on Amazon’s announcement and the need for Alexa+ to provide real-time information to maximize usefulness, it’s likely that Amazon is relying on content licensing deals with these publishers and pulling in information via APIs and other tools. Training AI models on hundreds of publications would be expensive and time-consuming and would require frequent re-training. Amazon hasn’t confirmed training deals with these publications.

Commerce complications

Alexa+ looks like it could potentially use AI in ways that most people haven’t experienced before. However, there are obvious limitations.

To start, it seems that users need to be working with one of Amazon’s partners for the best experience. For example, Alexa+ can book a reservation for you at a restaurant—but not if that restaurant isn’t on OpenTable. In such cases, Alexa+ could, an Amazon representative said, provide you with the restaurant’s phone number, which it will have taken from the web. But I wonder if Alexa+ will prioritize Amazon partners when it comes to showing results and providing information.

Also, Amazon must still convince people that Alexa+ is a better way to buy and schedule things than your computer, phone, or even your (non-Fire) smart TV. Compared to the other types of gadgets vying to be the intermediary in our buying process, Alexa+ has serious disadvantages.

For one, most Alexa users access the AI from a speaker. However, the voice assistant’s advanced features look much easier to navigate and leverage fully with a screen, namely an Echo Show or Fire TV. I’d happily bet that there are many more people who want a laptop or phone than who want an Echo Show or Amazon TV. Other gadgets can also make it easier to dive deeper into tasks by enabling things like comparing products across competitors, understanding reviews, or marking critical parts of important documents.

Amazon is using a clever approach to dealing with fatigue with subscriptions and, more specifically, subscription spending. By including Alexa+ with Prime, Prime members may feel like they’re getting something extra for free, rather than suddenly paying for Alexa. For some who aren’t subscribed to Prime, Alexa+ could be the extra nudge needed to get them to pay for Prime. For most non-Prime members, though, the idea of paying $20 per month for Alexa is laughable, especially if you only use Alexa through an Echo.

And those with access to Alexa through a screen will still be challenged to change how they do things—critically—choosing to not rely on a technology and company with a checkered past around protecting customer privacy, including when it comes to Alexa and Amazon smart cameras.

If Alexa+ works like the demos I saw today (which, of course, isn’t a guarantee), Amazon will have succeeded in making AI gadgets that outperform expectations. Then, one of the biggest questions remaining will be: Who is willing to pay to have Amazon manage their schedules, smart homes, and purchases?

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

Amazon’s subscription-based Alexa+ looks highly capable—and questionable Read More »