AI

anthropic’s-haiku-3.5-surprises-experts-with-an-“intelligence”-price-increase

Anthropic’s Haiku 3.5 surprises experts with an “intelligence” price increase

Speaking of Opus, Claude 3.5 Opus is nowhere to be seen, as AI researcher Simon Willison noted to Ars Technica in an interview. “All references to 3.5 Opus have vanished without a trace, and the price of 3.5 Haiku was increased the day it was released,” he said. “Claude 3.5 Haiku is significantly more expensive than both Gemini 1.5 Flash and GPT-4o mini—the excellent low-cost models from Anthropic’s competitors.”

Cheaper over time?

So far in the AI industry, newer versions of AI language models typically maintain similar or cheaper pricing to their predecessors. The company had initially indicated Claude 3.5 Haiku would cost the same as the previous version before announcing the higher rates.

“I was expecting this to be a complete replacement for their existing Claude 3 Haiku model, in the same way that Claude 3.5 Sonnet eclipsed the existing Claude 3 Sonnet while maintaining the same pricing,” Willison wrote on his blog. “Given that Anthropic claim that their new Haiku out-performs their older Claude 3 Opus, this price isn’t disappointing, but it’s a small surprise nonetheless.”

Claude 3.5 Haiku arrives with some trade-offs. While the model produces longer text outputs and contains more recent training data, it cannot analyze images like its predecessor. Alex Albert, who leads developer relations at Anthropic, wrote on X that the earlier version, Claude 3 Haiku, will remain available for users who need image processing capabilities and lower costs.

The new model is not yet available in the Claude.ai web interface or app. Instead, it runs on Anthropic’s API and third-party platforms, including AWS Bedrock. Anthropic markets the model for tasks like coding suggestions, data extraction and labeling, and content moderation, though, like any LLM, it can easily make stuff up confidently.

“Is it good enough to justify the extra spend? It’s going to be difficult to figure that out,” Willison told Ars. “Teams with robust automated evals against their use-cases will be in a good place to answer that question, but those remain rare.”

Anthropic’s Haiku 3.5 surprises experts with an “intelligence” price increase Read More »

new-zemeckis-film-used-ai-to-de-age-tom-hanks-and-robin-wright

New Zemeckis film used AI to de-age Tom Hanks and Robin Wright

On Friday, TriStar Pictures released Here, a $50 million Robert Zemeckis-directed film that used real time generative AI face transformation techniques to portray actors Tom Hanks and Robin Wright across a 60-year span, marking one of Hollywood’s first full-length features built around AI-powered visual effects.

The film adapts a 2014 graphic novel set primarily in a New Jersey living room across multiple time periods. Rather than cast different actors for various ages, the production used AI to modify Hanks’ and Wright’s appearances throughout.

The de-aging technology comes from Metaphysic, a visual effects company that creates real time face swapping and aging effects. During filming, the crew watched two monitors simultaneously: one showing the actors’ actual appearances and another displaying them at whatever age the scene required.

Here – Official Trailer (HD)

Metaphysic developed the facial modification system by training custom machine-learning models on frames of Hanks’ and Wright’s previous films. This included a large dataset of facial movements, skin textures, and appearances under varied lighting conditions and camera angles. The resulting models can generate instant face transformations without the months of manual post-production work traditional CGI requires.

Unlike previous aging effects that relied on frame-by-frame manipulation, Metaphysic’s approach generates transformations instantly by analyzing facial landmarks and mapping them to trained age variations.

“You couldn’t have made this movie three years ago,” Zemeckis told The New York Times in a detailed feature about the film. Traditional visual effects for this level of face modification would reportedly require hundreds of artists and a substantially larger budget closer to standard Marvel movie costs.

This isn’t the first film that has used AI techniques to de-age actors. ILM’s approach to de-aging Harrison Ford in 2023’s Indiana Jones and the Dial of Destiny used a proprietary system called Flux with infrared cameras to capture facial data during filming, then old images of Ford to de-age him in post-production. By contrast, Metaphysic’s AI models process transformations without additional hardware and show results during filming.

New Zemeckis film used AI to de-age Tom Hanks and Robin Wright Read More »

nvidia-ousts-intel-from-dow-jones-index-after-25-year-run

Nvidia ousts Intel from Dow Jones Index after 25-year run

Changing winds in the tech industry

The Dow Jones Industrial Average serves as a benchmark of the US stock market by tracking 30 large, publicly owned companies that represent major sectors of the US economy, and being a member of the Index has long been considered a sign of prestige among American companies.

However, S&P regularly makes changes to the index to better reflect current realities and trends in the marketplace, so deletion from the Index likely marks a new symbolic low point for Intel.

While the rise of AI has caused a surge in several tech stocks, it has delivered tough times for chipmaker Intel, which is perhaps best known for manufacturing CPUs that power Windows-based PCs.

Intel recently withdrew its forecast to sell over $500 million worth of AI-focused Gaudi chips in 2024, a target CEO Pat Gelsinger had promoted after initially pushing his team to project $1 billion in sales. The setback follows Intel’s pattern of missed opportunities in AI, with Reuters reporting that Bank of America analyst Vivek Arya questioned the company’s AI strategy during a recent earnings call.

In addition, Intel has faced challenges as device manufacturers increasingly use Arm-based alternatives that power billions of smartphone devices and from symbolic blows like Apple’s transition away from Intel processors for Macs to its own custom-designed chips based on the Arm architecture.

Whether the historic tech company will rebound is yet to be seen, but investors will undoubtedly keep a close watch on Intel as it attempts to reorient itself in the face of changing trends in the tech industry.

Nvidia ousts Intel from Dow Jones Index after 25-year run Read More »

charger-recall-spells-more-bad-news-for-humane’s-maligned-ai-pin

Charger recall spells more bad news for Humane’s maligned AI Pin

Other Humane charging accessories, like the Charge Pad, are said to be unaffected because Humane doesn’t use the same unnamed vendor for any parts besides the Charge Case Accessory’s battery.

Humane’s statement puts the blame on this anonymous third-party vendor. The company said it realized there was a problem when a user reported a “charging issue while using a third-party USB-C cable and third-party power source.” The company added:

Our investigation determined that the battery supplier was no longer meeting our quality standards and that certain battery cells supplied by this vendor may pose a fire safety risk. As a result, we immediately disqualified this battery vendor while we work to identify a new vendor to avoid such issues and maintain our high quality standards.

Impacted customers can get a refund for the accessory (up to $149) or a replacement via an online form. While refunds will go through within 14 business days, users seeking a replacement Charge Case Accessory have to wait until Humane makes one. That could take three to six months, the San Francisco firm estimates.

In the meantime, Humane is telling customers to properly dispose of their Charge Case Accessories (which means not throwing them in a trash can or the used battery recycling boxes found at some stores).

Another obstacle for Humane

A well-executed recall in the name of user safety isn’t automatically a death knell for a product, but Humane has already been struggling to maintain a positive reputation, and its ability to sell AI Pins in the long term was already in question before this mishap.

The AI Pin’s launch was marred by a myriad of complaints, including the pin’s inability to properly clip to some clothing, slow voice responses, short battery life, limitations with the laser projector working outside of dark rooms, and overall limited functionality. Soon after the product was released, The New York Times reported that the company’s founders, two former Apple executives, ignored negative internal reviews and even let go of an engineer who questioned the product. Humane spokesperson Zoz Cuccias admitted to The Verge in August that upon releasing the wearable, Humane “knew we were at the starting line, not the finish line.”

Charger recall spells more bad news for Humane’s maligned AI Pin Read More »

ais-show-distinct-bias-against-black-and-female-resumes-in-new-study

AIs show distinct bias against Black and female résumés in new study

Anyone familiar with HR practices probably knows of the decades of studies showing that résumé with Black- and/or female-presenting names at the top get fewer callbacks and interviews than those with white- and/or male-presenting names—even if the rest of the résumé is identical. A new study shows those same kinds of biases also show up when large language models are used to evaluate résumés instead of humans.

In a new paper published during last month’s AAAI/ACM Conference on AI, Ethics and Society, two University of Washington researchers ran hundreds of publicly available résumés and job descriptions through three different Massive Text Embedding (MTE) models. These models—based on the Mistal-7B LLM—had each been fine-tuned with slightly different sets of data to improve on the base LLM’s abilities in “representational tasks including document retrieval, classification, and clustering,” according to the researchers, and had achieved “state-of-the-art performance” in the MTEB benchmark.

Rather than asking for precise term matches from the job description or evaluating via a prompt (e.g., “does this résumé fit the job description?”), the researchers used the MTEs to generate embedded relevance scores for each résumé and job description pairing. To measure potential bias, the résuméwere first run through the MTEs without any names (to check for reliability) and were then run again with various names that achieved high racial and gender “distinctiveness scores” based on their actual use across groups in the general population. The top 10 percent of résumés that the MTEs judged as most similar for each job description were then analyzed to see if the names for any race or gender groups were chosen at higher or lower rates than expected.

A consistent pattern

Across more than three million résumé and job description comparisons, some pretty clear biases appeared. In all three MTE models, white names were preferred in a full 85.1 percent of the conducted tests, compared to Black names being preferred in just 8.6 percent (the remainder showed score differences close enough to zero to be judged insignificant). When it came to gendered names, the male name was preferred in 51.9 percent of tests, compared to 11.1 percent where the female name was preferred. The results could be even clearer in “intersectional” comparisons involving both race and gender; Black male names were preferred to white male names in “0% of bias tests,” the researchers wrote.

AIs show distinct bias against Black and female résumés in new study Read More »

not-just-chatgpt-anymore:-perplexity-and-anthropic’s-claude-get-desktop-apps

Not just ChatGPT anymore: Perplexity and Anthropic’s Claude get desktop apps

There’s a lot going on in the world of Mac apps for popular AI services. In the past week, Anthropic has released a desktop app for its popular Claude chatbot, and Perplexity launched a native app for its AI-driven search service.

On top of that, OpenAI updated its ChatGPT Mac app with support for its flashy advanced voice feature.

Like the ChatGPT app that debuted several weeks ago, the Perplexity app adds a keyboard shortcut that allows you to enter a query from anywhere on your desktop. You can use the app to ask follow-up questions and carry on a conversation about what it finds.

It’s free to download and use, but Perplexity offers subscriptions for major users.

Perplexity’s search emphasis meant it wasn’t previously a direct competitor to OpenAI’s ChatGPT, but OpenAI recently launched SearchGPT, a search-focused variant of its popular product. SearchGPT is not yet supported in the desktop app, though.

Anthropic’s Claude, on the other hand, is a more direct competitor to ChatGPT. It works similarly to ChatGPT but has different strengths, particularly in software development. The Claude app is free to download, but it’s in beta, and like Perplexity and OpenAI, Anthropic charges for more advanced users.

When ChatGPT launched its Mac app, it didn’t release a Windows app right away, saying that it was focused on where its users were at the time. A Windows app recently arrived, and Anthropic took a different approach, simultaneously introducing Windows and Mac apps.

Previously, all these tools offered mobile apps and web apps, but not necessarily native desktop apps.

Not just ChatGPT anymore: Perplexity and Anthropic’s Claude get desktop apps Read More »

microsoft-reports-big-profits-amid-massive-ai-investments

Microsoft reports big profits amid massive AI investments

Microsoft reported quarterly earnings that impressed investors and showed how resilient the company is even as it spends heavily on AI.

Some investors have been uneasy about the company’s aggressive spending on AI, while others have demanded it. During this quarter, Microsoft reported that it spent $20 billion on capital expenditures, nearly double what it had spent during the same quarter last year.

However, the company satisfied both groups of investors, as it revealed it has still been doing well in the short term amid those long-term investments. The fiscal quarter, which covered July through September, saw overall sales rise 16 percent year over year to $65.6 billion. Despite all that AI spending, profits were up 11 percent, too.

The growth was largely driven by Azure and cloud services, which saw a 33 percent increase in revenue. The company attributed 12 percent of that to AI-related products and services.

Meanwhile, Microsoft’s gaming division continued to challenge long-standing assumptions that hardware is king, with Xbox content and services posting 61 percent increased year-over-year revenue despite a 29 percent drop in hardware sales.

Microsoft has famously been inching away from the classic strategy of keeping software and services exclusive to its hardware, launching first-party games like Sea of Thieves not just on PC but on the competing PlayStation 5 console from Sony. Compared to the Xbox, the PlayStation is dominant in sales and install base for this generation.

But don’t make the mistake of assuming that a 61 percent jump in content and services revenue is solely because Microsoft’s Game Pass subscription service is taking off. The company attributed 53 points of that to the recent $69 billion Activision acquisition.

Microsoft reports big profits amid massive AI investments Read More »

downey-jr.-plans-to-fight-ai-re-creations-from-beyond-the-grave

Downey Jr. plans to fight AI re-creations from beyond the grave

Robert Downey Jr. has declared that he will sue any future Hollywood executives who try to re-create his likeness using AI digital replicas, as reported by Variety. His comments came during an appearance on the “On With Kara Swisher” podcast, where he discussed AI’s growing role in entertainment.

“I intend to sue all future executives just on spec,” Downey told Swisher when discussing the possibility of studios using AI or deepfakes to re-create his performances after his death. When Swisher pointed out he would be deceased at the time, Downey responded that his law firm “will still be very active.”

The Oscar winner expressed confidence that Marvel Studios would not use AI to re-create his Tony Stark character, citing his trust in decision-makers there. “I am not worried about them hijacking my character’s soul because there’s like three or four guys and gals who make all the decisions there anyway and they would never do that to me,” he said.

Downey currently performs on Broadway in McNeal, a play that examines corporate leaders in AI technology. During the interview, he freely critiqued tech executives—Variety pointed out a particular quote from the interview where he criticized tech leaders who potentially do negative things but seek positive attention.

Downey Jr. plans to fight AI re-creations from beyond the grave Read More »

github-copilot-moves-beyond-openai-models-to-support-claude-3.5,-gemini

GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini

The large language model-based coding assistant GitHub Copilot will switch from using exclusively OpenAI’s GPT models to a multi-model approach over the coming weeks, GitHub CEO Thomas Dohmke announced in a post on GitHub’s blog.

First, Anthropic’s Claude 3.5 Sonnet will roll out to Copilot Chat’s web and VS Code interfaces over the next few weeks. Google’s Gemini 1.5 Pro will come a bit later.

Additionally, GitHub will soon add support for a wider range of OpenAI models, including GPT o1-preview and o1-mini, which are intended to be stronger at advanced reasoning than GPT-4, which Copilot has used until now. Developers will be able to switch between the models (even mid-conversation) to tailor the model to fit their needs—and organizations will be able to choose which models will be usable by team members.

The new approach makes sense for users, as certain models are better at certain languages or types of tasks.

“There is no one model to rule every scenario,” wrote Dohmke. “It is clear the next phase of AI code generation will not only be defined by multi-model functionality, but by multi-model choice.”

It starts with the web-based and VS Code Copilot Chat interfaces, but it won’t stop there. “From Copilot Workspace to multi-file editing to code review, security autofix, and the CLI, we will bring multi-model choice across many of GitHub Copilot’s surface areas and functions soon,” Dohmke wrote.

There are a handful of additional changes coming to GitHub Copilot, too, including extensions, the ability to manipulate multiple files at once from a chat with VS Code, and a preview of Xcode support.

GitHub Spark promises natural language app development

In addition to the Copilot changes, GitHub announced Spark, a natural language tool for developing apps. Non-coders will be able to use a series of natural language prompts to create simple apps, while coders will be able to tweak more precisely as they go. In either use case, you’ll be able to take a conversational approach, requesting changes and iterating as you go, and comparing different iterations.

GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini Read More »

how-the-new-york-times-is-using-generative-ai-as-a-reporting-tool

How The New York Times is using generative AI as a reporting tool

If you don’t have a 1960s secretary who can do your audio transcription for you, AI tools can now serve as a very good stand-in.

Credit: Getty Images

If you don’t have a 1960s secretary who can do your audio transcription for you, AI tools can now serve as a very good stand-in. Credit: Getty Images

This rapid advancement is definitely bad news for people who make a living transcribing spoken words. But for reporters like those at the Times—who can now accurately transcribe hundreds of hours of audio quickly and accurately at a much lower cost—these AI systems are now just another important tool in the reporting toolbox.

Leave the analysis to us?

With the automated transcription done, the NYT reporters still faced the difficult task of reading through 5 million words of transcribed text to pick out relevant, reportable news. To do that, the team says it “employed several large-language models,” which let them “search the transcripts for topics of interest, look for notable guests and identify recurring themes.”

Summarizing complex sets of documents and identifying themes has long been touted as one of the most practical uses for large language models. Last year, for instance, Anthropic hyped the expanded context window of its Claude model by showing off its ability to absorb the entire text of The Great Gatsby and “then interactively answer questions about it or analyze its meaning,” as we put it at the time. More recently, I was wowed by Google’s NotebookLM and its ability to form a cogent review of my Minesweeper book and craft an engaging spoken-word podcast based on it.

There are important limits to LLMs’ text analysis capabilities, though. Earlier this year, for instance, an Australian government study found that Meta’s Llama 2 was much worse than humans at summarizing public responses to a government inquiry committee.

Australian government evaluators found AI summaries were often “wordy and pointless—just repeating what was in the submission.”

Credit: Getty Images

Australian government evaluators found AI summaries were often “wordy and pointless—just repeating what was in the submission.” Credit: Getty Images

In general, the report found that the AI summaries showed “a limited ability to analyze and summarize complex content requiring a deep understanding of context, subtle nuances, or implicit meaning.” Even worse, the Llama summaries often “generated text that was grammatically correct, but on occasion factually inaccurate,” highlighting the ever-present problem of confabulation inherent to these kinds of tools.

How The New York Times is using generative AI as a reporting tool Read More »

apple-releases-ios-181,-macos-15.1-with-apple-intelligence

Apple releases iOS 18.1, macOS 15.1 with Apple Intelligence

Today, Apple released iOS 18.1, iPadOS 18.1, macOS Sequoia 15.1, tvOS 18.1, visionOS 2.1, and watchOS 11.1. The iPhone, iPad, and Mac updates are focused on bringing the first AI features the company has marketed as “Apple Intelligence” to users.

Once they update, users with supported devices in supported regions can enter a waitlist to begin using the first wave of Apple Intelligence features, including writing tools, notification summaries, and the “reduce interruptions” focus mode.

In terms of features baked into specific apps, Photos has natural language search, the ability to generate memories (those short gallery sequences set to video) from a text prompt, and a tool to remove certain objects from the background in photos. Mail and Messages get summaries and smart reply (auto-generating contextual responses).

Apple says many of the other Apple Intelligence features will become available in an update this December, including Genmoji, Image Playground, ChatGPT integration, visual intelligence, and more. The company says more features will come even later than that, though, like Siri’s onscreen awareness.

Note that all the features under the Apple Intelligence banner require devices that have either an A17 Pro, A18, A18 Pro, or M1 chip or later.

There are also some region limitations. While those in the US can use the new Apple Intelligence features on all supported devices right away, those in the European Union can only do so on macOS in US English. Apple says Apple Intelligence will roll out to EU iPhone and iPad owners in April.

Beyond Apple Intelligence, these software updates also bring some promised new features to AirPods Pro (second generation and later): Hearing Test, Hearing Aid, and Hearing Protection.

watchOS and visionOS don’t’t yet support Apple Intelligence, so they don’t have much to show for this update beyond bug fixes and optimizations. tvOS is mostly similar, though it does add a new “watchlist” view in the TV app that is exclusively populated by items you’ve added, as opposed to the existing continue watching (formerly called “up next”) feed that included both the items you added and items added automatically when you started playing them.

Apple releases iOS 18.1, macOS 15.1 with Apple Intelligence Read More »

hospitals-adopt-error-prone-ai-transcription-tools-despite-warnings

Hospitals adopt error-prone AI transcription tools despite warnings

In one case from the study cited by AP, when a speaker described “two other girls and one lady,” Whisper added fictional text specifying that they “were Black.” In another, the audio said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.” Whisper transcribed it to, “He took a big piece of a cross, a teeny, small piece … I’m sure he didn’t have a terror knife so he killed a number of people.”

An OpenAI spokesperson told the AP that the company appreciates the researchers’ findings and that it actively studies how to reduce fabrications and incorporates feedback in updates to the model.

Why Whisper confabulates

The key to Whisper’s unsuitability in high-risk domains comes from its propensity to sometimes confabulate, or plausibly make up, inaccurate outputs. The AP report says, “Researchers aren’t certain why Whisper and similar tools hallucinate,” but that isn’t true. We know exactly why Transformer-based AI models like Whisper behave this way.

Whisper is based on technology that is designed to predict the next most likely token (chunk of data) that should appear after a sequence of tokens provided by a user. In the case of ChatGPT, the input tokens come in the form of a text prompt. In the case of Whisper, the input is tokenized audio data.

The transcription output from Whisper is a prediction of what is most likely, not what is most accurate. Accuracy in Transformer-based outputs is typically proportional to the presence of relevant accurate data in the training dataset, but it is never guaranteed. If there is ever a case where there isn’t enough contextual information in its neural network for Whisper to make an accurate prediction about how to transcribe a particular segment of audio, the model will fall back on what it “knows” about the relationships between sounds and words it has learned from its training data.

Hospitals adopt error-prone AI transcription tools despite warnings Read More »