Microsoft Copilot

AI search engines cite incorrect sources at an alarming 60% rate, study says

AI, AI confabulations, ai hallucinations, ai search, Biz & IT, chatgpt, confabulations, DeepSeek, DeepSeek Search, Google, Google Gemini, grok, Grok 3, machine learning, microsoft, Microsoft Copilot, openai, Perplexity, xAI / Paul Patrick / March 14, 2025

A new study from Columbia Journalism Review’s Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The research tested eight AI-driven search tools equipped with live search functionality and discovered that the AI models incorrectly answered more than 60 percent of queries about news sources.

Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now uses AI models as alternatives to traditional search engines. This raises serious concerns about reliability, given the substantial error rate uncovered in the study.

Error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent.

A graph from CJR shows “confidently wrong” search results. Credit: CJR

For the tests, researchers fed direct excerpts from actual news articles to the AI models, then asked each model to identify the article’s headline, original publisher, publication date, and URL. They ran 1,600 queries across the eight different generative search tools.

The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided confabulations—plausible-sounding incorrect or speculative answers. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool.

Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates.

Issues with citations and publisher control

The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.

AI search engines cite incorrect sources at an alarming 60% rate, study says Read More »

Why I’m disappointed with the TVs at CES 2025

AI, CES, CES 2025, generative ai, Google, lg, Microsoft Copilot, oled, samsung, Software, Tech, TVs / Kris Guyer / January 9, 2025

Won’t someone please think of the viewer?

Op-ed: TVs miss opportunity for real improvement by prioritizing corporate needs.

The TV industry is hitting users over the head with AI and other questionable gimmicks Credit: Getty

If you asked someone what they wanted from TVs released in 2025, I doubt they’d say “more software and AI.” Yet, if you look at what TV companies have planned for this year, which is being primarily promoted at the CES technology trade show in Las Vegas this week, software and AI are where much of the focus is.

The trend reveals the implications of TV brands increasingly viewing themselves as software rather than hardware companies, with their products being customer data rather than TV sets. This points to an alarming future for smart TVs, where even premium models sought after for top-end image quality and hardware capabilities are stuffed with unwanted gimmicks.

LG’s remote regression

LG has long made some of the best—and most expensive—TVs available. Its OLED lineup, in particular, has appealed to people who use their TVs to watch Blu-rays, enjoy HDR, and the like. However, some features that LG is introducing to high-end TVs this year seem to better serve LG’s business interests than those users’ needs.

Take the new remote. Formerly known as the Magic Remote, LG is calling the 2025 edition the AI Remote. That is already likely to dissuade people who are skeptical about AI marketing in products (research suggests there are many such people). But the more immediately frustrating part is that the new remote doesn’t have a dedicated button for switching input modes, as previous remotes from LG and countless other remotes do.

LG AI remote — LG’s AI Remote. Credit: Tom’s Guide/YouTube

To use the AI Remote to change the TV’s input—a common task for people using their sets to play video games, watch Blu-rays or DVDs, connect their PC, et cetera—you have to long-press the Home Hub button. Single-pressing that button brings up a dashboard of webOS (the operating system for LG TVs) apps. That functionality isn’t immediately apparent to someone picking up the remote for the first time and detracts from the remote’s convenience.

By overlooking other obviously helpful controls (play/pause, fast forward/rewind, and numbers) while including buttons dedicated to things like LG’s free ad-supported streaming TV (FAST) channels and Amazon Alexa, LG missed an opportunity to update its remote in a way centered on how people frequently use TVs. That said, it feels like user convenience didn’t drive this change. Instead, LG seems more focused on getting people to use webOS apps. LG can monetize app usage through, i.e., getting a cut of streaming subscription sign-ups, selling ads on webOS, and selling and leveraging user data.

Moving from hardware provider to software platform

LG, like many other TV OEMs, has been growing its ads and data business. Deals with data analytics firms like Nielsen give it more incentive to acquire customer data. Declining TV margins and rock-bottom prices from budget brands (like Vizio and Roku, which sometimes lose money on TV hardware sales and make up for the losses through ad sales and data collection) are also pushing LG’s software focus. In the case of the AI Remote, software prioritization comes at the cost of an oft-used hardware capability.

Further demonstrating its motives, in September 2023, LG announced intentions to “become a media and entertainment platform company” by offering “services” and a “collection of curated content in products, including LG OLED and LG QNED TVs.” At the time, the South Korean firm said it would invest 1 trillion KRW (about $737.7 million) into its webOS business through 2028.

Low TV margins, improved TV durability, market saturation, and broader economic challenges are all serious challenges for an electronics company like LG and have pushed LG to explore alternative ways to make money off of TVs. However, after paying four figures for TV sets, LG customers shouldn’t be further burdened to help LG accrue revenue.

Google TVs gear up for subscription-based features

There are numerous TV manufacturers, including Sony, TCL, and Philips, relying on Google software to power their TV sets. Numerous TVs announced at CES 2025 will come with what Google calls Gemini Enhanced Google Assistant. The idea that this is something that people using Google TVs have requested is somewhat contradicted by Google Assistant interactions with TVs thus far being “somewhat limited,” per a Lowpass report.

Nevertheless, these TVs are adding far-field microphones so that they can hear commands directed at the voice assistant. For the first time, the voice assistant will include Google’s generative AI chatbot, Gemini, this year—another feature that TV users don’t typically ask for. Despite the lack of demand and the privacy concerns associated with microphones that can pick up audio from far away even when the TV is off, companies are still loading 2025 TVs with far-field mics to support Gemini. Notably, these TVs will likely allow the mics to be disabled, like you can with other TVs using far-field mics. But I still ponder about features/hardware that could have been implemented instead.

Google is also working toward having people pay a subscription fee to use Gemini on their TVs, PCWorld reported.

“For us, our biggest goal is to create enough value that yes, you would be willing to pay for [Gemini],” Google TV VP and GM Shalini Govil-Pai told the publication.

The executive pointed to future capabilities for the Gemini-driven Google Assistant on TVs, including asking it to “suggest a movie like Jurassic Park but suitable for young children” or to show “Bollywood movies that are similar to Mission: Impossible.”

She also pointed to future features like showing weather, top news stories, and upcoming calendar events when someone is near the TV, showing AI-generated news briefings, and the ability to respond to questions like “explain the solar system to a third-grader” with text, audio, and YouTube videos.

But when people have desktops, laptops, tablets, and phones in their homes already, how helpful are these features truly? Govil-Pai admitted to PCWorld that “people are not used to” using their TVs this way “so it will take some time for them to adapt to it.” With this in mind, it seems odd for TV companies to implement new, more powerful microphones to support features that Google acknowledges aren’t in demand. I’m not saying that tech companies shouldn’t get ahead of the curve and offer groundbreaking features that users hadn’t considered might benefit them. But already planning to monetize those capabilities—with a subscription, no less—suggests a prioritization of corporate needs.

Samsung is hungry for AI

People who want to use their TV for cooking inspiration often turn to cooking shows or online cooking videos. However, Samsung wants people to use its TV software to identify dishes they want to try making.

During CES, Samsung announced Samsung Food for TVs. The feature leverages Samsung TVs’ AI processors to identify food displayed on the screen and recommend relevant recipes. Samsung introduced the capability in 2023 as an iOS and Android app after buying the app Whisk in 2019. As noted by TechCrunch, though, other AI tools for providing recipes based on food images are flawed.

So why bother with such a feature? You can get a taste of Samsung’s motivation from its CES-announced deal with Instacart that lets people order off Instacart from Samsung smart fridges that support the capability. Samsung Food on TVs can show users the progress of food orders placed via the Samsung Food mobile app on their TVs. Samsung Food can also create a shopping list for recipe ingredients based on what it knows (using cameras and AI) is in your (supporting) Samsung fridge. The feature also requires a Samsung account, which allows the company to gather more information on users.

Other software-centric features loaded into Samsung TVs this year include a dedicated AI button on the new TVs’ remotes, the ability to use gestures to control the TV but only if you’re wearing a Samsung Galaxy Watch, and AI Karaoke, which lets people sing karaoke using their TVs by stripping vocals from music playing and using their phone as a mic.

Like LG, Samsung has shown growing interest in ads and data collection. In May, for example, it expanded its automatic content recognition tech to track ad exposure on streaming services viewed on its TVs. It also has an ads analytics partnership with Experian.

Large language models on TVs

TVs are mainstream technology in most US homes. Generative AI chatbots, on the other hand, are emerging technology that many people have yet to try. Despite these disparities, LG and Samsung are incorporating Microsoft’s Copilot chatbot into 2025 TVs.

LG claims that Copilot will help its TVs “understand conversational context and uncover subtle user intentions,” adding: “Access to Microsoft Copilot further streamlines the process, allowing users to efficiently find and organize complex information using contextual cues. For an even smoother and more engaging experience, the AI chatbot proactively identifies potential user challenges and offers timely, effective solutions.”

Similarly, Samsung, which is also adding Copilot to some of its smart monitors, said in its announcement that Copilot will help with “personalized content recommendations.” Samsung has also said that Copilot will help its TVs understand strings of commands, like increasing the volume and changing the channel, CNET noted. Samsung said it intends to work with additional AI partners, namely Google, but it’s unclear why it needs multiple AI partners, especially when it hasn’t yet seen how people use large language models on their TVs.

TV-as-a-platform

To be clear, this isn’t a condemnation against new, unexpected TV features. This also isn’t a censure against new TV apps or the usage of AI in TVs.

AI marketing hype is real and misleading regarding the demand, benefits, and possibilities of AI in consumer gadgets. However, there are some cases when innovative software, including AI, can improve things that TV users not only care about but actually want or need. For example, some TVs use AI for things like trying to optimize sound, color, and/or brightness, including based on current environmental conditions or upscaling. This week, Samsung announced AI Live Translate for TVs. The feature is supposed to be able to translate foreign language closed captions in real time, providing a way for people to watch more international content. It’s a feature I didn’t ask for but can see being useful and changing how I use my TV.

But a lot of this week’s TV announcements underscore an alarming TV-as-a-platform trend where TV sets are sold as a way to infiltrate people’s homes so that apps, AI, and ads can be pushed onto viewers. Even high-end TVs are moving in this direction and amplifying features with questionable usefulness, effectiveness, and privacy considerations. Again, I can’t help but wonder what better innovations could have come out this year if more R&D was directed toward hardware and other improvements that are more immediately rewarding for users than karaoke with AI.

The TV industry is facing economic challenges, and, understandably, TV brands are seeking creative solutions for making money. But for consumers, that means paying for features that you’re likely to ignore. Ultimately, many people just want a TV with amazing image and sound quality. Finding that without having to sift through a bunch of fluff is getting harder.

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

Why I’m disappointed with the TVs at CES 2025 Read More »

Microsoft’s new “Copilot Vision” AI experiment can see what you browse

AI, AI assistant, AI assistants, Biz & IT, chatgpt, chatgtp, copilot, GPT-4, machine learning, microsoft, Microsoft Copilot, o1-preview, openai, OpenAI o1 / Rejus Almole / October 3, 2024

On Monday, Microsoft unveiled updates to its consumer AI assistant Copilot, introducing two new experimental features for a limited group of $20/month Copilot Pro subscribers: Copilot Labs and Copilot Vision. Labs integrates OpenAI’s latest o1 “reasoning” model, and Vision allows Copilot to see what you’re browsing in Edge.

Microsoft says Copilot Labs will serve as a testing ground for Microsoft’s latest AI tools before they see wider release. The company describes it as offering “a glimpse into ‘work-in-progress’ projects.” The first feature available in Labs is called “Think Deeper,” and it uses step-by-step processing to solve more complex problems than the regular Copilot. Think Deeper is Microsoft’s version of OpenAI’s new o1-preview and o1-mini AI models, and it has so far rolled out to some Copilot Pro users in Australia, Canada, New Zealand, the UK, and the US.

Copilot Vision is an entirely different beast. The new feature aims to give the AI assistant a visual window into what you’re doing within the Microsoft Edge browser. When enabled, Copilot can “understand the page you’re viewing and answer questions about its content,” according to Microsoft.

Microsoft’s Copilot Vision promo video.

The company positions Copilot Vision as a way to provide more natural interactions and task assistance beyond text-based prompts, but it will likely raise privacy concerns. As a result, Microsoft says that Copilot Vision is entirely opt-in and that no audio, images, text, or conversations from Vision will be stored or used for training. The company is also initially limiting Vision’s use to a pre-approved list of websites, blocking it on paywalled and sensitive content.

The rollout of these features appears gradual, with Microsoft noting that it wants to balance “pioneering features and a deep sense of responsibility.” The company said it will be “listening carefully” to user feedback as it expands access to the new capabilities. Microsoft has not provided a timeline for wider availability of either feature.

Mustafa Suleyman, chief executive of Microsoft AI, told Reuters that he sees Copilot as an “ever-present confidant” that could potentially learn from users’ various Microsoft-connected devices and documents, with permission. He also mentioned that Microsoft co-founder Bill Gates has shown particular interest in Copilot’s potential to read and parse emails.

But judging by the visceral reaction to Microsoft’s Recall feature, which keeps a record of everything you do on your PC so an AI model can recall it later, privacy-sensitive users may not appreciate having an AI assistant monitor their activities—especially if those features send user data to the cloud for processing.

Microsoft’s new “Copilot Vision” AI experiment can see what you browse Read More »

Outsourcing emotion: The horror of Google’s “Dear Sydney” AI ad

2024 olympics, AI, Google, Google Gemini, Microsoft Copilot / Mike M. / July 30, 2024

Here's an idea: Don't be a deadbeat and do it yourself! — Enlarge / Here’s an idea: Don’t be a deadbeat and do it yourself!

If you’ve watched any Olympics coverage this week, you’ve likely been confronted with an ad for Google’s Gemini AI called “Dear Sydney.” In it, a proud father seeks help writing a letter on behalf of his daughter, who is an aspiring runner and superfan of world-record-holding hurdler Sydney McLaughlin-Levrone.

“I’m pretty good with words, but this has to be just right,” the father intones before asking Gemini to “Help my daughter write a letter telling Sydney how inspiring she is…” Gemini dutifully responds with a draft letter in which the LLM tells the runner, on behalf of the daughter, that she wants to be “just like you.”

Every time I see this ad, it puts me on edge in a way I’ve had trouble putting into words (though Gemini itself has some helpful thoughts). As someone who writes words for a living, the idea of outsourcing a writing task to a machine brings up some vocational anxiety. And the idea of someone who’s “pretty good with words” doubting his abilities when the writing “has to be just right” sets off alarm bells regarding the superhuman framing of AI capabilities.

But I think the most offensive thing about the ad is what it implies about the kinds of human tasks Google sees AI replacing. Rather than using LLMs to automate tedious busywork or difficult research questions, “Dear Sydney” presents a world where Gemini can help us offload a heartwarming shared moment of connection with our children.

The “Dear Sydney” ad.

It’s a distressing answer to what’s still an incredibly common question in the AI space: What do you actually use these things for?

Yes, I can help

Marketers have a difficult task when selling the public on their shiny new AI tools. An effective ad for an LLM has to make it seem like a superhuman do-anything machine but also an approachable, friendly helper. An LLM has to be shown as good enough to reliably do things you can’t (or don’t want to) do yourself, but not so good that it will totally replace you.

Microsoft’s 2024 Super Bowl ad for Copilot is a good example of an attempt to thread this needle, featuring a handful of examples of people struggling to follow their dreams in the face of unseen doubters. “Can you help me?” those dreamers ask Copilot with various prompts. “Yes, I can help” is the message Microsoft delivers back, whether through storyboard images, an impromptu organic chemistry quiz, or “code for a 3D open world game.”

Microsoft’s Copilot marketing sells it as a helper for achieving your dreams.

The “Dear Sydney” ad tries to fit itself into this same box, technically. The prompt in the ad starts with “Help my daughter…” and the tagline at the end offers “A little help from Gemini.” If you look closely near the end, you’ll also see Gemini’s response starts with “Here’s a draft to get you started.” And to be clear, there’s nothing inherently wrong with using an LLM as a writing assistant in this way, especially if you have a disability or are writing in a non-native language.

But the subtle shift from Microsoft’s “Help me” to Google’s “Help my daughter” changes the tone of things. Inserting Gemini into a child’s heartfelt request for parental help makes it seem like the parent in question is offloading their responsibilities to a computer in the coldest, most sterile way possible. More than that, it comes across as an attempt to avoid an opportunity to bond with a child over a shared interest in a creative way.

It’s one thing to use AI to help you with the most tedious parts of your job, as people do in recent ads for Salesforce’s Einstein AI. It’s another to tell your daughter to go ask the computer for help pouring their heart out to their idol.

Outsourcing emotion: The horror of Google’s “Dear Sydney” AI ad Read More »

Microsoft Copilot will watch you play Minecraft, tell you what you’re doing wrong

gaming, Microsoft Copilot, minecraft / Kris Guyer / May 21, 2024

Parasocial gaming department —

Microsoft demo is like chatting with GameFAQs when you don’t have a friend to hang with.

Kyle Orland – May 21, 2024 4: 37 pm UTC

In the recent past, you'd have to rely on your kid sibling to deliver <em>Minecraft</em> commentary like “Oh no, it’s a zombie. Run!”” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/05/zombie-800×429.png”></img><figcaption>
<p><a data-height=

Longtime gamers (and/or Game Grumps fans) likely know that even single-player games can be a lot more fun with a friend hanging out nearby to offer advice, shoot the breeze, or just offer earnest reactions to whatever’s happening on screen. Now, Microsoft is promising that its GhatGPT-4o-powered Copilot system will soon offer an imitation of that pro-social experience even for Minecraft players who don’t have any human friends available to watch them play.

In a pair of social media posts Monday, Microsoft highlighted how “real-time conversations with your AI companion copilot” can enhance an otherwise solitary Minecraft experience. In the first demo, the disembodied copilot voice tells the player how to craft a sword, walking him through the process of gathering some wood or stone to go with the sticks sitting in his inventory. In another, the AI identifies a zombie in front of the player and gives the (seemingly obvious) advice to run away from the threat and “make sure it can’t reach you” by digging underground or building a tower of blocks.

Real time conversations with your AI companion Copilot, powered by OpenAI’s GPT-4o. pic.twitter.com/Ug7EWv2sah

— Microsoft Copilot (@MSFTCopilot) May 20, 2024

These kinds of in-game pointers aren’t the most revolutionary use of conversational AI—even a basic in-game tutorial/reference system or online walkthrough could deliver the same basic information, after all. Still, the demonstration stands out for just how that information is delivered to the player through a natural language conversation that doesn’t require pausing the gameplay even briefly.

The key moment highlighting this difference is near the end of one of the video demos, when the Copilot AI offers a bit of encouragement to the player: “Whew, that was a close one. Great job finding shelter!” That’s the point when the system transitions from a fancy voice-controlled strategy guide to an ersatz version of the kind of spectator that might be sitting on your couch or watching your Twitch stream. It creates the real possibility of developing a parasocial relationship with the Copilot guide that is not really a risk when consulting a text file on GameFAQs, for instance (though I think the Copilot reactions will have to get a bit less inane to really feel like a valued partner-in-gaming).

Just hanging out with my AI buddy

It’s unclear from the video clips whether Copilot is reading data directly from the Minecraft instance or simply reacting to the same information the player is seeing. But the social media posts came the same day as Microsoft’s announcement of “Recall,” a coming feature that “take[s] images of your active screen every few seconds” to provide a persistent “memory” of everything you do on the computer. That feature will be exclusive to Copilot+ PCs, which use an integrated Neural Processing Unit for on-device processing of many common generative AI tasks.

Microsoft’s Minecraft copilot demo brings to mind some of the similar conversations that OpenAI showed off during last week’s live demo of ChatGPT-4o. But the artificial game-adjacent conversation here sounds significantly more robotic and direct than the lifelike, emotional responses in ChatGPT’s presentation. Then again, ChatGPT has come under fire from actress Scarlett Johansson for using a voice that sounds too much like her performance in a 2013 movie about a conversational AI. Microsoft might be on safer ground sticking with a voice that is more obviously artificial here.

The casual cursing is what really makes an AI gaming buddy feel real.

Speaking of Her, we can’t help but think of one particular scene in that movie where Joaquin Phoenix’s Theodore asks for gaming advice both from the titular AI and a hilariously potty-mouthed NPC. Maybe Microsoft can add a casual cursing module to its Copilot gaming companion to really capture the feeling of hanging out with a dorm room buddy over a late-night gaming session.

Microsoft Copilot will watch you play Minecraft, tell you what you’re doing wrong Read More »

LLMs keep leaping with Llama 3, Meta’s newest open-weights AI model

AI, AI assistants, Biz & IT, chatgpt, chatgtp, copilot, Google Gemini, LLaMA, Llama 2, Llama 3, machine learning, Meta, Meta AI Assistant, microsoft, Microsoft Copilot, Open Source, open weights, openai / Mike M. / April 19, 2024

computer-powered word generator —

Zuckerberg says new AI model “was still learning” when Meta stopped training.

Benj Edwards – Apr 18, 2024 9: 04 pm UTC

A group of pink llamas on a pixelated background.

On Thursday, Meta unveiled early versions of its Llama 3 open-weights AI model that can be used to power text composition, code generation, or chatbots. It also announced that its Meta AI Assistant is now available on a website and is going to be integrated into its major social media apps, intensifying the company’s efforts to position its products against other AI assistants like OpenAI’s ChatGPT, Microsoft’s Copilot, and Google’s Gemini.

Like its predecessor, Llama 2, Llama 3 is notable for being a freely available, open-weights large language model (LLM) provided by a major AI company. Llama 3 technically does not quality as “open source” because that term has a specific meaning in software (as we have mentioned in other coverage), and the industry has not yet settled on terminology for AI model releases that ship either code or weights with restrictions (you can read Llama 3’s license here) or that ship without providing training data. We typically call these releases “open weights” instead.

At the moment, Llama 3 is available in two parameter sizes: 8 billion (8B) and 70 billion (70B), both of which are available as free downloads through Meta’s website with a sign-up. Llama 3 comes in two versions: pre-trained (basically the raw, next-token-prediction model) and instruction-tuned (fine-tuned to follow user instructions). Each has a 8,192 token context limit.

Enlarge / A screenshot of the Meta AI Assistant website on April 18, 2024.

Benj Edwards

Meta trained both models on two custom-built, 24,000-GPU clusters. In a podcast interview with Dwarkesh Patel, Meta CEO Mark Zuckerberg said that the company trained the 70B model with around 15 trillion tokens of data. Throughout the process, the model never reached “saturation” (that is, it never hit a wall in terms of capability increases). Eventually, Meta pulled the plug and moved on to training other models.

“I guess our prediction going in was that it was going to asymptote more, but even by the end it was still leaning. We probably could have fed it more tokens, and it would have gotten somewhat better,” Zuckerberg said on the podcast.

Meta also announced that it is currently training a 400B parameter version of Llama 3, which some experts like Nvidia’s Jim Fan think may perform in the same league as GPT-4 Turbo, Claude 3 Opus, and Gemini Ultra on benchmarks like MMLU, GPQA, HumanEval, and MATH.

Speaking of benchmarks, we have devoted many words in the past to explaining how frustratingly imprecise benchmarks can be when applied to large language models due to issues like training contamination (that is, including benchmark test questions in the training dataset), cherry-picking on the part of vendors, and an inability to capture AI’s general usefulness in an interactive session with chat-tuned models.

But, as expected, Meta provided some benchmarks for Llama 3 that list results from MMLU (undergraduate level knowledge), GSM-8K (grade-school math), HumanEval (coding), GPQA (graduate-level questions), and MATH (math word problems). These show the 8B model performing well compared to open-weights models like Google’s Gemma 7B and Mistral 7B Instruct, and the 70B model also held its own against Gemini Pro 1.5 and Claude 3 Sonnet.

Enlarge / A chart of instruction-tuned Llama 3 8B and 70B benchmarks provided by Meta.

Meta says that the Llama 3 model has been enhanced with capabilities to understand coding (like Llama 2) and, for the first time, has been trained with both images and text—though it currently outputs only text. According to Reuters, Meta Chief Product Officer Chris Cox noted in an interview that more complex processing abilities (like executing multi-step plans) are expected in future updates to Llama 3, which will also support multimodal outputs—that is, both text and images.

Meta plans to host the Llama 3 models on a range of cloud platforms, making them accessible through AWS, Databricks, Google Cloud, and other major providers.

Also on Thursday, Meta announced that Llama 3 will become the new basis of the Meta AI virtual assistant, which the company first announced in September. The assistant will appear prominently in search features for Facebook, Instagram, WhatsApp, Messenger, and the aforementioned dedicated website that features a design similar to ChatGPT, including the ability to generate images in the same interface. The company also announced a partnership with Google to integrate real-time search results into the Meta AI assistant, adding to an existing partnership with Microsoft’s Bing.

LLMs keep leaping with Llama 3, Meta’s newest open-weights AI model Read More »

Report: Sam Altman seeking trillions for AI chip fabrication from UAE, others

AI, AI chips, Biz & IT, Google Gemini, large language models, machine learning, microsoft, Microsoft Copilot, openai, sam altman, semiconductor manufacturing, text synthesis, UAE / Mike M. / February 10, 2024

chips ahoy —

WSJ: Audacious $5-$7 trillion investment would aim to expand global AI chip supply.

Benj Edwards – Feb 9, 2024 6: 21 pm UTC

WASHINGTON, DC - JANUARY 11: OpenAI Chief Executive Officer Sam Altman walks on the House side of the U.S. Capitol on January 11, 2024 in Washington, DC. Meanwhile, House Freedom Caucus members who left a meeting in the Speakers office say that they were talking to the Speaker about abandoning the spending agreement that Johnson announced earlier in the week. (Photo by Kent Nishimura/Getty Images) — Enlarge / OpenAI Chief Executive Officer Sam Altman walks on the House side of the US Capitol on January 11, 2024, in Washington, DC. (Photo by Kent Nishimura/Getty Images)

Getty Images

On Thursday, The Wall Street Journal reported that OpenAI CEO Sam Altman is in talks with investors to raise as much as $5 trillion to $7 trillion for AI chip manufacturing, according to people familiar with the matter. The funding seeks to address the scarcity of graphics processing units (GPUs) crucial for training and running large language models like those that power ChatGPT, Microsoft Copilot, and Google Gemini.

The high dollar amount reflects the huge amount of capital necessary to spin up new semiconductor manufacturing capability. “As part of the talks, Altman is pitching a partnership between OpenAI, various investors, chip makers and power providers, which together would put up money to build chip foundries that would then be run by existing chip makers,” writes the Wall Street Journal in its report. “OpenAI would agree to be a significant customer of the new factories.”

To hit these ambitious targets—which are larger than the entire semiconductor industry’s current $527 billion global sales combined—Altman has reportedly met with a range of potential investors worldwide, including sovereign wealth funds and government entities, notably the United Arab Emirates, SoftBank CEO Masayoshi Son, and representatives from Taiwan Semiconductor Manufacturing Co. (TSMC).

TSMC is the world’s largest dedicated independent semiconductor foundry. It’s a critical linchpin that companies such as Nvidia, Apple, Intel, and AMD rely on to fabricate SoCs, CPUs, and GPUs for various applications.

Altman reportedly seeks to expand the global capacity for semiconductor manufacturing significantly, funding the infrastructure necessary to support the growing demand for GPUs and other AI-specific chips. GPUs are excellent at parallel computation, which makes them ideal for running AI models that heavily rely on matrix multiplication to work. However, the technology sector currently faces a significant shortage of these important components, constraining the potential for AI advancements and applications.

In particular, the UAE’s involvement, led by Sheikh Tahnoun bin Zayed al Nahyan, a key security official and chair of numerous Abu Dhabi sovereign wealth vehicles, reflects global interest in AI’s potential and the strategic importance of semiconductor manufacturing. However, the prospect of substantial UAE investment in a key tech industry raises potential geopolitical concerns, particularly regarding the US government’s strategic priorities in semiconductor production and AI development.

The US has been cautious about allowing foreign control over the supply of microchips, given their importance to the digital economy and national security. Reflecting this, the Biden administration has undertaken efforts to bolster domestic chip manufacturing through subsidies and regulatory scrutiny of foreign investments in important technologies.

To put the $5 trillion to $7 trillion estimate in perspective, the White House just today announced a $5 billion investment in R&D to advance US-made semiconductor technologies. TSMC has already sunk $40 billion—one of the largest foreign investments in US history—into a US chip plant in Arizona. As of now, it’s unclear whether Altman has secured any commitments toward his fundraising goal.

Updated on February 9, 2024 at 8: 45 PM Eastern with a quote from the WSJ that clarifies the proposed relationship between OpenAI and partners in the talks.

Report: Sam Altman seeking trillions for AI chip fabrication from UAE, others Read More »