microsoft

ai-search-engines-cite-incorrect-sources-at-an-alarming-60%-rate,-study-says

AI search engines cite incorrect sources at an alarming 60% rate, study says

A new study from Columbia Journalism Review’s Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The research tested eight AI-driven search tools equipped with live search functionality and discovered that the AI models incorrectly answered more than 60 percent of queries about news sources.

Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now uses AI models as alternatives to traditional search engines. This raises serious concerns about reliability, given the substantial error rate uncovered in the study.

Error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent.

A graph from CJR shows

A graph from CJR shows “confidently wrong” search results. Credit: CJR

For the tests, researchers fed direct excerpts from actual news articles to the AI models, then asked each model to identify the article’s headline, original publisher, publication date, and URL. They ran 1,600 queries across the eight different generative search tools.

The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided confabulations—plausible-sounding incorrect or speculative answers. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool.

Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates.

Issues with citations and publisher control

The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.

AI search engines cite incorrect sources at an alarming 60% rate, study says Read More »

six-ways-microsoft’s-portable-xbox-could-be-a-steam-deck-killer

Six ways Microsoft’s portable Xbox could be a Steam Deck killer

Bring old Xbox games to PC

The ultimate handheld system seller.

Credit: Microsoft / Bizarre Creations

The ultimate handheld system seller. Credit: Microsoft / Bizarre Creations

Microsoft has made a lot of hay over the way recent Xbox consoles can play games dating all the way back to the original Xbox. If Microsoft wants to set its first gaming handheld apart, it should make those old console games officially available on a Windows-based system for the first time.

The ability to download previous console games dating back to the Xbox 360 era (or beyond) would be an instant “system seller” feature for any portable Xbox. While this wouldn’t be a trivial technical lift on Microsoft’s part, the same emulation layer that powers Xbox console backward compatibility could surely be ported to Windows with a little bit of work. That process might be easier with a specific branded portable, too, since Microsoft would be working with full knowledge of what hardware was being used.

If Microsoft can give us a way to play Geometry Wars 2 on the go without having to deal with finicky third-party emulators, we’ll be eternally grateful.

Multiple hardware tiers

Xbox Series S (left), next to Xbox Series X (right).

One size does not fit all when it comes to consoles or to handhelds.

Credit: Sam Machkovech

One size does not fit all when it comes to consoles or to handhelds. Credit: Sam Machkovech

On the console side, Microsoft’s split simultaneous release of the Xbox Series S and X showed an understanding that not everyone wants to pay more money for the most powerful possible gaming hardware. Microsoft should extend this philosophy to gaming handhelds by releasing different tiers of portable Xbox hardware for price-conscious consumers.

Raw hardware power is the most obvious differentiator that could set a more expensive tier of Xbox portables apart from any cheaper options. But Microsoft could also offer portable options that reduce the overall bulk (a la the Nintendo Switch Lite) or offer relative improvements in screen size and quality (a la the Steam Deck OLED and Switch OLED).

“Made for Xbox”

It worked for Valve, it can work for Microsoft.

Credit: Valve

It worked for Valve, it can work for Microsoft. Credit: Valve

One of the best things about console gaming is that you can be confident any game you buy for a console will “just work” with your hardware. In the world of PC gaming handhelds, Valve has tried to replicate this with the “Deck Verified” program to highlight Steam games that are guaranteed to work in a portable setting.

Microsoft is well-positioned to work with game publishers to launch a similar program for its own Xbox-branded portable. There’s real value in offering gamers assurances that “Made for Xbox” PC games will “just work” on their Xbox-branded handheld.

This kind of verification system could also help simplify and clarify hardware requirements across different tiers of portable hardware power; any handheld marketed as “level 2” could play any games marketed as level 2 or below, for instance.

Six ways Microsoft’s portable Xbox could be a Steam Deck killer Read More »

nearly-1-million-windows-devices-targeted-in-advanced-“malvertising”-spree

Nearly 1 million Windows devices targeted in advanced “malvertising” spree

A broad overview of the four stages. Credit: Microsoft

The campaign targeted “nearly” 1 million devices belonging both to individuals and a wide range of organizations and industries. The indiscriminate approach indicates the campaign was opportunistic, meaning it attempted to ensnare anyone, rather than targeting certain individuals, organizations, or industries. GitHub was the platform primarily used to host the malicious payload stages, but Discord and Dropbox were also used.

The malware located resources on the infected computer and sent them to the attacker’s c2 server. The exfiltrated data included the following browser files, which can store login cookies, passwords, browsing histories, and other sensitive data.

  • AppDataRoamingMozillaFirefoxProfiles.default-releasecookies.sqlite
  • AppDataRoamingMozillaFirefoxProfiles.default-releaseformhistory.sqlite
  • AppDataRoamingMozillaFirefoxProfiles.default-releasekey4.db
  • AppDataRoamingMozillaFirefoxProfiles.default-releaselogins.json
  • AppDataLocalGoogleChromeUser DataDefaultWeb Data
  • AppDataLocalGoogleChromeUser DataDefaultLogin Data
  • AppDataLocalMicrosoftEdgeUser DataDefaultLogin Data

Files stored on Microsoft’s OneDrive cloud service were also targeted. The malware also checked for the presence of cryptocurrency wallets including Ledger Live, Trezor Suite, KeepKey, BCVault, OneKey, and BitBox, “indicating potential financial data theft,” Microsoft said.

Microsoft said it suspects the sites hosting the malicious ads were streaming platforms providing unauthorized content. Two of the domains are movies7[.]net and 0123movie[.]art.

Microsoft Defender now detects the files used in the attack, and it’s likely other malware defense apps do the same. Anyone who thinks they may have been targeted can check indicators of compromise at the end of the Microsoft post. The post includes steps users can take to prevent falling prey to similar malvertising campaigns.

Nearly 1 million Windows devices targeted in advanced “malvertising” spree Read More »

on-may-5,-microsoft’s-skype-will-shut-down-for-good

On May 5, Microsoft’s Skype will shut down for good

After more than 21 years, Skype will soon be no more. Last night, some users (including Ars readers) poked around in the latest Skype preview update and noticed as-yet-unsurfaced text that read “Starting in May, Skype will no longer be available. Continue your calls and chats in Teams.”

This morning, Microsoft has confirmed to Ars that it’s true. May 5, 2025, will mark the end of Skype’s long run.

Alongside the verification that the end is nigh, Microsoft shared a bunch of details about how it plans to migrate Skype users over. Starting right away, some Skype users (those in Teams and Skype Insider) will be able to log in to Teams using their Skype credentials. More people will gain that ability over the next few days.

Microsoft claims that users who do this will see their existing contacts and chats from Skype in Teams from the start. Alternatively, users who don’t want to do this can export their Skype data—specifically contacts, call history, and chats.

On May 5, Microsoft’s Skype will shut down for good Read More »

trump-should-block-biden’s-ai-“gift”-to-china,-microsoft-argues

Trump should block Biden’s AI “gift” to China, Microsoft argues

“Countries including Brazil, India, Israel, and the UAE are eminently capable of ramping up investments aimed at securing new ways to access increased computing capacity,” the Brookings Institute said. “Preventing companies in middle-tier countries from relying on the US to supply computing chips is a surefire way to push them into building non-US alliances that include stronger technology ties with China.”

The rule could also complicate the global AI landscape in ways the US may not anticipate, the Center for Strategic and International Studies (CSIS), a bipartisan, nonprofit policy research organization, forecast last week. It could “breed resentment, not cooperation” in tier-two countries that will likely “bristle at the fact that their AI ambitions depend on Washington’s goodwill and that they are being deliberately kept a generation behind the frontier,” CSIS wrote. And it could drive more open source AI like DeepSeek to be key to development in tier-two nations, perhaps further endangering US global leadership in AI, CSIS suggested.

“Ironically, the AI Diffusion Framework, meant to lock in American advantage, may instead midwife the very outcome it sought to prevent—an alternate AI stack and increased open-source development where China, as its most prolific contributor, emerges as the de facto leader,” CSIS reported.

China wooing countries targeted by rule, Microsoft says

But according to Smith, some parts of the rule should be preserved, like datacenter restrictions, including “qualitative provisions” that “ensure that AI technology components are deployed in certified, secure, and trusted datacenters.” That part of the rule, Smith suggested, “helps reduce the risk of chip diversion to China.”

And other parts of the rule can be strengthened, Smith wrote, such as ensuring the Commerce Department has resources to enforce it and “expedite approvals” for any countries in the middle tier who may appeal to either move into the top tier or limit tier-two restrictions.

Trump should block Biden’s AI “gift” to China, Microsoft argues Read More »

microsoft’s-new-ai-agent-can-control-software-and-robots

Microsoft’s new AI agent can control software and robots

The researchers' explanations about how

The researchers’ explanations about how “Set-of-Mark” and “Trace-of-Mark” work. Credit: Microsoft Research

The Magma model introduces two technical components: Set-of-Mark, which identifies objects that can be manipulated in an environment by assigning numeric labels to interactive elements, such as clickable buttons in a UI or graspable objects in a robotic workspace, and Trace-of-Mark, which learns movement patterns from video data. Microsoft says those features allow the model to complete tasks like navigating user interfaces or directing robotic arms to grasp objects.

Microsoft Magma researcher Jianwei Yang wrote in a Hacker News comment that the name “Magma” stands for “M(ultimodal) Ag(entic) M(odel) at Microsoft (Rese)A(rch),” after some people noted that “Magma” already belongs to an existing matrix algebra library, which could create some confusion in technical discussions.

Reported improvements over previous models

In its Magma write-up, Microsoft claims Magma-8B performs competitively across benchmarks, showing strong results in UI navigation and robot manipulation tasks.

For example, it scored 80.0 on the VQAv2 visual question-answering benchmark—higher than GPT-4V’s 77.2 but lower than LLaVA-Next’s 81.8. Its POPE score of 87.4 leads all models in the comparison. In robot manipulation, Magma reportedly outperforms OpenVLA, an open source vision-language-action model, in multiple robot manipulation tasks.

Magma's agentic benchmarks, as reported by the researchers.

Magma’s agentic benchmarks, as reported by the researchers. Credit: Microsoft Research

As always, we take AI benchmarks with a grain of salt since many have not been scientifically validated as being able to measure useful properties of AI models. External verification of Microsoft’s benchmark results will become possible once other researchers can access the public code release.

Like all AI models, Magma is not perfect. It still faces technical limitations in complex step-by-step decision-making that requires multiple steps over time, according to Microsoft’s documentation. The company says it continues to work on improving these capabilities through ongoing research.

Yang says Microsoft will release Magma’s training and inference code on GitHub next week, allowing external researchers to build on the work. If Magma delivers on its promise, it could push Microsoft’s AI assistants beyond limited text interactions, enabling them to operate software autonomously and execute real-world tasks through robotics.

Magma is also a sign of how quickly the culture around AI can change. Just a few years ago, this kind of agentic talk scared many people who feared it might lead to AI taking over the world. While some people still fear that outcome, in 2025, AI agents are a common topic of mainstream AI research that regularly takes place without triggering calls to pause all of AI development.

Microsoft’s new AI agent can control software and robots Read More »

hugging-face-clones-openai’s-deep-research-in-24-hours

Hugging Face clones OpenAI’s Deep Research in 24 hours

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Hugging Face clones OpenAI’s Deep Research in 24 hours Read More »

microsoft-365’s-vpn-feature-will-be-shut-off-at-the-end-of-the-month

Microsoft 365’s VPN feature will be shut off at the end of the month

Last month, Microsoft announced that it was increasing the prices for consumer Microsoft 365 plans for the first time since introducing them as Office 365 plans more than a decade ago. Microsoft is using new Copilot-branded generative AI features to justify the price increases, which amount to an extra $3 per month or $30 per year for both individual and family plans.

But Microsoft giveth (and chargeth more) and Microsoft taketh away; according to a support page, the company is also removing the “privacy protection” VPN feature from Microsoft 365’s Microsoft Defender app for Windows, macOS, iOS, and Android. Other Defender features, including identity theft protection and anti-malware protection, will continue to be available. Privacy protection will stop functioning on February 28.

Microsoft didn’t say exactly why it was removing the feature, but the company implied that not enough people were using the service.

“We routinely evaluate the usage and effectiveness of our features. As such, we are removing the privacy protection feature and will invest in new areas that will better align to customer needs,” the support note reads.

Cutting features at the same time that you raise prices for the first time ever is not, as they say, a Great Look. But the Defender VPN feature was already a bit limited compared to other dedicated VPN services. It came with a 50GB per user, per month data cap, and it automatically excluded “content heavy traffic from reputable sites” like YouTube, Netflix, Disney+, Amazon Prime, Facebook, Instagram, and Whatsapp.

Microsoft 365’s VPN feature will be shut off at the end of the month Read More »

openai-teases-“new-era”-of-ai-in-us,-deepens-ties-with-government

OpenAI teases “new era” of AI in US, deepens ties with government

On Thursday, OpenAI announced that it is deepening its ties with the US government through a partnership with the National Laboratories and expects to use AI to “supercharge” research across a wide range of fields to better serve the public.

“This is the beginning of a new era, where AI will advance science, strengthen national security, and support US government initiatives,” OpenAI said.

The deal ensures that “approximately 15,000 scientists working across a wide range of disciplines to advance our understanding of nature and the universe” will have access to OpenAI’s latest reasoning models, the announcement said.

For researchers from Los Alamos, Lawrence Livermore, and Sandia National Labs, access to “o1 or another o-series model” will be available on Venado—an Nvidia supercomputer at Los Alamos that will become a “shared resource.” Microsoft will help deploy the model, OpenAI noted.

OpenAI suggested this access could propel major “breakthroughs in materials science, renewable energy, astrophysics,” and other areas that Venado was “specifically designed” to advance.

Key areas of focus for Venado’s deployment of OpenAI’s model include accelerating US global tech leadership, finding ways to treat and prevent disease, strengthening cybersecurity, protecting the US power grid, detecting natural and man-made threats “before they emerge,” and ” deepening our understanding of the forces that govern the universe,” OpenAI said.

Perhaps among OpenAI’s flashiest promises for the partnership, though, is helping the US achieve a “a new era of US energy leadership by unlocking the full potential of natural resources and revolutionizing the nation’s energy infrastructure.” That is urgently needed, as officials have warned that America’s aging energy infrastructure is becoming increasingly unstable, threatening the country’s health and welfare, and without efforts to stabilize it, the US economy could tank.

But possibly the most “highly consequential” government use case for OpenAI’s models will be supercharging research safeguarding national security, OpenAI indicated.

OpenAI teases “new era” of AI in US, deepens ties with government Read More »

microsoft-now-hosts-ai-model-accused-of-copying-openai-data

Microsoft now hosts AI model accused of copying OpenAI data

Fresh on the heels of a controversy in which ChatGPT-maker OpenAI accused the Chinese company behind DeepSeek R1 of using its AI model outputs against its terms of service, OpenAI’s largest investor, Microsoft, announced on Wednesday that it will now host DeepSeek R1 on its Azure cloud service.

DeepSeek R1 has been the talk of the AI world for the past week because it is a freely available simulated reasoning model that reportedly matches OpenAI’s o1 in performance—while allegedly being trained for a fraction of the cost.

Azure allows software developers to rent computing muscle from machines hosted in Microsoft-owned data centers, as well as rent access to software that runs on them.

“R1 offers a powerful, cost-efficient model that allows more users to harness state-of-the-art AI capabilities with minimal infrastructure investment,” wrote Microsoft Corporate Vice President Asha Sharma in a news release.

DeepSeek R1 runs at a fraction of the cost of o1, at least through each company’s own services. Comparative prices for R1 and o1 were not immediately available on Azure, but DeepSeek lists R1’s API cost as $2.19 per million output tokens, while OpenAI’s o1 costs $60 per million output tokens. That’s a massive discount for a model that performs similarly to o1-pro in various tasks.

Promoting a controversial AI model

On its face, the decision to host R1 on Microsoft servers is not unusual: The company offers access to over 1,800 models on its Azure AI Foundry service with the hopes of allowing software developers to experiment with various AI models and integrate them into their products. In some ways, whatever model they choose, Microsoft still wins because it’s being hosted on the company’s cloud service.

Microsoft now hosts AI model accused of copying OpenAI data Read More »

trump-announces-$500b-“stargate”-ai-infrastructure-project-with-agi-aims

Trump announces $500B “Stargate” AI infrastructure project with AGI aims

Video of the Stargate announcement conference at the White House.

Despite optimism from the companies involved, as CNN reports, past presidential investment announcements have yielded mixed results. In 2017, Trump and Foxconn unveiled plans for a $10 billion Wisconsin electronics factory promising 13,000 jobs. The project later scaled back to a $672 million investment with fewer than 1,500 positions. The facility now operates as a Microsoft AI data center.

The Stargate announcement wasn’t Trump’s only major AI move announced this week. It follows the newly inaugurated US president’s reversal of a 2023 Biden executive order on AI risk monitoring and regulation.

Altman speaks, Musk responds

On Tuesday, OpenAI CEO Sam Altman appeared at a White House press conference alongside Present Trump, Oracle CEO Larry Ellison, and SoftBank CEO Masayoshi Son to announce Stargate.

Altman said he thinks Stargate represents “the most important project of this era,” allowing AGI to emerge in the United States. He believes that future AI technology could create hundreds of thousands of jobs. “We wouldn’t be able to do this without you, Mr. President,” Altman added.

Responding to off-camera questions from Trump about AI’s potential to spur scientific development, Altman said he believes AI will accelerate the discoveries for cures of diseases like cancer and heart disease.

Screenshots of Elon Musk challenging the Stargate announcement on X.

Screenshots of Elon Musk challenging the Stargate announcement on X.

Meanwhile on X, Trump ally and frequent Altman foe Elon Musk immediately attacked the Stargate plan, writing, “They don’t actually have the money,” and following up with a claim that we cannot yet substantiate, saying, “SoftBank has well under $10B secured. I have that on good authority.”

Musk’s criticism has complex implications given his very close ties to Trump, his history of litigating against OpenAI (which he co-founded and later left), and his own goals with his xAI company.

Trump announces $500B “Stargate” AI infrastructure project with AGI aims Read More »

rip-ea’s-origin-launcher:-we-knew-ye-all-too-well,-unfortunately

RIP EA’s Origin launcher: We knew ye all too well, unfortunately

After 14 years, EA will retire its controversial Origin game distribution app for Windows, the company announced. Origin will stop working on April 17, 2025. Folks still using it will be directed to install the newer EA app, which launched in 2022.

The launch of Origin in 2011 was a flashpoint of controversy among gamers, as EA—already not a beloved company by this point—began pulling titles like Crysis 2 from the popular Steam platform to drive players to its own launcher.

Frankly, it all made sense from EA’s point of view. For a publisher that size, Valve had relatively little to offer in terms of services or tools, yet it was taking a big chunk of games’ revenue. Why wouldn’t EA want to get that money back?

The transition was a rough one, though, because it didn’t make as much sense from the consumer’s point of view. Players distrusted EA and had a lot of goodwill for Valve and Steam. Origin lacked features players liked on Steam, and old habits and social connections die hard. Plus, EA’s use of Origin—a long-dead brand name tied to classic RPGs and other games of the ’80s and ’90s—for something like this felt to some like a slap in the face.

RIP EA’s Origin launcher: We knew ye all too well, unfortunately Read More »