AI

apple-and-openai-currently-have-the-most-misunderstood-partnership-in-tech

Apple and OpenAI currently have the most misunderstood partnership in tech

A man talks into a smartphone.

Enlarge / He isn’t using an iPhone, but some people talk to Siri like this.

On Monday, Apple premiered “Apple Intelligence” during a wide-ranging presentation at its annual Worldwide Developers Conference in Cupertino, California. However, the heart of its new tech, an array of Apple-developed AI models, was overshadowed by the announcement of ChatGPT integration into its device operating systems.

Since rumors of the partnership first emerged, we’ve seen confusion on social media about why Apple didn’t develop a cutting-edge GPT-4-like chatbot internally. Despite Apple’s year-long development of its own large language models (LLMs), many perceived the integration of ChatGPT (and opening the door for others, like Google Gemini) as a sign of Apple’s lack of innovation.

“This is really strange. Surely Apple could train a very good competing LLM if they wanted? They’ve had a year,” wrote AI developer Benjamin De Kraker on X. Elon Musk has also been grumbling about the OpenAI deal—and spreading misinformation about it—saying things like, “It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!”

While Apple has developed many technologies internally, it has also never been shy about integrating outside tech when necessary in various ways, from acquisitions to built-in clients—in fact, Siri was initially developed by an outside company. But by making a deal with a company like OpenAI, which has been the source of a string of tech controversies recently, it’s understandable that some people don’t understand why Apple made the call—and what it might entail for the privacy of their on-device data.

“Our customers want something with world knowledge some of the time”

While Apple Intelligence largely utilizes its own Apple-developed LLMs, Apple also realized that there may be times when some users want to use what the company considers the current “best” existing LLM—OpenAI’s GPT-4 family. In an interview with The Washington Post, Apple CEO Tim Cook explained the decision to integrate OpenAI first:

“I think they’re a pioneer in the area, and today they have the best model,” he said. “And I think our customers want something with world knowledge some of the time. So we considered everything and everyone. And obviously we’re not stuck on one person forever or something. We’re integrating with other people as well. But they’re first, and I think today it’s because they’re best.”

The proposed benefit of Apple integrating ChatGPT into various experiences within iOS, iPadOS, and macOS is that it allows AI users to access ChatGPT’s capabilities without the need to switch between different apps—either through the Siri interface or through Apple’s integrated “Writing Tools.” Users will also have the option to connect their paid ChatGPT account to access extra features.

As an answer to privacy concerns, Apple says that before any data is sent to ChatGPT, the OS asks for the user’s permission, and the entire ChatGPT experience is optional. According to Apple, requests are not stored by OpenAI, and users’ IP addresses are hidden. Apparently, communication with OpenAI servers happens through API calls similar to using the ChatGPT app on iOS, and there is reportedly no deeper OS integration that might expose user data to OpenAI without the user’s permission.

We can only take Apple’s word for it at the moment, of course, and solid details about Apple’s AI privacy efforts will emerge once security experts get their hands on the new features later this year.

Apple’s history of tech integration

So you’ve seen why Apple chose OpenAI. But why look to outside companies for tech? In some ways, Apple building an external LLM client into its operating systems isn’t too different from what it has previously done with streaming video (the YouTube app on the original iPhone), Internet search (Google search integration), and social media (integrated Twitter and Facebook sharing).

The press has positioned Apple’s recent AI moves as Apple “catching up” with competitors like Google and Microsoft in terms of chatbots and generative AI. But playing it slow and cool has long been part of Apple’s M.O.—not necessarily introducing the bleeding edge of technology but improving existing tech through refinement and giving it a better user interface.

Apple and OpenAI currently have the most misunderstood partnership in tech Read More »

ai-trained-on-photos-from-kids’-entire-childhood-without-their-consent

AI trained on photos from kids’ entire childhood without their consent

AI trained on photos from kids’ entire childhood without their consent

Photos of Brazilian kids—sometimes spanning their entire childhood—have been used without their consent to power AI tools, including popular image generators like Stable Diffusion, Human Rights Watch (HRW) warned on Monday.

This act poses urgent privacy risks to kids and seems to increase risks of non-consensual AI-generated images bearing their likenesses, HRW’s report said.

An HRW researcher, Hye Jung Han, helped expose the problem. She analyzed “less than 0.0001 percent” of LAION-5B, a dataset built from Common Crawl snapshots of the public web. The dataset does not contain the actual photos but includes image-text pairs derived from 5.85 billion images and captions posted online since 2008.

Among those images linked in the dataset, Han found 170 photos of children from at least 10 Brazilian states. These were mostly family photos uploaded to personal and parenting blogs most Internet surfers wouldn’t easily stumble upon, “as well as stills from YouTube videos with small view counts, seemingly uploaded to be shared with family and friends,” Wired reported.

LAION, the German nonprofit that created the dataset, has worked with HRW to remove the links to the children’s images in the dataset.

That may not completely resolve the problem, though. HRW’s report warned that the removed links are “likely to be a significant undercount of the total amount of children’s personal data that exists in LAION-5B.” Han told Wired that she fears that the dataset may still be referencing personal photos of kids “from all over the world.”

Removing the links also does not remove the images from the public web, where they can still be referenced and used in other AI datasets, particularly those relying on Common Crawl, LAION’s spokesperson, Nate Tyler, told Ars.

“This is a larger and very concerning issue, and as a nonprofit, volunteer organization, we will do our part to help,” Tyler told Ars.

Han told Ars that “Common Crawl should stop scraping children’s personal data, given the privacy risks involved and the potential for new forms of misuse.”

According to HRW’s analysis, many of the Brazilian children’s identities were “easily traceable,” due to children’s names and locations being included in image captions that were processed when building the LAION dataset.

And at a time when middle and high school-aged students are at greater risk of being targeted by bullies or bad actors turning “innocuous photos” into explicit imagery, it’s possible that AI tools may be better equipped to generate AI clones of kids whose images are referenced in AI datasets, HRW suggested.

“The photos reviewed span the entirety of childhood,” HRW’s report said. “They capture intimate moments of babies being born into the gloved hands of doctors, young children blowing out candles on their birthday cake or dancing in their underwear at home, students giving a presentation at school, and teenagers posing for photos at their high school’s carnival.”

There is less risk that the Brazilian kids’ photos are currently powering AI tools since “all publicly available versions of LAION-5B were taken down” in December, Tyler told Ars. That decision came out of an “abundance of caution” after a Stanford University report “found links in the dataset pointing to illegal content on the public web,” Tyler said, including 3,226 suspected instances of child sexual abuse material.

Han told Ars that “the version of the dataset that we examined pre-dates LAION’s temporary removal of its dataset in December 2023.” The dataset will not be available again until LAION determines that all flagged illegal content has been removed.

“LAION is currently working with the Internet Watch Foundation, the Canadian Centre for Child Protection, Stanford, and Human Rights Watch to remove all known references to illegal content from LAION-5B,” Tyler told Ars. “We are grateful for their support and hope to republish a revised LAION-5B soon.”

In Brazil, “at least 85 girls” have reported classmates harassing them by using AI tools to “create sexually explicit deepfakes of the girls based on photos taken from their social media profiles,” HRW reported. Once these explicit deepfakes are posted online, they can inflict “lasting harm,” HRW warned, potentially remaining online for their entire lives.

“Children should not have to live in fear that their photos might be stolen and weaponized against them,” Han said. “The government should urgently adopt policies to protect children’s data from AI-fueled misuse.”

Ars could not immediately reach Stable Diffusion maker Stability AI for comment.

AI trained on photos from kids’ entire childhood without their consent Read More »

apple’s-ai-promise:-“your-data-is-never-stored-or-made-accessible-by-apple”

Apple’s AI promise: “Your data is never stored or made accessible by Apple”

…and throw away the key —

And publicly reviewable server code means experts can “verify this privacy promise.”

Apple Senior VP of Software Engineering Craig Federighi announces

Enlarge / Apple Senior VP of Software Engineering Craig Federighi announces “Private Cloud Compute” at WWDC 2024.

Apple

With most large language models being run on remote, cloud-based server farms, some users have been reluctant to share personally identifiable and/or private data with AI companies. In its WWDC keynote today, Apple stressed that the new “Apple Intelligence” system it’s integrating into its products will use a new “Private Cloud Compute” to ensure any data processed on its cloud servers is protected in a transparent and verifiable way.

“You should not have to hand over all the details of your life to be warehoused and analyzed in someone’s AI cloud,” Apple Senior VP of Software Engineering Craig Federighi said.

Trust, but verify

Part of what Apple calls “a brand new standard for privacy and AI” is achieved through on-device processing. Federighi said “many” of Apple’s generative AI models can run entirely on a device powered by an A17+ or M-series chips, eliminating the risk of sending your personal data to a remote server.

When a bigger, cloud-based model is needed to fulfill a generative AI request, though, Federighi stressed that it will “run on servers we’ve created especially using Apple silicon,” which allows for the use of security tools built into the Swift programming language. The Apple Intelligence system “sends only the data that’s relevant to completing your task” to those servers, Federighi said, rather than giving blanket access to the entirety of the contextual information the device has access to.

And Apple says that minimized data is not going to be saved for future server access or used to further train Apple’s server-based models, either. “Your data is never stored or made accessible by Apple,” Federighi said. “It’s used exclusively to fill your request.”

But you don’t just have to trust Apple on this score, Federighi claimed. That’s because the server code used by Private Cloud Compute will be publicly accessible, meaning that “independent experts can inspect the code that runs on these servers to verify this privacy promise.” The entire system has been set up cryptographically so that Apple devices “will refuse to talk to a server unless its software has been publicly logged for inspection.”

While the keynote speech was light on details for the moment, the focus on privacy during the presentation shows that Apple is at least prioritizing security concerns in its messaging as it wades into the generative AI space for the first time. We’ll see what security experts have to say when these servers and their code are made publicly available in the near future.

Apple’s AI promise: “Your data is never stored or made accessible by Apple” Read More »

ios-18-adds-apple-intelligence,-customizations,-and-makes-android-sms-nicer

iOS 18 adds Apple Intelligence, customizations, and makes Android SMS nicer

Apple WWDC 2024 —

Mail gets categories, Messages gets more tapbacks, and apps can now be locked.

Hands manipulating the Conrol Center on an iPhone

Apple

The biggest feature in iOS 18, the one that affects the most people, was a single item in a comma-stuffed sentence by Apple software boss Craig Federighi: “Support for RCS.”

As we noted when Apple announced its support for “RCS Universal Profile,” a kind of minimum viable cross-device rich messaging, iPhone users getting RCS means SMS chains with Android users “will be slightly less awful.” SMS messages will soon have read receipts, higher-quality media sending, and typing indicators, along with better security. And RCS messages can go over Wi-Fi when you don’t have a cellular signal. Apple is certainly downplaying a major cross-platform compatibility upgrade, but it’s a notable quality-of-life boost.

  • Prioritized notifications through Apple Intelligence

  • Sending a friend an AI-generated image of them holding a birthday cake, which is not exactly the future we all envisioned for 2024, but here we are.

    Apple

  • Example of a query that a supposedly now context-aware Siri can tackle.

  • Asking Siri “When is my mom’s flight landing,” followed by “What is our lunch plan?” can pull in data from multiple apps for an answer.

Apple Intelligence, the new Siri, and the iPhone

iOS 18 is one of the major beneficiaries of Apple’s AI rollout, dubbed “Apple Intelligence.” Apple Intelligence promises to help iPhone users create and understand language and images, with the proper context from your phone’s apps: photos, calendar, email, messages, and more.

Some of the suggested AI offerings include:

  • Auto-prioritizing notifications
  • Generating an AI image of people when you wish them a happy birthday.
  • Using Maps, Calendar, and an email with a meeting update to figure out if a work meeting change will make Federighi miss his daughter’s recital.

Many of the models needed to respond to your requests can be run on the device, Apple claims. For queries that need to go to remote servers, Apple relies on “Private Cloud Compute.” Apple has built its own servers, running on Apple Silicon, to handle requests that need more computational power. Your phone only sends the data necessary, is never stored, and independent researchers can verify the software on Apple’s servers, the company claims.

Siri is getting AI-powered upgrades across all platforms, including iOS. Apple says that Siri now understands more context in your questions to it. It will have awareness of what’s on your screen, so you could say “Add this address to his contact card” while messaging. You could ask it to “take a light-trail effect photo” from the camera. And “personal context” was repeatedly highlighted, including requests to find things people sent you, add your license number to a form (from an old ID picture), or ask “When is my mom’s flight landing?”

The non-AI things coming in iOS 18

A whole bunch of little boosts to iOS 18 announced by Apple.

A whole bunch of little boosts to iOS 18 announced by Apple.

On the iPhone itself, iOS 18 icons will change their look when in dark mode, and you can customize the look of compatible icons. Control Center, the pull-down menu in the top-right corner, now has multiple swipe-accessible controls, accessed through a strange-until-you’re-used-to-it long continuous swipe from the top. Developers are also getting access to the Control Center, so they can add their own apps’ controls. The lock screen will also get more customization, letting you swap out the standard flashlight and camera buttons for other items you prefer.

Privacy got some attention, too. Apps can be locked, such that Face ID, Touch ID, or a passcode is necessary to open them. Apps can also be hidden and have their data prevented from showing up in notifications, searches, or other streams. New controls also limit the access you may grant to apps for contacts, network, and devices.

Messages will have “a huge year,” according to Apple. Tapbacks (instant reactions) can now include any emoji on the phone. Messages can be scheduled for later sending, text can be formatted, and there are “text effects” that do things like zoom in on the word “MAJOR” or make “Blown away” explode off the screen. And “Messages via satellite” is now available for phones that have satellite access, with end-to-end encryption.

Here's a Messages upgrade that is absolutely going to surprise everybody when they forget about it in four months and then it shows up in a weird message.

Here’s a Messages upgrade that is absolutely going to surprise everybody when they forget about it in four months and then it shows up in a weird message.

The Mail app gets on-device categorization with Gmail-like labels like “Primary,” “Transactions,” “Updates,” and “Promotions.” Mail can also show you all the emails you get from certain businesses, such as receipts and tickets.

The Maps app is getting trail routes for US National Parks. Wallet now lets you “Tap to Cash,” sending money between phones in close proximity. Journal can now log your state of mind, track your goals, track streaks, and log “other fun stats.”

Photo libraries are getting navigation upgrades, with screenshots, receipts, and other banal photos automatically filtered out from gallery scrolls. There’s some automatic categorization of trips, days, and events. And, keeping with the theme of iOS 18, you can customize and reorder the collections and features Photos shows you when you browse through it.

This is a developing story and this post will be updated with new information.

iOS 18 adds Apple Intelligence, customizations, and makes Android SMS nicer Read More »

report:-new-“apple-intelligence”-ai-features-will-be-opt-in-by-default

Report: New “Apple Intelligence” AI features will be opt-in by default

“apple intelligence,” i see what you did there —

Apple reportedly plans to announce its first big wave of AI features at WWDC.

Report: New “Apple Intelligence” AI features will be opt-in by default

Apple

Apple’s Worldwide Developers Conference kicks off on Monday, and per usual, the company is expected to detail most of the big new features in this year’s updates to iOS, iPadOS, macOS, and all of Apple’s other operating systems.

The general consensus is that Apple plans to use this year’s updates to integrate generative AI into its products for the first time. Bloomberg’s Mark Gurman has a few implementation details that show how Apple’s approach will differ somewhat from Microsoft’s or Google’s.

Gurman says that the “Apple Intelligence” features will include an OpenAI-powered chatbot, but it will otherwise focus on “features with broad appeal” rather than “whiz-bang technology like image and video generation.” These include summaries for webpages, meetings, and missed notifications; a revamped version of Siri that can control apps in a more granular way; Voice Memos transcription; image enhancement features in the Photos app; suggested replies to text messages; automated sorting of emails; and the ability to “create custom emoji characters on the fly that represent phrases or words as they’re being typed.”

Apple also reportedly hopes to differentiate its AI push by implementing it in a more careful, privacy-focused way. The new features will use the Neural Engine available in newer devices for on-device processing where possible (Gurman says that only Apple’s A17 Pro and the M-series chips will be capable of supporting all the local processing features, though all of Apple’s recent chips feature some flavor of Neural Engine). And where Apple does use the cloud for AI processing, the company will apparently promise that user information isn’t being “sold or read” and is not being used to “build user profiles.”

Apple’s new AI features will also be opt-in by default, where Microsoft and Google have generally enabled features like the Copilot chatbot or AI Overviews by default whether users asked for them or not.

Looking beyond AI, we can also expect the typical grab bag of small- to medium-sized features in all of Apple’s software updates. These reportedly include reworked Control Center and Settings apps, emoji responses and RCS messaging support in the Messages app, a standalone password manager app, Calculator for the iPad, and a handful of other things. Gurman doesn’t expect Apple to announce any hardware at the event, though a number of Macs are past due for a M3- or M4-powered refresh.

Apple’s WWDC keynote happens on June 10 at 1 pm Eastern and can be streamed from Apple’s developer website.

Report: New “Apple Intelligence” AI features will be opt-in by default Read More »

outcry-from-big-ai-firms-over-california-ai-“kill-switch”-bill

Outcry from big AI firms over California AI “kill switch” bill

A finger poised over an electrical switch.

Artificial intelligence heavyweights in California are protesting against a state bill that would force technology companies to adhere to a strict safety framework including creating a “kill switch” to turn off their powerful AI models, in a growing battle over regulatory control of the cutting-edge technology.

The California Legislature is considering proposals that would introduce new restrictions on tech companies operating in the state, including the three largest AI start-ups OpenAI, Anthropic, and Cohere as well as large language models run by Big Tech companies such as Meta.

The bill, passed by the state’s Senate last month and set for a vote from its general assembly in August, requires AI groups in California to guarantee to a newly created state body that they will not develop models with “a hazardous capability,” such as creating biological or nuclear weapons or aiding cyber security attacks.

Developers would be required to report on their safety testing and introduce a so-called kill switch to shut down their models, according to the proposed Safe and Secure Innovation for Frontier Artificial Intelligence Systems Act.

But the law has become the focus of a backlash from many in Silicon Valley because of claims it will force AI start-ups to leave the state and prevent platforms such as Meta from operating open source models.

“If someone wanted to come up with regulations to stifle innovation, one could hardly do better,” said Andrew Ng, a renowned computer scientist who led AI projects at Alphabet’s Google and China’s Baidu, and who sits on Amazon’s board. “It creates massive liabilities for science-fiction risks, and so stokes fear in anyone daring to innovate.”

Outcry from big AI firms over California AI “kill switch” bill Read More »

duckduckgo-offers-“anonymous”-access-to-ai-chatbots-through-new-service

DuckDuckGo offers “anonymous” access to AI chatbots through new service

anonymous confabulations —

DDG offers LLMs from OpenAI, Anthropic, Meta, and Mistral for factually-iffy conversations.

DuckDuckGo's AI Chat promotional image.

DuckDuckGo

On Thursday, DuckDuckGo unveiled a new “AI Chat” service that allows users to converse with four mid-range large language models (LLMs) from OpenAI, Anthropic, Meta, and Mistral in an interface similar to ChatGPT while attempting to preserve privacy and anonymity. While the AI models involved can output inaccurate information readily, the site allows users to test different mid-range LLMs without having to install anything or sign up for an account.

DuckDuckGo’s AI Chat currently features access to OpenAI’s GPT-3.5 Turbo, Anthropic’s Claude 3 Haiku, and two open source models, Meta’s Llama 3 and Mistral’s Mixtral 8x7B. The service is currently free to use within daily limits. Users can access AI Chat through the DuckDuckGo search engine, direct links to the site, or by using “!ai” or “!chat” shortcuts in the search field. AI Chat can also be disabled in the site’s settings for users with accounts.

According to DuckDuckGo, chats on the service are anonymized, with metadata and IP address removed to prevent tracing back to individuals. The company states that chats are not used for AI model training, citing its privacy policy and terms of use.

“We have agreements in place with all model providers to ensure that any saved chats are completely deleted by the providers within 30 days,” says DuckDuckGo, “and that none of the chats made on our platform can be used to train or improve the models.”

An example of DuckDuckGo AI Chat with GPT-3.5 answering a silly question in an inaccurate way.

Enlarge / An example of DuckDuckGo AI Chat with GPT-3.5 answering a silly question in an inaccurate way.

Benj Edwards

However, the privacy experience is not bulletproof because, in the case of GPT-3.5 and Claude Haiku, DuckDuckGo is required to send a user’s inputs to remote servers for processing over the Internet. Given certain inputs (i.e., “Hey, GPT, my name is Bob, and I live on Main Street, and I just murdered Bill”), a user could still potentially be identified if such an extreme need arose.

While the service appears to work well for us, there’s a question about its utility. For example, while GPT-3.5 initially wowed people when it launched with ChatGPT in 2022, it also confabulated a lot—and it still does. GPT-4 was the first major LLM to get confabulations under control to a point where the bot became more reasonably useful for some tasks (though this itself is a controversial point), but that more capable model isn’t present in DuckDuckGo’s AI Chat. Also missing are similar GPT-4-level models like Claude Opus or Google’s Gemini Ultra, likely because they are far more expensive to run. DuckDuckGo says it may roll out paid plans in the future, and those may include higher daily usage limits or access to “more advanced models.”)

It’s true that the other three models generally (and subjectively) pass GPT-3.5 in capability for coding with lower hallucinations, but they can still make things up, too. With DuckDuckGo AI Chat as it stands, the company is left with a chatbot novelty with a decent interface and the promise that your conversations with it will remain private. But what use are fully private AI conversations if they are full of errors?

Mixtral 8x7B on DuckDuckGo AI Chat when asked about the author. Everything in red boxes is sadly incorrect, but it provides an interesting fantasy scenario. It's a good example of an LLM plausibly filling gaps between concepts that are underrepresented in its training data, called confabulation. For the record, Llama 3 gives a more accurate answer.

Enlarge / Mixtral 8x7B on DuckDuckGo AI Chat when asked about the author. Everything in red boxes is sadly incorrect, but it provides an interesting fantasy scenario. It’s a good example of an LLM plausibly filling gaps between concepts that are underrepresented in its training data, called confabulation. For the record, Llama 3 gives a more accurate answer.

Benj Edwards

As DuckDuckGo itself states in its privacy policy, “By its very nature, AI Chat generates text with limited information. As such, Outputs that appear complete or accurate because of their detail or specificity may not be. For example, AI Chat cannot dynamically retrieve information and so Outputs may be outdated. You should not rely on any Output without verifying its contents using other sources, especially for professional advice (like medical, financial, or legal advice).”

So, have fun talking to bots, but tread carefully. They’ll easily “lie” to your face because they don’t understand what they are saying and are tuned to output statistically plausible information, not factual references.

DuckDuckGo offers “anonymous” access to AI chatbots through new service Read More »

can-a-technology-called-rag-keep-ai-models-from-making-stuff-up?

Can a technology called RAG keep AI models from making stuff up?

Can a technology called RAG keep AI models from making stuff up?

Aurich Lawson | Getty Images

We’ve been living through the generative AI boom for nearly a year and a half now, following the late 2022 release of OpenAI’s ChatGPT. But despite transformative effects on companies’ share prices, generative AI tools powered by large language models (LLMs) still have major drawbacks that have kept them from being as useful as many would like them to be. Retrieval augmented generation, or RAG, aims to fix some of those drawbacks.

Perhaps the most prominent drawback of LLMs is their tendency toward confabulation (also called “hallucination”), which is a statistical gap-filling phenomenon AI language models produce when they are tasked with reproducing knowledge that wasn’t present in the training data. They generate plausible-sounding text that can veer toward accuracy when the training data is solid but otherwise may just be completely made up.

Relying on confabulating AI models gets people and companies in trouble, as we’ve covered in the past. In 2023, we saw two instances of lawyers citing legal cases, confabulated by AI, that didn’t exist. We’ve covered claims against OpenAI in which ChatGPT confabulated and accused innocent people of doing terrible things. In February, we wrote about Air Canada’s customer service chatbot inventing a refund policy, and in March, a New York City chatbot was caught confabulating city regulations.

So if generative AI aims to be the technology that propels humanity into the future, someone needs to iron out the confabulation kinks along the way. That’s where RAG comes in. Its proponents hope the technique will help turn generative AI technology into reliable assistants that can supercharge productivity without requiring a human to double-check or second-guess the answers.

“RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process” to help LLMs stick to the facts, according to Noah Giansiracusa, associate professor of mathematics at Bentley University.

Let’s take a closer look at how it works and what its limitations are.

A framework for enhancing AI accuracy

Although RAG is now seen as a technique to help fix issues with generative AI, it actually predates ChatGPT. Researchers coined the term in a 2020 academic paper by researchers at Facebook AI Research (FAIR, now Meta AI Research), University College London, and New York University.

As we’ve mentioned, LLMs struggle with facts. Google’s entry into the generative AI race, Bard, made an embarrassing error on its first public demonstration back in February 2023 about the James Webb Space Telescope. The error wiped around $100 billion off the value of parent company Alphabet. LLMs produce the most statistically likely response based on their training data and don’t understand anything they output, meaning they can present false information that seems accurate if you don’t have expert knowledge on a subject.

LLMs also lack up-to-date knowledge and the ability to identify gaps in their knowledge. “When a human tries to answer a question, they can rely on their memory and come up with a response on the fly, or they could do something like Google it or peruse Wikipedia and then try to piece an answer together from what they find there—still filtering that info through their internal knowledge of the matter,” said Giansiracusa.

But LLMs aren’t humans, of course. Their training data can age quickly, particularly in more time-sensitive queries. In addition, the LLM often can’t distinguish specific sources of its knowledge, as all its training data is blended together into a kind of soup.

In theory, RAG should make keeping AI models up to date far cheaper and easier. “The beauty of RAG is that when new information becomes available, rather than having to retrain the model, all that’s needed is to augment the model’s external knowledge base with the updated information,” said Peterson. “This reduces LLM development time and cost while enhancing the model’s scalability.”

Can a technology called RAG keep AI models from making stuff up? Read More »

what-kind-of-bug-would-make-machine-learning-suddenly-40%-worse-at-nethack?

What kind of bug would make machine learning suddenly 40% worse at NetHack?

Large Moon Models (LMMs) —

One day, a roguelike-playing system just kept biffing it, for celestial reasons.

Moon rendered in ASCII text, with

Aurich Lawson

Members of the Legendary Computer Bugs Tribunal, honored guests, if I may have your attention? I would, humbly, submit a new contender for your esteemed judgment. You may or may not find it novel, you may even deign to call it a “bug,” but I assure you, you will find it entertaining.

Consider NetHack. It is one of the all-time roguelike games, and I mean that in the more strict sense of that term. The content is procedurally generated, deaths are permanent, and the only thing you keep from game to game is your skill and knowledge. I do understand that the only thing two roguelike fans can agree on is how wrong the third roguelike fan is in their definition of roguelike, but, please, let us move on.

NetHack is great for machine learning…

Being a difficult game full of consequential choices and random challenges, as well as a “single-agent” game that can be generated and played at lightning speed on modern computers, NetHack is great for those working in machine learning—or imitation learning, actually, as detailed in Jens Tuyls’ paper on how compute scaling affects single-agent game learning. Using Tuyls’ model of expert NetHack behavior, Bartłomiej Cupiał and Maciej Wołczyk trained a neural network to play and improve itself using reinforcement learning.

By mid-May of this year, the two had their model consistently scoring 5,000 points by their own metrics. Then, on one run, the model suddenly got worse, on the order of 40 percent. It scored 3,000 points. Machine learning generally, gradually, goes in one direction with these types of problems. It didn’t make sense.

Cupiał and Wołczyk tried quite a few things: reverting their code, restoring their entire software stack from a Singularity backup, and rolling back their CUDA libraries. The result? 3,000 points. They rebuild everything from scratch, and it’s still 3,000 points.

<em>NetHack</em>, played by a regular human.” height=”506″ src=”https://cdn.arstechnica.net/wp-content/uploads/2024/06/13863751533_64654db44e_o.png” width=”821″></img><figcaption>
<p><em>NetHack</em>, played by a regular human.</p>
</figcaption></figure>
<h2>… except on certain nights</h2>
<p>As <a href=detailed in Cupiał’s X (formerly Twitter) thread, this was several hours of confused trial and error by him and Wołczyk. “I am starting to feel like a madman. I can’t even watch a TV show constantly thinking about the bug,” Cupiał wrote. In desperation, he asks model author Tuyls if he knows what could be wrong. He wakes up in Kraków to an answer:

“Oh yes, it’s probably a full moon today.”

In NetHack, the game in which the DevTeam has thought of everything, if the game detects from your system clock that it should be a full moon, it will generate a message: “You are lucky! Full moon tonight.” A full moon imparts a few player benefits: a single point added to Luck, and werecreatures mostly kept to their animal forms.

It’s an easier game, all things considered, so why would the learning agent’s score be lower? It simply doesn’t have data about full moon variables in its training data, so a branching series of decisions likely leads to lesser outcomes, or just confusion. It was indeed a full moon in Kraków when the 3,000-ish scores started showing up. What a terrible night to have a learning model.

Of course, “score” is not a real metric for success in NetHack, as Cupiał himself noted. Ask a model to get the best score, and it will farm the heck out of low-level monsters because it never gets bored. “Finding items required for [ascension] or even [just] doing a quest is too much for pure RL agent,” Cupiał wrote. Another neural network, AutoAscend, does a better job of progressing through the game, but “even it can only solve sokoban and reach mines end,” Cupiał notes.

Is it a bug?

I submit to you that, although NetHack responded to the full moon in its intended way, this quirky, very hard-to-fathom stop on a machine-learning journey was indeed a bug and a worthy one in the pantheon. It’s not a Harvard moth, nor a 500-mile email, but what is?

Because the team used Singularity to back up and restore their stack, they inadvertently carried forward the machine time and resulting bug each time they tried to solve it. The machine’s resulting behavior was so bizarre, and seemingly based on unseen forces, that it drove a coder into fits. And the story has a beginning, a climactic middle, and a denouement that teaches us something, however obscure.

The NetHack Lunar Learning Bug is, I submit, quite worth memorializing. Thank you for your time.

What kind of bug would make machine learning suddenly 40% worse at NetHack? Read More »

google’s-ai-overviews-misunderstand-why-people-use-google

Google’s AI Overviews misunderstand why people use Google

robot hand holding glue bottle over a pizza and tomatoes

Aurich Lawson | Getty Images

Last month, we looked into some of the most incorrect, dangerous, and downright weird answers generated by Google’s new AI Overviews feature. Since then, Google has offered a partial apology/explanation for generating those kinds of results and has reportedly rolled back the feature’s rollout for at least some types of queries.

But the more I’ve thought about that rollout, the more I’ve begun to question the wisdom of Google’s AI-powered search results in the first place. Even when the system doesn’t give obviously wrong results, condensing search results into a neat, compact, AI-generated summary seems like a fundamental misunderstanding of how people use Google in the first place.

Reliability and relevance

When people type a question into the Google search bar, they only sometimes want the kind of basic reference information that can be found on a Wikipedia page or corporate website (or even a Google information snippet). Often, they’re looking for subjective information where there is no one “right” answer: “What are the best Mexican restaurants in Santa Fe?” or “What should I do with my kids on a rainy day?” or “How can I prevent cheese from sliding off my pizza?”

The value of Google has always been in pointing you to the places it thinks are likely to have good answers to those questions. But it’s still up to you, as a user, to figure out which of those sources is the most reliable and relevant to what you need at that moment.

  • This wasn’t funny when the guys at Pep Boys said it, either. (via)

    Kyle Orland / Google

  • Weird Al recommends “running with scissors” as well! (via)

    Kyle Orland / Google

  • This list of steps actually comes from a forum thread response about doing something completely different. (via)

    Kyle Orland / Google

  • An island that’s part of the mainland? (via)

    Kyle Orland / Google

  • If everything’s cheaper now, why does everything seem so expensive?

    Kyle Orland / Google

  • Pretty sure this Truman was never president… (via)

    Kyle Orland / Google

For reliability, any savvy Internet user makes use of countless context clues when judging a random Internet search result. Do you recognize the outlet or the author? Is the information from someone with seeming expertise/professional experience or a random forum poster? Is the site well-designed? Has it been around for a while? Does it cite other sources that you trust, etc.?

But Google also doesn’t know ahead of time which specific result will fit the kind of information you’re looking for. When it comes to restaurants in Santa Fe, for instance, are you in the mood for an authoritative list from a respected newspaper critic or for more off-the-wall suggestions from random locals? Or maybe you scroll down a bit and stumble on a loosely related story about the history of Mexican culinary influences in the city.

One of the unseen strengths of Google’s search algorithm is that the user gets to decide which results are the best for them. As long as there’s something reliable and relevant in those first few pages of results, it doesn’t matter if the other links are “wrong” for that particular search or user.

Google’s AI Overviews misunderstand why people use Google Read More »

windows-recall-demands-an-extraordinary-level-of-trust-that-microsoft-hasn’t-earned

Windows Recall demands an extraordinary level of trust that Microsoft hasn’t earned

The Recall feature as it currently exists in Windows 11 24H2 preview builds.

Enlarge / The Recall feature as it currently exists in Windows 11 24H2 preview builds.

Andrew Cunningham

Microsoft’s Windows 11 Copilot+ PCs come with quite a few new AI and machine learning-driven features, but the tentpole is Recall. Described by Microsoft as a comprehensive record of everything you do on your PC, the feature is pitched as a way to help users remember where they’ve been and to provide Windows extra contextual information that can help it better understand requests from and meet the needs of individual users.

This, as many users in infosec communities on social media immediately pointed out, sounds like a potential security nightmare. That’s doubly true because Microsoft says that by default, Recall’s screenshots take no pains to redact sensitive information, from usernames and passwords to health care information to NSFW site visits. By default, on a PC with 256GB of storage, Recall can store a couple dozen gigabytes of data across three months of PC usage, a huge amount of personal data.

The line between “potential security nightmare” and “actual security nightmare” is at least partly about the implementation, and Microsoft has been saying things that are at least superficially reassuring. Copilot+ PCs are required to have a fast neural processing unit (NPU) so that processing can be performed locally rather than sending data to the cloud; local snapshots are protected at rest by Windows’ disk encryption technologies, which are generally on by default if you’ve signed into a Microsoft account; neither Microsoft nor other users on the PC are supposed to be able to access any particular user’s Recall snapshots; and users can choose to exclude apps or (in most browsers) individual websites to exclude from Recall’s snapshots.

This all sounds good in theory, but some users are beginning to use Recall now that the Windows 11 24H2 update is available in preview form, and the actual implementation has serious problems.

“Fundamentally breaks the promise of security in Windows”

This is Recall, as seen on a PC running a preview build of Windows 11 24H2. It takes and saves periodic screenshots, which can then be searched for and viewed in various ways.

Enlarge / This is Recall, as seen on a PC running a preview build of Windows 11 24H2. It takes and saves periodic screenshots, which can then be searched for and viewed in various ways.

Andrew Cunningham

Security researcher Kevin Beaumont, first in a thread on Mastodon and later in a more detailed blog post, has written about some of the potential implementation issues after enabling Recall on an unsupported system (which is currently the only way to try Recall since Copilot+ PCs that officially support the feature won’t ship until later this month). We’ve also given this early version of Recall a try on a Windows Dev Kit 2023, which we’ve used for all our recent Windows-on-Arm testing, and we’ve independently verified Beaumont’s claims about how easy it is to find and view raw Recall data once you have access to a user’s PC.

To test Recall yourself, developer and Windows enthusiast Albacore has published a tool called AmperageKit that will enable it on Arm-based Windows PCs running Windows 11 24H2 build 26100.712 (the build currently available in the Windows Insider Release Preview channel). Other Windows 11 24H2 versions are missing the underlying code necessary to enable Recall.

  • Windows uses OCR on all the text in all the screenshots it takes. That text is also saved to an SQLite database to facilitate faster searches.

    Andrew Cunningham

  • Searching for “iCloud,” for example, brings up every single screenshot with the word “iCloud” in it, including the app itself and its entry in the Microsoft Store. If I had visited websites that mentioned it, they would show up here, too.

    Andrew Cunningham

The short version is this: In its current form, Recall takes screenshots and uses OCR to grab the information on your screen; it then writes the contents of windows plus records of different user interactions in a locally stored SQLite database to track your activity. Data is stored on a per-app basis, presumably to make it easier for Microsoft’s app-exclusion feature to work. Beaumont says “several days” of data amounted to a database around 90KB in size. In our usage, screenshots taken by Recall on a PC with a 2560×1440 screen come in at 500KB or 600KB apiece (Recall saves screenshots at your PC’s native resolution, minus the taskbar area).

Recall works locally thanks to Azure AI code that runs on your device, and it works without Internet connectivity and without a Microsoft account. Data is encrypted at rest, sort of, at least insofar as your entire drive is generally encrypted when your PC is either signed into a Microsoft account or has Bitlocker turned on. But in its current form, Beaumont says Recall has “gaps you can drive a plane through” that make it trivially easy to grab and scan through a user’s Recall database if you either (1) have local access to the machine and can log into any account (not just the account of the user whose database you’re trying to see), or (2) are using a PC infected with some kind of info-stealer virus that can quickly transfer the SQLite database to another system.

Windows Recall demands an extraordinary level of trust that Microsoft hasn’t earned Read More »

nvidia-jumps-ahead-of-itself-and-reveals-next-gen-“rubin”-ai-chips-in-keynote-tease

Nvidia jumps ahead of itself and reveals next-gen “Rubin” AI chips in keynote tease

Swing beat —

“I’m not sure yet whether I’m going to regret this,” says CEO Jensen Huang at Computex 2024.

Nvidia's CEO Jensen Huang delivers his keystone speech ahead of Computex 2024 in Taipei on June 2, 2024.

Enlarge / Nvidia’s CEO Jensen Huang delivers his keystone speech ahead of Computex 2024 in Taipei on June 2, 2024.

On Sunday, Nvidia CEO Jensen Huang reached beyond Blackwell and revealed the company’s next-generation AI-accelerating GPU platform during his keynote at Computex 2024 in Taiwan. Huang also detailed plans for an annual tick-tock-style upgrade cycle of its AI acceleration platforms, mentioning an upcoming Blackwell Ultra chip slated for 2025 and a subsequent platform called “Rubin” set for 2026.

Nvidia’s data center GPUs currently power a large majority of cloud-based AI models, such as ChatGPT, in both development (training) and deployment (inference) phases, and investors are keeping a close watch on the company, with expectations to keep that run going.

During the keynote, Huang seemed somewhat hesitant to make the Rubin announcement, perhaps wary of invoking the so-called Osborne effect, whereby a company’s premature announcement of the next iteration of a tech product eats into the current iteration’s sales. “This is the very first time that this next click as been made,” Huang said, holding up his presentation remote just before the Rubin announcement. “And I’m not sure yet whether I’m going to regret this or not.”

Nvidia Keynote at Computex 2023.

The Rubin AI platform, expected in 2026, will use HBM4 (a new form of high-bandwidth memory) and NVLink 6 Switch, operating at 3,600GBps. Following that launch, Nvidia will release a tick-tock iteration called “Rubin Ultra.” While Huang did not provide extensive specifications for the upcoming products, he promised cost and energy savings related to the new chipsets.

During the keynote, Huang also introduced a new ARM-based CPU called “Vera,” which will be featured on a new accelerator board called “Vera Rubin,” alongside one of the Rubin GPUs.

Much like Nvidia’s Grace Hopper architecture, which combines a “Grace” CPU and a “Hopper” GPU to pay tribute to the pioneering computer scientist of the same name, Vera Rubin refers to Vera Florence Cooper Rubin (1928–2016), an American astronomer who made discoveries in the field of deep space astronomy. She is best known for her pioneering work on galaxy rotation rates, which provided strong evidence for the existence of dark matter.

A calculated risk

Nvidia CEO Jensen Huang reveals the

Enlarge / Nvidia CEO Jensen Huang reveals the “Rubin” AI platform for the first time during his keynote at Computex 2024 on June 2, 2024.

Nvidia’s reveal of Rubin is not a surprise in the sense that most big tech companies are continuously working on follow-up products well in advance of release, but it’s notable because it comes just three months after the company revealed Blackwell, which is barely out of the gate and not yet widely shipping.

At the moment, the company seems to be comfortable leapfrogging itself with new announcements and catching up later; Nvidia just announced that its GH200 Grace Hopper “Superchip,” unveiled one year ago at Computex 2023, is now in full production.

With Nvidia stock rising and the company possessing an estimated 70–95 percent of the data center GPU market share, the Rubin reveal is a calculated risk that seems to come from a place of confidence. That confidence could turn out to be misplaced if a so-called “AI bubble” pops or if Nvidia misjudges the capabilities of its competitors. The announcement may also stem from pressure to continue Nvidia’s astronomical growth in market cap with nonstop promises of improving technology.

Accordingly, Huang has been eager to showcase the company’s plans to continue pushing silicon fabrication tech to its limits and widely broadcast that Nvidia plans to keep releasing new AI chips at a steady cadence.

“Our company has a one-year rhythm. Our basic philosophy is very simple: build the entire data center scale, disaggregate and sell to you parts on a one-year rhythm, and we push everything to technology limits,” Huang said during Sunday’s Computex keynote.

Despite Nvidia’s recent market performance, the company’s run may not continue indefinitely. With ample money pouring into the data center AI space, Nvidia isn’t alone in developing accelerator chips. Competitors like AMD (with the Instinct series) and Intel (with Guadi 3) also want to win a slice of the data center GPU market away from Nvidia’s current command of the AI-accelerator space. And OpenAI’s Sam Altman is trying to encourage diversified production of GPU hardware that will power the company’s next generation of AI models in the years ahead.

Nvidia jumps ahead of itself and reveals next-gen “Rubin” AI chips in keynote tease Read More »