llm

AI isn’t ready to replace human coders for debugging, researchers say

AI, coding, debugging, llm, microsoft, Programming, software development / Kris Guyer / April 12, 2025

A graph showing agents with tools nearly doubling the success rates of those without, but still achieving a success score under 50 percent — Agents using debugging tools drastically outperformed those that didn’t, but their success rate still wasn’t high enough. Credit: Microsoft Research

This approach is much more successful than relying on the models as they’re usually used, but when your best case is a 48.4 percent success rate, you’re not ready for primetime. The limitations are likely because the models don’t fully understand how to best use the tools, and because their current training data is not tailored to this use case.

“We believe this is due to the scarcity of data representing sequential decision-making behavior (e.g., debugging traces) in the current LLM training corpus,” the blog post says. “However, the significant performance improvement… validates that this is a promising research direction.”

This initial report is just the start of the efforts, the post claims. The next step is to “fine-tune an info-seeking model specialized in gathering the necessary information to resolve bugs.” If the model is large, the best move to save inference costs may be to “build a smaller info-seeking model that can provide relevant information to the larger one.”

This isn’t the first time we’ve seen outcomes that suggest some of the ambitious ideas about AI agents directly replacing developers are pretty far from reality. There have been numerous studies already showing that even though an AI tool can sometimes create an application that seems acceptable to the user for a narrow task, the models tend to produce code laden with bugs and security vulnerabilities, and they aren’t generally capable of fixing those problems.

This is an early step on the path to AI coding agents, but most researchers agree it remains likely that the best outcome is an agent that saves a human developer a substantial amount of time, not one that can do everything they can do.

AI isn’t ready to replace human coders for debugging, researchers say Read More »

ChatGPT can now remember and reference all your previous chats

AI, chat bot, chatgpt, llm, openai / Rejus Almole / April 11, 2025

Unlike the older saved memories feature, the information saved via the chat history memory feature is not accessible or tweakable. It’s either on or it’s not.

The new approach to memory is rolling out first to ChatGPT Plus and Pro users, starting today—though it looks like it’s a gradual deployment over the next few weeks. Some countries and regions (the UK, European Union, Iceland, Liechtenstein, Norway, and Switzerland) are not included in the rollout.

OpenAI says these new features will reach Enterprise, Team, and Edu users at a later, as-yet-unannounced date. The company hasn’t mentioned any plans to bring them to free users. When you gain access to this, you’ll see a pop-up that says “Introducing new, improved memory.”

A menu showing two memory toggle buttons — The new ChatGPT memory options. Credit: Benj Edwards

Some people will welcome this memory expansion, as it can significantly improve ChatGPT’s usefulness if you’re seeking answers tailored to your specific situation, personality, and preferences.

Others will likely be highly skeptical of a black box of chat history memory that can’t be tweaked or customized for privacy reasons. It’s important to note that even before the new memory feature, logs of conversations with ChatGPT may be saved and stored on OpenAI servers. It’s just that the chatbot didn’t fully incorporate their contents into its responses until now.

As with the old memory feature, you can click a checkbox to disable this completely, and it won’t be used for conversations with the Temporary Chat flag.

ChatGPT can now remember and reference all your previous chats Read More »

You knew it was coming: Google begins testing AI-only search results

ai mode, Google, google ai overviews, llm, Tech / Tim Belzer / March 6, 2025

Google has become so integral to online navigation that its name became a verb, meaning “to find things on the Internet.” Soon, Google might just tell you what’s on the Internet instead of showing you. The company has announced an expansion of its AI search features, powered by Gemini 2.0. Everyone will soon see more AI Overviews at the top of the results page, but Google is also testing a more substantial change in the form of AI Mode. This version of Google won’t show you the 10 blue links at all—Gemini completely takes over the results in AI Mode.

This marks the debut of Gemini 2.0 in Google search. Google announced the first Gemini 2.0 models in December 2024, beginning with the streamlined Gemini 2.0 Flash. The heavier versions of Gemini 2.0 are still in testing, but Google says it has tuned AI Overviews with this model to offer help with harder questions in the areas of math, coding, and multimodal queries.

With this update, you will begin seeing AI Overviews on more results pages, and minors with Google accounts will see AI results for the first time. In fact, even logged out users will see AI Overviews soon. This is a big change, but it’s only the start of Google’s plans for AI search.

Gemini 2.0 also powers the new AI Mode for search. It’s launching as an opt-in feature via Google’s Search Labs, offering a totally new alternative to search as we know it. This custom version of the Gemini large language model (LLM) skips the standard web links that have been part of every Google search thus far. The model uses “advanced reasoning, thinking, and multimodal capabilities” to build a response to your search, which can include web summaries, Knowledge Graph content, and shopping data. It’s essentially a bigger, more complex AI Overview.

As Google has previously pointed out, many searches are questions rather than a string of keywords. For those kinds of queries, an AI response could theoretically provide an answer more quickly than a list of 10 blue links. However, that relies on the AI response being useful and accurate, something that often still eludes generative AI systems like Gemini.

You knew it was coming: Google begins testing AI-only search results Read More »

AI firms follow DeepSeek’s lead, create cheaper models with “distillation”

AI, DeepSeek, Distillation, llm, Meta, openai, syndication / Tim Belzer / March 4, 2025

Thanks to distillation, developers and businesses can access these models’ capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.

Developers can use OpenAI’s platform for distillation, learning from the large language models that underpin products like ChatGPT. OpenAI’s largest backer, Microsoft, used GPT-4 to distill its small language family of models Phi as part of a commercial partnership after investing nearly $14 billion into the company.

However, the San Francisco-based start-up has said it believes DeepSeek distilled OpenAI’s models to train its competitor, a move that would be against its terms of service. DeepSeek has not commented on the claims.

While distillation can be used to create high-performing models, experts add they are more limited.

“Distillation presents an interesting trade-off; if you make the models smaller, you inevitably reduce their capability,” said Ahmed Awadallah of Microsoft Research, who said a distilled model can be designed to be very good at summarising emails, for example, “but it really would not be good at anything else.”

David Cox, vice-president for AI models at IBM Research, said most businesses do not need a massive model to run their products, and distilled ones are powerful enough for purposes such as customer service chatbots or running on smaller devices like phones.

“Any time you can [make it less expensive] and it gives you the right performance you want, there is very little reason not to do it,” he added.

That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load.

AI firms follow DeepSeek’s lead, create cheaper models with “distillation” Read More »

Privacy-problematic DeepSeek pulled from app stores in South Korea

AI, bytedance, china, DeepSeek, deepthink, infosec, llm, privacy / Kris Guyer / February 17, 2025

In a media briefing held Monday, the South Korean Personal Information Protection Commission indicated that it had paused new downloads within the country of Chinese AI startup DeepSeek’s mobile app. The restriction took effect on Saturday and doesn’t affect South Korean users who already have the app installed on their devices. The DeepSeek service also remains accessible in South Korea via the web.

Per Reuters, PIPC explained that representatives from DeepSeek acknowledged the company had “partially neglected” some of its obligations under South Korea’s data protection laws, which provide South Koreans some of the strictest privacy protections globally.

PIPC investigation division director Nam Seok is quoted by the Associated Press as saying DeepSeek “lacked transparency about third-party data transfers and potentially collected excessive personal information.” DeepSeek reportedly has dispatched a representative to South Korea to work through any issues and bring the app into compliance.

It’s unclear how long the app will remain unavailable in South Korea, with PIPC saying only that the privacy issues it identified with the app might take “a considerable amount of time” to resolve.

Western infosec sources have also expressed dissatisfaction with aspects of DeepSeek’s security. Mobile security company NowSecure reported two weeks ago that the app sends information unencrypted to servers located in China and controlled by TikTok owner ByteDance; the week before that, another security company found an open, web-accessible database filled with DeepSeek customer chat history and other sensitive data.

Ars attempted to ask DeepSeek’s DeepThink (R1) model about the Tiananmen Square massacre or its favorite “Winnie the Pooh” movie, but the LLM continued to have no comment.

Privacy-problematic DeepSeek pulled from app stores in South Korea Read More »

Amid a flurry of hype, Microsoft reorganizes entire dev team around AI

agentic AI, AI, copilot, deep learning, GitHub, llm, machine learning, microsoft, Satya Nadella, software development, Tech, visual studio / Rejus Almole / January 14, 2025

Microsoft CEO Satya Nadella has announced a dramatic restructuring of the company’s engineering organization, which is pivoting the company’s focus to developing the tools that will underpin agentic AI.

Dubbed “CoreAI – Platform and Tools,” the new division rolls the existing AI platform team and the previous developer division (responsible for everything from .NET to Visual Studio) along with some other teams into one big group.

As for what this group will be doing specifically, it’s basically everything that’s mission-critical to Microsoft in 2025, as Nadella tells it:

This new division will bring together Dev Div, AI Platform, and some key teams from the Office of the CTO (AI Supercomputer, AI Agentic Runtimes, and Engineering Thrive), with the mission to build the end-to-end Copilot & AI stack for both our first-party and third-party customers to build and run AI apps and agents. This group will also build out GitHub Copilot, thus having a tight feedback loop between the leading AI-first product and the AI platform to motivate the stack and its roadmap.

To accomplish all that, “Jay Parikh will lead this group as EVP.” Parikh was hired by Microsoft in October; he previously worked as the VP and global head of engineering at Meta.

The fact that the blog post doesn’t say anything about .NET or Visual Studio, instead emphasizing GitHub Copilot and anything and everything related to agentic AI, says a lot about how Nadella sees Microsoft’s future priorities.

So-called AI agents are applications that are given specified boundaries (action spaces) and a large memory capacity to independently do subsets of the kinds of work that human office workers do today. Some company leaders and AI commentators believe these agents will outright replace jobs, while others are more conservative, suggesting they’ll simply be powerful tools to streamline the jobs people already have.

Amid a flurry of hype, Microsoft reorganizes entire dev team around AI Read More »

How I program with LLMs

AI, best practices, Development, Features, llm, Programming, Tech / Tim Belzer / January 8, 2025

The second issue is we can do better. I am happy we now live in a time when programmers write their own tests, but we do not hold ourselves to the same standards with tests as we do with production code. That is a reasonable trade-off; there are only so many hours in the day. But what LLMs lack in arithmetical prowess, they make up for in enthusiasm.

Let’s ask for an even better test.

In the tests, implement the simplest, most readable version of the standard code for quartiles over a fixed set of known values in a slice. Then pass the test cases through the standard code and the reservoir sampler and confirm they are within an epsilon of each other. Structure the comparison code such that it can be used in a fuzz test, too.

This got us some new test code:

// referenceQuartiles calculates the exact quartiles for a slice of float64 values    // using linear interpolation, matching the behavior expected from the sampler.    func referenceQuartiles(data []float64) (q1, median, q3 float64)  …     // compareQuartiles checks if two sets of quartiles are within epsilon of each other.    // Returns true if they match within the tolerance, false otherwise.    func compareQuartiles(q1a, meda, q3a, q1b, medb, q3b, epsilon float64) bool  …     // checkQuartiles is a test helper that compares sampler output against the reference    // implementation and reports any differences.    func checkQuartiles(t *testing.T, data []float64, epsilon float64)        t.Helper()             // Get reference values       wantQ1, wantMed, wantQ3 := referenceQuartiles(data)             // Get sampler values using a large reservoir for accuracy       qs := NewQuartileSampler(1000)       for _, v := range data            qs.Add(v)              gotQ1, gotMed, gotQ3 := qs.Quartiles()             if !compareQuartiles(gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)            t.Errorf("Quartiles mismatch:ngot  (q1=%v, med=%v, q3=%v)nwant (q1=%v, med=%v, q3=%v)nepsilon=%v",               gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)

The original test from above has been reworked to use checkQuartiles, and we have something new:

func FuzzQuartileSampler(f *testing.F)        // Add some seed corpus       f.Add([]float641, 2, 3, 4, 5)             f.Fuzz(func(t *testing.T, data []float64)            // Use a larger epsilon for fuzzing since we might get more extreme values           checkQuartiles(t, data, 0.2)       )

This is fun because it’s wrong. My running gopls tool immediately says:

fuzzing arguments can only have the following types:      string, bool, float32, float64,      int, int8, int16, int32, int64,      uint, uint8, uint16, uint32, uint64,      []byte

Pasting that error back into the LLM gets it to regenerate the fuzz test such that it is built around a func(t *testing.T, data []byte) function that uses math.Float64frombits to extract floats from the data slice. Interactions like this point us toward automating the feedback from tools; all it needed was the obvious error message to make solid progress toward something useful. I was not needed.

Doing a quick survey of the last few weeks of my LLM chat history shows (which, as I mentioned earlier, is not a proper quantitative analysis by any measure) that more than 80 percent of the time there is a tooling error, the LLM can make useful progress without me adding any insight. About half the time, it can completely resolve the issue without me saying anything of note. I am just acting as the messenger.

How I program with LLMs Read More »

Apple botched the Apple Intelligence launch, but its long-term strategy is sound

AI, Apple, apple intelligence, generative ai, iOS, iOS 18.1, iOS 18.2, llm, Tech / Kris Guyer / November 9, 2024

I’ve spent a week with Apple Intelligence—here are the takeaways.

Apple Intelligence includes features like Clean Up, which lets you pick from glowing objects it has recognized to remove them from a photo. Credit: Samuel Axon

Ask a few random people about Apple Intelligence and you’ll probably get quite different responses.

One might be excited about the new features. Another could opine that no one asked for this and the company is throwing away its reputation with creatives and artists to chase a fad. Another still might tell you that regardless of the potential value, Apple is simply too late to the game to make a mark.

The release of Apple’s first Apple Intelligence-branded AI tools in iOS 18.1 last week makes all those perspectives understandable.

The first wave of features in Apple’s delayed release shows promise—and some of them may be genuinely useful, especially with further refinement. At the same time, Apple’s approach seems rushed, as if the company is cutting some corners to catch up where some perceive it has fallen behind.

That impatient, unusually undisciplined approach to the rollout could undermine the value proposition of AI tools for many users. Nonetheless, Apple’s strategy might just work out in the long run.

What’s included in “Apple Intelligence”

I’m basing those conclusions on about a week spent with both the public release of iOS 18.1 and the developer beta of iOS 18.2. Between them, the majority of features announced back in June under the “Apple Intelligence” banner are present.

Let’s start with a quick rundown of which Apple Intelligence features are in each release.

iOS 18.1 public release

Writing Tools
- Proofreading
- Rewriting in friendly, professional, or concise voices
- Summaries in prose, key points, bullet point list, or table format
Text summaries
- Summarize text from Mail messages
- Summarize text from Safari pages
Notifications
Reduce Interruptions – Intelligent filtering of notifications to include only ones deemed critical
Type to Siri
More conversational Siri
Photos
- Clean Up (remove an object or person from the image)
- Generate Memories videos/slideshows from plain language text prompts
- Natural language search

iOS 18.2 developer beta (as of November 5, 2024)

Image Playground – A prompt-based image generation app akin to something like Dall-E or Midjourney but with a limited range of stylistic possibilities, fewer features, and more guardrails
Genmoji – Generate original emoji from a prompt
Image Wand – Similar to Image Playground but simplified within the Notes app
ChatGPT integration in Siri
Visual Intelligence – iPhone 16 and iPhone 16 Pro users can use the new Camera Control button to do a variety of tasks based on what’s in the camera’s view, including translation, information about places, and more
Writing Tools – Expanded with support for prompt-based edits to text

iOS 18.1 is out right now for everybody. iOS 18.2 is scheduled for a public launch sometime in December.

iOS 18.2 will introduce both Visual Intelligence and the ability to chat with ChatGPT via Siri. Credit: Samuel Axon

A staggered rollout

For several years, Apple has released most of its major new software features for, say, the iPhone in one big software update in the fall. That timeline has gotten fuzzier in recent years, but the rollout of Apple Intelligence has moved further from that tradition than we’ve ever seen before.

Apple announced iOS 18 at its developer conference in June, suggesting that most if not all of the Apple Intelligence features would launch in that singular update alongside the new iPhones.

Much of the marketing leading up to and surrounding the iPhone 16 launch focused on Apple Intelligence, but in actuality, the iPhone 16 had none of the features under that label when it launched. The first wave hit with iOS 18.1 last week, over a month after the first consumers started getting their hands on iPhone 16 hardware. And even now, these features are in “beta,” and there has been a wait list.

Many of the most exciting Apple Intelligence features still aren’t here, with some planned for iOS 18.2’s launch in December and a few others coming even later. There will likely be a wait list for some of those, too.

The wait list part makes sense—some of these features put demand on cloud servers, and it’s reasonable to stagger the rollout to sidestep potential launch problems.

The rest doesn’t make as much sense. Between the beta label and the staggered features, it seems like Apple is rushing to satisfy expectations about Apple Intelligence before quality and consistency have fallen into place.

Making AI a harder sell

In some cases, this strategy has led to things feeling half-baked. For example, Writing Tools is available system-wide, but it’s a different experience for first-party apps that work with the new Writing Tools API than third-party apps that don’t. The former lets you approve changes piece by piece, but the latter puts you in a take-it-or-leave-it situation with the whole text. The Writing Tools API is coming in iOS 18.2, maintaining that gap for a couple of months, even for third-party apps whose developers would normally want to be on the ball with this.

Further, iOS 18.2 will allow users to tweak Writing Tools rewrites by specifying what they want in a text prompt, but that’s missing in iOS 18.1. Why launch Writing Tools with features missing and user experience inconsistencies when you could just launch the whole suite in December?

That’s just one example, but there are many similar ones. I think there are a couple of possible explanations:

Apple is trying to satisfy anxious investors and commentators who believe the company is already way too late to the generative AI sector.
With the original intent to launch it all in the first iOS 18 release, significant resources were spent on Apple Intelligence-focused advertising and marketing around the iPhone 16 in September—and when unexpected problems developing the software features led to a delay for the software launch, it was too late to change the marketing message. Ultimately, the company’s leadership may feel the pressure to make good on that pitch to users as quickly after the iPhone 16 launch as possible, even if it’s piecemeal.

I’m not sure which it is, but in either case, I don’t believe it was the right play.

So many consumers have their defenses up about AI features already, in part because other companies like Microsoft or Google rushed theirs to market without really thinking things through (or caring, if they had) and also because more and more people are naturally suspicious of whatever is labeled the next great thing in Silicon Valley (remember NFTs?). Apple had an opportunity to set itself apart in consumers’ perceptions about AI, but at least right now, that opportunity has been squandered.

Now, I’m not an AI doubter. I think these features and others can be useful, and I already use similar ones every day. I also commend Apple for allowing users to control whether these AI features are enabled at all, which should make AI skeptics more comfortable.

Notification summaries condense all the notifications from a single app into one or two lines, like with this lengthy Discord conversation here. Results are hit or miss. Credit: Samuel Axon

That said, releasing half-finished bits and pieces of Apple Intelligence doesn’t fit the company’s framing of it as a singular, branded product, and it doesn’t do a lot to handle objections from users who are already assuming AI tools will be nonsense.

There’s so much confusion about AI that it makes sense to let those who are skeptical move at their own pace, and it also makes sense to sell them on the idea with fully baked implementations.

Apple still has a more sensible approach than most

Despite all this, I like the philosophy behind how Apple has thought about implementing its AI tools, even if the rollout has been a mess. It’s fundamentally distinct from what we’re seeing from a company like Microsoft, which seems hell-bent on putting AI chatbots everywhere it can to see which real-world use cases emerge organically.

There is no true, ChatGPT-like LLM chatbot in iOS 18.1. Technically, there’s one in iOS 18.2, but only because you can tell Siri to refer you to ChatGPT on a case-by-case basis.

Instead, Apple has introduced specific generative AI features peppered throughout the operating system meant to explicitly solve narrow user problems. Sure, they’re all built on models that have resemblances to the ones that power Claude or Midjourney, but they’re not built around this idea that you start up a chat dialogue with an LLM or an image generator and it’s up to you to find a way to make it useful for you.

The practical application of most of these features is clear, provided they end up working well (more on that shortly). As a professional writer, it’s easy for me to dismiss Writing Tools as unnecessary—but obviously, not everyone is a professional writer, or even a decent one. For example, I’ve long held that one of the most positive applications of large language models is their ability to let non-native speakers clean up their writing to make it meet native speakers’ standards. In theory, Apple’s Writing Tools can do that.

Apple Intelligence features augment or add additional flexibility or power to existing use cases across the OS, like this new way to generate photo memory movies via text prompt. Credit: Samuel Axon

I have no doubt that Genmoji will be popular—who doesn’t love a bit of fun in group texts with friends? And many months before iOS 18.1, I was already dropping senselessly gargantuan corporate email threads into ChatGPT and asking for quick summaries.

Apple is approaching AI in a user-centric way that stands in stark contrast to almost every other major player rolling out AI tools. Generative AI is an evolution from machine learning, which is something Apple has been using for everything from iPad screen palm rejection to autocorrect for a while now—to great effect, as we discussed in my interview with Apple AI chief John Giannandrea a few years ago. Apple just never wrapped it in a bow and called it AI until now.

But there was no good reason to rush these features out or to even brand them as “Apple Intelligence” and make a fuss about it. They’re natural extensions of what Apple was already doing. Since they’ve been rushed out the door with a spotlight shining on them, Apple’s AI ambitions have a rockier road ahead than the company might have hoped.

It could take a year or two for this all to come together

Using iOS 18.1, it’s clear that Apple’s large language models are not as effective or reliable as Claude or ChatGPT. It takes time to train models like these, and it looks like Apple started late.

Based on my hours spent with both Apple Intelligence and more established tools from cutting-edge AI companies, I feel the other models crossed a usefulness and reliability threshold a year or so ago. When ChatGPT first launched, it was more of a curiosity than a powerful tool. Now it’s a powerful tool, but that’s a relatively recent development.

In my time with Writing Tools and Notification Summaries in particular, Apple’s models subjectively appear to be around where ChatGPT or Claude were 18 months ago. Notification Summaries almost always miss crucial context in my experience. Writing Tools introduce errors where none existed before.

A writing suggestion shows an egregious grammatical error — It’s not hard to spot the huge error that Writing Tools introduced here. This happens all the time when I use it. Credit: Samuel Axon

More mature models do these things, too, but at a much lower frequency. Unfortunately, Apple Intelligence isn’t far enough along to be broadly useful.

That said, I’m excited to see where Apple Intelligence will be in 24 months. I think the company is on the right track by using AI to target specific user needs rather than just putting a chatbot out there and letting people figure it out. It’s a much better approach than what we see with Microsoft’s Copilot. If Apple’s models cross that previously mentioned threshold of utility—and it’s only a matter of time before they do—the future of AI tools on Apple platforms could be great.

It’s just a shame that Apple didn’t seem to have the confidence to ignore the zeitgeisty commentators and roll out these features when they’re complete and ready, with messaging focusing on user problems instead of “hey, we’re taking AI seriously too.”

Most users don’t care if you’re taking AI seriously, but they do care if the tools you introduce can make their day-to-day lives better. I think they can—it will just take some patience. Users can be patient, but can Apple? It seems not.

Even so, there’s a real possibility that these early pains will be forgotten before long.

Samuel Axon is a senior editor at Ars Technica. He covers Apple, software development, gaming, AI, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

Apple botched the Apple Intelligence launch, but its long-term strategy is sound Read More »

X is training Grok AI on your data—here’s how to stop it

AI, chatbot, elon musk, grok, llm, privacy, Tech, Twitter, X, x.AI / Mike M. / July 26, 2024

Grok Your Privacy Options —

Some users were outraged to learn this was opt-out, not opt-in.

Samuel Axon – Jul 26, 2024 6: 13 pm UTC

An AI-generated image released by xAI during the launch of Grok — Enlarge / An AI-generated image released by xAI during the open-weights launch of Grok-1.

Elon Musk-led social media platform X is training Grok, its AI chatbot, on users’ data, and that’s opt-out, not opt-in. If you’re an X user, that means Grok is already being trained on your posts if you haven’t explicitly told it not to.

Over the past day or so, users of the platform noticed the checkbox to opt out of this data usage in X’s privacy settings. The discovery was accompanied by outrage that user data was being used this way to begin with.

The social media posts about this sometimes seem to suggest that Grok has only just begun training on X users’ data, but users actually don’t know for sure when it started happening.

Earlier today, X’s Safety account tweeted, “All X users have the ability to control whether their public posts can be used to train Grok, the AI search assistant.” But it didn’t clarify either when the option became available or when the data collection began.

You cannot currently disable it in the mobile apps, but you can on mobile web, and X says the option is coming to the apps soon.

On the privacy settings page, X says:

To continuously improve your experience, we may utilize your X posts as well as your user interactions, inputs, and results with Grok for training and fine-tuning purposes. This also means that your interactions, inputs, and results may also be shared with our service provider xAI for these purposes.

X’s privacy policy has allowed for this since at least September 2023.

It’s increasingly common for user data to be used this way; for example, Meta has done the same with its users’ content, and there was an outcry when Adobe updated its terms of use to allow for this kind of thing. (Adobe quickly backtracked and promised to “never” train generative AI on creators’ content.)

How to opt out

To stop Grok from training on your X content, first go to “Settings and privacy” from the “More” menu in the navigation panel…

Samuel Axon
Then click or tap “Privacy and safety”…

Samuel Axon
Then “Grok”…

Samuel Axon
And finally, uncheck the box.

Samuel Axon

You can’t opt out within the iOS or Android apps yet, but you can do so in a few quick steps on either mobile or desktop web. To do so:

Click or tap “More” in the nav panel
Click or tap “Settings and privacy”
Click or tap “Privacy and safety”
Scroll down and click or tap “Grok” under “Data sharing and personalization”
Uncheck the box “Allow your posts as well as your interactions, inputs, and results with Grok to be used for training and fine-tuning,” which is checked by default.

Alternatively, you can follow this link directly to the settings page and uncheck the box with just one more click. If you’d like, you can also delete your conversation history with Grok here, provided you’ve actually used the chatbot before.

X is training Grok AI on your data—here’s how to stop it Read More »

Google’s AI Overviews misunderstand why people use Google

AI, Google, llm, search / Mike M. / June 4, 2024

robot hand holding glue bottle over a pizza and tomatoes — Aurich Lawson | Getty Images

Last month, we looked into some of the most incorrect, dangerous, and downright weird answers generated by Google’s new AI Overviews feature. Since then, Google has offered a partial apology/explanation for generating those kinds of results and has reportedly rolled back the feature’s rollout for at least some types of queries.

But the more I’ve thought about that rollout, the more I’ve begun to question the wisdom of Google’s AI-powered search results in the first place. Even when the system doesn’t give obviously wrong results, condensing search results into a neat, compact, AI-generated summary seems like a fundamental misunderstanding of how people use Google in the first place.

Reliability and relevance

When people type a question into the Google search bar, they only sometimes want the kind of basic reference information that can be found on a Wikipedia page or corporate website (or even a Google information snippet). Often, they’re looking for subjective information where there is no one “right” answer: “What are the best Mexican restaurants in Santa Fe?” or “What should I do with my kids on a rainy day?” or “How can I prevent cheese from sliding off my pizza?”

The value of Google has always been in pointing you to the places it thinks are likely to have good answers to those questions. But it’s still up to you, as a user, to figure out which of those sources is the most reliable and relevant to what you need at that moment.

This wasn’t funny when the guys at Pep Boys said it, either. (via)

Kyle Orland / Google
Weird Al recommends “running with scissors” as well! (via)

Kyle Orland / Google
This list of steps actually comes from a forum thread response about doing something completely different. (via)

Kyle Orland / Google
An island that’s part of the mainland? (via)

Kyle Orland / Google
If everything’s cheaper now, why does everything seem so expensive?

Kyle Orland / Google
Pretty sure this Truman was never president… (via)

Kyle Orland / Google

For reliability, any savvy Internet user makes use of countless context clues when judging a random Internet search result. Do you recognize the outlet or the author? Is the information from someone with seeming expertise/professional experience or a random forum poster? Is the site well-designed? Has it been around for a while? Does it cite other sources that you trust, etc.?

But Google also doesn’t know ahead of time which specific result will fit the kind of information you’re looking for. When it comes to restaurants in Santa Fe, for instance, are you in the mood for an authoritative list from a respected newspaper critic or for more off-the-wall suggestions from random locals? Or maybe you scroll down a bit and stumble on a loosely related story about the history of Mexican culinary influences in the city.

One of the unseen strengths of Google’s search algorithm is that the user gets to decide which results are the best for them. As long as there’s something reliable and relevant in those first few pages of results, it doesn’t matter if the other links are “wrong” for that particular search or user.

Google’s AI Overviews misunderstand why people use Google Read More »

Sony Music opts out of AI training for its entire catalog

AI, AI training, llm, music, Sony, syndication / Mike M. / May 17, 2024

Taking a hard line —

Music group contacts more than 700 companies to prohibit use of content

Daniel Thomas, Financial Times – May 17, 2024 1: 16 pm UTC

picture of Beyonce who is a Sony artist — Enlarge / The Sony Music letter expressly prohibits artificial intelligence developers from using its music — which includes artists such as Beyoncé.

Kevin Mazur/WireImage for Parkwood via Getty Images

Sony Music is sending warning letters to more than 700 artificial intelligence developers and music streaming services globally in the latest salvo in the music industry’s battle against tech groups ripping off artists.

The Sony Music letter, which has been seen by the Financial Times, expressly prohibits AI developers from using its music—which includes artists such as Harry Styles, Adele and Beyoncé—and opts out of any text and data mining of any of its content for any purposes such as training, developing or commercializing any AI system.

Sony Music is sending the letter to companies developing AI systems including OpenAI, Microsoft, Google, Suno, and Udio, according to those close to the group.

The world’s second-largest music group is also sending separate letters to streaming platforms, including Spotify and Apple, asking them to adopt “best practice” measures to protect artists and songwriters and their music from scraping, mining and training by AI developers without consent or compensation. It has asked them to update their terms of service, making it clear that mining and training on its content is not permitted.

Sony Music declined to comment further.

The letter, which is being sent to tech companies around the world this week, marks an escalation of the music group’s attempts to stop the melodies, lyrics and images from copyrighted songs and artists being used by tech companies to produce new versions or to train systems to create their own music.

The letter says that Sony Music and its artists “recognize the significant potential and advancement of artificial intelligence” but adds that “unauthorized use . . . in the training, development or commercialization of AI systems deprives [Sony] of control over and appropriate compensation.”

It says: “This letter serves to put you on notice directly, and reiterate, that [Sony’s labels] expressly prohibit any use of [their] content.”

Executives at the New York-based group are concerned that their music has already been ripped off, and want to set out a clearly defined legal position that would be the first step to taking action against any developer of AI systems it considers to have exploited its music. They argue that Sony Music would be open to doing deals with AI developers to license the music, but want to reach a fair price for doing so.

The letter says: “Due to the nature of your operations and published information about your AI systems, we have reason to believe that you and/or your affiliates may already have made unauthorized uses [of Sony content] in relation to the training, development or commercialization of AI systems.”

Sony Music has asked developers to provide details of all content used by next week.

The letter also reflects concerns over the fragmented approach to AI regulation around the world. Global regulations over AI vary widely, with some regions moving forward with new rules and legal frameworks to cover the training and use of such systems but others leaving it to creative industries companies to work out relationships with developers.

In many countries around the world, particularly in the EU, copyright owners are advised to state publicly that content is not available for data mining and training for AI.

The letter says the prohibition includes using any bot, spider, scraper or automated program, tool, algorithm, code, process or methodology, as well as any “automated analytical techniques aimed at analyzing text and data in digital form to generate information, including patterns, trends, and correlations.”

Sony Music opts out of AI training for its entire catalog Read More »

Apple may hire Google to power new iPhone AI features using Gemini—report

AI, Apple, Biz & IT, chatgpt, chatgtp, cloud AI, Gemini, Google, Google Gemini, GPT-4, image synthesis, iOS, large language models, llm, machine learning, openai, Siri, text synthesis / Kris Guyer / March 18, 2024

Bake a cake as fast as you can —

With Apple’s own AI tech lagging behind, the firm looks for a fallback solution.

Benj Edwards – Mar 18, 2024 7: 56 pm UTC

On Monday, Bloomberg reported that Apple is in talks to license Google’s Gemini model to power AI features like Siri in a future iPhone software update coming later in 2024, according to people familiar with the situation. Apple has also reportedly conducted similar talks with ChatGPT maker OpenAI.

The potential integration of Google Gemini into iOS 18 could bring a range of new cloud-based (off-device) AI-powered features to Apple’s smartphone, including image creation or essay writing based on simple prompts. However, the terms and branding of the agreement have not yet been finalized, and the implementation details remain unclear. The companies are unlikely to announce any deal until Apple’s annual Worldwide Developers Conference in June.

Gemini could also bring new capabilities to Apple’s widely criticized voice assistant, Siri, which trails newer AI assistants powered by large language models (LLMs) in understanding and responding to complex questions. Rumors of Apple’s own internal frustration with Siri—and potential remedies—have been kicking around for some time. In January, 9to5Mac revealed that Apple had been conducting tests with a beta version of iOS 17.4 that used OpenAI’s ChatGPT API to power Siri.

As we have previously reported, Apple has also been developing its own AI models, including a large language model codenamed Ajax and a basic chatbot called Apple GPT. However, the company’s LLM technology is said to lag behind that of its competitors, making a partnership with Google or another AI provider a more attractive option.

Google launched Gemini, a language-based AI assistant similar to ChatGPT, in December and has updated it several times since. Many industry experts consider the larger Gemini models to be roughly as capable as OpenAI’s GPT-4 Turbo, which powers the subscription versions of ChatGPT. Until just recently, with the emergence of Gemini Ultra and Claude 3, OpenAI’s top model held a fairly wide lead in perceived LLM capability.

The potential partnership between Apple and Google could significantly impact the AI industry, as Apple’s platform represents more than 2 billion active devices worldwide. If the agreement gets finalized, it would build upon the existing search partnership between the two companies, which has seen Google pay Apple billions of dollars annually to make its search engine the default option on iPhones and other Apple devices.

However, Bloomberg reports that the potential partnership between Apple and Google is likely to draw scrutiny from regulators, as the companies’ current search deal is already the subject of a lawsuit by the US Department of Justice. The European Union is also pressuring Apple to make it easier for consumers to change their default search engine away from Google.

With so much potential money on the line, selecting Google for Apple’s cloud AI job could potentially be a major loss for OpenAI in terms of bringing its technology widely into the mainstream—with a market representing billions of users. Even so, any deal with Google or OpenAI may be a temporary fix until Apple can get its own LLM-based AI technology up to speed.

Apple may hire Google to power new iPhone AI features using Gemini—report Read More »