llm

amid-a-flurry-of-hype,-microsoft-reorganizes-entire-dev-team-around-ai

Amid a flurry of hype, Microsoft reorganizes entire dev team around AI

Microsoft CEO Satya Nadella has announced a dramatic restructuring of the company’s engineering organization, which is pivoting the company’s focus to developing the tools that will underpin agentic AI.

Dubbed “CoreAI – Platform and Tools,” the new division rolls the existing AI platform team and the previous developer division (responsible for everything from .NET to Visual Studio) along with some other teams into one big group.

As for what this group will be doing specifically, it’s basically everything that’s mission-critical to Microsoft in 2025, as Nadella tells it:

This new division will bring together Dev Div, AI Platform, and some key teams from the Office of the CTO (AI Supercomputer, AI Agentic Runtimes, and Engineering Thrive), with the mission to build the end-to-end Copilot & AI stack for both our first-party and third-party customers to build and run AI apps and agents. This group will also build out GitHub Copilot, thus having a tight feedback loop between the leading AI-first product and the AI platform to motivate the stack and its roadmap.

To accomplish all that, “Jay Parikh will lead this group as EVP.” Parikh was hired by Microsoft in October; he previously worked as the VP and global head of engineering at Meta.

The fact that the blog post doesn’t say anything about .NET or Visual Studio, instead emphasizing GitHub Copilot and anything and everything related to agentic AI, says a lot about how Nadella sees Microsoft’s future priorities.

So-called AI agents are applications that are given specified boundaries (action spaces) and a large memory capacity to independently do subsets of the kinds of work that human office workers do today. Some company leaders and AI commentators believe these agents will outright replace jobs, while others are more conservative, suggesting they’ll simply be powerful tools to streamline the jobs people already have.

Amid a flurry of hype, Microsoft reorganizes entire dev team around AI Read More »

how-i-program-with-llms

How I program with LLMs

The second issue is we can do better. I am happy we now live in a time when programmers write their own tests, but we do not hold ourselves to the same standards with tests as we do with production code. That is a reasonable trade-off; there are only so many hours in the day. But what LLMs lack in arithmetical prowess, they make up for in enthusiasm.

Let’s ask for an even better test.

In the tests, implement the simplest, most readable version of the standard code for quartiles over a fixed set of known values in a slice. Then pass the test cases through the standard code and the reservoir sampler and confirm they are within an epsilon of each other. Structure the comparison code such that it can be used in a fuzz test, too.

This got us some new test code:

// referenceQuartiles calculates the exact quartiles for a slice of float64 values    // using linear interpolation, matching the behavior expected from the sampler.    func referenceQuartiles(data []float64) (q1, median, q3 float64)  …     // compareQuartiles checks if two sets of quartiles are within epsilon of each other.    // Returns true if they match within the tolerance, false otherwise.    func compareQuartiles(q1a, meda, q3a, q1b, medb, q3b, epsilon float64) bool  …     // checkQuartiles is a test helper that compares sampler output against the reference    // implementation and reports any differences.    func checkQuartiles(t *testing.T, data []float64, epsilon float64)        t.Helper()             // Get reference values       wantQ1, wantMed, wantQ3 := referenceQuartiles(data)             // Get sampler values using a large reservoir for accuracy       qs := NewQuartileSampler(1000)       for _, v := range data            qs.Add(v)              gotQ1, gotMed, gotQ3 := qs.Quartiles()             if !compareQuartiles(gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)            t.Errorf("Quartiles mismatch:ngot  (q1=%v, med=%v, q3=%v)nwant (q1=%v, med=%v, q3=%v)nepsilon=%v",               gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)             

The original test from above has been reworked to use checkQuartiles, and we have something new:

func FuzzQuartileSampler(f *testing.F)        // Add some seed corpus       f.Add([]float641, 2, 3, 4, 5)             f.Fuzz(func(t *testing.T, data []float64)            // Use a larger epsilon for fuzzing since we might get more extreme values           checkQuartiles(t, data, 0.2)       )      

This is fun because it’s wrong. My running gopls tool immediately says:

fuzzing arguments can only have the following types:      string, bool, float32, float64,      int, int8, int16, int32, int64,      uint, uint8, uint16, uint32, uint64,      []byte  

Pasting that error back into the LLM gets it to regenerate the fuzz test such that it is built around a func(t *testing.T, data []byte) function that uses math.Float64frombits to extract floats from the data slice. Interactions like this point us toward automating the feedback from tools; all it needed was the obvious error message to make solid progress toward something useful. I was not needed.

Doing a quick survey of the last few weeks of my LLM chat history shows (which, as I mentioned earlier, is not a proper quantitative analysis by any measure) that more than 80 percent of the time there is a tooling error, the LLM can make useful progress without me adding any insight. About half the time, it can completely resolve the issue without me saying anything of note. I am just acting as the messenger.

How I program with LLMs Read More »

apple-botched-the-apple-intelligence-launch,-but-its-long-term-strategy-is-sound

Apple botched the Apple Intelligence launch, but its long-term strategy is sound


I’ve spent a week with Apple Intelligence—here are the takeaways.

Apple Intelligence includes features like Clean Up, which lets you pick from glowing objects it has recognized to remove them from a photo. Credit: Samuel Axon

Ask a few random people about Apple Intelligence and you’ll probably get quite different responses.

One might be excited about the new features. Another could opine that no one asked for this and the company is throwing away its reputation with creatives and artists to chase a fad. Another still might tell you that regardless of the potential value, Apple is simply too late to the game to make a mark.

The release of Apple’s first Apple Intelligence-branded AI tools in iOS 18.1 last week makes all those perspectives understandable.

The first wave of features in Apple’s delayed release shows promise—and some of them may be genuinely useful, especially with further refinement. At the same time, Apple’s approach seems rushed, as if the company is cutting some corners to catch up where some perceive it has fallen behind.

That impatient, unusually undisciplined approach to the rollout could undermine the value proposition of AI tools for many users. Nonetheless, Apple’s strategy might just work out in the long run.

What’s included in “Apple Intelligence”

I’m basing those conclusions on about a week spent with both the public release of iOS 18.1 and the developer beta of iOS 18.2. Between them, the majority of features announced back in June under the “Apple Intelligence” banner are present.

Let’s start with a quick rundown of which Apple Intelligence features are in each release.

iOS 18.1 public release

  • Writing Tools
    • Proofreading
    • Rewriting in friendly, professional, or concise voices
    • Summaries in prose, key points, bullet point list, or table format
  • Text summaries
    • Summarize text from Mail messages
    • Summarize text from Safari pages
  • Notifications
  • Reduce Interruptions – Intelligent filtering of notifications to include only ones deemed critical
  • Type to Siri
  • More conversational Siri
  • Photos
    • Clean Up (remove an object or person from the image)
    • Generate Memories videos/slideshows from plain language text prompts
    • Natural language search

iOS 18.2 developer beta (as of November 5, 2024)

  • Image Playground – A prompt-based image generation app akin to something like Dall-E or Midjourney but with a limited range of stylistic possibilities, fewer features, and more guardrails
  • Genmoji – Generate original emoji from a prompt
  • Image Wand – Similar to Image Playground but simplified within the Notes app
  • ChatGPT integration in Siri
  • Visual Intelligence – iPhone 16 and iPhone 16 Pro users can use the new Camera Control button to do a variety of tasks based on what’s in the camera’s view, including translation, information about places, and more
  • Writing Tools – Expanded with support for prompt-based edits to text

iOS 18.1 is out right now for everybody. iOS 18.2 is scheduled for a public launch sometime in December.

iOS 18.2 will introduce both Visual Intelligence and the ability to chat with ChatGPT via Siri.

Credit: Samuel Axon

iOS 18.2 will introduce both Visual Intelligence and the ability to chat with ChatGPT via Siri. Credit: Samuel Axon

A staggered rollout

For several years, Apple has released most of its major new software features for, say, the iPhone in one big software update in the fall. That timeline has gotten fuzzier in recent years, but the rollout of Apple Intelligence has moved further from that tradition than we’ve ever seen before.

Apple announced iOS 18 at its developer conference in June, suggesting that most if not all of the Apple Intelligence features would launch in that singular update alongside the new iPhones.

Much of the marketing leading up to and surrounding the iPhone 16 launch focused on Apple Intelligence, but in actuality, the iPhone 16 had none of the features under that label when it launched. The first wave hit with iOS 18.1 last week, over a month after the first consumers started getting their hands on iPhone 16 hardware. And even now, these features are in “beta,” and there has been a wait list.

Many of the most exciting Apple Intelligence features still aren’t here, with some planned for iOS 18.2’s launch in December and a few others coming even later. There will likely be a wait list for some of those, too.

The wait list part makes sense—some of these features put demand on cloud servers, and it’s reasonable to stagger the rollout to sidestep potential launch problems.

The rest doesn’t make as much sense. Between the beta label and the staggered features, it seems like Apple is rushing to satisfy expectations about Apple Intelligence before quality and consistency have fallen into place.

Making AI a harder sell

In some cases, this strategy has led to things feeling half-baked. For example, Writing Tools is available system-wide, but it’s a different experience for first-party apps that work with the new Writing Tools API than third-party apps that don’t. The former lets you approve changes piece by piece, but the latter puts you in a take-it-or-leave-it situation with the whole text. The Writing Tools API is coming in iOS 18.2, maintaining that gap for a couple of months, even for third-party apps whose developers would normally want to be on the ball with this.

Further, iOS 18.2 will allow users to tweak Writing Tools rewrites by specifying what they want in a text prompt, but that’s missing in iOS 18.1. Why launch Writing Tools with features missing and user experience inconsistencies when you could just launch the whole suite in December?

That’s just one example, but there are many similar ones. I think there are a couple of possible explanations:

  • Apple is trying to satisfy anxious investors and commentators who believe the company is already way too late to the generative AI sector.
  • With the original intent to launch it all in the first iOS 18 release, significant resources were spent on Apple Intelligence-focused advertising and marketing around the iPhone 16 in September—and when unexpected problems developing the software features led to a delay for the software launch, it was too late to change the marketing message. Ultimately, the company’s leadership may feel the pressure to make good on that pitch to users as quickly after the iPhone 16 launch as possible, even if it’s piecemeal.

I’m not sure which it is, but in either case, I don’t believe it was the right play.

So many consumers have their defenses up about AI features already, in part because other companies like Microsoft or Google rushed theirs to market without really thinking things through (or caring, if they had) and also because more and more people are naturally suspicious of whatever is labeled the next great thing in Silicon Valley (remember NFTs?). Apple had an opportunity to set itself apart in consumers’ perceptions about AI, but at least right now, that opportunity has been squandered.

Now, I’m not an AI doubter. I think these features and others can be useful, and I already use similar ones every day. I also commend Apple for allowing users to control whether these AI features are enabled at all, which should make AI skeptics more comfortable.

Notification summaries condense all the notifications from a single app into one or two lines, like with this lengthy Discord conversation here. Results are hit or miss.

Credit: Samuel Axon

Notification summaries condense all the notifications from a single app into one or two lines, like with this lengthy Discord conversation here. Results are hit or miss. Credit: Samuel Axon

That said, releasing half-finished bits and pieces of Apple Intelligence doesn’t fit the company’s framing of it as a singular, branded product, and it doesn’t do a lot to handle objections from users who are already assuming AI tools will be nonsense.

There’s so much confusion about AI that it makes sense to let those who are skeptical move at their own pace, and it also makes sense to sell them on the idea with fully baked implementations.

Apple still has a more sensible approach than most

Despite all this, I like the philosophy behind how Apple has thought about implementing its AI tools, even if the rollout has been a mess. It’s fundamentally distinct from what we’re seeing from a company like Microsoft, which seems hell-bent on putting AI chatbots everywhere it can to see which real-world use cases emerge organically.

There is no true, ChatGPT-like LLM chatbot in iOS 18.1. Technically, there’s one in iOS 18.2, but only because you can tell Siri to refer you to ChatGPT on a case-by-case basis.

Instead, Apple has introduced specific generative AI features peppered throughout the operating system meant to explicitly solve narrow user problems. Sure, they’re all built on models that have resemblances to the ones that power Claude or Midjourney, but they’re not built around this idea that you start up a chat dialogue with an LLM or an image generator and it’s up to you to find a way to make it useful for you.

The practical application of most of these features is clear, provided they end up working well (more on that shortly). As a professional writer, it’s easy for me to dismiss Writing Tools as unnecessary—but obviously, not everyone is a professional writer, or even a decent one. For example, I’ve long held that one of the most positive applications of large language models is their ability to let non-native speakers clean up their writing to make it meet native speakers’ standards. In theory, Apple’s Writing Tools can do that.

Apple Intelligence features augment or add additional flexibility or power to existing use cases across the OS, like this new way to generate photo memory movies via text prompt.

Credit: Samuel Axon

Apple Intelligence features augment or add additional flexibility or power to existing use cases across the OS, like this new way to generate photo memory movies via text prompt. Credit: Samuel Axon

I have no doubt that Genmoji will be popular—who doesn’t love a bit of fun in group texts with friends? And many months before iOS 18.1, I was already dropping senselessly gargantuan corporate email threads into ChatGPT and asking for quick summaries.

Apple is approaching AI in a user-centric way that stands in stark contrast to almost every other major player rolling out AI tools. Generative AI is an evolution from machine learning, which is something Apple has been using for everything from iPad screen palm rejection to autocorrect for a while now—to great effect, as we discussed in my interview with Apple AI chief John Giannandrea a few years ago. Apple just never wrapped it in a bow and called it AI until now.

But there was no good reason to rush these features out or to even brand them as “Apple Intelligence” and make a fuss about it. They’re natural extensions of what Apple was already doing. Since they’ve been rushed out the door with a spotlight shining on them, Apple’s AI ambitions have a rockier road ahead than the company might have hoped.

It could take a year or two for this all to come together

Using iOS 18.1, it’s clear that Apple’s large language models are not as effective or reliable as Claude or ChatGPT. It takes time to train models like these, and it looks like Apple started late.

Based on my hours spent with both Apple Intelligence and more established tools from cutting-edge AI companies, I feel the other models crossed a usefulness and reliability threshold a year or so ago. When ChatGPT first launched, it was more of a curiosity than a powerful tool. Now it’s a powerful tool, but that’s a relatively recent development.

In my time with Writing Tools and Notification Summaries in particular, Apple’s models subjectively appear to be around where ChatGPT or Claude were 18 months ago. Notification Summaries almost always miss crucial context in my experience. Writing Tools introduce errors where none existed before.

A writing suggestion shows an egregious grammatical error

It’s not hard to spot the huge error that Writing Tools introduced here. This happens all the time when I use it.

Credit: Samuel Axon

It’s not hard to spot the huge error that Writing Tools introduced here. This happens all the time when I use it. Credit: Samuel Axon

More mature models do these things, too, but at a much lower frequency. Unfortunately, Apple Intelligence isn’t far enough along to be broadly useful.

That said, I’m excited to see where Apple Intelligence will be in 24 months. I think the company is on the right track by using AI to target specific user needs rather than just putting a chatbot out there and letting people figure it out. It’s a much better approach than what we see with Microsoft’s Copilot. If Apple’s models cross that previously mentioned threshold of utility—and it’s only a matter of time before they do—the future of AI tools on Apple platforms could be great.

It’s just a shame that Apple didn’t seem to have the confidence to ignore the zeitgeisty commentators and roll out these features when they’re complete and ready, with messaging focusing on user problems instead of “hey, we’re taking AI seriously too.”

Most users don’t care if you’re taking AI seriously, but they do care if the tools you introduce can make their day-to-day lives better. I think they can—it will just take some patience. Users can be patient, but can Apple? It seems not.

Even so, there’s a real possibility that these early pains will be forgotten before long.

Photo of Samuel Axon

Samuel Axon is a senior editor at Ars Technica. He covers Apple, software development, gaming, AI, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

Apple botched the Apple Intelligence launch, but its long-term strategy is sound Read More »

x-is-training-grok-ai-on-your-data—here’s-how-to-stop-it

X is training Grok AI on your data—here’s how to stop it

Grok Your Privacy Options —

Some users were outraged to learn this was opt-out, not opt-in.

An AI-generated image released by xAI during the launch of Grok

Enlarge / An AI-generated image released by xAI during the open-weights launch of Grok-1.

Elon Musk-led social media platform X is training Grok, its AI chatbot, on users’ data, and that’s opt-out, not opt-in. If you’re an X user, that means Grok is already being trained on your posts if you haven’t explicitly told it not to.

Over the past day or so, users of the platform noticed the checkbox to opt out of this data usage in X’s privacy settings. The discovery was accompanied by outrage that user data was being used this way to begin with.

The social media posts about this sometimes seem to suggest that Grok has only just begun training on X users’ data, but users actually don’t know for sure when it started happening.

Earlier today, X’s Safety account tweeted, “All X users have the ability to control whether their public posts can be used to train Grok, the AI search assistant.” But it didn’t clarify either when the option became available or when the data collection began.

You cannot currently disable it in the mobile apps, but you can on mobile web, and X says the option is coming to the apps soon.

On the privacy settings page, X says:

To continuously improve your experience, we may utilize your X posts as well as your user interactions, inputs, and results with Grok for training and fine-tuning purposes. This also means that your interactions, inputs, and results may also be shared with our service provider xAI for these purposes.

X’s privacy policy has allowed for this since at least September 2023.

It’s increasingly common for user data to be used this way; for example, Meta has done the same with its users’ content, and there was an outcry when Adobe updated its terms of use to allow for this kind of thing. (Adobe quickly backtracked and promised to “never” train generative AI on creators’ content.)

How to opt out

  • To stop Grok from training on your X content, first go to “Settings and privacy” from the “More” menu in the navigation panel…

    Samuel Axon

  • Then click or tap “Privacy and safety”…

    Samuel Axon

  • Then “Grok”…

    Samuel Axon

  • And finally, uncheck the box.

    Samuel Axon

You can’t opt out within the iOS or Android apps yet, but you can do so in a few quick steps on either mobile or desktop web. To do so:

  • Click or tap “More” in the nav panel
  • Click or tap “Settings and privacy”
  • Click or tap “Privacy and safety”
  • Scroll down and click or tap “Grok” under “Data sharing and personalization”
  • Uncheck the box “Allow your posts as well as your interactions, inputs, and results with Grok to be used for training and fine-tuning,” which is checked by default.

Alternatively, you can follow this link directly to the settings page and uncheck the box with just one more click. If you’d like, you can also delete your conversation history with Grok here, provided you’ve actually used the chatbot before.

X is training Grok AI on your data—here’s how to stop it Read More »

google’s-ai-overviews-misunderstand-why-people-use-google

Google’s AI Overviews misunderstand why people use Google

robot hand holding glue bottle over a pizza and tomatoes

Aurich Lawson | Getty Images

Last month, we looked into some of the most incorrect, dangerous, and downright weird answers generated by Google’s new AI Overviews feature. Since then, Google has offered a partial apology/explanation for generating those kinds of results and has reportedly rolled back the feature’s rollout for at least some types of queries.

But the more I’ve thought about that rollout, the more I’ve begun to question the wisdom of Google’s AI-powered search results in the first place. Even when the system doesn’t give obviously wrong results, condensing search results into a neat, compact, AI-generated summary seems like a fundamental misunderstanding of how people use Google in the first place.

Reliability and relevance

When people type a question into the Google search bar, they only sometimes want the kind of basic reference information that can be found on a Wikipedia page or corporate website (or even a Google information snippet). Often, they’re looking for subjective information where there is no one “right” answer: “What are the best Mexican restaurants in Santa Fe?” or “What should I do with my kids on a rainy day?” or “How can I prevent cheese from sliding off my pizza?”

The value of Google has always been in pointing you to the places it thinks are likely to have good answers to those questions. But it’s still up to you, as a user, to figure out which of those sources is the most reliable and relevant to what you need at that moment.

  • This wasn’t funny when the guys at Pep Boys said it, either. (via)

    Kyle Orland / Google

  • Weird Al recommends “running with scissors” as well! (via)

    Kyle Orland / Google

  • This list of steps actually comes from a forum thread response about doing something completely different. (via)

    Kyle Orland / Google

  • An island that’s part of the mainland? (via)

    Kyle Orland / Google

  • If everything’s cheaper now, why does everything seem so expensive?

    Kyle Orland / Google

  • Pretty sure this Truman was never president… (via)

    Kyle Orland / Google

For reliability, any savvy Internet user makes use of countless context clues when judging a random Internet search result. Do you recognize the outlet or the author? Is the information from someone with seeming expertise/professional experience or a random forum poster? Is the site well-designed? Has it been around for a while? Does it cite other sources that you trust, etc.?

But Google also doesn’t know ahead of time which specific result will fit the kind of information you’re looking for. When it comes to restaurants in Santa Fe, for instance, are you in the mood for an authoritative list from a respected newspaper critic or for more off-the-wall suggestions from random locals? Or maybe you scroll down a bit and stumble on a loosely related story about the history of Mexican culinary influences in the city.

One of the unseen strengths of Google’s search algorithm is that the user gets to decide which results are the best for them. As long as there’s something reliable and relevant in those first few pages of results, it doesn’t matter if the other links are “wrong” for that particular search or user.

Google’s AI Overviews misunderstand why people use Google Read More »

sony-music-opts-out-of-ai-training-for-its-entire-catalog

Sony Music opts out of AI training for its entire catalog

Taking a hard line —

Music group contacts more than 700 companies to prohibit use of content

picture of Beyonce who is a Sony artist

Enlarge / The Sony Music letter expressly prohibits artificial intelligence developers from using its music — which includes artists such as Beyoncé.

Kevin Mazur/WireImage for Parkwood via Getty Images

Sony Music is sending warning letters to more than 700 artificial intelligence developers and music streaming services globally in the latest salvo in the music industry’s battle against tech groups ripping off artists.

The Sony Music letter, which has been seen by the Financial Times, expressly prohibits AI developers from using its music—which includes artists such as Harry Styles, Adele and Beyoncé—and opts out of any text and data mining of any of its content for any purposes such as training, developing or commercializing any AI system.

Sony Music is sending the letter to companies developing AI systems including OpenAI, Microsoft, Google, Suno, and Udio, according to those close to the group.

The world’s second-largest music group is also sending separate letters to streaming platforms, including Spotify and Apple, asking them to adopt “best practice” measures to protect artists and songwriters and their music from scraping, mining and training by AI developers without consent or compensation. It has asked them to update their terms of service, making it clear that mining and training on its content is not permitted.

Sony Music declined to comment further.

The letter, which is being sent to tech companies around the world this week, marks an escalation of the music group’s attempts to stop the melodies, lyrics and images from copyrighted songs and artists being used by tech companies to produce new versions or to train systems to create their own music.

The letter says that Sony Music and its artists “recognize the significant potential and advancement of artificial intelligence” but adds that “unauthorized use . . . in the training, development or commercialization of AI systems deprives [Sony] of control over and appropriate compensation.”

It says: “This letter serves to put you on notice directly, and reiterate, that [Sony’s labels] expressly prohibit any use of [their] content.”

Executives at the New York-based group are concerned that their music has already been ripped off, and want to set out a clearly defined legal position that would be the first step to taking action against any developer of AI systems it considers to have exploited its music. They argue that Sony Music would be open to doing deals with AI developers to license the music, but want to reach a fair price for doing so.

The letter says: “Due to the nature of your operations and published information about your AI systems, we have reason to believe that you and/or your affiliates may already have made unauthorized uses [of Sony content] in relation to the training, development or commercialization of AI systems.”

Sony Music has asked developers to provide details of all content used by next week.

The letter also reflects concerns over the fragmented approach to AI regulation around the world. Global regulations over AI vary widely, with some regions moving forward with new rules and legal frameworks to cover the training and use of such systems but others leaving it to creative industries companies to work out relationships with developers.

In many countries around the world, particularly in the EU, copyright owners are advised to state publicly that content is not available for data mining and training for AI.

The letter says the prohibition includes using any bot, spider, scraper or automated program, tool, algorithm, code, process or methodology, as well as any “automated analytical techniques aimed at analyzing text and data in digital form to generate information, including patterns, trends, and correlations.”

© 2024 The Financial Times Ltd. All rights reserved Not to be redistributed, copied, or modified in any way.

Sony Music opts out of AI training for its entire catalog Read More »

apple-may-hire-google-to-power-new-iphone-ai-features-using-gemini—report

Apple may hire Google to power new iPhone AI features using Gemini—report

Bake a cake as fast as you can —

With Apple’s own AI tech lagging behind, the firm looks for a fallback solution.

A Google

Benj Edwards

On Monday, Bloomberg reported that Apple is in talks to license Google’s Gemini model to power AI features like Siri in a future iPhone software update coming later in 2024, according to people familiar with the situation. Apple has also reportedly conducted similar talks with ChatGPT maker OpenAI.

The potential integration of Google Gemini into iOS 18 could bring a range of new cloud-based (off-device) AI-powered features to Apple’s smartphone, including image creation or essay writing based on simple prompts. However, the terms and branding of the agreement have not yet been finalized, and the implementation details remain unclear. The companies are unlikely to announce any deal until Apple’s annual Worldwide Developers Conference in June.

Gemini could also bring new capabilities to Apple’s widely criticized voice assistant, Siri, which trails newer AI assistants powered by large language models (LLMs) in understanding and responding to complex questions. Rumors of Apple’s own internal frustration with Siri—and potential remedies—have been kicking around for some time. In January, 9to5Mac revealed that Apple had been conducting tests with a beta version of iOS 17.4 that used OpenAI’s ChatGPT API to power Siri.

As we have previously reported, Apple has also been developing its own AI models, including a large language model codenamed Ajax and a basic chatbot called Apple GPT. However, the company’s LLM technology is said to lag behind that of its competitors, making a partnership with Google or another AI provider a more attractive option.

Google launched Gemini, a language-based AI assistant similar to ChatGPT, in December and has updated it several times since. Many industry experts consider the larger Gemini models to be roughly as capable as OpenAI’s GPT-4 Turbo, which powers the subscription versions of ChatGPT. Until just recently, with the emergence of Gemini Ultra and Claude 3, OpenAI’s top model held a fairly wide lead in perceived LLM capability.

The potential partnership between Apple and Google could significantly impact the AI industry, as Apple’s platform represents more than 2 billion active devices worldwide. If the agreement gets finalized, it would build upon the existing search partnership between the two companies, which has seen Google pay Apple billions of dollars annually to make its search engine the default option on iPhones and other Apple devices.

However, Bloomberg reports that the potential partnership between Apple and Google is likely to draw scrutiny from regulators, as the companies’ current search deal is already the subject of a lawsuit by the US Department of Justice. The European Union is also pressuring Apple to make it easier for consumers to change their default search engine away from Google.

With so much potential money on the line, selecting Google for Apple’s cloud AI job could potentially be a major loss for OpenAI in terms of bringing its technology widely into the mainstream—with a market representing billions of users. Even so, any deal with Google or OpenAI may be a temporary fix until Apple can get its own LLM-based AI technology up to speed.

Apple may hire Google to power new iPhone AI features using Gemini—report Read More »

google-upstages-itself-with-gemini-15-ai-launch,-one-week-after-ultra-1.0

Google upstages itself with Gemini 1.5 AI launch, one week after Ultra 1.0

Gemini’s Twin —

Google confusingly overshadows its own pro product a week after its last major AI launch.

The Gemini 1.5 logo

Enlarge / The Gemini 1.5 logo, released by Google.

Google

One week after its last major AI announcement, Google appears to have upstaged itself. Last Thursday, Google launched Gemini Ultra 1.0, which supposedly represented the best AI language model Google could muster—available as part of the renamed “Gemini” AI assistant (formerly Bard). Today, Google announced Gemini Pro 1.5, which it says “achieves comparable quality to 1.0 Ultra, while using less compute.”

Congratulations, Google, you’ve done it. You’ve undercut your own premiere AI product. While Ultra 1.0 is possibly still better than Pro 1.5 (what even are we saying here), Ultra was presented as a key selling point of its “Gemini Advanced” tier of its Google One subscription service. And now it’s looking a lot less advanced than seven days ago. All this is on top of the confusing name-shuffling Google has been doing recently. (Just to be clear—although it’s not really clarifying at all—the free version of Bard/Gemini currently uses the Pro 1.0 model. Got it?)

Google claims that Gemini 1.5 represents a new generation of LLMs that “delivers a breakthrough in long-context understanding,” and that it can process up to 1 million tokens, “achieving the longest context window of any large-scale foundation model yet.” Tokens are fragments of a word. The first part of the claim about “understanding” is contentious and subjective, but the second part is probably correct. OpenAI’s GPT-4 Turbo can reportedly handle 128,000 tokens in some circumstances, and 1 million is quite a bit more—about 700,000 words. A larger context window allows for processing longer documents and having longer conversations. (The Gemini 1.0 model family handles 32,000 tokens max.)

But any technical breakthroughs are almost beside the point. What should we make of a company that just trumpeted to the world about its AI supremacy last week, only to partially supersede that a week later? Is it a testament to the rapid rate of AI technical progress in Google’s labs, a sign that red tape was holding back Ultra 1.0 for too long, or merely a sign of poor coordination between research and marketing? We honestly don’t know.

So back to Gemini 1.5. What is it, really, and how will it be available? Google implies that like 1.0 (which had Nano, Pro, and Ultra flavors), it will be available in multiple sizes. Right now, Pro 1.5 is the only model Google is unveiling. Google says that 1.5 uses a new mixture-of-experts (MoE) architecture, which means the system selectively activates different “experts” or specialized sub-models within a larger neural network for specific tasks based on the input data.

Google says that Gemini 1.5 can perform “complex reasoning about vast amounts of information,” and gives an example of analyzing a 402-page transcript of Apollo 11’s mission to the Moon. It’s impressive to process documents that large, but the model, like every large language model, is highly likely to confabulate interpretations across large contexts. We wouldn’t trust it to soundly analyze 1 million tokens without mistakes, so that’s putting a lot of faith into poorly understood LLM hands.

For those interested in diving into technical details, Google has released a technical report on Gemini 1.5 that appears to show Gemini performing favorably versus GPT-4 Turbo on various tasks, but it’s also important to note that the selection and interpretation of those benchmarks can be subjective. The report does give some numbers on how much better 1.5 is compared to 1.0, saying it’s 28.9 percent better than 1.0 Pro at “Math, Science & Reasoning” and 5.2 percent better at those subjects than 1.0 Ultra.

A table from the Gemini 1.5 technical document showing comparisons to Gemini 1.0.

Enlarge / A table from the Gemini 1.5 technical document showing comparisons to Gemini 1.0.

Google

But for now, we’re still kind of shocked that Google would launch this particular model at this particular moment in time. Is it trying to get ahead of something that it knows might be just around the corner, like OpenAI’s unreleased GPT-5, for instance? We’ll keep digging and let you know what we find.

Google says that a limited preview of 1.5 Pro is available now for developers via AI Studio and Vertex AI with a 128,000 token context window, scaling up to 1 million tokens later. Gemini 1.5 apparently has not come to the Gemini chatbot (formerly Bard) yet.

Google upstages itself with Gemini 1.5 AI launch, one week after Ultra 1.0 Read More »

this-‘skyrim-vr’-mod-shows-how-ai-can-take-vr-immersion-to-the-next-level

This ‘Skyrim VR’ Mod Shows How AI Can Take VR Immersion to the Next Level

ChatGPT isn’t perfect, but the popular AI chatbot’s access to large language models (LLM) means it can do a lot of things you might not expect, like give all of Tamriel’s NPC inhabitants the ability to hold natural conversations and answer questions about the iconic fantasy world. Uncanny, yes. But it’s a prescient look at how games might one day use AI to reach new heights in immersion.

YouTuber ‘Art from the Machine’ released a video showing off how they modded the much beloved VR version of The Elder Scrolls V: Skyrim.

The mod, which isn’t available yet, ostensibly lets you hold conversations with NPCs via ChatGPT and xVASynth, an AI tool for generating voice acting lines using voices from video games.

Check out the results in the most recent update below:

The latest version of the project introduces Skyrim scripting for the first time, which the developer says allows for lip syncing of voices and NPC awareness of in-game events. While still a little rigid, it feels like a pretty big step towards climbing out of the uncanny valley.

Here’s how ‘Art from the Machine’ describes the project in a recent Reddit post showcasing their work:

A few weeks ago I posted a video demonstrating a Python script I am working on which lets you talk to NPCs in Skyrim via ChatGPT and xVASynth. Since then I have been working to integrate this Python script with Skyrim’s own modding tools and I have reached a few exciting milestones:

NPCs are now aware of their current location and time of day. This opens up lots of possibilities for ChatGPT to react to the game world dynamically instead of waiting to be given context by the player. As an example, I no longer have issues with shopkeepers trying to barter with me in the Bannered Mare after work hours. NPCs are also aware of the items picked up by the player during conversation. This means that if you loot a chest, harvest an animal pelt, or pick a flower, NPCs will be able to comment on these actions.

NPCs are now lip synced with xVASynth. This is obviously much more natural than the floaty proof-of-concept voices I had before. I have also made some quality of life improvements such as getting response times down to ~15 seconds and adding a spell to start conversations.

When everything is in place, it is an incredibly surreal experience to be able to sit down and talk to these characters in VR. Nothing takes me out of the experience more than hearing the same repeated voice lines, and with this no two responses are ever the same. There is still a lot of work to go, but even in its current state I couldn’t go back to playing without this.

You might notice the actual voice prompting the NPCs is also fairly robotic too, although ‘Art from the Machine’ says they’re using speech-to-text to talk to the ChatGPT 3.5-driven system. The voice heard in the video is generated from xVASynth, and then plugged in during video editing to replace what they call their “radio-unfriendly voice.”

And when can you download and play for yourself? Well, the developer says publishing their project is still a bit of a sticky issue.

“I haven’t really thought about how to publish this, so I think I’ll have to dig into other ChatGPT projects to see how others have tackled the API key issue. I am hoping that it’s possible to alternatively connect to a locally-run LLM model for anyone who isn’t keen on paying the API fees.”

Serving up more natural NPC responses is also an area that needs to be addressed, the developer says.

For now I have it set up so that NPCs say “let me think” to indicate that I have been heard and the response is in the process of being generated, but you’re right this can be expanded to choose from a few different filler lines instead of repeating the same one every time.

And while the video is noticeably sped up after prompts, this mostly comes down to the voice generation software xVASynth, which admittedly slows the response pipeline down since it’s being run locally. ChatGPT itself doesn’t affect performance, the developer says.

This isn’t the first project we’ve seen using chatbots to enrich user interactions. Lee Vermeulen, a long-time VR pioneer and developer behind Modboxreleased a video in 2021 showing off one of his first tests using OpenAI GPT 3 and voice acting software Replica. In Vermeulen’s video, he talks about how he set parameters for each NPC, giving them the body of knowledge they should have, all of which guides the sort of responses they’ll give.

Check out Vermeulen’s video below, the very same that inspired ‘Art from the Machine’ to start working on the Skyrim VR mod:

As you’d imagine, this is really only the tip of the iceberg for AI-driven NPC interactions. Being able to naturally talk to NPCs, even if a little stuttery and not exactly at human-level, may be preferable over having to wade through a ton of 2D text menus, or go through slow and ungainly tutorials. It also offers up the chance to bond more with your trusty AI companion, like Skyrim’s Lydia or Fallout 4’s Nick Valentine, who instead of offering up canned dialogue might actually, you know, help you out every once in a while.

And that’s really only the surface level stuff that a mod like ‘Art from the Machine’ might deliver to existing games that aren’t built with AI-driven NPCs. Imagining a game that is actually predicated on your ability to ask the right questions and do your own detective work—well, that’s a role-playing game we’ve never experienced before, either in VR our otherwise.

This ‘Skyrim VR’ Mod Shows How AI Can Take VR Immersion to the Next Level Read More »