Author name: Tim Belzer

we’re-about-to-fly-a-spacecraft-into-the-sun-for-the-first-time

We’re about to fly a spacecraft into the Sun for the first time

Twas the night before Christmas, when all through the Solar cycle,

Not a sunspot was stirring, not even a burst;

The stockings were all hung by the corona with care,

In hopes that the Parker Solar Probe would soon be there. 

Almost no one ever writes about the Parker Solar Probe anymore.

Sure, the spacecraft got some attention when it launched.  It is, after all, the fastest moving object that humans have ever built. At its maximum speed, goosed by the gravitational pull of the Sun, the probe reaches a velocity of 430,000 miles per hour, or more than one-sixth of 1 percent the speed of light. That kind of speed would get you from New York City to Tokyo in less than a minute.

And the Parker Solar Probe also has the distinction of being the first NASA spacecraft named after a living person. At the time of its launch, in August 2018, physicist Eugene Parker was 91 years old.

But in the six years since the probe has been zipping through outer space and flying by the Sun? Not so much. Let’s face it, the astrophysical properties of the Sun and its complicated structure are not something that most people think about on a daily basis.

However, the smallish probe—it masses less than a metric ton, and its scientific payload is only about 110 pounds (50 kg)—is about to make its star turn. Quite literally. On Christmas Eve, the Parker Solar Probe will make its closest approach yet to the Sun. It will come within just 3.8 million miles (6.1 million km) of the solar surface, flying into the solar atmosphere for the first time.

Yeah, it’s going to get pretty hot. Scientists estimate that the probe’s heat shield will endure temperatures in excess of 2,500° Fahrenheit (1,371° C) on Christmas Eve, which is pretty much the polar opposite of the North Pole.

Going straight to the source

I spoke with the chief of science at NASA, Nicky Fox, to understand why the probe is being tortured so. Before moving to NASA headquarters, Fox was the project scientist for the Parker Solar Probe, and she explained that scientists really want to understand the origins of the solar wind.

We’re about to fly a spacecraft into the Sun for the first time Read More »

not-to-be-outdone-by-openai,-google-releases-its-own-“reasoning”-ai-model

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model

Google DeepMind’s chief scientist, Jeff Dean, says that the model receives extra computing power, writing on X, “we see promising results when we increase inference time computation!” The model works by pausing to consider multiple related prompts before providing what it determines to be the most accurate answer.

Since OpenAI’s jump into the “reasoning” field in September with o1-preview and o1-mini, several companies have been rushing to achieve feature parity with their own models. For example, DeepSeek launched DeepSeek-R1 in early November, while Alibaba’s Qwen team released its own “reasoning” model, QwQ earlier this month.

While some claim that reasoning models can help solve complex mathematical or academic problems, these models might not be for everybody. While they perform well on some benchmarks, questions remain about their actual usefulness and accuracy. Also, the high computing costs needed to run reasoning models have created some rumblings about their long-term viability. That high cost is why OpenAI’s ChatGPT Pro costs $200 a month, for example.

Still, it appears Google is serious about pursuing this particular AI technique. Logan Kilpatrick, a Google employee in its AI Studio, called it “the first step in our reasoning journey” in a post on X.

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model Read More »

here’s-what-we-learned-driving-audi’s-new-q6-and-sq6-electric-suvs

Here’s what we learned driving Audi’s new Q6 and SQ6 electric SUVs

HEALDSBURG, Calif.—Earlier this summer, Ars got its first drive of Audi’s new Q6 e-tron on some very wet roads in Spain. Then, we were driving pre-production Q6s in Euro-spec. Now, the electric SUV is on sale in the US, with more power in the base model and six months more refinement for its software. But the venue change did not bring a change of weather—heavy rain was the order of the day, making me wonder if Audi is building its new electric vehicle on the site of an ancient rain god’s temple?

Of all its rivals, Audi appears to have settled into a nomenclature for its vehicles that at least makes a little sense. Odd numbers are for internal combustion engines, even numbers for EVs, although it also appends “e-tron” on the end to make that entirely clear… and give francophones something to snicker about. (Yes, the e-tron GT does not fit into this schema, but nobody’s perfect.)

The Q6 e-tron is also the most advanced EV to wear Audi’s four rings. Built on a new architecture called PPE (premium platform electric), at its heart is an 800 V powertrain with a 100 kWh (94.4 kWh useable) lithium-ion battery pack that powers a permanently excited synchronous motor driving the rear wheels, and in the case of the quattro versions, an asynchronous motor. The electric motors have 30 percent less energy consumption than those used in the Q8 e-tron, and are smaller and lighter.

That makes it a lot more up to date than the Q8 e-tron, which uses a modified version of Audi’s venerable MLB Evo platform, or the smaller Q4 e-tron, a somewhat disappointing electric crossover that’s essentially a Volkswagen ID.4 with a glow-up. That goes for the Q6 e-tron’s electronics, which are also a generation newer than the Q4 e-tron, and also more capable.

Audi is starting off US Q6 e-tron sales with a pair of models, the $65,800 Q6 e-tron quattro and the $72,900 SQ6 e-tron quattro. A $63,800 single-motor (not-quattro) Q6 e-tron will be available in time, with 302 hp (225 kW) and an EPA range of 321 miles (517 km), but we’ll have to wait a while before we get behind the wheel of that one.

Here’s what we learned driving Audi’s new Q6 and SQ6 electric SUVs Read More »

$2-per-megabyte:-at&t-mistakenly-charged-customer-$6,223-for-3.1gb-of-data

$2 per megabyte: AT&T mistakenly charged customer $6,223 for 3.1GB of data

An AT&T customer who switched to the company’s FirstNet service for first responders got quite the shock when his bill came in at $6,223.60, instead of the roughly $260 that his four-line plan previously cost each month.

The Texas man described his experience in a now-deleted Reddit post three days ago, saying he hadn’t been able to get the obviously incorrect bill reversed despite calling AT&T and going to an AT&T store in Dallas. The case drew plenty of attention and the bill was finally wiped out several days after the customer contacted the AT&T president’s office.

The customer said he received the billing email on December 11. An automatic payment was scheduled for December 15, but he canceled the autopay before the money was charged. The whole mess took a week to straighten out.

“I have been with AT&T for over a decade and I have always had unlimited plans so I knew this was a mistake,” he wrote. “The only change I have made to my account is last month I moved my line over to FirstNet. I am a first responder and I was told my price per month would actually go down a few dollars a month.”

“We have apologized for the inconvenience”

AT&T confirmed to Ars today that it “straightened out the customer’s bill.”

“We understand how frustrating this must have been for [the customer] and we have apologized for the inconvenience. We have resolved his concerns about his bill and are investigating to determine what caused this system error,” an AT&T spokesperson told Ars.

The customer posted screenshots of his bill, which helpfully pointed out, “Your bill increased $5,956.92” since the previous month. It included a $5.73 “discount for first responder appreciation,” but that wasn’t enough to wipe out a $6,194 line item listed as “Data Pay Per use 3,097MB at $2.00 per MB.”

$2 per megabyte: AT&T mistakenly charged customer $6,223 for 3.1GB of data Read More »

louisiana-resident-in-critical-condition-with-h5n1-bird-flu

Louisiana resident in critical condition with H5N1 bird flu

The Louisiana resident infected with H5N1 bird flu is hospitalized in critical condition and suffering from severe respiratory symptoms, the Louisiana health department revealed Wednesday.

The health department had reported the presumptive positive case on Friday and noted the person was hospitalized, as Ars reported. But a spokesperson had, at the time, declined to provide Ars with the patient’s condition or further details, citing patient confidentiality and an ongoing public health investigation.

This morning, the Centers for Disease Control and Prevention announced that it had confirmed the state’s H5N1 testing and determined that the case “marks the first instance of severe illness linked to the virus in the United States.”

In a follow-up, the health department spokesperson Emma Herrock was able to release more information about the case. In addition to being in critical condition with severe respiratory symptoms, the person is reported to be over the age of 65 and has underlying health conditions.

Further, the CDC collected partial genetic data of the H5N1 strain infecting the patient, finding it to be of D1.1. genotype, which has been detected in wild birds and some poultry in the US. Notably, it is the same genotype seen in a Canadian teenager who was also hospitalized in critical condition from the virus last month. The D1.1. genotype is not the same as the one circulating in US dairy cows, which is the B3.13 genotype.

Louisiana resident in critical condition with H5N1 bird flu Read More »

the-backbone-one-would-be-an-ideal-game-controller—if-the-iphone-had-more-games

The Backbone One would be an ideal game controller—if the iPhone had more games


It works well, but there still aren’t enough modern, console-style games.

The Backbone One attachable game controller for the iPhone.

In theory, it ought to be as good a time as ever to be a gamer on the iPhone.

Classic console emulators have rolled out to the platform for the first time, and they work great. There are strong libraries of non-skeezy mobile games on Apple Arcade and Netflix Games, streaming via Xbox and PlayStation services is continuing apace, and there are even a few AAA console games now running natively on the platform, like Assassin’s Creed and Resident Evil titles.

Some of those games need a traditional, dual-stick game controller to work well, though, and Apple bafflingly offers no first-party solution for this.

Yes, you can sync popular Bluetooth controllers from Sony, Microsoft, Nintendo, and 8bitdo with your iPhone, but that’s not really the ideal answer—your iPhone isn’t a big TV sitting across the room or a computer monitor propped up on your desk.

A few companies have jumped in to solve this with attachable controllers that give the iPhone a Nintendo Switch or Steam Deck-like form factor (albeit a lot smaller). There’s a wide range of quality, though, and some of the ones you’ll see advertised aren’t very well made.

There’s some debate out there, but there’s one that just about anyone will at least put up for consideration: the Backbone One. That’s the one I picked for my new iPhone 16 Pro Max, which I have loaded with emulators and tons of games.

Since many folks are about to get iPhone 16s for the holidays and might be in the market for something similar, I figured it’d be a good time to write some quick impressions, including pros and cons. Is this thing worth a $99 price tag? What about its subscription-based app?

Switching from the Razer Kishi

Here’s some background, real quick: I previously owned an iPhone 13 Pro, and I played a lot of Diablo Immortal. I wanted to try the controller experience with that game, so I bought a first-generation Razer Kishi—which I liked for the most part. It had excellent thumbsticks that felt similar to those you’d find on an Xbox controller, if a little bit softer.

That said, its design involved a back that loosened up and flexed to fit different kinds of phones, but I found it annoying to take on or off because it immediately crumbled into a folding mess. The big issue that made me go with something else, though, was that the controller worked with a Lightning port, and my new iPhone traded that for USB-C. That’s a good change, overall, but it did mean I had to replace some things.

The Kishi I had is now discontinued, and it’s been replaced with the Kishi V2, which looks… less appealing to me. That’s because it ditches those Xbox-like sticks for ones more similar to what you see with a Nintendo Switch. There’s less range of motion and less stability.

The Razer Kishi V2 (top) and Razer Kishi V1 (bottom). I had the V1. Credit: Ars Technica

The Backbone One has similar drawbacks, but I was drawn to the Backbone as an alternative partly because I had enough complaints about the Kishi that I wanted to roll the dice on something new. I also wanted a change because there’s a version with PlayStation button symbols—and I planned to primarily play PS1 games in an emulator as well as stream PS5 games to the device instead of a PlayStation Portal.

Solid hardware

One of my big complaints about the first-generation Kishi (the folding and flimsy back) isn’t an issue with the Backbone One. It’s right there in the name: This accessory has a sturdy plastic backbone that keeps things nice and stable.

The PlayStation version I got has face buttons and a directional pad that seem like good miniature counterparts to the buttons on Sony’s DualSense controller. The triggers and sticks offer much shallower and less precise control than the DualSense, though—they closely resemble the triggers and sticks on the Nintendo Switch Joy-Cons.

A Backbone One and a DualSense controller side-by-side

This version of the Backbone One adopts some styling from Sony’s DualSense PS5 controller. Credit: Samuel Axon

I feel that’s a big downside. It’s fine for some games, but if you’re playing any game built around quickly and accurately aiming in a 3D environment, you’ll feel the downgrade compared to a real controller.

The product feels quite sturdy to hold and use, and it doesn’t seem likely to break anytime soon. The only thing that bugs me on that front is that the placement of the USB-C port for connecting to the phone is in a place where it takes enough force to insert or remove it that I’m worried about wear and tear on the ports on either my phone or the controller. Time will tell on that front.

There’s an app, but…

The Backbone One is not just a hardware product, even though I think it’d be a perfectly good product without any kind of software or service component.

There is a Backbone app that closely resembles the PlayStation 5’s home screen interface (this is not just for the PlayStation version of the controller, to be clear). It offers a horizontally scrolling list of games from multiple sources like streaming services, mobile game subscription services, or just what’s installed on your device. It also includes voice chat, multiplayer lobbies, streaming to Twitch, and content like video highlights from games.

A screenshot showing a scrollable list of games

The Backbone One app collects games from different sources into one browsing interface. Credit: Samuel Axon

Unfortunately, all this requires a $40 annual subscription after a one-month trial. The good news is that you don’t have to pay for the Backbone One’s subscription service to use it as a controller with your games and emulators.

I don’t think anyone anywhere was asking for a subscription-based app for their mobile game controller. The fact that one is offered proves two things. First, it shows you just how niche this kind of product still is (and transitively, the current state of playing traditional, console-style games on iPhone) that the company that made it felt this was necessary to make a sufficient amount of money.

Second, it shows how much work Apple still needs to do to bake these features into the OS to make iOS/iPadOS a platform that is competitive with offerings from Sony, Microsoft, or even Nintendo in terms of appeal for core rather than casual gamers. That involves more than just porting a few AAA titles.

The state of iPhone gaming

The Backbone One is a nice piece of hardware, but many games you might be excited to play with it are better played elsewhere or with something else.

Hit games with controller support like Genshin Impact, Call of Duty Mobile, and Infinity Nikki all have excellent touch-based control schemes, making using a gamepad simply a matter of preference rather than a requirement.

While Apple is working with publishers like Capcom and Ubisoft to bring some hardcore console titles to the platform, that all still seems like just dipping toes in the water at this point, because they’re such a tiny slice of what’s on offer for PlayStation, Xbox, PC, or even Nintendo Switch players.

In theory, AAA game developers should be excited at the prospect of having iPhone players as a market—the install base of the iPhone absolutely dwarfs all home and handheld consoles combined. But they’re facing two barriers. The first is a chicken-and-egg problem: Only the most recent iPhones (iPhone 15 Pro and the iPhone 16 series) have supported those console AAA titles, and it will take a few years before most iPhone owners catch up.

A Backbone One attached to an iphone 16 Pro Max with the RetroArch main menu on its screen

Emulators like RetroArch (seen here running on an iPhone 16 Pro Max) are the main use case of the Backbone One. Credit: Samuel Axon

The second is that modern AAA games are immensely expensive to produce, and they (thankfully) don’t typically have robust enough in-game monetization paths to be distributed for free. That means that to profit and not cannibalize console and PC sales, publishers need to sell games for much higher up-front costs than mobile players are accustomed to.

So if mobile-first hardcore games are best played with touchscreens, and gamepad-first console games haven’t hit their stride on the platform yet, what’s the point of spending $100 on a Backbone One?

The answer is emulators, for both classic and homebrew games. For that, I’ve been pleased with the Backbone One. But if your goal is to play modern games, the time still hasn’t quite come.

Photo of Samuel Axon

Samuel Axon is a senior editor at Ars Technica. He covers Apple, software development, gaming, AI, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

The Backbone One would be an ideal game controller—if the iPhone had more games Read More »

call-chatgpt-from-any-phone-with-openai’s-new-1-800-voice-service

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service

On Wednesday, OpenAI launched a 1-800-CHATGPT (1-800-242-8478) telephone number that anyone in the US can call to talk to ChatGPT via voice chat for up to 15 minutes for free. The company also says that people outside the US can send text messages to the same number for free using WhatsApp.

Upon calling, users hear a voice say, “Hello again, it’s ChatGPT, an AI assistant. Our conversation may be reviewed for safety. How can I help you?” Callers can ask ChatGPT anything they would normally ask the AI assistant and have a live, interactive conversation.

During a livestream demo of “Calling with ChatGPT” during Day 10 of “12 Days of OpenAI,” OpenAI employees demonstrated several examples of the telephone-based voice chat in action, asking ChatGPT to identify a distinctive house in California and for help in translating a message into Spanish for a friend. For fun, they showed calls from an iPhone, a flip phone, and a vintage rotary phone.

OpenAI developers demonstrate calling 1-800-CHATGPT during a livestream on December 18, 2024.

OpenAI developers demonstrate calling 1-800-CHATGPT during a livestream on December 18, 2024. Credit: OpenAI

OpenAI says the new features came out of an internal OpenAI “hack week” project that a team built just a few weeks ago. The company says its goal is to make ChatGPT more accessible if someone does not have a smartphone or a computer handy.

During the livestream, an OpenAI employee mentioned that 15 minutes of voice chatting are free and that you can download the app and create an account to get more. While the audio chat version seems to be running a full version of GPT-4o on the back end, a developer during the livestream said the free WhatsApp text mode is using GPT-4o mini.

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service Read More »

the-$700-price-tag-isn’t-hurting-ps5-pro’s-early-sales

The $700 price tag isn’t hurting PS5 Pro’s early sales

When Sony revealed the PlayStation 5 Pro a few months ago, some wondered just how many people would be willing to spend $700 for a marginal upgrade to the already quite powerful graphical performance of the PS5. Now, initial sales reports suggest there’s still a substantial portion of the console market that’s willing to shell out serious money for top-of-the-line console graphics.

Circana analyst Matt Piscatella shared on Bluesky this morning that the PS5 Pro accounted for a full 19 percent of US PS5 sales in its launch month of November. That sales ratio puts initial upgrade interest in the PS5 Pro roughly in line with lifetime interest in the PS4 Pro, which recent reports suggest was responsible for about 20 percent of all PS4 sales following its launch in 2016.

That US sales ratio also lines up with international sales reports for the PS5 Pro launch. In the UK, GfK ChartTrack reports that the PS5 Pro was responsible for 26 percent of all console sales for November. And in Japan, Famitsu sales data suggests the PS5 Pro was responsible for a full 63 percent of the PS5’s November sales after selling an impressive 78,000 units in its launch week alone.

Shut up and take my money

In the US, raw unit sales for the PS5 Pro were down slightly (12 percent) compared to those for the PS4 Pro’s launch month in November 2016, Piscatella writes. But the PS5 Pro still managed to bring in 50 percent more total US revenue in its launch month, owing to the PS4 Pro’s much more reasonable $400 launch price (or $533 in 2024 dollars).

The $700 price tag isn’t hurting PS5 Pro’s early sales Read More »

openai’s-api-users-get-full-access-to-the-new-o1-model

OpenAI’s API users get full access to the new o1 model

Updates for real-time interaction and fine-tuning

Developers that make use of OpenAI’s real-time voice APIs will also get full access to WebRTC support that was announced today. This comes on top of existing support for the WebSocket audio standard and can simplify the creation of OpenAI audio interfaces for third-party applications from roughly 250 lines of code to about a dozen, according to the company.

OpenAI says it will release simple WebRTC code that can be used on a plug-and-play basis in plenty of simple devices, from toy reindeer to smart glasses and cameras that want to make use of context-aware AI assistants. To help encourage those kinds of uses, OpenAI said it was reducing the cost of o1 audio tokens for API developers by 60 percent and the cost of 4o mini tokens by a full 90 percent.

Developers interested in fine-tuning their own AI models can also make use of a new method called “direct preference optimization” for their efforts. In the current system of supervised fine tuning, model makers have to provide examples of the exact input/output pairs they want to help refine the new model. With direct preference optimization, model makers can instead simply provide two separate responses and indicate that one is preferred over the other.

OpenAI says its fine-tuning process will then optimize to learn the difference between the preferred and non-preferred answers provided, automatically detecting changes in things like verbosity, formatting and style guidelines, or the helpfulness/creativity level of responses and factoring them into the new model.

Programmers who write in Go or Java will also be able to use new SDKs for those languages to connect to the OpenAI API, OpenAI said.

OpenAI’s API users get full access to the new o1 model Read More »

nvidia-partners-leak-next-gen-rtx-50-series-gpus,-including-a-32gb-5090

Nvidia partners leak next-gen RTX 50-series GPUs, including a 32GB 5090

Rumors have suggested that Nvidia will be taking the wraps off of some next-generation RTX 50-series graphics cards at CES in January. And as we get closer to that date, Nvidia’s partners and some of the PC makers have begun to inadvertently leak details of the cards.

According to recent leaks from both Zotac and Acer, it looks like Nvidia is planning to announce four new GPUs next month, all at the high end of its lineup: The RTX 5090, RTX 5080, RTX 5070 Ti, and RTX 5070 were all briefly listed on Zotac’s website, as spotted by VideoCardz. There’s also an RTX 5090D variant for the Chinese market, which will presumably have its specs tweaked to conform with current US export restrictions on high-performance GPUs.

Though the website leak didn’t confirm many specs, it did list the RTX 5090 as including 32GB of GDDR7, an upgrade from the 4090’s 24GB of GDDR6X. An Acer spec sheet for new Predator Orion desktops also lists 32GB of GDDR7 for the 4090, as well as 16GB of GDDR7 for the RTX 5080. This is the same amount of RAM included with the RTX 4080 and 4080 Super.

The 5090 will be a big deal when it launches because no graphics card released since October 2022 has come close to beating the 4090’s performance. Nvidia’s early 2024 Super refresh for some 40-series cards didn’t include a 4090 Super, and AMD’s flagship RX 7900 XTX card is more comfortable competing with the likes of the 4080 and 4080 Super. The 5090 isn’t a card that most people are going to buy, but for the performance-obsessed, it’s the first high-end performance upgrade the GPU market has seen in more than two years.

Nvidia partners leak next-gen RTX 50-series GPUs, including a 32GB 5090 Read More »

the-second-gemini

The Second Gemini

  1. Trust the Chef.

  2. Do Not Trust the Marketing Department.

  3. Mark that Bench.

  4. Going Multimodal.

  5. The Art of Deep Research.

  6. Project Mariner the Web Agent.

  7. Project Astra the Universal Assistant.

  8. Project Jules the Code Agent.

  9. Gemini Will Aid You on Your Quest.

  10. Reactions to Gemini Flash 2.0.

Google has been cooking lately.

Gemini Flash 2.0 is the headline release, which will be the main topic today.

But there’s also Deep Research, where you can ask Gemini to take several minutes, check dozens of websites and compile a report for you. Think of it as a harder to direct, slower but vastly more robust version of Perplexity, that will improve with time and as we figure out how to use and prompt it.

NotebookLM added a call-in feature for podcasts, a Plus paid offering and a new interface that looks like a big step up.

Veo 2 is their new video generation model, and Imagen 3 is their new image model. There’s also Whisk, where you hand it a bunch of images and it combines them with some description for a new image. Superficially they all look pretty good.

They claim people in a survey chose Veo 2 generations over Sora Turbo by a wide margin, note that the edges over the other options imply Sora was sub-par:

Here’s one comparison of both handling the same prompt. Here is Veo conquering the (Will Smith eating) spaghetti monster.

This is a strong endorsement from a source I find credible:

Nearcyan: I haven’t seen a model obliterate the competition as thoroughly as Veo2 is right now since Claude 3.5.

They took the concept I was barely high-IQ enough to try to articulate and actually put it in the model and got it to work at scale.

It really was two years from StyleGAN2 to Stable Diffusion, then two years from Stable Diffusion to Veo2. They were right. Again.

I wonder when the YouTubers are going to try to revolt.

There’s a new Realtime Multimodal API, Agentic Web Browsing from Project Mariner, Jules for automated code fixing, an image model upgrade, the ability to try Project Astra.

And they’re introducing Android XR (launches in 2025) as a new operating system and platform for ‘Gemini-era’ AR or VR glasses, which they’re pitching as something you wear all day with AI being the killer app, similar to a smart watch. One detail I appreciated was seamless integration with a mouse and keyboard. All the details I saw seem… right, if they can nail the execution. The Apple Vision Pro is too cumbersome, didn’t have AI involved and didn’t work out, but Google’s vision feels like the future.

Demis Hassabis is calling 2025 ‘the year of the AI agent.’

Gemini 2.0 is broadly available, via Google Studio, Vertex and its API, and Gemini’s app.

Gemini 2.0 finally has a functional code interpreter.

For developers, they offer this cookbook.

One big thing we do not know yet is the price. You can use a free preview, and that’s it.

If you want to join all the waitlists, and why wouldn’t you, go to Google Labs.

I mean, obviously, you never want to ‘trust the marketing department.’

But in this case, I mean something else: Do not trust them to do their jobs.

Google has been bizarrely bad about explaining all of this and what it can do. I very much do not want to hear excuses about ‘the Kleenex effect’ (especially since you could also call this ‘the Google effect’) or ‘first mover advantage.’ This is full-on not telling us what they are actually offering or giving us any reasonable way to find out beyond the ‘faround’ plan.

Even when I seek out their copy, it is awful.

For example, CEO Sundar Pichai’s note at the top of their central announcement is cringeworthy corporate-speak and tells you almost nothing. Nobody wants this.

On at least some benchmarks, Gemini Flash 2.0 outperforms Gemini 1.5 Pro.

That chart only compares to Gemini Pro 1.5, and only on their selection of benchmarks. But based on other reports, it seems likely that yes, this is an overall intelligence upgrade over Pro 1.5 while also being a lot faster and cheaper.

It tops something called the ‘Hallucination leaderboard’ along with Zhipu.

Chubby: Gemini 2.0 Flash on Hallucination leaderboard

Gemini shows its strength day by day

Claude is at 4.6%, and its hallucinations don’t bother me, but I do presume this is measuring something useful.

The performance on Arena is super impressive for a model of this size and speed, and a large improvement over the old Flash, which was already doing great for its size. It’s not quite at the top, but it’s remarkably close:

Gemini 2.0 is sufficiently lightweight, fast and capable that Google says it enables real time multimodal agentic output. It can remember, analyze and respond accordingly.

It also has native tool use, which is importantly missing from o1.

Google claims all this will enable their universal AI assistant, Project Astra, to be worth using. And also Project Mariner, asking it to act as a reasoning multi-step agent on the web on your behalf across domains.

Currently Astra and Mariner, and also their coding agent Jules, are in the experimental stage. This is very good. Projects like this should absolutely have extensive experimental stages first. It is relatively fine to rush Flash 2.0 into Search, but agents require a bit more caution, if only for not-shooting-self-in-foot practical purposes.

Project Astra now is fully multilingual including within a conversation, and has 10 minutes of in-session memory plus memory of earlier conversations, and less latency.

There’s a waiting list for Astra, but right now, you can use Google Studio to click Stream Realtime on your screen, which seems to be at least close to the same thing if you do it on mobile? There’s a button to use your webcam, another to talk to it.

On a computer you can then use a voice interface, and it will analyze things in real time, including analyzing code and suggesting bug fixes.

If we can bring this together with the rest of the IDE and the abilities of a Cursor, watch out, cause it would solve some of the bottlenecks.

Deep Research is a fantastic idea.

Ethan Mollick: Google has a knack for making non-chatbot interfaces for serious work with LLMs.

When I demo them, both NotebookLM & Deep Research are instantly understandable and fill real organizational needs. They represent a tiny range of AI capability, but they are easy for everyone to get

You type in your question, you tab out somewhere else for a while, then you check back later and presto, there’s a full report based on dozens of websites. Brilliant!

It is indeed a great modality and interface. But it has to work in practice.

In practice, well, there are some problems. As always, there’s basic accuracy, such as this output – I had it flat out copy benchmark numbers wrong, claim 40.4% was a higher score than 62.1% on GPQA (rather persistently, even!) and so on.

I also didn’t feel like the ‘research plan’ corresponded that well to the results.

The bigger issue is that it will give you ‘the report you vaguely asked for.’ It doesn’t, at least in my attempts so far, do the precision thing. Ask it a particular question, get a generic vaguely related answer. And if you try to challenge its mistakes, weird mostly unhelpful things happen.

That doesn’t mean it is useless.

If what you want matches Gemini’s inclinations about what a vaguely related report would look like, you’re golden.

If what you want is a subset of that but will be included in the report, you can, as someone suggested to me on Twitter, take the report, click a button to make it a Google Doc, then feed the Google Doc to Claude (or Gemini!) and have it pick out the information you want.

These were by far the most gung-ho review I’ve seen so far:

Dean Ball: Holy hell, Gemini’s deep research is unbelievable.

I just pulled information from about 100 websites and compiled a report on natural gas generation in minutes.

Perhaps my favorite AI product launch of the last three business days.

The first questions I ask language models are *alwaysresearch questions I myself have investigated in the recent past.

Gemini’s performance on the prompt in question was about 85%, for what it’s worth, but the significant point is that no other model could have gotten close.

It wasn’t factually inaccurate about anything I saw—most of the problem was the classic llm issue of not getting at what I *actuallywanted on certain sub-parts of the inquiry.

Especially useful for me since I very often am doing 50-state surveys of things.

Sid Bharath: Gemini Deep Research is absolutely incredible. It’s like having an analyst at your fingertips, working at inhuman speeds.

Need a list of fashion bloggers and their email addresses to promote your new clothing brand brand? Deep Research crawls through hundreds of sites to pull it all together into a spreadsheet, along with a description of that website and a personalized pitch, in minutes.

Analyzing a stock or startup pitch deck? Deep Research can write up a full investment memo with competitive analysis and market sizing, along with sources, while you brew your coffee.

Whenever you need to research something, whether it’s for an essay or blog, analyzing a business, building a product, promoting your brand, creating an outreach list, Deep Research can do it at a fraction of the time you or your best analyst can.

And it’s available on the Gemini app right now. Check it out and let me know what you think.

On reflection, Dean Ball’s use cases are a great fit for Deep Research. I still don’t see how he came away so enthused.

Sid Bharath again seems like he has a good use case with generating a list of contacts. I’m a lot more suspicious about some of the other tasks here, where I’d expect to have a bigger slop problem.

You can also view DR as a kind of ‘free action.’ You get a bunch of Deep Research reports on a wide variety of subjects. The ones that don’t help, you quickly discard. So it’s fine if the hit rate is not so high.

Another potential good use is to use this as a search engine for the sources, looking either at the ones in the final data set or the list of researched websites.

It will take time to figure out the right ways to take advantage of this, and doubtless Google can improve this experience a lot if it keeps cooking.

Jon Stokes sees Deep Research as Google ‘eating its seed corn’ as in not only search but also the internet, because this is hits to websites with no potential customers.

Jon Stokes: Gemini is strip-mining the web. Not a one of the 563 websites being visited by Gemini in the above screencap is getting any benefit from this activity — in fact, they’re paying to serve this content to Google. It’s all cost, no benefit for rightsholders.

I don’t think it is true they get no benefit. I have clicked on a number of Deep Research’s sources and looked at them myself, and I doubt I am alone in this.

I encourage you to share your experiences.

Project Mariner scores a SotA 83.5% on the WebVoyager benchmark, going up to 90%+ if you give it access to tree search. They certainly are claiming it is damn impressive.

The research prototype can only use the active tab, stopping you from doing other things in the meantime. Might need multiple computers?

Here’s Olivia Moore using it to nail GeoGuessr. The example in question does seem like easy mode, there’s actual signs that give away the exact location, but very cool.

It is however still in early access, so we can’t try it out yet.

Shane Legg (Chief AGI Scientist, Google): Who’s starting to feel the AGI?

I was excited when I first saw the announcements for Project Astra, but we’re still waiting and haven’t seen much. They’re now giving us more details and claiming it has been upgraded, and is ready to go experimental. Mostly we get some early tester reports, a few minutes long each.

One tester points to the long-term memory as a key feature. That was one of the ones that made sense to me, along with translation and object identification. Some of the other ways the early testers used Astra, and their joy in some of the responses, seemed so weird to me. It’s cool that Astra can do these things, but why are these things you want Astra to be doing?

That shows how far we’ve come. I’ve stopped being impressed that it can do a thing, and started instead asking if I would want to do the thing in practice.

Astra will have at least some tool use, 10 minutes of in-context memory, a long-term memory for past conversations and real time voice interaction. The prototype glasses, they are also coming.

Here Roni Rahman goes over the low hanging fruit Astra use cases, and a similar thread from Min Choi.

My favorite use case so far is getting Gemini to watch the screen for when you slack off and yell at you to get back to work.

Jules is Google’s new code agent. Again, it isn’t available yet for us regular folk, they promise it for interested developers in early 2025.

How good is it? Impossible to know. All we know is Google’s getting into the game.

There’s also a data science agent scheduled for the first half of 2025.

With the multimodal Live API, Gemini 2.0 can be your assistant while playing games.

It understand your screen, help you strategize in games, remember tasks, and search the web for background information, all in voice mode.

An excellent question:

High Minded Lowlife: I don’t play these games so I gotta ask. Are these actually good suggestions or just generic slop answers that sound good but really aren’t. If the former then this is pretty awesome.

That’s always the question, isn’t it? Are the suggestions any good?

I notice that if Gemini could put an arrow icon or even better pathways onto the screen, it would be that much more helpful here.

So we all know what that means.

We already know that no, Gemini can’t play Magic: The Gathering yet.

What is the right way to use this new power while gaming?

When do you look at the tier list, versus very carefully not looking at the tier list?

Now more than ever, you need to cultivate the gaming experience that you want. You want a challenge that is right for you, of the type that you enjoy. Sometimes you want the joy of organic discovery and exploration, and other times you want key information in advance, especially to avoid making large mistakes.

Here Sid Bharath uses Gemini to solve the New York Times Crossword, as presumably any other LLM could as well with a slightly worse interface. But it seems like mostly you want to not do this one?

Sully is a big fan.

Sully: This is insane.

Gemini Flash 2.0 is twice as fast and significantly smarter than before.

Guys, DeepMind is cooking.

From the benchmarks, it is better than 1.5 Pro.

Mbongeni Ndlovu: I’m loving Gemini 2.0 Flash so much right now.

Its video understanding is so much better and faster than 1.5 Pro.

The real-time streaming feature is pretty wild.

Sully: Spent the day using Gemini Flash 2.0, and I’m really impressed.

Basically, it is the same as GPT-4O and slightly worse than Claude, in my opinion.

Once it is generally available, I think all our “cheap” requests will go to Flash. Getting rid of GPT Mini plus Haiku (and some GPT-4o).

Bidnu Reddy is a big fan.

Bindu Reddy: Gotta say Gemini 2.0 is a way bigger launch that whatever OpenAI has announced so far

Also love that Google made the API available for evals and experiments

Last but not the least, Gemini’s speed takes your breath away

Mostafa Dehghani notices that Gemini 2.0 can break down steps in the ‘draw the rest of the owl’ task.

What is my take so far?

Veo 2 seems great, but it’s not my area, and I notice I don’t care.

Deep Research is a great idea, and it has a place in your workflow even with all the frustrations, but it’s early days and it needs more time to cook. It’s probably a good idea to keep a few Gemini windows open for this, occasionally put in questions where it might do something interesting, and then quickly scan the results.

Gemini-1206 seems solid from what I can tell but I don’t notice any temptation to explore it more, or any use case where I expect it to be a superior tool to some combination of o1, GPT-4o with web search, Perplexity and Claude Sonnet.

Gemini Flash 2.0 seems like it is doing a remarkably good impression of models that are much larger and more expensive. I’d clearly never use it over Claude Sonnet where I had both options, but Flash opens up a bunch of new use cases, and I’m excited to see where those go.

Project Astra (or ‘streaming realtime’) in particular continues to seem fascinating, both the PC version with a shared screen and the camera version with your phone. I’m eager to put both to proper tests, even in their early forms, but have not yet found the time. Maybe I should just turn it on during my work at some point and see what happens.

Project Mariner I don’t have access to, so it’s impossible to know if it is anything yet.

For now I notice that I’m acting like most people who bounce off AI, and don’t properly explore it, and miss out. On a less dumb level, but I need to snap out of it.

The future is going to get increasingly AI, and increasingly weird. Let’s get that first uneven distribution.

Discussion about this post

The Second Gemini Read More »

ps-placeable:-the-adorable-mod-that-turns-a-playstation-portable-into-a-console

PS Placeable: The adorable mod that turns a PlayStation Portable into a console

When Sony launched the PlayStation Portable almost exactly 20 years ago, the value proposition was right there in the name: a PlayStation, but portable. But now modders have flipped that, introducing a PSP that can be played on a TV, console-style, and they’ve dubbed it the PS Placeable.

It’s a non-destructive mod to PSP-2000 and PSP-3000 systems that allows you to play PSP games on the TV off the original UMD physical media format, with a wireless controller like the PlayStation 4’s DualShock 4—all wrapped in a miniature, PlayStation 2-like enclosure.

Let’s be frank: One of the main reasons this thing gets special attention here is that its look is both clever and, well, kind of adorable. The miniaturization of the retro styling of the PlayStation 2 is a nice touch.

Of course, there have long been other ways to play some PSP games on the big screen—but there has always been one downside or another.

For example, you could connect the original PSP to a TV with convoluted cables, but you would then have to use that tethered handheld as your controller.

Much later, the PlayStation TV set-top box made by Sony itself was essentially a console-style take on the PlayStation Vita, and like the Vita, it could play numerous classic PSP games—plus, it supported wireless controllers—but it didn’t support most PSP games, and it only worked with those downloaded through Sony’s digital store.

PS Placeable: The adorable mod that turns a PlayStation Portable into a console Read More »