Author name: Rejus Almole

cdc-can-no-longer-help-prevent-lead-poisoning-in-children,-state-officials-say

CDC can no longer help prevent lead poisoning in children, state officials say

Amid the brutal cuts across the federal government under the Trump administration, perhaps one of the most gutting is the loss of experts at the Centers for Disease Control and Prevention who respond to lead poisoning in children.

On April 1, the staff of the CDC’s Childhood Lead Poisoning Prevention Program was terminated as part of the agency’s reduction in force, according to NPR. The staff included epidemiologists, statisticians, and advisors who specialized in lead exposures and responses.

The cuts were immediately consequential to health officials in Milwaukee, who are currently dealing with a lead exposure crisis in public schools. Six schools have had to close, displacing 1,800 students. In April, the city requested help from the CDC’s lead experts, but the request was denied—there was no one left to help.

In a Congressional hearing this week, US health secretary and anti-vaccine advocate Robert F. Kennedy Jr. told lawmakers, “We have a team in Milwaukee.”

But Milwaukee Health Commissioner Mike Totoraitis told NPR that this is false. “There is no team in Milwaukee,” he said. “We had a single [federal] staff person come to Milwaukee for a brief period to help validate a machine, but that was separate from the formal request that we had for a small team to actually come to Milwaukee for our Milwaukee Public Schools investigation and ongoing support there.”

Kennedy has also previously told lawmakers that lead experts at the CDC who were terminated would be rehired. But that statement was also false. The health department’s own communications team told ABC that the lead experts would not be reinstated.

CDC can no longer help prevent lead poisoning in children, state officials say Read More »

faa:-airplanes-should-stay-far-away-from-spacex’s-next-starship-launch

FAA: Airplanes should stay far away from SpaceX’s next Starship launch


“The FAA is expanding the size of hazard areas both in the US and other countries.”

The Starship for SpaceX’s next test flight, known as Ship 35, on the move between the production site at Starbase (in background) and the Massey’s test facility for a static fire test. Credit: SpaceX

The Federal Aviation Administration gave the green light Thursday for SpaceX to launch the next test flight of its Starship mega-rocket as soon as next week, following two consecutive failures earlier this year.

The failures set back SpaceX’s Starship program by several months. The company aims to get the rocket’s development back on track with the upcoming launch, Starship’s ninth full-scale test flight since its debut in April 2023. Starship is central to SpaceX’s long-held ambition to send humans to Mars and is the vehicle NASA has selected to land astronauts on the Moon under the umbrella of the government’s Artemis program.

In a statement Thursday, the FAA said SpaceX is authorized to launch the next Starship test flight, known as Flight 9, after finding the company “meets all of the rigorous safety, environmental and other licensing requirements.”

SpaceX has not confirmed a target launch date for the next launch of Starship, but warning notices for pilots and mariners to steer clear of hazard areas in the Gulf of Mexico suggest the flight might happen as soon as the evening of Tuesday, May 27. The rocket will lift off from Starbase, Texas, SpaceX’s privately owned spaceport near the US-Mexico border.

This will be the third flight of SpaceX’s upgraded Block 2, or Version 2, Starship rocket. The first two flights of Starship Block 2—in January and Marchdid not go well. On both occasions, the rocket’s upper stage shut down its engines prematurely and the vehicle lost control, breaking apart in the upper atmosphere and spreading debris near the Bahamas and the Turks and Caicos Islands.

Debris from Starship falls back into the atmosphere after Starship Flight 8 in this view over Hog Cay, Bahamas. Credit: GeneDoctorB via X

Investigators determined the cause of the January failure was a series of fuel leaks and fires in the ship’s aft compartment. The leaks were most likely triggered by vibrations that were more intense than anticipated, SpaceX said before Starship’s most recent flight in March. SpaceX has not announced the cause of the March failure, although the circumstances were similar to the mishap in January.

“The FAA conducted a comprehensive safety review of the SpaceX Starship Flight 8 mishap and determined that the company has satisfactorily addressed the causes of the mishap, and therefore, the Starship vehicle can return to flight,” the agency said. “The FAA will verify SpaceX implements all corrective actions.”

Flight safety

The flight profile for the next Starship launch will largely be a repeat of what SpaceX hoped to accomplish on the ill-fated tests earlier this year. If all goes according to plan, the rocket’s upper stage, or ship, will travel halfway around the world from Starbase, reaching an altitude of more than 100 miles before reentering the atmosphere over the Indian Ocean. A little more than an hour after liftoff, the ship will aim for a controlled splashdown in the ocean northwest of Australia.

Apart from overcoming the problems that afflicted the last two launches, one of the most important objectives for this flight is to test the performance of Starship’s heat shield. Starship Block 2 includes improved heat shield materials that could do better at protecting the ship from the superheated temperatures of reentry and, ultimately, make it easier to reuse the vehicle. The problems on the last two Starship test flights prevented the rocket from reaching the point where its heat shield could be tested.

Starship Block 2 also features redesigned flaps to better control the vehicle during its descent through the atmosphere. This version of Starship also has larger propellant tanks and reconfigured fuel feed lines for the ship’s six Raptor engines.

The FAA’s approval for Starship Flight 9 comes with some stipulations. The agency is expanding the size of hazard areas in the United States and in other countries based on an updated “flight safety analysis” from SpaceX and because SpaceX will reuse a previously flown first-stage booster—called Super Heavy—for the first time.

The aircraft hazard area for Starship Flight 9 extends approximately 1,600 nautical miles to the east from Starbase, Texas. Credit: Federal Aviation Administration

This flight-safety analysis takes into account the outcomes of previous flights, including accidents, population exposure risk, the probability of vehicle failure, and debris propagation and behavior, among other considerations. “The FAA uses this and other data to determine and implement measures to mitigate public risk,” the agency said.

All of this culminated in the FAA’s “return to flight determination,” which the agency says is based on public safety. The FAA’s primary concern with commercial space activity is ensuring rocket launches don’t endanger third parties. The agency also requires that SpaceX maintain at least $500 million in liability insurance to cover claims resulting from the launch and flight of Starship Flight 9, the same requirement the FAA levied for previous Starship test flights.

For the next launch, the FAA will establish an aircraft hazard area covering approximately 1,600 nautical miles extending eastward from Starbase, Texas, and through the Straits of Florida, including the Bahamas and the Turks and Caicos Islands. This is an extension of the 885-nautical-mile hazard area the FAA established for the test flight in March. In order to minimize disruption to commercial and private air traffic, the FAA is requiring the launch window for Starship Flight 9 to be scheduled during “non-peak transit periods.”

The size of FAA-mandated airspace closures can expand or shrink based on the reliability of the launch vehicle. The failures of Starship earlier this year raised the probability of vehicle failure in the flight-safety analysis for Starship Flight 9, according to the FAA.

The expanded hazard area will force the closure of more than 70 established air routes across the Gulf of Mexico and now includes the Bahamas and the Turks and Caicos Islands. The FAA anticipates this will affect more than 175 flights, almost all of them on international connecting routes. For airline passengers traveling through this region, this will mean an average flight delay of approximately 40 minutes, and potentially up to two hours, the FAA said.

If SpaceX can reel off a series of successful Starship flights, the hazard areas will likely shrink in size. This will be important as SpaceX ramps up the Starship launch cadence. The FAA recently approved SpaceX to increase its Starship flight rate from five per year to 25 per year.

The agency said it is in “close contact and collaboration” with other nations with territory along or near Starship’s flight path, including the United Kingdom, Turks and Caicos, the Bahamas, Mexico, and Cuba.

Status report

Meanwhile, SpaceX’s hardware for Starship Flight 9 appears to be moving closer to launch. Engineers test-fired the Super Heavy booster, which SpaceX previously launched and recovered in January, last month on the launch pad in South Texas. On May 12, SpaceX fired the ship’s six Raptor engines for 60 seconds on a test stand near Starbase.

After the test-firing, ground crews rolled the ship back to the Starship production site a few miles away, only to return the vehicle to the test stand Wednesday for unspecified testing. SpaceX is expected to roll the ship back to the production site again before the end of the week.

The final steps before launch will involve separately transporting the Super Heavy booster and Starship upper stage from the production site to the launch pad. There, SpaceX will stack the ship on top of the booster. Once the two pieces are stacked together, the rocket will stand 404 feet (123.1 meters) tall.

If SpaceX moves forward with a launch attempt next Tuesday evening, the long-range outlook from the National Weather Service calls for a 30 percent chance of showers and thunderstorms.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

FAA: Airplanes should stay far away from SpaceX’s next Starship launch Read More »

rfk-jr.-calls-who-“moribund”-amid-us-withdrawal;-china-pledges-to-give-$500m

RFK Jr. calls WHO “moribund” amid US withdrawal; China pledges to give $500M

“WHO’s priorities have increasingly reflected the biases and interests of corporate medicine,” Kennedy said, alluding to his anti-vaccine and germ-theory denialist views. He chastised the health organization for allegedly capitulating to China and working with the country to “promote the fiction that COVID originated in bats.”

Kennedy ended the short speech by touting his Make America Healthy Again agenda. He also urged the WHO to undergo a radical overhaul similar to what the Trump administration is currently doing to the US government—presumably including dismantling and withholding funding from critical health agencies and programs. Last, he pitched other countries to join the US in abandoning the WHO.

“I would like to take this opportunity to invite my fellow health ministers around the world into a new era of cooperation…. we’re ready to work with you,” Kennedy said.

Meanwhile, the WHA embraced collaboration. During the assembly this week, WHO overwhelmingly voted to adopt the world’s first pandemic treaty, aimed at collectively preventing, preparing for, and responding to any future pandemics. The treaty took over three years to negotiate, but in the end, no country voted against it—124 votes in favor, 11 abstentions, and no objections. (The US, no longer being a member of WHO, did not have a vote.)

“The world is safer today thanks to the leadership, collaboration and commitment of our Member States to adopt the historic WHO Pandemic Agreement,” WHO Director-General Tedros Adhanom Ghebreyesus said. “The Agreement is a victory for public health, science and multilateral action. It will ensure we, collectively, can better protect the world from future pandemic threats. It is also a recognition by the international community that our citizens, societies and economies must not be left vulnerable to again suffer losses like those endured during COVID-19.”

RFK Jr. calls WHO “moribund” amid US withdrawal; China pledges to give $500M Read More »

gemini-2.5-is-leaving-preview-just-in-time-for-google’s-new-$250-ai-subscription

Gemini 2.5 is leaving preview just in time for Google’s new $250 AI subscription

Deep Think graphs I/O

Deep Think is more capable of complex math and coding. Credit: Ryan Whitwam

Both 2.5 models have adjustable thinking budgets when used in Vertex AI and via the API, and now the models will also include summaries of the “thinking” process for each output. This makes a little progress toward making generative AI less overwhelmingly expensive to run. Gemini 2.5 Pro will also appear in some of Google’s dev products, including Gemini Code Assist.

Gemini Live, previously known as Project Astra, started to appear on mobile devices over the last few months. Initially, you needed to have a Gemini subscription or a Pixel phone to access Gemini Live, but now it’s coming to all Android and iOS devices immediately. Google demoed a future “agentic” capability in the Gemini app that can actually control your phone, search the web for files, open apps, and make calls. It’s perhaps a little aspirational, just like the Astra demo from last year. The version of Gemini Live we got wasn’t as good, but as a glimpse of the future, it was impressive.

There are also some developments in Chrome, and you guessed it, it’s getting Gemini. It’s not dissimilar from what you get in Edge with Copilot. There’s a little Gemini icon in the corner of the browser, which you can click to access Google’s chatbot. You can ask it about the pages you’re browsing, have it summarize those pages, and ask follow-up questions.

Google AI Ultra is ultra-expensive

Since launching Gemini, Google has only had a single $20 monthly plan for AI features. That plan granted you access to the Pro models and early versions of Google’s upcoming AI. At I/O, Google is catching up to AI firms like OpenAI, which have offered sky-high AI plans. Google’s new Google AI Ultra plan will cost $250 per month, more than the $200 plan for ChatGPT Pro.

Gemini 2.5 is leaving preview just in time for Google’s new $250 AI subscription Read More »

the-codex-of-ultimate-vibing

The Codex of Ultimate Vibing

While we wait for wisdom, OpenAI releases a research preview of a new software engineering agent called Codex, because they previously released a lightweight open-source coding agent in terminal called Codex CLI and if OpenAI uses non-confusing product names it violates the nonprofit charter. The promise, also reflected in a number of rival coding agents, is to graduate from vibe coding. Why not let the AI do all the work on its own, typically for 1-30 minutes?

The answer is that it’s still early days, but already many report this is highly useful.

Sam Altman: today we are introducing codex.

it is a software engineering agent that runs in the cloud and does tasks for you, like writing a new feature of fixing a bug.

you can run many tasks in parallel.

it is amazing and exciting how much software one person is going to be able to create with tools like this. “you can just do things” is one of my favorite memes;

i didn’t think it would apply to AI itself, and its users, in such an important way so soon.

OpenAI: Today we’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel. Codex can perform tasks for you such as writing features, answering questions about your codebase, fixing bugs, and proposing pull requests for review; each task runs in its own cloud sandbox environment, preloaded with your repository.

Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering. It was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and can iteratively run tests until it receives a passing result.

Once Codex completes a task, it commits its changes in its environment. Codex provides verifiable evidence of its actions through citations of terminal logs and test outputs, allowing you to trace each step taken during task completion. You can then review the results, request further revisions, open a GitHub pull request, or directly integrate the changes into your local environment. In the product, you can configure the Codex environment to match your real development environment as closely as possible.

Codex can be guided by AGENTS.md files placed within your repository. These are text files, akin to README.md, where you can inform Codex how to navigate your codebase, which commands to run for testing, and how best to adhere to your project’s standard practices. Like human developers, Codex agents perform best when provided with configured dev environments, reliable testing setups, and clear documentation.

On coding evaluations and internal benchmarks, codex-1 shows strong performance even without AGENTS.md files or custom scaffolding.

All code is provided via GitHub repositories. All codex executions are sandboxed in the cloud. The agent cannot access external websites, APIs or other services. Afterwards you are given a comprehensive log of its actions and changes. You then choose to get the code via pull requests.

Note that while it lacks internet access during its core work, it can still install dependencies before it starts. But there are reports of struggles with its inability to install dependencies while it runs, which seems like a major issue.

Inability to access the web also makes some things trickier to diagnose, figure out or test. A lot of my frustration with AI coding is everything I want to do seems to involve interacting with persnickety websites.

This is a ‘research preview,’ and the worst Codex will ever be, although it might temporarily get less affordable once the free preview period ends. It does seem like they have given this a solid amount of thought and taken reasonable precautions.

The question is, when is this a better way to code than Cursor or Claude Code, and how does this compare to existing coding agents like Devin?

It would have been easy, given everything that happened, for OpenAI to have said ‘we do not need to give you a system card addendum, this is in preview and not a fully new model, etc.’ It is thus to their credit that they gave us the card anyway. It is short, but there is no need for it to be long.

As you would expect, the first thing that stood out was 2.3, ‘falsely claiming to have completed a task it did not complete.’ This seems to be a common pattern in similar models, including Claude 3.7.

I believe this behavior is something you want to fight hard to avoid having the AI learn in the first place. Once the AI learns to do this, it is difficult to get rid of it, but it wouldn’t learn it if you weren’t rewarding it during training. It is avoidable in theory. Is it avoidable in practice? I don’t know if the price is worthwhile, but I do know it’s worth a lot to avoid it.

OpenAI does indeed try, but with positive action rather than via negativa. Their plan is ensuring that the model is penalized for producing results inconsistent with its actions, and rewarded for acknowledging limitations. Good. That was a big help, going from 15% to 85% chance of correctly stating it couldn’t complete tasks. But 85% really isn’t 99%.

As in, I think if you include some things that push against pretending to solve problems, that helps a lot (hence the results here), but if you also have other places that pretending is rewarded, there will be a pattern, and then you still have a problem, and it will keep getting bigger. So instead, track down every damn place in which the AI could get away with claiming to have solved a task during training without having solved it, and make sure you always catch all of them. I know this is asking a lot.

They solve prompt injecting via network sandbagging. That definitely does the job for now, but also they made sure that prompt injections inside the coding environment also mostly failed. Good.

Finally we have the preparedness team affirming that the model did not reach high risk in any categories. I’d have liked to see more detail here, but overall This Is Fine.

Want to keep using the command line? OpenAI gives you codex-1, a variant of o4-mini, as an upgrade. They’re also introducing a simpler onboarding process for it and offering some free credits.

These look like a noticeable improvement over o4-mini-high and even o3-high. Codex-mini-latest will be priced at $1.50/$6 per million with a 75% prompt caching discount. They are also setting a great precedent by sharing the system message.

Greg Brockman speculates that over time the ‘local’ and ‘remote’ coding agents will merge. This makes sense. Why shouldn’t the local agent call additional remote agents to execute subtasks? Parallelism for the win. Nothing could possibly go wrong.

Immediate reaction to Codex was relatively muted. It takes a while for people to properly evaluate this kind of tool, and it is only available to those paying $200/month.

What feedback we do have is somewhat mixed. Cautious optimism, especially for what a future version could be, seems like the baseline.

Codex is the combination of an agent implementation with the underlying model. Reports seem to be consistent with the underlying model and async capabilities being excellent and those both matter a lot, but with the implementation needing work and being much less practically useful than rival agents, requiring more hand holding, having less clean UI and running slower.

That makes Codex in its current state a kind of ‘AI coding agent for advanced power users.’ You wouldn’t use the current Codex over the competition unless you understood what you were doing, and you wanted to do a lot of it.

The future of Codex looks bright. OpenAI in many senses started with ‘the hard part’ of having a great model and strong parallelism. The things still missing seem easily fixable over time.

One must also keep an eye out that OpenAI (especially via Greg Brockman) is picking out and amplifying positive feedback. It’s not yet clear how much of an upgrade this is over existing alternatives, especially as most reports don’t compare Codex to its rivals. That’s one reason I like to rely on my own Twitter reaction threads.

Then there’s Jules, Google’s coding assistant, which according to multiple sources is coming soon. Google will no doubt once again Fail Marketing Forever, but it seems highly plausible that Jules could be a better tool, and almost certain it will have a cheaper price tag.

What can it do?

Whatever those things are, it can do them fully in parallel. People seem to be underestimating this aspect of coding agents.

Alex Halliday: The killer feature of OpenAI Codex is parallelism.

Browser-based work is evolving: from humans handling tasks one tab at a time, to overseeing multiple AI agent tabs, providing feedback as needed.

The most important thing is the Task Relevant Maturity of these systems. You need to understand for which tasks systems like Codex can be used which is function of model capability and error tolerance. This is the “opportunity zone” for all AI systems, including ours @AirOpsHQ.

It can do legacy project migrations.

Flavio Adamo: I asked Codex to convert a legacy project from Python 2.7 to 3.11 and from Django 1.x to 5.0

It literally took 12 minutes. If you know, that’s usually weeks of pain. This is actually insane.

Haider: how much manual cleanup or review did it need after that initial pass?

Flavio Adamo: Not much, actually. Just a few Docker issues, solved in a couple of minutes.

Here’s Darwin Santos pumping out PRs and being very impressed.

Darwin Santos: Don’t mind us – it’s just @elvstejd and me knocking one PR after another with Codex. Thanks @embirico – @kevinweil. You weren’t joking with this being yet again a game changer.

Here’s Seconds being even more impressed, and sdmat being impressed with caveats.

0.005 Seconds: It’s incredible. The ux is mid and it’s missing features but the underlying model is so good that if you transported this to 2022 everyone would assume you have agi and put 70% of engineers into unemployment. 6 months of product engineering and it replaces teams.

It has been making insane progress in fairly complex scenarios on my personal project and I pretty effortlessly closed 7 tickets at work today. It obliterates small to medium tasks in familiar context.

Sdmat: Fantastic, though only part of what it will be and rough around the edges.

With no environment internet access, no agent search tool, and oriented to small-medium tasks it is currently a scalpel.

An excellent scalpel if you know what it is you want to cut.

Conrad Barski: this is right: it’s power is not that it can solve 50% of hard problems, it’s that it solves 99.9% of mid problems.

Sdmat: Exactly.

And mid problems comprise >90% of hard problems, so if you know what you are doing and can carve at the joints it is a very, very useful tool.

And here’s Riley Coyote being perhaps the most impressed, especially by the parallelism.

Riley Coyote: I’m *reallytrying to play it cool here but like…

I’mma just say it: Codex might be the most impressive, most *powerfulAI product I’ve ever touched. all things considered. the async ability, especially, is on another level. like it’s not just a technical ‘leap’, it’s transcendent. I’ve used basically every ai coding tool and platform out there at least once, and nothing else is in the same class. it just works, ridiculously well. and I’ll admit, I didn’t want to like it. Maybe it’s stubborn loyalty to Claude – I love that retro GUI and the no-nonsense simplicity of Claude Code. There’s still something special there and ill alway use it.

but, if I’m honest: that edge is kinda becoming irrelevant, because Codex feels like having a private, hyper-competent swarm – a crack team of 10/10 FS devs, but kinda *betteri think tbh.

it’s wild. at this rate, I might start shipping something new every single day, at least until I clear out my backlog (which, without exaggeration, is something like 35-40 ‘projects’ that are all ~70–85% done). this could not have come at a better time too. I desperately needed the combination of something like codex and much higher rate limits + a streamlined pipeline from my daily drive ai to db.

go try it out.

sidebar/tip: if you cant get over the initial hump, pop over to ai.studio.google.com and click the “build apps” button on the left hand side.

a bunch of sample apps and tools propogates and they’re actually really really really good one-click zero-shots essentially….

shits getting wild. and its only monday.

Bayram Annakov prefers Deep Research’s output for now on a sample task, but finds Codex to be promising as well, and it gets a B on an AI Product Engineer homework assignment.

Here’s Robbie Bouschery finding a bug in the first three minutes.

JB one shots a doodle jump game and gets 600k likes for the post, so clearly money well spent. Paul Couvert does the same with Gemini 2.5 although objectively the platform placement seems better in Codex’s version. Upgrade?

Reliability will always be a huge sticking point, right up until it isn’t. Being highly autonomous only matters if you can trust it.

Fleischman Mena: I’m reticent to use it on featurework: ~unchanged benchmarks & results look like o3 bolted to a SWE-bench finetune + git.

You seem to still need to baby it w/ gold-set context for decent outputs, so it’s unclear where alpha is vs. current reprompt grinds

It’s a nice “throw it in the bag, too” feature if you’re hitting GPT caps and don’t want to fan out to other services: But to me, it’s in the same category as task scheduling and the web agent: the “party trick” version of a better thing yet to come.

He points to a similar issue with Operator. I have access to Operator, but I don’t bother using it, largely because in many of the places where it is valuable it requires enough supervision I might as well do the job myself:

Henry: Does anyone use that ‘operator’ agent for anything?

Fleischman Mena: Not really.

Problem with web operators are that the REAL version of that product pretty much HAVE to be made by a sin-eater like the leetcode cheating startup.

Nobody wants “we build a web botting platform but it’s useless whenever lots of bots would have an impact.”

You pretty much HAVE to commit to “we’re going to sell you the ability to destroy the internet commons with bots”,

-or accept you’re only selling the “party trick” version of what this software would actually be if implemented “properly” for its users.

The few times I tried to use Operator to do something that would have been highly annoying to do myself, it fell down and died, and I decided that unless other people started reporting great results I’d rather just wait for similar agents to get better.

Alex Mizrahi reports Codex engaging in ‘busywork,’ identifying and fixing a ‘bug’ that wasn’t actually a bug.

Scott Swingle tries Codex out and compares it to Mentat. A theme throughout is that Mentat is more polished and faster, whereas Codex has to rerun a bunch of stuff. He likes o3 as the underlying model more than Sonnet 3.7, but finds the current implementation to not yet be up to par.

Lemonaut mostly doesn’t see the alpha over using some combination of Devin and Cursor/Cline, and finds it terribly finnicky and requiring hand holding in ways Cline and Devin aren’t, but does notice it solve a relatively difficult prompt. Again, that is compatible with o3 being a very good base model, but the implementation needing work.

People think about price all wrong.

Don’t think about relative price. Think about absolute benefits versus absolute price.

It doesn’t matter if ten times the price is ten times better. If ten times the price makes you 10% better, it’s an absolute steal.

Fleischman Mena: The sticking point is $2,160/year more than plus.

If you think Plus is a good deal at $240, the upgrade only makes sense if you GENUINELY believe

“This isn’t just better, it’s 10x better than plus, AND a better idea than subscribing to 9 other LLM pro plans.”

Seems dubious.

The $2,160 price issue is hard to ignore. that buys you ~43M o3 I/O tokens via API. War and peace is ~750k tokens. Most codebases & outputs don’t come close.

If spend’s okay, you prob do better plugging an API key into a half dozen agent competitors; you’d still come out ahead.

The dollar price, even at the $200/month a level, is chump change for a programmer, relative to a substantial productivity gain. What matters is your time and your productivity. If this improves your productivity even a few percent over rival options, and there isn’t a principal-agent problem (aka you pay the cost and someone else gets the productivity gains), then it is worthwhile. So ask whether or not it does that.

The other way this is the wrong approach is that it is only part of the $200/month package. You also get unlimited o3 and deep research use, among other products, which was previously the main attraction.

As a company, you are paying six figures for a programmer. Give them the best tools you can, whether or not this is the best tool.

This seems spot on to me:

Sully: I think agents are going to be split into 2 categories

Background & active

Background agents = stuff I don’t want to do (ux/spees doesn’t matter, but review + feedback does)

“Active agents” = things I want to do but 10x faster with agents (ux/speed matters, most apps are this)

Mat Ferrante: And I think they will be able to integrate with each other. Background leverages active one to execute quick stuff just like a user would. Active kicking off background tasks.

Sully: 100%.

Codex is currently in a weird spot. It wants to be background (or async) and is great at being async, but requires too much hand holding to let you actually ignore it for long. Once that is solved, things get a lot more interesting.

Discussion about this post

The Codex of Ultimate Vibing Read More »

biotech-company-regeneron-to-buy-bankrupt-23andme-for-$256m

Biotech company Regeneron to buy bankrupt 23andMe for $256M

Biotechnology company Regeneron will acquire 23andMe out of bankruptcy for $256 million, with a plan to keep the DNA-testing company running without interruption and uphold its privacy-protection promises.

In its announcement of the acquisition, Regeneron assured 23andMe’s 15 million customers that their data—including genetic and health information, genealogy, and other sensitive personal information—would be safe and in good hands. Regeneron aims to use the large trove of genetic data to further its own work using genetics to develop medical advances—something 23andMe tried and failed to do.

“As a world leader in human genetics, Regeneron Genetics Center is committed to and has a proven track record of safeguarding the genetic data of people across the globe, and, with their consent, using this data to pursue discoveries that benefit science and society,” Aris Baras, senior vice president and head of the Regeneron Genetics Center, said in a statement. “We assure 23andMe customers that we are committed to protecting the 23andMe dataset with our high standards of data privacy, security, and ethical oversight and will advance its full potential to improve human health.”

Baras said that Regeneron’s Genetic Center already has its own genetic dataset from nearly 3 million people.

The safety of 23andMe’s dataset has drawn considerable concern among consumers, lawmakers, and regulators amid the company’s downfall. For instance, in March, California Attorney General Rob Bonta made the unusual move to urge Californians to delete their genetic data amid 23andMe’s financial distress. Federal Trade Commission Chairman Andrew Ferguson also weighed in, making clear in a March letter that “any purchaser should expressly agree to be bound by and adhere to the terms of 23andMe’s privacy policies and applicable law.”

Biotech company Regeneron to buy bankrupt 23andMe for $256M Read More »

f1-in-imola-reminds-us-it’s-about-strategy-as-much-as-a-fast-car

F1 in Imola reminds us it’s about strategy as much as a fast car


Who went home happy from Imola and why? F1’s title race heats up.

IMOLA, ITALY - MAY 17: Charles Leclerc of Monaco driving the (16) Scuderia Ferrari SF-25 on track during during Qualifying ahead of the F1 Grand Prix of Emilia-Romagna at Autodromo Internazionale Enzo e Dino Ferrari on May 17, 2025 in Imola, Italy

In Italy there are two religions, and one of them is Ferrari. Credit: Ryan Pierse/Getty Images

In Italy there are two religions, and one of them is Ferrari. Credit: Ryan Pierse/Getty Images

Formula 1’s busy 2025 schedule saw the sport return to its European heartland this past weekend. Italy has two races on the calendar this year, and this was the first, the (deep breath) “Formula 1 AWS Gran Premio Del Made in Italy e Dell’Emilia-Romagna,” which took place at the scenic and historic (another deep breath) Autodromo Enzo e Dino Ferrari, better known as Imola. It’s another of F1’s old-school circuits where overtaking is far from easy, particularly when the grid is as closely matched as it is. But Sunday’s race was no snoozer, and for a couple of teams, there was a welcome change in form.

Red Bull was one. The team has looked a bit shambolic at times this season, with some wondering whether this change in form was the result of a number of high-profile staff departures toward the end of last season. Things looked pretty bleak during the first of three qualifying sessions, when Yuki Tsunoda got too aggressive with a curb and, rather than finding lap time, found himself in a violent crash that tore all four corners off the car and relegated him to starting the race last from the pit lane.

2025 has also been trying for Ferrari. Italy expects a lot from the red team, and the replacement of Mattia Binotto with Frédéric Vasseur as team principal was supposed to result in Maranello challenging for championships. Signing Lewis Hamilton, a bona fide superstar with seven titles already on his CV, hasn’t exactly reduced the amount of pressure on Scuderia Ferrari, either.

Frederic Vasseur, Team Principal of Scuderia Ferrari, is at the Formula 1 AWS Gran Premio del Made in Italy e dell'Emilia-Romagna 2025 in Imola, Italy, on May 17, 2025, at Autodromo Internazionale Enzo e Dino Ferrari.

Ferrari team principal Frédéric Vasseur. Credit: Alessio Morgese/NurPhoto via Getty Images

Lewis Hamilton was much closer to teammate Charles Leclerc this weekend, which will be encouraging to everyone. After Hamilton’s exclusion from the Chinese Grand Prix, he has had to run a higher ride height, which has cost him speed relative to his younger teammate. Now it looks like he’s getting a handle on the car and lost out to Leclerc by 0.06 seconds in Q1 and 0.16 seconds in Q2. Unfortunately, Leclerc’s time was only good for 11th, and Hamilton’s was only good for 12th.

Sunday brought smiles for the Red Bull and Ferrari teams. In the hands of Verstappen, the Red Bull was about as fast as the black-and-orange McLarens, and while second was the best Verstappen could do in qualifying, the gap to McLaren’s Oscar Piastri was measured in the hundredths of a second.

Verstappen’s initial start from the line looked unremarkable, too—the Mercedes of George Russell seemed more of a threat to the pole man. But Verstappen saw an opportunity and drove around the outside almost before Piastri even registered he was there, seizing the lead of the race. Once the Red Bull driver was in clean air, he was able to stretch the gap to Piastri.

IMOLA, ITALY - MAY 18: Oscar Piastri of Australia driving the (81) McLaren MCL39 Mercedes leads Max Verstappen of the Netherlands driving the (1) Oracle Red Bull Racing RB21 George Russell of Great Britain driving the (63) Mercedes AMG Petronas F1 Team W16 Lando Norris of Great Britain driving the (4) McLaren MCL39 Mercedes Fernando Alonso of Spain driving the (14) Aston Martin F1 Team AMR25 Mercedes and the rest of the field at the start during the F1 Grand Prix of Emilia-Romagna at Autodromo Internazionale Enzo e Dino Ferrari on May 18, 2025 in Imola, Italy.

Oscar Piastri is seen here in the lead, but it wouldn’t last more than a corner. Credit: Mark Thompson/Getty Images

Getting past someone is notoriously hard at Imola. In a 2005 classic, Fernando Alonso held off Michael Schumacher’s much faster car for the entire race. Even though the cars are larger and heavier now and more closely matched, overtaking was still possible, like Norris’ pass on Russell.

Undercut? Overcut?

But when overtaking is as hard as it is at a track like Imola, teams will try to use strategy to pass each other with pit stops. Each driver has to make at least one pit stop, as drivers are required to use two different tire compounds during the race. But depending on other factors, like how much the tires degrade, a team might decide to do two or even three stops—the lap time lost in the pits by stopping more often can be less than the time lost running on worn-out rubber.

In recent years, the word “undercut” has crept into F1 vocab, and no, it doesn’t refer to the hairstyles favored by the more flamboyant drivers in the paddock. To undercut a rival means to make your pit stop before them and then, on fresh tires and with a clear track ahead, set fast lap after fast lap so that when your rival makes their stop, they emerge from the pits behind you.

The undercut doesn’t always work, but in Imola, it initially looked like it did. Charles Leclerc stopped on lap 10 and leapfrogged Russell’s Mercedes, as well as his former Ferrari teammate and now Williams driver Carlos Sainz. Since Piastri wasn’t closing on Verstappen up front, McLaren decided to bring him in for an early stop.

IMOLA, ITALY - MAY 18: Race winner Max Verstappen of the Netherlands and Oracle Red Bull Racing celebrates on the podium during the F1 Grand Prix of Emilia-Romagna at Autodromo Internazionale Enzo e Dino Ferrari on May 18, 2025 in Imola, Italy.

Verstappen’s wins this season are far from inevitable. Credit: Clive Rose/Getty Images

But his advantage on new tires was not enough to eat into Verstappen’s margin, and he did not emerge in clean air but rather had to overtake car after car on track as he sought to regain his position ahead of those who hadn’t stopped. Sometimes, a strategy is the wrong one.

McLaren’s other driver, Lando Norris, couldn’t make a dent on Red Bull’s race, either. Having recognized the two-stop undercut wouldn’t work, Norris had stayed out, but he was almost 10 seconds behind Verstappen when it was finally time to change tires on lap 29. Shortly afterward, Esteban Ocon pulled his Haas to the side of the track with a powertrain failure, triggering a virtual safety car. With all the cars required to drive around at a prescribed, reduced pace, Verstappen was able to take his pit stop while only losing half as much time as anyone who stopped under green flag conditions.

Victory required a little more. Kimi Antonelli’s Mercedes also ground to a halt in a position that required a full safety car. With some on fresh rubber and others not, there were battles aplenty, but Verstappen wasn’t involved in any and won by seven seconds over Norris, with the recovering Piastri a few more seconds down the road.

Meanwhile, Hamilton had been having a pretty good Sunday of his own. Although he started 12th, he finished fourth, to the delight of the partisan, flag-waving crowd. Some of that was thanks to Leclerc coming together with the Williams of Alex Albon; after that on-track scuffle was sorted, Albon lay fifth, with Leclerc at sixth. Albon was right to feel aggrieved that he lost fourth place but equalled his best finish of the year.

IMOLA, ITALY - MAY 18: Ferrari fans wave their flags in a grandstand prior to the F1 Grand Prix of Emilia-Romagna at Autodromo Internazionale Enzo e Dino Ferrari on May 18, 2025 in Imola, Italy.

A fine fourth and a sixth were redemption for the Tifosi. Credit: Bryn Lennon – Formula 1/Formula 1 via Getty Images

Leclerc needed to cede the place to Albon, but at the same time, his complaint about the amount of rules-lawyering that now accompanies every bit of wheel-to-wheel action is getting a bit tedious. If F1 isn’t careful, the rulebook will end up being too constraining, with drivers playing to the letter even if it’s bad for the sport and the show. And sixth place was still a decent result from 11th; the championships already look out of reach for Ferrari for 2025, but at least it’s in no danger of being overtaken by Williams in the tables, even if that is a threat on track.

McLaren is already at 279 points in the constructors’ championship, 132 points ahead of next-best Mercedes, so the constructors’ cup is looking somewhat secure. Things are a lot closer in the drivers’ standings, with Piastri on 146, Norris on 133, and Verstappen still entirely in the fight with 124 points.

Next weekend, it’s time for the Monaco Grand Prix.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

F1 in Imola reminds us it’s about strategy as much as a fast car Read More »

sierra-made-the-games-of-my-childhood.-are-they-still-fun-to-play?

Sierra made the games of my childhood. Are they still fun to play?


Get ready for some nostalgia.

My Ars colleagues were kicking back at the Orbital HQ water cooler the other day, and—as gracefully aging gamers are wont to do—they began to reminisce about classic Sierra On-Line adventure games. I was a huge fan of these games in my youth, so I settled in for some hot buttered nostalgia.

Would we remember the limited-palette joys of early King’s Quest, Space Quest, or Quest for Glory titles? Would we branch out beyond games with “Quest” in their titles, seeking rarer fare like Freddy Pharkas: Frontier Pharmacist? What about the gothic stylings of The Colonel’s Bequest or the voodoo-curious Gabriel Knight?

Nope. The talk was of acorns. [Bleeping] acorns, in fact!

The scene in question came from King’s Quest III, where our hero Gwydion must acquire some exceptionally desiccated acorns to advance the plot. It sounds simple enough. As one walkthrough puts it, “Go east one screen and north one screen to the acorn tree. Try picking up acorns until you get some dry ones. Try various spots underneath the tree.” Easy! And clear!

Except it wasn’t either one because the game rather notoriously won’t always give you the acorns, even when you enter the right command. This led many gamers to believe they were in the wrong spot, when in reality, they just had to keep entering the “get acorns” command while moving pixel by pixel around the tree until the game finally supplied them. One of our staffers admitted to having purchased the King’s Quest III hint book solely because of this “puzzle.” (The hint book, which is now online, says that players should “move around” the particular oak tree in question because “you can only find the right kind of acorns in one spot.”)

This wasn’t quite the “fun” I had remembered from these games, but as I cast my mind back, I dimly began to recall similar situations. Space Quest II: Vohaul’s Revenge had been my first Sierra title, and after my brother and I spent weeks on the game only to get stuck and die repeatedly in some pitch-dark tunnels, we implored my dad to call Sierra’s 1-900 pay hint line. He thought about it. I could see it pained him because he had never before (and never since!) called a 1-900 number in his life. In this case, the call cost a piratical 75 cents for the first minute and 50 cents for each additional minute. But after listening to us whine for several days straight, my dad decided that his sanity was worth the fee, and he called.

Much like with the acorn example above, we had known what to do—we had just not done it to the game’s rather exacting and sometimes obscure standards. The key was to use a glowing gem as a light source, which my brother and I had long understood. The problem was the text parser, which demanded that we “put gem in mouth” to use its light in the tunnels. There was no other place to put the gem, no other way to hold or attach it. (We tried them all.) No other attempts to use the light of this shining crystal, no matter how clear, well-intentioned, or succinctly expressed, would work. You put the gem in your mouth, or you died in the darkness.

Returning from my reveries to the conversation at hand, I caught Ars Senior Editor Lee Hutchinson’s cynical remark that these kinds of puzzles were “the only way to make 2–3 hours of ‘game’ last for months.” This seemed rather shocking, almost offensive. How could one say such a thing about the games that colored my memories of childhood?

So I decided to replay Space Quest II for the first time in 35 years in an attempt to defend my own past.

Big mistake.

Space Quest II screenshot.

We’re not on Endor anymore, Dorothy.

Play it again, Sam

In my memory, the Space Quest series was filled with sharply written humor, clever puzzles, and enchanting art. But when I fired up the original version of the game, I found that only one of these was true. The art, despite its blockiness and limited colors, remained charming.

As for the gameplay, the puzzles were not so much “clever” as “infuriating,” “obvious,” or (more often) “rather obscure.”

Finding the glowing gem discussed above requires you to swim into one small spot of a multi-screen river, with no indication in advance that anything of importance is in that exact location. Trying to “call” a hunter who has captured you does nothing… until you do it a second time. And the less said about trying to throw a puzzle at a Labian Terror Beast, typing out various word permutations while death bears down upon you, the better.

The whole game was also filled with far more no-warning insta-deaths than I had remembered. On the opening screen, for instance, after your janitorial space-broom floats off into the cosmic ether, you can walk your character right off the edge of the orbital space station he is cleaning. The game doesn’t stop you; indeed, it kills you and then mocks you for “an obvious lack of common sense.” It then calls you a “wing nut” with an “inability to sustain life.” Game over.

The game’s third screen, which features nothing more to do than simply walking around, will also kill you in at least two different ways. Walk into the room still wearing your spacesuit and your boss will come over and chew you out. Game over.

If you manage to avoid that fate by changing into your indoor uniform first, it’s comically easy to tap the wrong arrow key and fall off the room’s completely guardrail-free elevator platform. Game over.

Space Quest II screenshot.

Do NOT touch any part of this root monster.

Get used to it because the game will kill you in so, so many ways: touching any single pixel of a root monster whose branches form a difficult maze; walking into a giant mushroom; stepping over an invisible pit in the ground; getting shot by a guard who zips in on a hovercraft; drowning in an underwater tunnel; getting swiped at by some kind of giant ape; not putting the glowing gem in your mouth; falling into acid; and many more.

I used the word “insta-death” above, but the game is not even content with this. At one key point late in the game, a giant Aliens-style alien stalks the hallways, and if she finds you, she “kisses” you. But then she leaves! You are safe after all! Of course, if you have seen the films, you will recognize that you are not safe, but the game lets you go on for a bit before the alien’s baby inevitably bursts from your chest, killing you. Game over.

This is why the official hint book suggests that you “save your game a lot, especially when it seems that you’re entering a dangerous area. That way, if you die, you don’t have to retrace your steps much.” Presumably, this was once considered entertaining.

When it comes to the humor, most of it is broad. (When you are told to “say the word,” you have to say “the word.”) Sometimes it is condescending. (“You quickly glance around the room to see if anyone saw you blow it.”) Or it might just be potty jokes. (Plungers, jock straps, toilet paper, alien bathrooms, and fouling one’s trousers all make appearances.)

My total gameplay time: a few hours.

“By Grabthar’s hammer!” I thought. “Lee was right!”

When I admitted this to him, Lee told me that he had actually spent time learning to speedrun the Space Quest games during the pandemic. “According to my notes, a clean run of SQ2 in ‘fast’ mode—assuming good typing skills—takes about 20 minutes straight-up,” he said. Yikes.

Space Quest II screenshot.

What a fiendish plot!

And yet

The past was a different time. Computer memory was small, graphics capabilities were low, and computer games had emerged from the “let them live just long enough to encourage spending another quarter” arcade model. Mouse adoption took a while; text parsers made sense even though they created plenty of frustration. So yes—some of these games were a few hours of gameplay stretched out with insta-death, obscure puzzles, and the sheer amount of time it took just to walk across the game’s various screens. (Seriously, “walking around” took a ridiculous amount of the game’s playtime, especially when a puzzle made you backtrack three screens, type some command, and then return.)

Space Quest II screenshot.

Let’s get off this rock.

Judged by current standards, the Sierra games are no longer what I would play for fun.

All the same, I loved them. They introduced me to the joy of exploring virtual worlds and to the power of evocative artwork. I went into space, into fairy tales, and into the past, and I did so while finding the games’ humor humorous and their plotlines compelling. (“An army of life insurance salesmen?” I thought at the time. “Hilarious and brilliant!”)

If the games can feel a bit arbitrary or vexing today, my child-self’s love of repetition was able to treat them as engaging challenges rather than “unfair” design.

Replaying Space Quest II, encountering the half-remembered jokes and visual designs, brought back these memories. The novelist Thomas Wolfe knew that you can’t go home again, and it was probably inevitable that the game would feel dated to me now. But playing it again did take me back to that time before the Internet, when not even hint lines, insta-death, and EGA graphics could dampen the wonder of the new worlds computers were capable of showing us.

Space Quest II screenshot.

Literal bathroom humor.

Space Quest II, along with several other Sierra titles, is freely and legally available online at sarien.net—though I found many, many glitches in the implementation. Windows users can buy the entire Space Quest collection through Steam or Good Old Games. There’s even a fan remake that runs on macOS, Windows, and Linux.

Photo of Nate Anderson

Sierra made the games of my childhood. Are they still fun to play? Read More »

after-latest-kidnap-attempt,-crypto-types-tell-crime-bosses:-transfers-are-traceable

After latest kidnap attempt, crypto types tell crime bosses: Transfers are traceable

The sudden spike in copycat attacks in France, Belgium, and Spain over the last few months suggests that crypto robbery as a tactic has caught the attention of organized crime. (This week’s abduction attempt is already being investigated by the organized crime unit of the Parisian police.)

Crypto industry insiders seem convinced that organized crime likes these attacks because of a (mistaken) belief that crypto transfers are untraceable. So people like Chainalysis CEO Jonathan Levin are trying to clue in the crime bosses.

“For whatever reason, there is a perception that’s out there that crypto is an asset that is untraceable, and that really lends itself to criminals acting in a certain way,” Levin said at a recent conference covered by the trade publication Cointelegraph.

“Apparently, the [knowledge] that crypto is not untraceable hasn’t been received by some of the organized crime groups that are actually perpetrating these attacks, and some of them are concentrated in, you know, France, but not exclusively.”

After latest kidnap attempt, crypto types tell crime bosses: Transfers are traceable Read More »

apple’s-new-carplay-ultra-is-ready,-but-only-in-aston-martins-for-now

Apple’s new CarPlay Ultra is ready, but only in Aston Martins for now

It’s a few years later than we were promised, but an advanced new version of Apple CarPlay is finally here. CarPlay is Apple’s way of casting a phone’s video and audio to a car’s infotainment system, but with CarPlay Ultra it gets a big upgrade. Now, in addition to displaying compatible iPhone apps on the car’s center infotainment screen, CarPlay Ultra will also take over the main instrument panel in front of the driver, replacing the OEM-designed dials like the speedometer and tachometer with a number of different Apple designs instead.

“iPhone users love CarPlay and it has changed the way people interact with their vehicles. With CarPlay Ultra, together with automakers we are reimagining the in-car experience and making it even more unified and consistent,” said Bob Borchers, vice president of worldwide marketing at Apple.

However, to misquote William Gibson, CarPlay Ultra is unevenly distributed. In fact, if you want it today, you’re going to have to head over to the nearest Aston Martin dealership. Because to begin with, it’s only rolling out in North America with Aston Martin, inside the DBX SUV, as well as the DB12, Vantage, and Vanquish sports cars. It’s standard on all new orders, the automaker says, and will be available as a dealer-performed update for existing Aston Martins with the company’s in-house 10.25-inch infotainment system in the coming weeks.

“The next generation of CarPlay gives drivers a smarter, safer way to use their iPhone in the car, deeply integrating with the vehicle while maintaining the very best of the automaker. We are thrilled to begin rolling out CarPlay Ultra with Aston Martin, with more manufacturers to come,” Borchers said.

Apple’s new CarPlay Ultra is ready, but only in Aston Martins for now Read More »

xai’s-grok-suddenly-can’t-stop-bringing-up-“white-genocide”-in-south-africa

xAI’s Grok suddenly can’t stop bringing up “white genocide” in South Africa

Where could Grok have gotten these ideas?

The treatment of white farmers in South Africa has been a hobbyhorse of South African X owner Elon Musk for quite a while. In 2023, he responded to a video purportedly showing crowds chanting “kill the Boer, kill the White Farmer” with a post alleging South African President Cyril Ramaphosa of remaining silent while people “openly [push] for genocide of white people in South Africa.” Musk was posting other responses focusing on the issue as recently as Wednesday.

They are openly pushing for genocide of white people in South Africa. @CyrilRamaphosa, why do you say nothing?

— gorklon rust (@elonmusk) July 31, 2023

President Trump has long shown an interest in this issue as well, saying in 2018 that he was directing then Secretary of State Mike Pompeo to “closely study the South Africa land and farm seizures and expropriations and the large scale killing of farmers.” More recently, Trump granted “refugee” status to dozens of white Afrikaners, even as his administration ends protections for refugees from other countries

Former American Ambassador to South Africa and Democratic politician Patrick Gaspard posted in 2018 that the idea of large-scale killings of white South African farmers is a “disproven racial myth.”

In launching the Grok 3 model in February, Musk said it was a “maximally truth-seeking AI, even if that truth is sometimes at odds with what is politically correct.” X’s “About Grok” page says that the model is undergoing constant improvement to “ensure Grok remains politically unbiased and provides balanced answers.”

But the recent turn toward unprompted discussions of alleged South African “genocide” has many questioning what kind of explicit adjustments Grok’s political opinions may be getting from human tinkering behind the curtain. “The algorithms for Musk products have been politically tampered with nearly beyond recognition,” journalist Seth Abramson wrote in one representative skeptical post. “They tweaked a dial on the sentence imitator machine and now everything is about white South Africans,” a user with the handle Guybrush Threepwood glibly theorized.

Representatives from xAI were not immediately available to respond to a request for comment from Ars Technica.

xAI’s Grok suddenly can’t stop bringing up “white genocide” in South Africa Read More »

google-deepmind-creates-super-advanced-ai-that-can-invent-new-algorithms

Google DeepMind creates super-advanced AI that can invent new algorithms

Google’s DeepMind research division claims its newest AI agent marks a significant step toward using the technology to tackle big problems in math and science. The system, known as AlphaEvolve, is based on the company’s Gemini large language models (LLMs), with the addition of an “evolutionary” approach that evaluates and improves algorithms across a range of use cases.

AlphaEvolve is essentially an AI coding agent, but it goes deeper than a standard Gemini chatbot. When you talk to Gemini, there is always a risk of hallucination, where the AI makes up details due to the non-deterministic nature of the underlying technology. AlphaEvolve uses an interesting approach to increase its accuracy when handling complex algorithmic problems.

According to DeepMind, this AI uses an automatic evaluation system. When a researcher interacts with AlphaEvolve, they input a problem along with possible solutions and avenues to explore. The model generates multiple possible solutions, using the efficient Gemini Flash and the more detail-oriented Gemini Pro, and then each solution is analyzed by the evaluator. An evolutionary framework allows AlphaEvolve to focus on the best solution and improve upon it.

Credit: Google DeepMind

Many of the company’s past AI systems, for example, the protein-folding AlphaFold, were trained extensively on a single domain of knowledge. AlphaEvolve, however, is more dynamic. DeepMind says AlphaEvolve is a general-purpose AI that can aid research in any programming or algorithmic problem. And Google has already started to deploy it across its sprawling business with positive results.

Google DeepMind creates super-advanced AI that can invent new algorithms Read More »