AlphaGeometry

google-claims-math-breakthrough-with-proof-solving-ai-models

Google claims math breakthrough with proof-solving AI models

slow and steady —

AlphaProof and AlphaGeometry 2 solve problems, with caveats on time and human assistance.

An illustration provided by Google.

Enlarge / An illustration provided by Google.

On Thursday, Google DeepMind announced that AI systems called AlphaProof and AlphaGeometry 2 reportedly solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving a score equivalent to a silver medal. The tech giant claims this marks the first time an AI has reached this level of performance in the prestigious math competition—but as usual in AI, the claims aren’t as clear-cut as they seem.

Google says AlphaProof uses reinforcement learning to prove mathematical statements in the formal language called Lean. The system trains itself by generating and verifying millions of proofs, progressively tackling more difficult problems. Meanwhile, AlphaGeometry 2 is described as an upgraded version of Google’s previous geometry-solving AI modeI, now powered by a Gemini-based language model trained on significantly more data.

According to Google, prominent mathematicians Sir Timothy Gowers and Dr. Joseph Myers scored the AI model’s solutions using official IMO rules. The company reports its combined system earned 28 out of 42 possible points, just shy of the 29-point gold medal threshold. This included a perfect score on the competition’s hardest problem, which Google claims only five human contestants solved this year.

A math contest unlike any other

The IMO, held annually since 1959, pits elite pre-college mathematicians against exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Performance on IMO problems has become a recognized benchmark for assessing an AI system’s mathematical reasoning capabilities.

Google states that AlphaProof solved two algebra problems and one number theory problem, while AlphaGeometry 2 tackled the geometry question. The AI model reportedly failed to solve the two combinatorics problems. The company claims its systems solved one problem within minutes, while others took up to three days.

Google says it first translated the IMO problems into formal mathematical language for its AI model to process. This step differs from the official competition, where human contestants work directly with the problem statements during two 4.5-hour sessions.

Google reports that before this year’s competition, AlphaGeometry 2 could solve 83 percent of historical IMO geometry problems from the past 25 years, up from its predecessor’s 53 percent success rate. The company claims the new system solved this year’s geometry problem in 19 seconds after receiving the formalized version.

Limitations

Despite Google’s claims, Sir Timothy Gowers offered a more nuanced perspective on the Google DeepMind models in a thread posted on X. While acknowledging the achievement as “well beyond what automatic theorem provers could do before,” Gowers pointed out several key qualifications.

“The main qualification is that the program needed a lot longer than the human competitors—for some of the problems over 60 hours—and of course much faster processing speed than the poor old human brain,” Gowers wrote. “If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.”

Gowers also noted that humans manually translated the problems into the formal language Lean before the AI model began its work. He emphasized that while the AI performed the core mathematical reasoning, this “autoformalization” step was done by humans.

Regarding the broader implications for mathematical research, Gowers expressed uncertainty. “Are we close to the point where mathematicians are redundant? It’s hard to say. I would guess that we’re still a breakthrough or two short of that,” he wrote. He suggested that the system’s long processing times indicate it hasn’t “solved mathematics” but acknowledged that “there is clearly something interesting going on when it operates.”

Even with these limitations, Gowers speculated that such AI systems could become valuable research tools. “So we might be close to having a program that would enable mathematicians to get answers to a wide range of questions, provided those questions weren’t too difficult—the kind of thing one can do in a couple of hours. That would be massively useful as a research tool, even if it wasn’t itself capable of solving open problems.”

Google claims math breakthrough with proof-solving AI models Read More »

deepmind-ai-rivals-the-world’s-smartest-high-schoolers-at-geometry

DeepMind AI rivals the world’s smartest high schoolers at geometry

Demis Hassabis, CEO of DeepMind Technologies and developer of AlphaGO, attends the AI Safety Summit at Bletchley Park on November 2, 2023 in Bletchley, England.

Enlarge / Demis Hassabis, CEO of DeepMind Technologies and developer of AlphaGO, attends the AI Safety Summit at Bletchley Park on November 2, 2023 in Bletchley, England.

A system developed by Google’s DeepMind has set a new record for AI performance on geometry problems. DeepMind’s AlphaGeometry managed to solve 25 of the 30 geometry problems drawn from the International Mathematical Olympiad between 2000 and 2022.

That puts the software ahead of the vast majority of young mathematicians and just shy of IMO gold medalists. DeepMind estimates that the average gold medalist would have solved 26 out of 30 problems. Many view the IMO as the world’s most prestigious math competition for high school students.

“Because language models excel at identifying general patterns and relationships in data, they can quickly predict potentially useful constructs, but often lack the ability to reason rigorously or explain their decisions,” DeepMind writes. To overcome this difficulty, DeepMind paired a language model with a more traditional symbolic deduction engine that performs algebraic and geometric reasoning.

The research was led by Trieu Trinh, a computer scientist who recently earned his PhD from New York University. He was a resident at DeepMind between 2021 and 2023.

Evan Chen, a former Olympiad gold medalist who evaluated some of AlphaGeometry’s output, praised it as “impressive because it’s both verifiable and clean.” Whereas some earlier software generated complex geometry proofs that were hard for human reviewers to understand, the output of AlphaGeometry is similar to what a human mathematician would write.

AlphaGeometry is part of DeepMind’s larger project to improve the reasoning capabilities of large language models by combining them with traditional search algorithms. DeepMind has published several papers in this area over the last year.

How AlphaGeometry works

Let’s start with a simple example shown in the AlphaGeometry paper, which was published by Nature on Wednesday:

The goal is to prove that if a triangle has two equal sides (AB and AC), then the angles opposite those sides will also be equal. We can do this by creating a new point D at the midpoint of the third side of the triangle (BC). It’s easy to show that all three sides of triangle ABD are the same length as the corresponding sides of triangle ACD. And two triangles with equal sides always have equal angles.

Geometry problems from the IMO are much more complex than this toy problem, but fundamentally, they have the same structure. They all start with a geometric figure and some facts about the figure like “side AB is the same length as side AC.” The goal is to generate a sequence of valid inferences that conclude with a given statement like “angle ABC is equal to angle BCA.”

For many years, we’ve had software that can generate lists of valid conclusions that can be drawn from a set of starting assumptions. Simple geometry problems can be solved by “brute force”: mechanically listing every possible fact that can be inferred from the given assumption, then listing every possible inference from those facts, and so on until you reach the desired conclusion.

But this kind of brute-force search isn’t feasible for an IMO-level geometry problem because the search space is too large. Not only do harder problems require longer proofs, but sophisticated proofs often require the introduction of new elements to the initial figure—as with point D in the above proof. Once you allow for these kinds of “auxiliary points,” the space of possible proofs explodes and brute-force methods become impractical.

DeepMind AI rivals the world’s smartest high schoolers at geometry Read More »