DeepMind's AlphaProof AI achieves silver at math olympiad

Google DeepMind has developed AlphaProof, an AI system that matched silver medal performance at the 2024 International Mathematical Olympiad, scoring 28 out of 42 points. The system fell just one point short of gold in the world's most prestigious undergraduate math competition. This breakthrough demonstrates advances in AI's ability to handle complex mathematical proofs.

Computers have long excelled at calculations but struggled with the logical reasoning required for advanced mathematics. Google DeepMind's new AI, AlphaProof, addresses this by achieving silver medalist performance at the 2024 International Mathematical Olympiad (IMO), the top high school-level math competition. The IMO featured six problems worth seven points each, totaling 42 points. Gold medals required 29 or more points, achieved by 58 of 609 participants, while silver went to those scoring 22-28 points, with 123 recipients. AlphaProof scored 28 points, solving four problems independently for 21 points and relying on the specialized AlphaGeometry 2 for the geometry problem to reach silver level.

AlphaProof's development began with overcoming limited training data for formal math proofs. The team used the Lean software, a tool for precise mathematical definitions and verifications. To expand the dataset, they trained a Gemini large language model to translate natural language math statements into Lean, generating about 80 million formalized statements. As Thomas Hubert, DeepMind researcher and lead author, explained, “The major difficulty of working with formal languages is that there’s very little data.” Despite imperfections, “There are many ways you can capitalize on approximate translations,” he added.

The system's architecture draws from DeepMind's AlphaZero, combining a neural network trained through trial and error with a tree search algorithm to explore proof paths. It rewards correct proofs and penalizes inefficient steps, promoting elegant solutions. A novel addition, Test-Time Reinforcement Learning (TTRL), allows AlphaProof to generate variations of tough problems for on-the-fly training, mimicking human problem-solving. “We were trying to learn this game through trial and error,” Hubert said.

However, AlphaProof required human assistance to formalize problems in Lean and took several days per challenging problem, consuming hundreds of TPU-days—far beyond human time limits of 4.5 hours per session. Only six humans solved the hardest problem; AlphaProof was the seventh. DeepMind acknowledges the high computational costs but aims to optimize for broader use. “We don’t want to stop at math competitions. We want to build an AI system that could really contribute to research-level mathematics,” Hubert stated. The team plans a trusted testers program to share an AlphaProof tool with mathematicians.

The work is detailed in a 2025 Nature paper (DOI: 10.1038/s41586-025-09833-y).

Tämä verkkosivusto käyttää evästeitä

Käytämme evästeitä analyysiä varten parantaaksemme sivustoamme. Lue tietosuojakäytäntömme tietosuojakäytäntö lisätietoja varten.
Hylkää