
Imagine an artificial intelligence system capable of solving the most challenging problems in undergraduate mathematics competitions—not only matching human performance but doing so with entirely different, often more elegant, approaches. This is no longer science fiction but a reality demonstrated by the team behind AxiomProver, whose AI system achieved a flawless score in the 2025 Putnam Mathematical Competition.
The Putnam Competition: Everest of Undergraduate Mathematics
The William Lowell Putnam Mathematical Competition is widely regarded as the most demanding undergraduate mathematics contest in North America. Participants must solve twelve exceptionally difficult proof-based problems within six hours, with a maximum possible score of 120 points. Historically, only the most exceptional human contestants—designated as Putnam Fellows—have approached scores above 110.
AxiomProver: The Problem-Solving Machine
AxiomProver shattered expectations by achieving a perfect score. Remarkably, the team publicly shared all Lean formalized proofs, effectively transparentizing the AI's reasoning process for academic scrutiny.
Divergent Problem-Solving Approaches
The research team categorized the competition problems based on cognitive differences between AI and human solvers:
1. Intuitive but Formally Complex: Problems requiring human insight but demanding exhaustive formalization from AI.
2. Unexpected AI Solutions: Challenges solved through unconventional approaches that surprised researchers.
3. Alternative Mathematical Pathways: Problems where AI and humans arrived at correct solutions through fundamentally different methods.
The team suggests an emerging research paradigm where "humans provide conceptual inspiration while machines handle formal verification and implementation," potentially revolutionizing mathematical discovery.
The Cost of Formalism
Problems appearing simplest to humans—particularly calculus questions (A2, B2)—proved most challenging to formalize. For instance, human solvers might intuit function behavior graphically, while Lean requires explicit declarations of domain partitions, monotonicity, inflection points, and boundary behavior.
A seemingly trivial positivity lemma (
psi_support_pos
) in problem B2 required over 60 lines of Lean code. This exemplifies formal mathematics' core requirement: while not rejecting intuition, it demands verifiable statements rather than intuitive leaps.
Combinatorial Breakthroughs
Combinatorics, traditionally considered AI's weakness, saw AxiomProver successfully solve problem A5—a complex inductive construction challenge. The human solution might occupy three paragraphs, while Lean expanded this into 2,054 lines of code over 518 minutes, meticulously verifying every edge case and implicit assumption.
Unanticipated Strategies
The AI demonstrated particular ingenuity on problems A3 (combinatorial game theory) and B1 (Euclidean geometry), employing strategies the team hadn't anticipated. Without dedicated geometric engines, AxiomProver:
• For A3, identified a concise winning strategy for the second player, avoiding complex game tree searches.
• For B1, established geometric facts like "two circles intersect at exactly two points" through pure symbolic reasoning, requiring human mathematicians to sketch diagrams to follow the logic.
This reveals fundamental differences in "difficulty perception" between humans and AI—what's challenging depends on representational and verification frameworks.
Computational Persistence
Problem A6 involved p-adic arithmetic dynamical systems with sensitive power series expansions. While human mathematicians identified solution directions, they couldn't complete derivations. AxiomProver succeeded in five hours through methodical term-by-term differentiation and convergence verification—demonstrating that for machines, correctness supersedes elegance.
Complementary Approaches
Problem A4: Humans pursued algebraic intuition (group representations, vector spaces), while AI geometrized the problem by modeling solutions as rank-one projections on unit vectors.
Problem B4: Human solvers visualized matrix mappings instantly, whereas AI decomposed the structure into 1,061 lines of Lean code, verifying each combinatorial relationship formally.
The Approaching Singularity
Fields Medalist Terence Tao observes that AI has crossed a critical threshold in mathematical reasoning. Polish mathematician Bartosz Naskręcki, testing GPT-5.2 Pro, noted AI rarely stalls on non-trivial problems, typically delivering complete solutions within one to two hours of interaction.
AxiomProver's team reflects: "Watching the system grind through competition problems in real time—especially when it solves them in ways we'd never consider—produces an indescribable thrill."
This underscores a profound realization: mathematical difficulty for machines no longer aligns with human perceptions. As human intuition and machine verification increasingly complement each other, they may collectively elevate mathematical research—like Grothendieck's "rising sea" metaphor—making hard problems tractable through strengthened foundations.
AxiomProver's Putnam triumph and GPT-5.2 Pro's advancements suggest artificial general intelligence in mathematics isn't distant—it's unfolding now. The future of mathematical discovery may well be a collaborative symphony between human and artificial minds.