Harmonic is continuing to make progress toward mathematical superintelligence, and we have three updates to share today:
 A new stateoftheart of 90% on the MiniF2F benchmark
 A public release of our internal updates to MiniF2F
 An allnew naturallanguage interface
Last month we announced Aristotle our first model under development. When prompted with naturallanguage math problems, Aristotle is able to formalize the problem in Lean 4, solve the problem in a formally verified way, and output the answer in both naturallanguage and Lean. This is an important step towards our longterm goal of solving advanced problems in research mathematics and verifying all of the world's reasoning.
We plan to track our system's growing capabilities on key mathematics benchmarks. We are starting with MiniF2F, the standard formal mathematics benchmark, which separates the problem of formal theorem proving from the translation between natural language and formal math.
Let's dig in.
90% on MiniF2F
We're excited to announce that we have achieved 90% on MiniF2F, a new stateoftheart that improves on our previous result of 83% last month.
MiniF2F measures our model's core problemsolving ability when presented with formallyspecified problems. These problems are drawn from national and international highschool math competitions, as well as high school and undergraduate math classes, which lets us benchmark Aristotle against both human performance and prior research. The problems span a range of difficulty, from simple calculations to extremely challenging proofs: three of the validationset problems are drawn from the International Mathematical Olympiad (IMO), widely considered exceptionally difficult even for trained competitors. Going forward, we will continue to track progress publicly on formalonly benchmarks like MiniF2F, as well as tasks involving naturallanguage understanding.
An updated & corrected MiniF2F
To accurately assess our system's capabilities, we have made several improvements to MiniF2F:

We translated the problem statements to Lean 4, the leading system for formal mathematics today

We resplit the 488 problems uniformly randomly into a training set of 392 problems, a validation set of 48 problems, and a test set of 48 problems.

We ensured that each formal statement is associated with an accurate naturallanguage statement, allowing us to evaluate autoformalization capabilities.

We fixed many incorrect formalizations, including several theorem statements that became trivial or impossible in their original Lean encodings
Today, we are releasing
our revision of MiniF2F
.
We hope it will support future work on AI for formal mathematics using Lean 4.
Naturallanguage interface
This automatic translation of naturallanguage problems and solutions into their formal representation is known as autoformalization, and it's a key capability of our system. Autoformalization enables Aristotle to collaborate with mathematicians and educators who may not know Lean, checking their work and filling in details grounded by formal understanding. It also makes the world of existing naturallanguage math, in the form of textbooks, research papers, and blog posts, available as training data for formal reasoning.
Below is an unmodified example of Aristotle at work on question 6 from the 2001 IMO. We prompt Aristotle with the naturallanguage statement of the problem, a naturallanguage proof of the problem, as well as the formalized statement of the problem.
/
PROBLEM
Let $a, b, c, d$ be integers with $a>b>c>d>0$.
Suppose that $ac+bd=(b+d+ac)(b+da+c)$.
Prove that $ab+cd$ is not prime.
PROVIDED SOLUTION
We'll use a proof by contradiction. Assume that $ab+cd$ is a prime number.
Rearranging terms in $ac+bd=(a+bc+d)(a+b+c+d)$, we get $b^2+bd+d^2=a^2ac+c^2$.
Thus we have $(ac+bd)(b^2+bd+d^2)=ac(b^2+bd+d^2)+bd(a^2ac+c^2)=(ab+cd)(ad+bc)$.
Therefore, $ac+bd$ divides $(ab+cd)(ad+bc)$.
Since we assumed that $ab+cd$ is a prime number, either it divides $ac+bd$, or $ac+bd$ divides $ad+bc$.
Therefore, we have either $ab+cd \le ac+bd$ or $ac+bd \le ad+bc$.
Each of these cases contradicts the rearrangement inequality.
/
theorem IMO_2001_Q6 (a b c d : ℤ) (hab : a > b) (hbc : b > c) (hcd : c > d) (hd : d > 0)
(hacbd : a * c + b * d = (a + b  c + d) * (a + b + c + d)) :
¬ Prime (a * b + c * d) := by
sorry
Fig 1: Unmodified prompt given to Aristotle to autoformalize the solution to question 6 of the 2001 IMO.
In its output, Aristotle fills in the proof, guided by the naturallanguage solution in the prompt. Aristotle is able to translate each step of the naturallanguage proof to valid Lean 4, eventually arriving at a complete formallyverified proof.
theorem IMO_2001_Q6 (a b c d : ℤ) (hab : a > b) (hbc : b > c) (hcd : c > d) (hd : d > 0)
(hacbd : a * c + b * d = (a + b  c + d) * (a + b + c + d)) :
¬ Prime (a * b + c * d) := by
 We'll use a proof by contradiction. Assume that $ab+cd$ is a prime number.
intro habcd
 Rearranging terms in $ac+bd=(a+bc+d)(a+b+c+d)$, we get $b^2+bd+d^2=a^2ac+c^2$.
have hb2bda2ac : b^2 + b * d + d^2 = a^2  a * c + c^2 := by
linarith
 Thus we have $(ac+bd)(b^2+bd+d^2)=ac(b^2+bd+d^2)+bd(a^2ac+c^2)=(ab+cd)(ad+bc)$.
have multiply_out : (a * c + b * d) * (b^2 + b * d + d^2) = (a * b + c * d) * (a * d + b * c) := by
ring_nf
nlinarith [ mul_self_nonneg ( a  c ), mul_self_nonneg ( b  d ), hacbd, habcd, hab, hbc, hcd, hd, hb2bda2ac.symm, 1 < c * d, ( a  b  c  d ) / b < 1, 1 < a * d  b * c, ( ( a + b  c  d ) / d : ℤ ) < 1, ( b * d : ℤ ) > 0, b * d + c * c < b * d + a * a ] <;> ring_nf <;> try aesop <;> interval_cases a <;> any_goals norm_num at habcd ⊢
 Therefore, $ac+bd$ divides $(ab+cd)(ad+bc)$.
have divides : a * c + b * d ∣ (a * b + c * d) * (a * d + b * c) := by
exact ⟨ _, multiply_out.symm ⟩ <;> rw [ mul_comm ] <;> nlinarith [ hab, hbc, hcd, habcd, hb2bda2ac, multiply_out.symm ]
 Since we assumed that $ab+cd$ is a prime number, either it divides $ac+bd$, or $ac+bd$ divides $ad+bc$.
have case1 : a * b + c * d ∣ a * c + b * d ∨ a * c + b * d ∣ a * d + b * c := by
exact?
rcases case1 with ( h  h ) <;> have := Int.le_of_dvd ( by nlinarith ) h <;> nlinarith
Fig 2: Unmodified output from Aristotle to the prompt in figure 1.
Join us
Aristotle's improved reasoning and autoformalization capabilities represent a major step toward mathematical superintelligence that can easily converse in naturallanguage. We look forward to sharing our progress as we continue to scale/advance Aristotle's capabilities. If building the world's most advanced mathematical reasoning engine excites you,
join us or
follow us
 Tudor, Vlad, and the Harmonic Team