Harmonic
Harmonic Closes Series A Funding Round

About a year ago, we embarked on a big mission: to create mathematical superintelligence (MSI). MSI is artificial intelligence with mathematical capabilities superior to that of humans, and we believe its development will dramatically accelerate progress in quantitative domains including science and engineering.

We're excited to announce that we've recently closed our $75 million Series A funding round led by Sequoia Capital, with significant participation by Index Ventures, and additional backing from GreatPoint Ventures, Jasper Lau's Era Funds, Blossom Capital, DST Global partners, Elefund, Nikesh Arora, and Jared Leto.

We wish to extend a big welcome to Sequoia partner Andrew Reed who will join our founders Tudor Achim and Vlad Tenev on the Board of Directors, along with Index Ventures partner Jan Hammer, who will join as an observer.

This funding provides the resources to rapidly accelerate the development of MSI. In particular, we are hiring amazing talent across research, engineering, and mathematics. If this important mission resonates with you, please see our open roles.

One month in - A new SOTA on MiniF2F and more

Harmonic is continuing to make progress toward mathematical superintelligence, and we have three updates to share today:

Last month we announced Aristotle our first model under development. When prompted with natural-language math problems, Aristotle is able to formalize the problem in Lean 4, solve the problem in a formally verified way, and output the answer in both natural-language and Lean. This is an important step towards our long-term goal of solving advanced problems in research mathematics and verifying all of the world's reasoning.

We plan to track our system's growing capabilities on key mathematics benchmarks. We are starting with MiniF2F, the standard formal mathematics benchmark, which separates the problem of formal theorem proving from the translation between natural language and formal math.

Let's dig in.

▶ 90% on MiniF2F

We're excited to announce that we have achieved 90% on MiniF2F, a new state-of-the-art that improves on our previous result of 83% last month. 1 2

Harmonic MiniF2F Score: 90%

MiniF2F measures our model's core problem-solving ability when presented with formally-specified problems. These problems are drawn from national and international high-school math competitions, as well as high school and undergraduate math classes, which lets us benchmark Aristotle against both human performance and prior research. The problems span a range of difficulty, from simple calculations to extremely challenging proofs: three of the validation-set problems are drawn from the International Mathematical Olympiad (IMO), widely considered exceptionally difficult even for trained competitors. Going forward, we will continue to track progress publicly on formal-only benchmarks like MiniF2F, as well as tasks involving natural-language understanding.

▶ An updated & corrected MiniF2F

To accurately assess our system's capabilities, we have made several improvements to MiniF2F:

  1. We translated the problem statements to Lean 4, the leading system for formal mathematics today 3
  2. We re-split the 488 problems uniformly randomly into a training set of 392 problems, a validation set of 48 problems, and a test set of 48 problems. 4
  3. We ensured that each formal statement is associated with an accurate natural-language statement, allowing us to evaluate autoformalization capabilities.
  4. We fixed many incorrect formalizations, including several theorem statements that became trivial or impossible in their original Lean encodings 5

Today, we are releasing our revision of MiniF2F . We hope it will support future work on AI for formal mathematics using Lean 4.

▶ Natural-language interface

This automatic translation of natural-language problems and solutions into their formal representation is known as autoformalization, and it's a key capability of our system. Autoformalization enables Aristotle to collaborate with mathematicians and educators who may not know Lean, checking their work and filling in details grounded by formal understanding. It also makes the world of existing natural-language math, in the form of textbooks, research papers, and blog posts, available as training data for formal reasoning.

Below is an unmodified example of Aristotle at work on question 6 from the 2001 IMO. We prompt Aristotle with the natural-language statement of the problem, a natural-language proof of the problem, as well as the formalized statement of the problem.

            
  /-
    PROBLEM
    Let $a, b, c, d$ be integers with $a>b>c>d>0$. 
    Suppose that $ac+bd=(b+d+a-c)(b+d-a+c)$. 
    Prove that $ab+cd$ is not prime.

    PROVIDED SOLUTION
    We'll use a proof by contradiction. Assume that $ab+cd$ is a prime number.
    Rearranging terms in $ac+bd=(a+b-c+d)(-a+b+c+d)$, we get $b^2+bd+d^2=a^2-ac+c^2$.
    Thus we have $(ac+bd)(b^2+bd+d^2)=ac(b^2+bd+d^2)+bd(a^2-ac+c^2)=(ab+cd)(ad+bc)$.
    Therefore, $ac+bd$ divides $(ab+cd)(ad+bc)$.
    Since we assumed that $ab+cd$ is a prime number, either it divides $ac+bd$, or $ac+bd$ divides $ad+bc$.
    Therefore, we have either $ab+cd \le ac+bd$ or $ac+bd \le ad+bc$.
    Each of these cases contradicts the rearrangement inequality.
    -/
    theorem IMO_2001_Q6 (a b c d : ℤ) (hab : a > b) (hbc : b > c) (hcd : c > d) (hd : d > 0)
      (hacbd : a * c + b * d = (a + b - c + d) * (-a + b + c + d)) :
      ¬ Prime (a * b + c * d) := by
        sorry
            
          

Fig 1: Unmodified prompt given to Aristotle to autoformalize the solution to question 6 of the 2001 IMO.

In its output, Aristotle fills in the proof, guided by the natural-language solution in the prompt. Aristotle is able to translate each step of the natural-language proof to valid Lean 4, eventually arriving at a complete formally-verified proof.

            
  theorem IMO_2001_Q6 (a b c d : ℤ) (hab : a > b) (hbc : b > c) (hcd : c > d) (hd : d > 0)
  (hacbd : a * c + b * d = (a + b - c + d) * (-a + b + c + d)) :
  ¬ Prime (a * b + c * d) := by
  -- We'll use a proof by contradiction. Assume that $ab+cd$ is a prime number.
  intro habcd

  -- Rearranging terms in $ac+bd=(a+b-c+d)(-a+b+c+d)$, we get $b^2+bd+d^2=a^2-ac+c^2$.
  have hb2bda2ac : b^2 + b * d + d^2 = a^2 - a * c + c^2 := by 
    linarith

  -- Thus we have $(ac+bd)(b^2+bd+d^2)=ac(b^2+bd+d^2)+bd(a^2-ac+c^2)=(ab+cd)(ad+bc)$.
  have multiply_out : (a * c + b * d) * (b^2 + b * d + d^2) = (a * b + c * d) * (a * d + b * c) := by 
    ring_nf
    nlinarith [ mul_self_nonneg ( a - c ), mul_self_nonneg ( b - d ), hacbd, habcd, hab, hbc, hcd, hd, hb2bda2ac.symm, -1 < c * d, ( a - b - c - d ) / b < 1, -1 < a * d - b * c, ( ( -a + b - c - d ) / d : ℤ ) < 1, ( b * d : ℤ ) > 0, b * d + c * c < b * d + a * a ] <;> ring_nf <;> try aesop <;> interval_cases a <;> any_goals norm_num at habcd ⊢

  -- Therefore, $ac+bd$ divides $(ab+cd)(ad+bc)$.
  have divides : a * c + b * d ∣ (a * b + c * d) * (a * d + b * c) := by 
    exact ⟨ _, multiply_out.symm ⟩ <;> rw [ mul_comm ] <;> nlinarith [ hab, hbc, hcd, habcd, hb2bda2ac, multiply_out.symm ]

  -- Since we assumed that $ab+cd$ is a prime number, either it divides $ac+bd$, or $ac+bd$ divides $ad+bc$.
  have case1 : a * b + c * d ∣ a * c + b * d ∨ a * c + b * d ∣ a * d + b * c := by 
    exact?

  rcases case1 with ( h | h ) <;> have := Int.le_of_dvd ( by nlinarith ) h <;> nlinarith
            
          

Fig 2: Unmodified output from Aristotle to the prompt in figure 1.

▶ Join us

Aristotle's improved reasoning and autoformalization capabilities represent a major step toward mathematical superintelligence that can easily converse in natural-language. We look forward to sharing our progress as we continue to scale/advance Aristotle's capabilities. If building the world's most advanced mathematical reasoning engine excites you, join us or follow us

- Tudor, Vlad, and the Harmonic Team

1 We evaluate Aristotle in a setting where additional external computer algebra systems are permitted to solve simple subproblems in a trusted way. Our Lean results are not based on expert iteration on the validation set.

2 Deepseek results: https://arxiv.org/html/2405.14333v1 and LEGO-prover results: https://arxiv.org/abs/2310.00656

3 We began with formal statements released by the Hoskinson Center , themselves derived from Facebook Research's fork which fixed several issues with the original benchmark .

4 The original dataset specified 244 validation and 244 test problems, without a training set. Prior research on MiniF2F typically involved training on the validation set, but we wanted to evaluate our model's ability to solve unseen problems.

5 We believe our formalizations accurately represent the original, natural-language statement. We welcome any corrections, contributions, or feedback from the community.

Introducing Harmonic: Our Mission and First Results

Harmonic is building mathematical superintelligence.

When we want to know if an answer to a question is correct, we check the reasoning - the logical steps behind the answer. In order for artificial intelligence systems to be truthful, explainable, and aligned with us, we must endow these systems with powerful and verifiable reasoning capabilities.

The language of reasoning is mathematics, and mathematics is the means through which humans have discovered the fundamental truths about our universe. Mathematical superintelligence will greatly accelerate human advancement in science and engineering.

▶ AI today

Modern AI systems based on large language models appear to understand and use language in a way that emulates humans, and we have seen new and exciting capabilities emerge as these models have grown in scale. However, today's models fall short of human-level reasoning; hallucinations are common and models are often as confident about wrong answers as they are about right ones. They can fail in unpredictable ways, and the consequences of such failures become increasingly significant as their usage grows. In their current state, they are unusable in most safety-critical applications.

▶ The promise of mathematical reasoning

Models capable of formal mathematical reasoning will produce output guaranteed to be correct, with an interpretable chain of reasoning. We believe such models, with transparent and automatically-verifiable reasoning traces, will be fundamentally safe in ways that the current generation of models are not.

This approach will be immediately applicable to critical industries such as aerospace, chip design, industrial systems, and healthcare, where software reliability is paramount. Moreover, the development of such models will push the boundaries of AI research, ultimately driving the creation of more powerful and reliable AI systems across different domains.

▶ Where we are today

Our first research result is Aristotle: an automated theorem prover advancing the state of the art on MiniF2F. This standard benchmark measures problem-solving ability on a range of problem difficulties including the International Mathematical Olympiad. To evaluate our system, we manually reformalized and improved1 MiniF2F in Lean 4. To obtain a training set, we re-split the 488 MiniF2F problems (originally evenly divided into validation and test sets) randomly into 392 training, 48 validation, and 48 test problems. 2

We evaluate Aristotle in two settings: one where additional external computer algebra systems are permitted to solve simple subproblems in a trusted way, and one where the full proofs are expressed in Lean. Our system currently achieves a 83% success rate in the first setting and a 63% success rate when restricted to Lean. We compare our results on the validation split to two previous state of the art approaches below: 3 4 5

Harmonic MiniF2F Score: 83%

▶ Join us

Our journey to build mathematical superintelligence is just beginning. We are a commercial research lab, and our primary focus is to enable our team of talented researchers, mathematicians, and engineers to build the world's most advanced mathematical reasoning engine.

If our mission resonates with you, join us or follow us

- Tudor, Vlad, and the Harmonic Team

1 We found that many statements' formalizations in MiniF2F were much easier than the original problem, e.g. only containing the easier direction of a biconditional statement. We worked with mathematicians and olympiad competitors to ensure our version of MiniF2F represents each problem fully.

2 We plan to release our corrected datasets in the future. The new test set and val set are an unbiased random subset of the original test set. The train set is the remaining problems from the original benchmark.

3 Our Lean results are not based on expert iteration on the validation set and are not cumulative.

4 https://arxiv.org/html/2405.14333v1

5 https://arxiv.org/abs/2310.00656