Realizing the Promise of Higher-Order SMT and Superposition for Interactive Verification

    European Research Council (ERC) Consolidator Grant 2022
Grant Agreement No. 101083038
July 2023 – June 2028
Principal investigator:   Jasmin Blanchette
Host institution:   Ludwig-Maximilians-Universität München, Germany


Proof assistants (also called interactive theorem provers) have a long history of being very tedious to use. The situation has improved markedly with the integration of first-order automatic theorem provers as backends. And recently, there have been exciting developments for more expressive logics, with the emergence of automatic provers based on optimized higher-order calculi. The Nekoka project's aim is to make higher-order SMT and λ-superposition a great fit for logical problems emerging from the verification of software and mathematics.

We will start by extending higher-order SMT and λ-superposition and implementing these extensions in automatic provers to provide push-button proof automation for lemmas expressed in higher-order logics. To reach end users, we will integrate the automatic provers in interactive tools: both general-purpose proof assistants and software verification platforms. As case studies, we will use our own provers and integrations to formalize quantum information theory and verify a big data framework in collaboration with domain experts. Beyond providing representative case studies, this will help build a user community around our tools and technologies.

In terms of scientific impact, our hope is that the improved higher-order SMT and λ-superposition calculi will substantially advance the art of higher-order automation and help reorient research in automated reasoning towards the needs of end users, whether computer scientists or mathematicians. Our tools will outlive the project, serving end users and continuing to be useful for future research. At the societal level, the project will herald a future in which automatic provers and proof assistants are routinely deployed in tandem to verify critical computing infrastructure and to formalize research in computer science and mathematics, thereby leading to more trustworthy software and science.

Context and state of the art (anno 2023)

Theorem proving software has been around for a long time, starting with the Logic Theorist in 1955. Automatic theorem provers and interactive proof assistants are used in industry to verify hardware (e.g., by AMD, Centaur Technologies, Intel) and software (e.g., by Airbus, Amazon, Apple, Facebook, Google, Microsoft), but compared with other branches of computer science, the impact of theorem proving on science and industry remains limited.

A few examples of software verified using theorem provers are the optimizing C compiler CompCert, the operating system kernels CertiKOS and seL4, the conference management system CoCon, and the SAT (satisfiability) solver IsaSAT. But despite having been carried out using Coq and Isabelle, two modern proof assistants, these efforts were very time consuming. For example, verifying less than 10 000 lines of code took the seL4 team around 20 person-years, and the pure mathematicians who have embraced the Lean proof assistant, notably Kevin Buzzard, report that formally verifying a single “obvious” step can take an arbitrarily long time. To most potential users, the tools are not yet cost-effective.

The current situation is unfortunate, to say the least. Writing a pen-and-paper proof is akin to developing a program as pseudocode, without using a compiler or an integrated development environment. The process is awkward and error-prone. Despite its weaknesses, existing theorem proving software can already be useful to many computer scientists, engineers, and even mathematicians. Computers outperform humans for complex logical reasoning tasks, which are crucial in many areas.

The Nekoka project's vision is to develop highly flexible interactive verification tools that incorporate very strong proof automation, broadening the range of problems for which computers can assist humans. Proof steps that are obvious to the user but currently require hours of focused labor will become largely or fully automatic. These tools will support general-purpose reasoning and combinations of proof techniques. They will revolutionize the practice of software verification and formalized mathematics.

Example 1. To show the gulf between the state of the art and the vision, we consider the CoCon conference management system, which was verified to ensure the absence of leaks. Below is the Isabelle proof of the claim that program committee members cannot review papers if they have a conflict of interest:
proof (cases a)
  case (Create c)
  show ?thesis
    using Step isRev reach.Step by simp
  case (Update u)
  show ?thesis
  proof (cases u)
    case (UpdatePref confID uID p paperID pref)
    thus ?thesis
      using Step unfolding Update UpdatePref isRev_def2
      by (blast dest: isRevNth reach.Step)
  case (NondestUpdate nu)
  show ?thesis
    using Step unfolding NondestUpdate isRev_def2
    by (meson IO.reach.Step isRevNth)
qed simp+

Isabelle proofs are hierarchical and exhibit a block structure reflected by the indentation. The parts highlighted in pink, supplied by the user, correspond to the actual proof steps; they required some thinking and interactions with the system. The rest specifies the proof outline: a three-way case analysis, whose second case is itself a case analysis. The three outer cases are labeled Create, Update, and NondestUpdate.

The proof may easily have taken 15 minutes to develop. It combines various types of automation. User productivity would skyrocket if such proofs could be entirely generated at the click of a button.

Today we see several elements of the above vision in a variety of tools. The most widely deployed theorem provers are likely SAT and SMT (satisfiability modulo theories) solvers. Their great strength is that they are reliable enough to serve as backends to dozens of software verification and testing platforms as a push-button technology. On the other hand, their logics are restrictive. Users often find it more convenient to express their problems in higher-order logics such as simple type theory and dependent type theory. HOSMT (higher-order SMT) has been proposed but is in its infancy. Next to SAT and SMT solvers, another family of successful automatic provers consists of those based on the superposition calculus, a successor of the resolution calculus.

At the other end of the automatic–interactive spectrum, we have the proof assistants, most of which are based on simple or dependent type theory. These tools offer a graphical user interface that can be used to develop large proofs interactively. Their richer logic makes it possible to use higher-order functions (e.g., map and fold on lists), big operators, set comprehensions, the big O notation, monads, and more.

SMT solvers and superposition provers are integrated as backends to proof assistants to increase the level of automation. Such integrations are called hammers. Hammers encode the problem in the automatic provers' less expressive logic and translate any proofs found in the other direction. In Example 1, the “by (meson IO...)” command near the end was generated by Sledgehammer, the hammer for Isabelle.

For a long time, the most successful hammer backends supported only first-order logic, but recently there has been a breakthrough in the development of strong higher-order automatic provers. To see this, it suffices to take a look at the results of the higher-order division of the 2020 edition of the CASC prover competition:

 Zipperposition   Satallax   Vampire   Leo-III   CVC4   Leo-II 
Solved problems (of 500) 424 323 299 287 194 111

A cell containing n tells us that the prover solved n out of 500 problems with a wall-clock time limit of 120 s per problem. Leo-II represents the state of the art five years ago. Since then, Satallax—the leader until 2019—has gained a lot of ground, and a new generation of higher-order provers has emerged: the winning Zipperposition, developed by the Matryoshka team and based on λ-superposition, and also Vampire, Leo-III, and CVC4 (now cvc5), developed by other teams. The results from CASC 2021 and 2022 only confirmed this picture. In the past five years or so, strong higher-order automation has taken the step from music of the future to reality.

This line of work has produced remarkable results, but it has also raised new scientific questions and opened up new possibilities. Reliable push-button proof automation for typical real-world applications (as opposed to the logical puzzles that are common among CASC benchmarks) now seems within reach. But to make this a reality, we must overcome the following challenges:

  • Out of tune: Higher-order automatic provers today are almost exclusively tuned against the benchmark library used for CASC. They disappoint on problems originating from hammers, which tend to be fundamentally less higher-order but involve hundreds of superfluous axioms.
  • Lost in deduction: The most successful theorem proving approaches replace heuristic deduction by focused computation wherever they can. SMT solvers with their support for decidable theories are a prime example. Nobody has yet looked into the optimization of big operators and many other practical higher-order theories.
  • No dependent types: The proof assistants Agda, Coq, and Lean are based on dependent type theory, but higher-order provers support only simple types. Dependent types are useful for m × n matrices, integers modulo n, families (Xi)i, and p-adic numbers. Today they must be translated using clumsy, inefficient encodings.
  • Opaque failures: Successful solvers typically give users some information, such as a potential countermodel, on failure. However, the current generation of higher-order provers provide no guidance when they time out.

Exciting higher-order reasoning techniques are emerging in automatic theorem provers, which could lead to push-button automation in highly efficient verification tools. More research is needed to develop the potential of these technologies and deploy them in science and industry.

Although verification of all software will never be practical, we should aim at verifying safety- and security-critical systems, starting with the libraries they build on. The high-profile bug found in Android's, Java's, and Python's sorting procedure, detected during verification, is a stark reminder of the relevance of formal methods. By relying to a greater extent on computers to develop and check proofs, practitioners and researchers can achieve much higher quality standards for software and science.

Challenges and objectives

Nowadays it is widely acknowledged that formal methods do work. Computer scientists of any persuasion understand that the seL4 kernel will not panic due to software bugs, that the CompCert compiler produces correct assembly code, and that the EverCrypt cryptography library verified using F* is mathematically correct.

SAT and SMT solvers are leading the way in industry, and superposition provers have distinguished themselves as backends to hammers in proof assistants. But users still face a choice between automation and expressiveness. In an ideal world, computers and humans would complement each other, doing what they are best at. But today's automatic and interactive theorem provers are far from this ideal.

Our grand challenge is to extend higher-order SMT and superposition to make them a perfect match for problems emerging from interactive verification. We will implement these improvements in standalone provers, integrate them in proof assistants and software verification platforms, and collaborate with early adopters to deploy our tools.

To meet this challenge, we need to build on the strengths of both automatic and interactive theorem proving. The interactive side includes general-purpose proof assistants such as Coq, Isabelle, and Lean as well as software verification platforms such as F* and the TLA* Proof System. They all provide some proof management—checking that definitions are well formed and that every stated lemma is proved. Developing a proof is much like coding a program in an integrated development environment. All interactive tools provide some automation, and when automation fails, users can specify a proof outline like in Example 1 above. For instance, if the step from lemma L to lemma N is too difficult, the user can state an intermediate lemma M and then derive N from M and M from L. Induction and case analysis must normally be applied by the user as well.

The tools on the automatic side have a very different look and feel. They are command-line programs that try to solve a self-contained input problem within a given time limit. They can give three verdicts: provable, unprovable, or unknown. For problems in first-order logic and beyond, due to the logic's undecidability, the verdict “unprovable” is very rare. Nevertheless, SMT solvers can often provide useful potential counterexamples, and the output of superposition provers could also be analyzed to provide helpful hints to the user.

The grand challenge will be met by pursuing three objectives. Our starting point is that there is a lot of scientific value in pursuing the following activities within a single team: developing higher-order proof calculi, implementing them in provers, integrating these in verification platforms, and applying the platforms to verify specific pieces of software or mathematics. By using our own tools, we can detect and fix shortcomings as they arise and implement new features that meet practical needs. It is also important that the techniques and tools we develop are adopted by others and outlive the project.

Objective 1: Extend HOSMT and λ-superposition to provide push-button automation for proof obligations in higher-order logics

Our priority will be to make higher-order SMT a working reality. The forthcoming SMT-LIB 3 standard will support HOSMT, and cvc5 already supports some HOSMT constructs but not λ-expressions. Despite its dominance as a first-order solver, cvc5 disappoints in the higher-order division of CASC. We will collaborate with the cvc5 developers to enhance their solver. Quantifier instantiation is crucial for lemmas containing universally quantified variables, but today it has not been generalized to full simple type theory. The challenge is to achieve a graceful generalization, both in theory and in practice, so that the solver performs well on higher-order formulas while remaining efficient on first-order formulas.

In contrast, superposition is known for its efficient quantifier support. To capitalize on the success of the λ-superposition calculus, we will investigate its theory, aiming to reduce its search space without compromising its ability to prove any provable problem—its refutational completeness. The focus on refutational completeness as a guide to achieve high performance will be a distinguishing feature of our work. Some researchers see pragmatism and completeness as opposites; for us, they will go hand in hand.

On a scale of expressiveness, simple type theory is located between first-order logic and dependent type theory. We will work at both ends of this scale. On the one hand, we will adapt successful SAT and first-order techniques to a higher-order setting. On the other hand, we will extend λ-superposition to dependent type theory with the axiom of choice. Both parts of this program raise their own challenges, because more expressive logics are less well behaved; when generalizing techniques, we must invariably design workarounds to compensate. In particular, for dependent type theory, we will need to find solutions to cope with dependent types, uninhabited types, universes, and universe polymorphism. To achieve refutational completeness, we will follow the stratified design that underlies λ-superposition.

Objective 2: Integrate higher-order provers in interactive verification platforms, using lightweight, trustworthy translations

Rather than being invoked from the command line, automatic provers usually serve as backends to general-purpose proof assistants or more specialized software verification platforms. To reach users, we will integrate, and help integrate, our provers in many interactive proof tools. On the proof assistant side, our targets will be Isabelle and Lean, as representatives of simple and dependent type theory. Isabelle's tool Sledgehammer already includes rudimentary support for Zipperposition, but Zipperposition has not yet been fine-tuned on Isabelle benchmarks. For Lean, work has commenced on a Lean Hammer in the Lean community. The main difficulty on the Lean side will be to design an encoding of dependent types that is sound, graceful, ideally complete, and efficient.

To make Sledgehammer and the Lean Hammer more transparent when they fail, we will have the automatic provers output proofs with holes—for example, if one of the cases of a proof by case analysis cannot be solved. In general, because of the heuristic way in which provers explore the search space, there may be many competing unfinished proof attempts. New heuristics will be needed to choose among them. Proofs with holes should work well in combination with clause splitting, a technique to partition the search space. When reconstructing proofs in Isabelle or Lean, the hammer will put in a sorry placeholder for each missing subproof.

Example 2. Consider the Isabelle proof presented in Example 1. The envisioned automation might prove all but the most difficult, second case (Update). A future Sledgehammer would then generate the proof text below:
proof (cases a)
  case (Create c)
  show ?thesis
    using Step isRev reach.Step by simp
  case (Update u)
  show ?thesis
  case (NondestUpdate nu)
  show ?thesis
    using Step unfolding NondestUpdate isRev_def2
    by (meson IO.reach.Step isRevNth)
qed simp+

The user would then remove the sorry and continue manually, possibly invoking Sledgehammer again to solve new subcases. Clearly, a partial proof such as this is much more helpful than an opaque failure.

For software verification platforms, we face a chicken-and-egg situation: The vast majority of platforms present users with first-order formalisms, because they are based on first-order SMT solvers. We will initially focus on F* and the TLA+ Proof System. F* is a dependently typed effectful functional programming language with features aimed at program verification. The TLA+ Proof System is a tool for specifying and verifying concurrent and distributed systems based on set theory.

Objective 3: Apply higher-order reasoning methods to ongoing verification efforts, in close collaboration with quantum information and big data scientists

Tools have a major role to play in stimulating new ideas for improvements and extensions, but they need users to thrive. To create an immediate impact, we will collaborate with other Isabelle and Lean users who are working on ambitious projects formalizing quantum information and a big data framework—two attractive new topics for formal verification—so that we can work with real benchmarks and increase awareness of our tools to reach more potential users. These two areas of research have received comparatively little attention from the theorem proving community and could benefit greatly from formal verification.

Quantum physics has interesting implications for information processing, including cryptography and problem solving. As a result, quantum information has become its own area of study. It is highly mathematical and as such constitutes an interesting target for proof assistants. We will build a Lean library for the mathematics of quantum information, including libraries for quantum circuits and quantum information theory. This case study will exercise our tools' support for dependent types. This will provide valuable case studies to guide our work and help build a user community around our technologies.

Big data frameworks can be used to process high volumes of data arriving at high speed. They manage deployment of parallelizable programs onto the cloud and provide fault tolerance. Yet they are themselves notoriously faulty, which is extremely worrying given that they process huge amounts of critical data in industry. As our second case study, we will use Isabelle to formally verify a big data framework inspired by Timely Dataflow.

The Nekoka Team

Funded by the project

Jasmin Blanchette

Senior collaborators

Frédéric Dupuis, Université de Montréal
Gabriel Ebner, Microsoft Research
Marijn Heule, Carnegie Mellon University
Andrew Reynolds, University of Iowa
Stephan Schulz, DHBW Stuttgart
Nikhil Swamy, Microsoft Research
Sophie Tourret, Inria Nancy
Dmitriy Traytel, Københavns Universitet


The project is thankful for the support and advice of the following friends and colleagues: Jean-Pierre Banâtre, Alexander Bentkamp, Simon Cruanes, Martin Desharnais, Pascal Fontaine, Laura Kovács, Robert Lewis, Jannis Limperg, Assia Mahboubi, Anja Palatzke, Andrei Popescu, Trenton Schulz, Rikkert Stuve, Mark Summerfield, Petar Vukmirović, Uwe Waldmann, and Aleksandar Zeljić.






    Verified given clause procedures
    Jasmin Blanchette, Qi Qiu, and Sophie Tourret. In Pientka, B., Tinelli, C. 29th International Conference on Automated Deduction (CADE-29), LNCS, Springer, 2023.
    Authors' PDF
  • Finding mathematical proofs using computers
    Alexander Bentkamp and Jasmin Blanchette. Nieuw Archief voor Wiskunde 5/24(2): 114–118, 2023.
    Publisher's PDFAuthors' PDF


Please contact Jasmin Blanchette if you have inquiries related to the project.