Context and state of the art (anno 2023)
Theorem proving software has been around for a long time, starting with the
Logic Theorist in 1955. Automatic theorem provers and interactive proof
assistants are used in industry to verify hardware (e.g., by AMD, Centaur
Technologies, Intel) and software (e.g., by Airbus, Amazon, Apple, Facebook,
Google, Microsoft), but compared with other branches of computer science, the
impact of theorem proving on science and industry remains limited.
A few examples of software verified using theorem provers are the
optimizing C compiler CompCert, the operating system
kernels CertiKOS and seL4, the
conference management system CoCon, and the SAT
(satisfiability) solver IsaSAT. But despite having been
carried out using Coq and Isabelle, two modern proof assistants, these efforts
were very time consuming. For example, verifying less than 10 000 lines of
code took the seL4 team around 20 personyears, and the pure
mathematicians who have embraced the Lean proof assistant, notably Kevin Buzzard,
report that formally verifying a single “obvious” step can take an
arbitrarily long time. To most potential users, the tools are not yet
costeffective.
The current situation is unfortunate, to say the least. Writing a penandpaper
proof is akin to developing a program as pseudocode, without using a compiler or
an integrated development environment. The process is awkward and errorprone.
Despite its weaknesses, existing theorem proving software can already be
useful to many computer scientists, engineers, and even mathematicians.
Computers outperform humans for complex logical reasoning tasks, which are
crucial in many areas.
The Nekoka project's vision is to develop highly flexible interactive
verification tools that incorporate very strong proof automation, broadening
the range of problems for which computers can assist humans. Proof steps
that are obvious to the user but currently require hours of focused labor will
become largely or fully automatic. These tools will support generalpurpose
reasoning and combinations of proof techniques. They will revolutionize the
practice of software verification and formalized mathematics.
Example 1.
To show the gulf between the state of the art and the vision, we consider
the CoCon conference management system, which was verified to
ensure the absence of leaks. Below is the Isabelle proof of the claim that program committee
members cannot review papers if they have a conflict of interest:
proof (cases a)
case (Create c)
show ?thesis
using Step isRev reach.Step by simp
next
case (Update u)
show ?thesis
proof (cases u)
case (UpdatePref confID uID p paperID pref)
thus ?thesis
using Step unfolding Update UpdatePref isRev_def2
by (blast dest: isRevNth reach.Step)
qed
next
case (NondestUpdate nu)
show ?thesis
using Step unfolding NondestUpdate isRev_def2
by (meson IO.reach.Step isRevNth)
qed simp+
Isabelle proofs are hierarchical and exhibit a block structure reflected by the
indentation. The parts highlighted in pink, supplied by the user, correspond to
the actual proof steps; they required some thinking and interactions with the
system. The rest specifies the proof outline: a threeway case
analysis, whose second case is itself a case analysis. The three outer cases
are labeled Create, Update, and NondestUpdate.
The proof may easily have taken 15 minutes to develop. It combines various
types of automation. User productivity would skyrocket if such proofs could be
entirely generated at the click of a button.
Today we see several elements of the above vision in a variety of tools. The
most widely deployed theorem provers are likely SAT and
SMT (satisfiability modulo theories) solvers.
Their great strength is
that they are reliable enough to serve as backends to dozens of software
verification and testing platforms as a pushbutton technology.
On the other hand,
their logics are restrictive. Users often find it more convenient to express
their problems in higherorder logics such as simple type theory and dependent
type theory. HOSMT (higherorder SMT) has
been proposed but is in its infancy.
Next to SAT and SMT solvers, another family of
successful automatic provers consists of those based on the superposition
calculus,
a successor of the resolution calculus.
At the other end of the automatic–interactive spectrum, we have the proof
assistants, most of which are based on simple or dependent type theory. These
tools offer a graphical user interface that can be used to develop large proofs
interactively. Their richer logic makes it possible to use higherorder
functions (e.g., map and fold on lists), big operators, set comprehensions, the
big O notation, monads, and more.
SMT solvers and superposition provers are integrated as backends to proof
assistants to increase the level of automation. Such integrations are called
hammers. Hammers encode the problem in the automatic
provers' less expressive logic and translate any proofs found in the other
direction. In Example 1, the “by (meson IO...)” command near the end
was generated by Sledgehammer, the hammer for Isabelle.
For a long time, the most successful hammer backends supported only firstorder
logic, but recently there has been a breakthrough in the development of
strong higherorder automatic provers. To see this, it suffices to take a look
at the results of the higherorder division of the 2020 edition of the CASC
prover competition:
 Zipperposition
 Satallax
 Vampire
 LeoIII
 CVC4
 LeoII

Solved problems (of 500)
 424
 323
 299
 287
 194
 111

This line of work has produced remarkable results, but it has also raised new
scientific questions and opened up new possibilities. Reliable pushbutton proof
automation for typical realworld applications (as opposed to the logical
puzzles that are common among CASC benchmarks) now seems within reach. But to
make this a reality, we must overcome the following challenges:
Exciting higherorder reasoning techniques are emerging in automatic theorem
provers, which could lead to pushbutton automation in highly efficient
verification tools. More research is needed to develop the potential of these
technologies and deploy them in science and industry.
Although verification of
all software will never be practical, we should aim at verifying safety and
securitycritical systems, starting with the libraries they build on. The
highprofile bug found in Android's, Java's, and Python's sorting procedure,
detected during verification, is a stark reminder of the
relevance of formal methods. By relying to a greater extent on computers to
develop and check proofs, practitioners and researchers can achieve much higher
quality standards for software and science.
Nowadays it is widely acknowledged that formal methods do work. Computer
scientists of any persuasion understand that the seL4 kernel will not panic due
to software bugs, that the CompCert compiler produces correct assembly code, and
that the EverCrypt cryptography library verified using
F* is mathematically correct.
SAT and SMT solvers are leading the way in industry, and superposition provers
have distinguished themselves as backends to hammers in proof assistants. But users
still face a choice between automation and expressiveness. In an ideal
world, computers and humans would complement each other, doing what they are
best at. But today's automatic and interactive theorem provers are far from
this ideal.
Our grand challenge is to extend higherorder SMT and superposition to
make them a perfect match for problems emerging from interactive
verification. We will implement these improvements in standalone provers,
integrate them in proof assistants and software verification platforms, and
collaborate with early adopters to deploy our tools.
To meet this challenge, we need to build on the strengths of both automatic and
interactive theorem proving. The interactive side includes generalpurpose proof
assistants such as Coq, Isabelle, and Lean as well as software verification
platforms such as F* and the TLA^{*} Proof System. They all provide some proof
management—checking that definitions are well formed and that every stated
lemma is proved. Developing a proof is much like coding a program in an
integrated development environment. All interactive tools provide some
automation, and when automation fails, users can specify a proof outline like in
Example 1 above. For instance, if the step from lemma L
to lemma N is too difficult, the user can state an intermediate
lemma M and then derive N from M and M from
L. Induction and case analysis must normally be applied by the user as
well.
The tools on the automatic side have a very different look and feel. They are
commandline programs that try to solve a selfcontained input problem within a
given time limit. They can give three verdicts: provable, unprovable, or
unknown. For problems in firstorder logic and beyond, due to the logic's
undecidability, the verdict “unprovable” is very rare. Nevertheless,
SMT solvers can often provide useful potential counterexamples, and the output
of superposition provers could also be analyzed to provide helpful hints to the
user.
The grand challenge will be met by pursuing three objectives. Our starting point
is that there is a lot of scientific value in pursuing the following activities
within a single team: developing higherorder proof calculi, implementing them
in provers, integrating these in verification platforms, and applying the
platforms to verify specific pieces of software or mathematics. By using our own
tools, we can detect and fix shortcomings as they arise and implement new
features that meet practical needs. It is also important that the techniques and
tools we develop are adopted by others and outlive the project.
Objective 1: Extend HOSMT and λsuperposition to
provide pushbutton automation for proof obligations in higherorder logics
Our priority will be to make higherorder SMT a working reality. The forthcoming
SMTLIB 3 standard will support HOSMT, and cvc5 already supports some HOSMT
constructs but not λexpressions. Despite its
dominance as a firstorder solver, cvc5 disappoints in the higherorder division
of CASC.
We will collaborate with the cvc5 developers to enhance their solver. Quantifier
instantiation is crucial for lemmas containing universally
quantified variables, but today it has not been
generalized to full simple type theory. The challenge is to achieve a
graceful generalization, both in theory and in practice, so that the
solver performs well on higherorder formulas while remaining efficient on
firstorder formulas.
In contrast, superposition is known for its efficient quantifier support.
To capitalize on the success of the λsuperposition calculus,
we will investigate its theory, aiming to
reduce its search space without compromising its ability to prove any
provable problem—its refutational completeness.
The focus on refutational completeness as a guide to achieve high performance
will be a distinguishing feature of our work. Some researchers see pragmatism
and completeness as opposites; for us, they will go hand in hand.
On a scale of expressiveness, simple type theory is located between firstorder
logic and dependent type theory. We will work at both ends of this scale. On the
one hand, we will adapt successful SAT and firstorder techniques to a
higherorder setting. On the other hand, we will extend λsuperposition
to dependent type theory with the axiom of choice. Both parts of this program
raise their own challenges, because more expressive logics are less well
behaved; when generalizing techniques, we must invariably design workarounds to
compensate. In particular, for dependent type theory, we will need to find
solutions to cope with dependent types, uninhabited types, universes, and
universe polymorphism. To achieve refutational completeness, we will follow the
stratified design that underlies λsuperposition.
Objective 2: Integrate higherorder provers in interactive verification
platforms, using lightweight, trustworthy translations
Rather than being invoked from the command line, automatic provers usually serve
as backends to generalpurpose proof assistants or more specialized software
verification platforms. To reach users, we will integrate, and help integrate,
our provers in many interactive proof tools.
On the proof assistant side, our targets will be Isabelle and Lean, as
representatives of simple and dependent type theory.
Isabelle's tool Sledgehammer already includes rudimentary support for
Zipperposition, but Zipperposition has not yet been finetuned on Isabelle benchmarks.
For Lean, work has commenced on a Lean Hammer in the Lean community. The main
difficulty on the Lean side will be to design an encoding of dependent types
that is sound, graceful, ideally complete, and efficient.
To make Sledgehammer and the Lean Hammer more transparent when they fail, we will
have the automatic provers output proofs with holes—for example, if one
of the cases of a proof by case analysis cannot be solved.
In general, because of the heuristic way in which provers explore the search
space, there may be many competing unfinished proof attempts. New
heuristics will be needed to choose among them. Proofs with holes
should work well in combination with clause splitting,
a technique to partition the search space.
When reconstructing proofs in Isabelle or Lean, the hammer will put in a
sorry placeholder for each missing subproof.
Example 2.
Consider the Isabelle proof presented in Example 1. The
envisioned automation might prove all but the most difficult, second case
(Update). A future Sledgehammer would then generate the proof text below:
proof (cases a)
case (Create c)
show ?thesis
using Step isRev reach.Step by simp
case (Update u)
show ?thesis
sorry
case (NondestUpdate nu)
show ?thesis
using Step unfolding NondestUpdate isRev_def2
by (meson IO.reach.Step isRevNth)
qed simp+
The user would then remove the sorry and continue manually,
possibly invoking Sledgehammer again to solve new subcases. Clearly, a partial
proof such as this is much more helpful than an opaque failure.
For software verification platforms, we face a chickenandegg
situation: The vast majority of platforms present users with firstorder
formalisms, because they are based on firstorder SMT solvers. We will
initially focus on F* and the TLA^{+} Proof System.
F* is a dependently typed effectful functional programming language with
features aimed at program verification. The TLA^{+} Proof System is a
tool for specifying and verifying concurrent and distributed systems based on
set theory.
Objective 3: Apply higherorder reasoning methods to
ongoing verification efforts, in close collaboration with quantum information
and big data scientists
Tools have a major role to play in stimulating new ideas for improvements and
extensions, but they need users to thrive. To create an immediate impact, we
will collaborate with other Isabelle and Lean users who are working on ambitious
projects formalizing quantum information and a big data framework—two
attractive new topics for formal verification—so that we can work with real
benchmarks and increase awareness of our tools to reach more potential users.
These two areas of research have received comparatively little attention from
the theorem proving community and could benefit greatly from formal
verification.
Quantum physics has interesting implications for information processing,
including cryptography and problem solving. As a result, quantum information has
become its own area of study. It is highly mathematical and as such constitutes
an interesting target for proof assistants. We will build a Lean library for the
mathematics of quantum information, including libraries for quantum circuits and
quantum information theory. This case study will exercise our tools' support for
dependent types. This will provide valuable case studies to guide our work and
help build a user community around our technologies.
The project is thankful for the support and advice of the following friends and
colleagues:
JeanPierre Banâtre,
Alexander Bentkamp,
Simon Cruanes,
Martin Desharnais,
Pascal Fontaine,
Laura Kovács,
Robert Lewis,
Jannis Limperg,
Assia Mahboubi,
Anja Palatzke,
Andrei Popescu,
Trenton Schulz,
Rikkert Stuve,
Mark Summerfield,
Petar Vukmirović,
Uwe Waldmann, and
Aleksandar Zeljić.