Trustworthy Extensions to a Theorem Prover Core

Introduction
An Ideal System
Finding an Approach
Implementation Detail
The Rule
Literature Review

Trustworthy Extensions to a Theorem Prover Core

Computer systems tend to be complex objects, which are notoriously difficult to build correctly. Yet, we increasingly require computers to control systems that are safety or national security critical (e.g. automobiles, train signaling, medical systems, financial transfers, classified messaging systems, nuclear weapon systems). Techniques for improving the likelihood that such systems are designed and delivered correctly, would have great benefits to any society so dependent on automation.

Validating and Verifying

The process of taking a human intuition for what a system should do, and realizing that intuition in a physical system can be considered in two primary stages: specification and implementation. Often, the object that stands at the junction of these two phases (also called a specification) is an informal document, presented in a natural language (e.g. English), possibly enhanced with some semi-formal annotations (e.g. flow-charts){1}. Ideally, the specification is formal, that is, it is presented using a formally defined (mathematically-based) language.

Why? Because a formal notation facilitates the achievement of correctness in both phases of the process.

A formal specification is more precise and consequently easier to analyze; formal languages can be automated--enabling simulation of the specification so that humans can intuitively analyze a prototype of the system; and because formal languages encourage clean separation between parts of a system, it is easier to reuse the specifications. All of this makes the process of validation--determining whether the specification matches the human intuition--faster, cheaper and easier.

Since the formal specification is (by definition) mathematical--and computing systems (from high-level language constructs [1] all the way down to hardware gates[2]) can be modeled by mathematical statements--it is possible to use mathematical reasoning to determine whether a proposed implementation conforms to the desired specification. Replacing the error-prone and tedious informal methods of checking this correspondence with the more precise and easy to automate technique of formal reasoning means that the process of verification can also be made faster, easier {2} and more effective.

While both of these phases are critical to the process of developing correct systems, the project to be described is strictly in the verification phase.

Limits of Formal Reasoning

The first limit (which was implicit in the discussion above) is that only mathematical objects are amenable to formal reasoning. The impact of this limit is felt at both ends of the process. Because human ideas and intuition are not mathematical objects, the only way to compare them with the formal models we build (i.e. the specification) is informally. Simulation is an excellent technique for performing validation, not despite its informality {10}, but precisely because it converts a (possibly formal) specification into a running prototype (i.e. a natural phenomenon)--allowing the brain to compare its intuitive "picture" of the system with a similar sensory phenomenon.

At the same time, while we can build precise mathematical models of programming languages, which can be built on precise models of compilers (and assemblers, and run-time environments, and operating systems and register-transfer architectures) eventually one of the models (whether it is gates or transistors or something else) will be of some natural phenomenon. Once again, we can not directly reason (formally) about a non-mathematical object, and we must rely on informal techniques to justify our models.

While this may be troubling at some level, it is a concern which is hardly unique to computer science. Every discipline in science does its reasoning based on mathematical models of physical phenomenon (how much weight can a steel beam of a certain thickness hold, how fast does light travel, how many calories per day does a human burn) which have been validated by exhaustive observation. Every engineering discipline aims to meet fuzzy, intuitive human expressed requirements (the bridge should allow traffic to cross with no delays on a normal day, and with no more than 1/2 an hour delay on a heavy day). Computer science is hardly likely to be any different {3}.

The second limit to formal reasoning, though, is unique to formal reasoning systems. A formal reasoning system is part of a logic. A logic is made up of the statements in the logic (usually, rules for construction of those statements) and a definition of what it means for those statements to be true. Generally the definition of truth is dependent on checking an infinite number of cases and can not be computed. That's why a formal reasoning system (or prover) is needed--it is a computable {15} way to determine whether a statement is true.

It would certainly be nice if we could design a prover for a given logic that would, whenever a statement was true, correctly determine that this was the case. But according to Goedel's [7] First Incompleteness Theorem, any sufficiently expressive logic (i.e. any logic able to handle ordinary arithmetic) can not have a prover which is able to correctly determine the truth of every statement in the logic.

It would also be desirable that we could determine that the logical system we are using is "sensible" (i.e. consistent). Part of the process of defining what "truth" means in a logic, is assuming that certain statements are true (aka choosing axioms). It would be reassuring if we could check that these axioms are consistent with each other and our sense of reality. Goedel's Second Incompleteness Theorem [7] states that there is no sufficiently expressive logic which is able to establish the consistency of its own axiomatization. Proving the consistency of a logical system requires that an even more powerful logic be used to examine the system of interest.

But, then one is either forced to trust the consistency of that more powerful system, or construct yet another in which to examine its consistency--but that more powerful system will be subject to the same limit. Eventually there must be a logical system which acts as the foundation. The net effect of these limits is not, however, an indication against the use of formal reasoning. The benefits of formal reasoning are clearly in its reusability, mechanizability and precision--but it is not a panacea, it does not fully relieve the obligation to do informal analysis.

An Ideal System

Requirements

If one were to construct an ideal formal reasoning system, what properties would the system have? The following criteria are proposed:

Expressive The language of the underlying logical system should be highly expressive--statements in the language should be simple and elegant. The language should communicate well with the human user of the system.
Efficient The system should not require exorbitant resources to automate, it should be able to determine interesting results in a reasonable amount of time and it should be possible to reuse past successes without reconstructing them.
Secure The logic should be extensible (that is, the set of defined constants increased, not that the underlying logic be altered) in a manner that does not compromise the soundness of the reasoning mechanism.

Approaches

Automatic vs User-GuidanceThere is a continuum between provers which only discover proofs, accepting no user input (except for the conjecture to be proved) and those which only check the validity of individual steps suggested by the user. In particular, the system may be able to record: the fact that a statement has already been proven, a collection of proof steps, a program which can suggest proof steps; it may already have its own collection of procedures which search for proofs (some of which may have input parameters which guide the search) and it may have heuristics for choosing between these procedures. No attempt will be made, however, to precisely define this continuum.
Security vs Efficiency While every prover designer would hope to create a prover that has both of these properties, typically there is a decision made to focus on one of these (as the raison d'etre of the system) properties, with any efforts to achieve the other property limited so as not to effect the primary property.
Provers designed with efficiency as the primary property are generally written in systems programming languages (e.g. C), and are designed to manage machines with vast resources (disk, memory, etc). Often the underlying logic is first-order, and the ability to reuse past efforts limited to citing proved facts. These systems are implemented by large programs, although the vast majority of the code is generally in the resource management, not in the proof checking.
Security-focused provers are built to have a well-defined core which is solely responsible for the correctness of any proof. They are more often implemented in type-safe languages (which provide support for a tamper-proof core), and rely on the resource management of the underlying system. The core is then used to provide a foundation for the user interface and proof search methods (which build proofs on-the-fly, which are checked by the underlying core).

Many of the proof systems which currently have significant user communities tend to fall into two of the four categories induced by these criteria. There is one group of highly automatic, efficiency-focused systems (Otter [8], SMV [9], Randy Bryant's BDD's[10]), and another group of user-directed, security-focused provers (HOL [11], Isabelle [12], Coq [13]). But it is not even necessarily easy to classify some systems: the Boyer-Moore Theorem Prover [14], for example, while lacking an identified proof core and operating in a very automatic fashion, has a nearly unimpeached record of correctness and primarily relies on the underlying system for resource management.

Finding an Approach

Based on this background, we arbitrarily choose to focus on the security of the system {4}. In particular, we focus on the HOL [11] system; it is very expressive (the object language is Higher-Order Logic); it is built in ML and has an identified proof core; and it is efficient enough to have been used for some large [21] and some commercial [29] applications {8}. Our initial consideration was how to increase the security of the basic system. Once convinced that the second incompleteness theorem was valid and applicable, we considered the idea of the "Hilbert Ideal" proof checker [15].

Hilbert proposed that any theorem proving technology be divided into two parts: a proof discoverer and a proof checker. If the proof checker was simple and small enough (a few pages of text is the usual intuitive definition), then it could be subject to extensive peer review. This would allow all of the "trust" {9} in the proving technology to be placed in the checker--the proof discoverer could do essentially anything at all, assuming its only output was a (checkable) proof.

The advantage of reducing all the security critical code to the proof checker is four-fold:

it makes the task of informally checking the proof checker feasible, since only a few pages of text must be reviewed
once the proof checker has been exhaustively reviewed and analyzed by the community at large, any number of proofs could take advantage of the already completed informal analysis
discovered proofs could be extracted and submitted to an impartial referee (who need only run the proof log thru the checker)
any number or style of proof discoverer could be built--as long as they only used the proof steps understood by the checker

Unfortunately, for the proof checker to be small and simple, as desired, the proof steps (or rules) that the checker can make must also be simple; and the set of rules must be pretty small. This can have a rather dramatic impact on the efficiency of the system, since even a seemingly small proof can require a great many primitive inferences to complete. Even if the proof discoverer doesn't check the steps individually, it still must compute all of these steps.

To improve the efficiency of the proof system, some way of making "bigger" proof steps, preferably without impacting the security of the system, is needed. We next considered whether it might be possible to adapt Hilbert's idea by creating a meta-logical proof core. The meta-logical prover could be used to prove the soundness of the core of the object system; and then could be used to prove the soundness of any desired new proof rule. Effectively then, the object core could be extensible, allowing the inclusion of "bigger" proof steps, enabling the creation of much more efficient object level provers.

Even better, the meta-logical prover would not need to be terribly efficient. Since it would only be used to prove the consistency of the initial object system, and reprove consistency when a new proof rule was needed, it would get significantly less use than the object system. Consequently, efforts to construct the meta-logical system could be heavily focused on its security, while the object system could be as efficient as could be proved secure by the meta-logical system.

While this idea was theoretically enticing, we were unable in practice {5} to devise an approach to building the meta-logical prover that resulted in a distinctly less complex and smaller security critical core of the intended object prover. Given that the approach didn't seem to be simplifying the informal analysis needed to trust the system, and that it would only allow users who learned yet another prover (the meta-logical one) to extend the system, this approach was set aside.

It was clear, however, that the key to creating a more efficient prover, is to reliably replace large chains of primitive inferences with some kind of proof rule. Currently, the most effective way of encoding a proof rule in a system like HOL [11] is with a tactic [18]. A tactic {11} is just a program that can apply proof steps in the course of its operations. Based on the results of previous steps, it can choose to apply different steps at subsequent steps. Essentially, it creates a (backward) proof in the course of searching for one. While tactics do an excellent job of encoding a proof plan, can easily be reused and have no impact on the security of the proof core, they must generate the primitive inferences for each proof they perform. Consequently they are not especially efficient.

To formally justify new proof rules (up to the reliability of the underlying core) without the benefit of an explicit meta-logic, the object logic must be "reflective" [33]. Harrison [34] characterizes two variations to this idea which might address this problem: metatheoretical reflection and computational reflection. The former is designed to extend the theoretical power of the object logic, but as Harrison points out higher-order logic (or full set theory) already includes the power to reason about encodings of the object logic (in the object logic).

Computational reflection, on the other hand, is merely intended to allow for increases in the efficiency of the theorem proving system [34]. We propose that it should be possible to demonstrate that a program only performs operations equivalent to primitive proof steps. If we can prove this about a piece of code, we should be able to apply it as if it were a proof rule, since we could just as well apply the sequence of proof steps which it (effectively) encodes.

Harrison strenuously points out that the use of computational reflection is not necessary to prevent LCF-style provers from "hopeless inefficiency" and being "fundamentally inadequate"; and we agree (since we also agree that HOL is a practical tool). However, since significant (e.g. factor of 40) constant time slow-downs (between SML implementations and simulations in the prover) are common, important benefits can be realized. Since these programs would no longer have to operate at the level of typical primitive inferences--and in fact, could build one upon the other--the resulting proof rules could be dramatically more efficient. Yet, since we will have already pre-proved that any of these proof rules could be replaced by primitive proof steps--even if we have to hierarchically expand nested proof rules--the resulting system should be just as secure as the base (core) of the system.

Harrison also points out that 1) the most likely benefits to be realized from computational reflection will be in the ability to reason about imperative features and 2) most previous implementations only reasoned about abstract models of the extensions. By employing a deep semantic embedding [39] of a real programming language, both of these issues will be directly addressed {13}.

While the original goal of creating a theorem proving system which relied on less informal analysis (and was in some sense more secure) has not been achieved, computational reflection is an approach to making more effective proof procedures secure, without increasing (well, much: one new rule) the need for informal analysis. The contribution of the project, then, is the development of an extensible secure core for a user-guided theorem proving system. It should provide the basis for dramatically more efficient systems built in accordance with the security-focused scheme.

Improving the efficiency of theorem provers has been an important goal of the community for some time now--especially incorporating decision procedures for specific domains into a security-focused prover [28]--and this project will lay the foundation for such efforts.

Implementation Detail

One of the big motivators behind the meta-logical approach mentioned above, was that it would provide an excellent opportunity to rationalize several of the quirks in the existing implementation of the HOL system. While the formal explication of the HOL system identifies a logical basis consisting of 3 predefined constants, 8 rules of inference, 5 axioms and 4 definitional principles, the documentation also readily admits that many of the allegedly derived rules of inference are actually hard-coded [11] and it turns out that one of the constants is axiomatized in an ad hoc way (to get one of the principles of definition admitted) long before it is "really" added [24].

Fortunately, John Harrison has developed a system called HOL Light (aka Gordon's Type Theory, GTT)[24] which makes many of the types of changes we had imagined implementing. The core of this system is 2 predefined constants (equality and implication), 2 principles of definition, and 13 primitive rules of inference (some of these rules could be derived). The core is completely contained in 4 CAML [22] files, the largest of which is just a collection of helper functions. Also technically part of the logical core are 3 axioms (choice, infinity, extensionality) used to facilitate the development of mathematics in the derived theories (since they do not directly contribute to the operation of the core, they appear later, when they are needed).

While the HOL Light system is built on a smaller and more easily identified core, it is also consistently more efficient than either of the existing HOL implementations (HOL 88 [26], or HOL '90 [27]). Since it does the best job of addressing two of our design concerns for such a system (security and efficiency--the specification language is unchanged), we chose to make it the foundation for our development.

To reason about the proof procedures we wish to use to extend the system, however, we must have a formal definition of the programming language we use to implement the procedures. The most complete existing formal definition of a "real" programming language is a deep embedding of Standard ML [ 19, 20] done by Myra VanInwegen[21]. This was ideal in that it also matched the author's preference for choice of a functional language.

Unfortunately, this created one rather critical difficulty, in that, in order for the proof procedures to be added into (run as part of) the base system, they would need to be implemented in the same language. But, while VanInwegen's work defines the SML language, Harrison's HOL Light system is built in a different dialect of ML, called CAML [22].

To resolve this dilemma, the author has constructed a CAML-->SML translator (primarily employing the ML-Lex and ML-Yacc tools of Appel [23], but requiring some small hand twiddling) {6}. This translation has a number of disadvantages:

A number of features native to the CAML implementation [22] had to be implemented by hand
CAML has good support for numerical operations, and a significant portion of Harrison's research [24] with HOL Light has been centered around numerical analysis-- the facility is lost in the SML version
to the extent this project is successful, it raises the danger of version competition

but, these will be endured, because:

it offers a direct route to making the project feasible (without creating my own, ad hoc, shallow embedding of a programming language),
a shallow embedding of the implementation would just raise a new requirement for informal analysis (the correctness of the embedding), whereas the deep embedding comes with interesting theorems proved about the correctness of the implementation,
the intent is to demonstrate the feasibility of adding a new capability to the community, not become the keeper/maintainer of a system.

Finally, targets for proof procedures are required. The following three notions are being considered:

some exceedingly simple, like a program that will recognize that any chain of multiplications can be reduced to zero if one of the operands is zero
a very straight-forward implementation of a tautology checker for the propositional logic
a linear arithmetic simplifier, modeled on the tactic (written by Richard Boulton [25]) in the current HOL "arith" library.

The last of these proposed procedures is especially interesting, not only because it is a tactic commonly used in the HOL community, but because it was developed by Boulton as a model of how to develop highly efficient tactics (the scheme is to write tactics which only employ proof steps which will be used as part of the delivered proof). So it will not only be a test of the utility of the project for "real" applications, but provide a very exacting test of whether efficiency gains are truly being realized.

The Rule

Candidate proof procedures will be SML functions from terms to terms. For a proof procedure to be accepted as an extension to the system core, it will have to be supported by the submission of a "justification", a function from terms to proofs. The user will have to demonstrate that for any term t, the proof obtained by applying the justification to the term, will prove that applying the procedure to that same term is equivalent to the term.

To state it more formally, the obligation is:

prove: proof * goal -> bool
f: term -> term
j: term -> proof
t: term
!: "for all" notation

!t.(prove(j t,f t = t))

If the user can demonstrate that this conjecture is true, the system will create a conversion (i.e. a function from terms to theorems) which will accept arbitrary terms and return the theorem which states the equivalence. Note that this theorem may then be used as a rewrite rule--or we may create a facility which takes a term, creates the rewrite rule and applies it directly to the goal.

For the purposes of the current project, the user will have to submit both the proof procedure and the justification, and demonstrate that they have the proper relationship. Boulton [25] expresses concern that such a process might be excessively laborious, but our belief is that the benefit of only having to pay the cost once will generate sufficient savings. We also suspect that it might be possible to take advantage of the elegance of the proof procedure style advocated by Boulton [25] and create a mechanism which would break such a tactic down into a proof procedure and its justification {7}.

Also, since we must show that we have a proof of equivalence to use this rule, it will always be possible to extract a fully primitive proof-- as long as the proofs are recorded in the system, and the extraction process uses the recorded proof whenever the corresponding conversion is employed. Note that while our proposed rule has the proof justification parameterized by term, which may be helpful in the extract of the Primitive Rules of Inference (PRI) chain from a sequence tactics, it is not necessary, since any true "extraction mechanism" will have to be able to determine the PRI chain from a series of tactics. Some issues which haven't been addressed directly in this proposal:

full implementation of the rule--especially what do we mean by the function "prove"
fitting the rule into the organization of the system
use of the type and module system to protect the system internals
tools to be used to measure time used by ordinary tactics and new proof rules

Review of the Most Current Literature

Decision Procedures in HOL

Certainly the idea is not new, the 1991 HOL User's Meeting [28] Introduction states, "Work is in progress on incorporating decision procedures (such as Boolean Decision Diagrams) into HOL, the goal being to provide automation in domains where efficient algorithms are known. This automation is being incorporated without sacrificing the intrinsic generality of HOL." In the period following, "incorporation" has primarily denoted one of two approaches: 1) attaching another reasoning mechanism (via a method for translating between the two mechanisms' languages), and 2) simulating the behavior of another reasoning mechanism in HOL.

At the same meeting, in fact, a system which integrated a first-order automatic prover called FAUST with the HOL environment [17] was described. Joyce and Seger reported [35] on a combination of HOL and a model checker in 1993. Four years later (at the 1997 TPHOLs), a presentation was made regarding the combination of HOL and SMV [29]; there was also a poster presentation [30] on support for connecting decision procedures with the HOL system. The idea was that any fact that was "proved" by an external decision procedure, would be tagged as having been proved outside the HOL system. Any subsequent proof, that relied on such a tagged proof, would be similarly tagged.

Harrison [31] offers an interesting variation on this approach: using an external procedure (in this case Maple) to discover the roots of an equation, then checking them using HOL. While this neatly follows the intuitive split between discovery and checking, one is still obliged to discover the proof of the roots (you just get to avoid using HOL to search for the roots). There is little indication that this technique will extend to the general class of decision procedures. In general, adding a (hard-coded) decision procedure to the HOL system requires informal analysis of both the procedure itself and the method for translation between the language of HOL and the decision procedure.

To avoid this increased obligation for informal analysis, the only other current option is to simulate the operation of a decision procedure as a HOL tactic. The most recent release of the HOL '90 system [27] makes available several library entries which include decision procedures for restricted domains (e.g. arith, taut, word). The Cambridge University Web Pages offer documentation on and implementations of the Knuth-Bendix completion procedure and the Boyer-Moore heuristics [32] for the HOL '88 system.

Unfortunately, these tactics are signficantly less efficient than the hard-coded versions of the decision procedures. Significant improvements to the efficiency of tactics (especially those which emulate DPs) can be made by coding them in the style advocated by Boulton [25]--that is, to only apply primitive inference rules that contribute to the final proof (i.e. avoid inference when searching dead-ends). Even employing this technique, however, there is still an efficiency price to pay, as Boulton points out, "Much of the cost is inherent in the primitive inferences..." [25].

Implementations of Computational Reflection

Boyer-Moore

The Boyer-Moore Theorem Prover (NQTHM [14]) is based on an executable, functional subset of Common LISP and a quantifier-free version of the First-Order Logic {12}. It works by systematically applying a series of rules (e.g. simplification, rewriting with known theorems, induction) until it proves the conjecture or fails. The primary means of user interaction is finding appropriate intermediate lemmas and having NQTHM prove them (thereby adding them to the set of known theorems).

However, NQTHM also included a facility for adding what it called "META" rules to the system. A META rule essentially adds a LISP function to the set of rules to be applied when trying to find proofs, if the user can show that the function, under any possible assignment (a concept implemented by a LISP function called "MEANING"):

the reult of applying the function to any formula is a formula (LISP is untyped, so this is not implicit)
that for any formula, the MEANING of the formula is the same as the MEANING of the result of applying the function to that formula

Although the basic extension rule is very similar to our proposed rule, there are several key differences between NQTHM and the work proposed here:

MEANING only provides a shallow embedding of the semantics of LISP/Boyer-Moore Logic programs
NQTHM is not based on a logical core, so there is no small target for informal analysis
NQTHM does not reason from a set of PRI, so it is not possible to extract the "proof"
Common LISP is a less expressive specification language than Higher Order Logic

NuPRL

A practical use of the idea of computational reflection was made by Aagaard and Leeser [36]. Using the NuPRL system they verified a logic synthesis algorithm which they encoded in a highly restricted subset of ML. Once proved, they were able to use the proved function to design hardware which was consistent with its specification. This work, however, did not suggest a way to extend the theorem prover.

Constable and Howe (e.g. in [37]) have suggested explicit reflection rules, but these are metatheoretical approaches and have not seen practical application.

A Proposal for Computational Reflection in HOL

Slind [38] suggests a rule he calls "MM". The idea is to formalize both a subset of ML and the HOL system (using shallow embeddings), then show that the formalization of some ML function is equivalent to the formalization of some HOL proof procedure (consisting only of PRIs). MM states that if these two programs are equivalent (and the environments in which they are defined are correctly managed) then the real ML function may be used to replace the HOL tactic. No work has been done to implement this proposal.

While Slind's proposal offers, through the use of first-class ML environments, a practical way of managing flip-flops between object proofs and use of computational reflection, it is not a practical consideration in this work. Since the deep embedding of SML {14} will heavily load the system, it would not be pleasant to have all of that machinery included in the system while trying to do other work. As currently imagined, we see our rule as a system design tool (albeit, one usable by "power" users), to be used before (in the sense of a proving session) any object proofs.

Slind's proposal employs two shallow embeddings (although one would not expect the HOL embedding to be too tricky) to facilitate the functional equivalence test (N.B. Harrison [34] suggests that this would be an easier to meet proof standard--although employing the tactic coding style of Boulton may obviate the difference). The shallow embeddings have the usual disadvantage of creating a new obligation for informal analysis, but the functional equivalence proof would make the job of extracting a PRI proof significantly more complicated. The MM proposal does not address this issue.

Summary

We have proposed a mechanism for creating theorem provers which are able to extend their core in a highly reliable fashion. The proposal is similar to one made by Slind, and an implementation by Boyer & Moore, but offers significantly higher levels of assurance than either of those schemes. The example implementation of the system will be an extension to (an SML translation) of Harrison's HOL Light system. Extensions to this system will be proven, and the efficiency of these extensions compared with those of traditional, tactic-based extensions.