Why? Because a formal notation facilitates the achievement of correctness in both phases of the process.
A formal specification is more precise and consequently easier to analyze; formal languages can be automated--enabling simulation of the specification so that humans can intuitively analyze a prototype of the system; and because formal languages encourage clean separation between parts of a system, it is easier to reuse the specifications. All of this makes the process of validation--determining whether the specification matches the human intuition--faster, cheaper and easier.
Since the formal specification is (by definition) mathematical--and computing systems (from high-level language constructs [1] all the way down to hardware gates[2]) can be modeled by mathematical statements--it is possible to use mathematical reasoning to determine whether a proposed implementation conforms to the desired specification. Replacing the error-prone and tedious informal methods of checking this correspondence with the more precise and easy to automate technique of formal reasoning means that the process of verification can also be made faster, easier {2} and more effective.
While both of these phases are critical to the process of developing correct systems, the project to be described is strictly in the verification phase.
At the same time, while we can build precise mathematical models of programming languages, which can be built on precise models of compilers (and assemblers, and run-time environments, and operating systems and register-transfer architectures) eventually one of the models (whether it is gates or transistors or something else) will be of some natural phenomenon. Once again, we can not directly reason (formally) about a non-mathematical object, and we must rely on informal techniques to justify our models.
While this may be troubling at some level, it is a concern which is hardly unique to computer science. Every discipline in science does its reasoning based on mathematical models of physical phenomenon (how much weight can a steel beam of a certain thickness hold, how fast does light travel, how many calories per day does a human burn) which have been validated by exhaustive observation. Every engineering discipline aims to meet fuzzy, intuitive human expressed requirements (the bridge should allow traffic to cross with no delays on a normal day, and with no more than 1/2 an hour delay on a heavy day). Computer science is hardly likely to be any different {3}.
The second limit to formal reasoning, though, is unique to formal reasoning systems. A formal reasoning system is part of a logic. A logic is made up of the statements in the logic (usually, rules for construction of those statements) and a definition of what it means for those statements to be true. Generally the definition of truth is dependent on checking an infinite number of cases and can not be computed. That's why a formal reasoning system (or prover) is needed--it is a computable {15} way to determine whether a statement is true.
It would certainly be nice if we could design a prover for a given logic that would, whenever a statement was true, correctly determine that this was the case. But according to Goedel's [7] First Incompleteness Theorem, any sufficiently expressive logic (i.e. any logic able to handle ordinary arithmetic) can not have a prover which is able to correctly determine the truth of every statement in the logic.
It would also be desirable that we could determine that the logical system we are using is "sensible" (i.e. consistent). Part of the process of defining what "truth" means in a logic, is assuming that certain statements are true (aka choosing axioms). It would be reassuring if we could check that these axioms are consistent with each other and our sense of reality. Goedel's Second Incompleteness Theorem [7] states that there is no sufficiently expressive logic which is able to establish the consistency of its own axiomatization. Proving the consistency of a logical system requires that an even more powerful logic be used to examine the system of interest.
But, then one is either forced to trust the consistency of that more powerful system, or construct yet another in which to examine its consistency--but that more powerful system will be subject to the same limit. Eventually there must be a logical system which acts as the foundation. The net effect of these limits is not, however, an indication against the use of formal reasoning. The benefits of formal reasoning are clearly in its reusability, mechanizability and precision--but it is not a panacea, it does not fully relieve the obligation to do informal analysis.
Provers designed with efficiency as the primary property are
generally written in systems programming languages (e.g. C),
and are designed to manage machines with vast resources (disk,
memory, etc). Often the underlying logic is first-order, and
the ability to reuse past efforts limited to citing proved
facts. These systems are implemented by large programs,
although the vast majority of the code is generally in the
resource management, not in the proof checking.
Security-focused provers are built to have a well-defined core
which is solely responsible for the correctness of any proof.
They are more often implemented in type-safe languages (which
provide support for a tamper-proof core), and rely on the
resource management of the underlying system. The core is
then used to provide a foundation for the user interface and
proof search methods (which build proofs on-the-fly, which are
checked by the underlying core).
An Ideal System
Requirements
If one were to construct an ideal formal reasoning system, what
properties would the system have? The following criteria are
proposed:
Approaches
Many of the proof systems which currently have significant user
communities tend to fall into two of the four categories induced by
these criteria. There is one group of highly automatic,
efficiency-focused systems (Otter [8],
SMV [9], Randy Bryant's BDD's[10]), and another group of user-directed,
security-focused provers (HOL [11],
Isabelle [12], Coq [13]). But it is not even necessarily easy
to classify some systems: the Boyer-Moore Theorem Prover [14], for example, while lacking an
identified proof core and operating in a very automatic fashion, has a
nearly unimpeached record of correctness and primarily relies on the
underlying system for resource management.
Finding an Approach
Based on this background, we arbitrarily choose to focus on the
security of the system {4}. In particular, we
focus on the HOL [11] system; it is very
expressive (the object language is Higher-Order Logic); it is built in
ML and has an identified proof core; and it is efficient enough to
have been used for some large [21] and
some commercial [29] applications {8}. Our initial consideration was how to increase
the security of the basic system. Once convinced that the second
incompleteness theorem was valid and applicable, we considered the
idea of the "Hilbert Ideal" proof checker [15].
Hilbert proposed that any theorem proving technology be divided into two parts: a proof discoverer and a proof checker. If the proof checker was simple and small enough (a few pages of text is the usual intuitive definition), then it could be subject to extensive peer review. This would allow all of the "trust" {9} in the proving technology to be placed in the checker--the proof discoverer could do essentially anything at all, assuming its only output was a (checkable) proof.
The advantage of reducing all the security critical code to the proof checker is four-fold:
To improve the efficiency of the proof system, some way of making "bigger" proof steps, preferably without impacting the security of the system, is needed. We next considered whether it might be possible to adapt Hilbert's idea by creating a meta-logical proof core. The meta-logical prover could be used to prove the soundness of the core of the object system; and then could be used to prove the soundness of any desired new proof rule. Effectively then, the object core could be extensible, allowing the inclusion of "bigger" proof steps, enabling the creation of much more efficient object level provers.
Even better, the meta-logical prover would not need to be terribly efficient. Since it would only be used to prove the consistency of the initial object system, and reprove consistency when a new proof rule was needed, it would get significantly less use than the object system. Consequently, efforts to construct the meta-logical system could be heavily focused on its security, while the object system could be as efficient as could be proved secure by the meta-logical system.
While this idea was theoretically enticing, we were unable in practice {5} to devise an approach to building the meta-logical prover that resulted in a distinctly less complex and smaller security critical core of the intended object prover. Given that the approach didn't seem to be simplifying the informal analysis needed to trust the system, and that it would only allow users who learned yet another prover (the meta-logical one) to extend the system, this approach was set aside.
It was clear, however, that the key to creating a more efficient prover, is to reliably replace large chains of primitive inferences with some kind of proof rule. Currently, the most effective way of encoding a proof rule in a system like HOL [11] is with a tactic [18]. A tactic {11} is just a program that can apply proof steps in the course of its operations. Based on the results of previous steps, it can choose to apply different steps at subsequent steps. Essentially, it creates a (backward) proof in the course of searching for one. While tactics do an excellent job of encoding a proof plan, can easily be reused and have no impact on the security of the proof core, they must generate the primitive inferences for each proof they perform. Consequently they are not especially efficient.
To formally justify new proof rules (up to the reliability of the underlying core) without the benefit of an explicit meta-logic, the object logic must be "reflective" [33]. Harrison [34] characterizes two variations to this idea which might address this problem: metatheoretical reflection and computational reflection. The former is designed to extend the theoretical power of the object logic, but as Harrison points out higher-order logic (or full set theory) already includes the power to reason about encodings of the object logic (in the object logic).
Computational reflection, on the other hand, is merely intended to allow for increases in the efficiency of the theorem proving system [34]. We propose that it should be possible to demonstrate that a program only performs operations equivalent to primitive proof steps. If we can prove this about a piece of code, we should be able to apply it as if it were a proof rule, since we could just as well apply the sequence of proof steps which it (effectively) encodes.
Harrison strenuously points out that the use of computational reflection is not necessary to prevent LCF-style provers from "hopeless inefficiency" and being "fundamentally inadequate"; and we agree (since we also agree that HOL is a practical tool). However, since significant (e.g. factor of 40) constant time slow-downs (between SML implementations and simulations in the prover) are common, important benefits can be realized. Since these programs would no longer have to operate at the level of typical primitive inferences--and in fact, could build one upon the other--the resulting proof rules could be dramatically more efficient. Yet, since we will have already pre-proved that any of these proof rules could be replaced by primitive proof steps--even if we have to hierarchically expand nested proof rules--the resulting system should be just as secure as the base (core) of the system.
Harrison also points out that 1) the most likely benefits to be realized from computational reflection will be in the ability to reason about imperative features and 2) most previous implementations only reasoned about abstract models of the extensions. By employing a deep semantic embedding [39] of a real programming language, both of these issues will be directly addressed {13}.
While the original goal of creating a theorem proving system which relied on less informal analysis (and was in some sense more secure) has not been achieved, computational reflection is an approach to making more effective proof procedures secure, without increasing (well, much: one new rule) the need for informal analysis. The contribution of the project, then, is the development of an extensible secure core for a user-guided theorem proving system. It should provide the basis for dramatically more efficient systems built in accordance with the security-focused scheme.
Improving the efficiency of theorem provers has been an important goal of the community for some time now--especially incorporating decision procedures for specific domains into a security-focused prover [28]--and this project will lay the foundation for such efforts.
Implementation Detail
One of the big motivators behind the meta-logical approach mentioned
above, was that it would provide an excellent opportunity to
rationalize several of the quirks in the existing implementation of
the HOL system. While the formal explication of the HOL system
identifies a logical basis consisting of 3 predefined constants, 8
rules of inference, 5 axioms and 4 definitional principles, the
documentation also readily admits that many of the allegedly derived
rules of inference are actually hard-coded
[11] and it turns out that one of the constants is axiomatized in
an ad hoc way (to get one of the principles of definition admitted)
long before it is "really" added [24].
Fortunately, John Harrison has developed a system called HOL Light (aka Gordon's Type Theory, GTT)[24] which makes many of the types of changes we had imagined implementing. The core of this system is 2 predefined constants (equality and implication), 2 principles of definition, and 13 primitive rules of inference (some of these rules could be derived). The core is completely contained in 4 CAML [22] files, the largest of which is just a collection of helper functions. Also technically part of the logical core are 3 axioms (choice, infinity, extensionality) used to facilitate the development of mathematics in the derived theories (since they do not directly contribute to the operation of the core, they appear later, when they are needed).
While the HOL Light system is built on a smaller and more easily identified core, it is also consistently more efficient than either of the existing HOL implementations (HOL 88 [26], or HOL '90 [27]). Since it does the best job of addressing two of our design concerns for such a system (security and efficiency--the specification language is unchanged), we chose to make it the foundation for our development.
To reason about the proof procedures we wish to use to extend the system, however, we must have a formal definition of the programming language we use to implement the procedures. The most complete existing formal definition of a "real" programming language is a deep embedding of Standard ML [ 19, 20] done by Myra VanInwegen[21]. This was ideal in that it also matched the author's preference for choice of a functional language.
Unfortunately, this created one rather critical difficulty, in that, in order for the proof procedures to be added into (run as part of) the base system, they would need to be implemented in the same language. But, while VanInwegen's work defines the SML language, Harrison's HOL Light system is built in a different dialect of ML, called CAML [22].
To resolve this dilemma, the author has constructed a CAML-->SML translator (primarily employing the ML-Lex and ML-Yacc tools of Appel [23], but requiring some small hand twiddling) {6}. This translation has a number of disadvantages:
To state it more formally, the obligation is:
For the purposes of the current project, the user will have to submit
both the proof procedure and the justification, and demonstrate that
they have the proper relationship. Boulton [25] expresses concern that such a process
might be excessively laborious, but our belief is that the benefit of
only having to pay the cost once will generate sufficient savings. We
also suspect that it might be possible to take advantage of the
elegance of the proof procedure style advocated by Boulton [25] and create a mechanism which would
break such a tactic down into a proof procedure and its justification
{7}.
Also, since we must show that we have a proof of equivalence to use
this rule, it will always be possible to extract a fully primitive proof--
as long as the proofs are recorded in the system, and the extraction
process uses the recorded proof whenever the corresponding conversion is
employed. Note that while our proposed rule has the proof justification
parameterized by term, which may be helpful in the extract of the Primitive
Rules of Inference (PRI) chain from a sequence tactics, it is not necessary,
since any true "extraction mechanism" will have to be able to determine the
PRI chain from a series of tactics.
Some issues which haven't been addressed directly in this proposal:
The Rule
Candidate proof procedures will be SML functions from terms to
terms. For a proof procedure to be accepted as an extension to
the system core, it will have to be supported by the submission
of a "justification", a function from terms to proofs. The user
will have to demonstrate that for any term t, the proof obtained
by applying the justification to the term, will prove that
applying the procedure to that same term is equivalent to the
term.
prove: proof * goal -> bool
f: term -> term
j: term -> proof
t: term
!: "for all" notation
!t.(prove(j t,f t = t))
If the user can demonstrate that this conjecture is true, the
system will create a conversion (i.e. a function from terms to
theorems) which will accept arbitrary terms and return the
theorem which states the equivalence. Note that this theorem
may then be used as a rewrite rule--or we may create a facility
which takes a term, creates the rewrite rule and applies it
directly to the goal.
Review of the Most Current Literature
Decision Procedures in HOL
Certainly the idea is not new, the 1991 HOL User's Meeting [28] Introduction states, "Work is in progress on
incorporating decision procedures (such as Boolean Decision Diagrams)
into HOL, the goal being to provide automation in domains where
efficient algorithms are known. This automation is being incorporated
without sacrificing the intrinsic generality of HOL." In the period
following, "incorporation" has primarily denoted one of two
approaches: 1) attaching another reasoning mechanism (via a method for
translating between the two mechanisms' languages), and 2) simulating
the behavior of another reasoning mechanism in HOL.
At the same meeting, in fact, a system which integrated a first-order automatic prover called FAUST with the HOL environment [17] was described. Joyce and Seger reported [35] on a combination of HOL and a model checker in 1993. Four years later (at the 1997 TPHOLs), a presentation was made regarding the combination of HOL and SMV [29]; there was also a poster presentation [30] on support for connecting decision procedures with the HOL system. The idea was that any fact that was "proved" by an external decision procedure, would be tagged as having been proved outside the HOL system. Any subsequent proof, that relied on such a tagged proof, would be similarly tagged.
Harrison [31] offers an interesting variation on this approach: using an external procedure (in this case Maple) to discover the roots of an equation, then checking them using HOL. While this neatly follows the intuitive split between discovery and checking, one is still obliged to discover the proof of the roots (you just get to avoid using HOL to search for the roots). There is little indication that this technique will extend to the general class of decision procedures. In general, adding a (hard-coded) decision procedure to the HOL system requires informal analysis of both the procedure itself and the method for translation between the language of HOL and the decision procedure.
To avoid this increased obligation for informal analysis, the only other current option is to simulate the operation of a decision procedure as a HOL tactic. The most recent release of the HOL '90 system [27] makes available several library entries which include decision procedures for restricted domains (e.g. arith, taut, word). The Cambridge University Web Pages offer documentation on and implementations of the Knuth-Bendix completion procedure and the Boyer-Moore heuristics [32] for the HOL '88 system.
Unfortunately, these tactics are signficantly less efficient than the hard-coded versions of the decision procedures. Significant improvements to the efficiency of tactics (especially those which emulate DPs) can be made by coding them in the style advocated by Boulton [25]--that is, to only apply primitive inference rules that contribute to the final proof (i.e. avoid inference when searching dead-ends). Even employing this technique, however, there is still an efficiency price to pay, as Boulton points out, "Much of the cost is inherent in the primitive inferences..." [25].
However, NQTHM also included a facility for adding what it called "META" rules to the system. A META rule essentially adds a LISP function to the set of rules to be applied when trying to find proofs, if the user can show that the function, under any possible assignment (a concept implemented by a LISP function called "MEANING"):
Constable and Howe (e.g. in [37]) have suggested explicit reflection rules, but these are metatheoretical approaches and have not seen practical application.
While Slind's proposal offers, through the use of first-class ML environments, a practical way of managing flip-flops between object proofs and use of computational reflection, it is not a practical consideration in this work. Since the deep embedding of SML {14} will heavily load the system, it would not be pleasant to have all of that machinery included in the system while trying to do other work. As currently imagined, we see our rule as a system design tool (albeit, one usable by "power" users), to be used before (in the sense of a proving session) any object proofs.
Slind's proposal employs two shallow embeddings (although one would not expect the HOL embedding to be too tricky) to facilitate the functional equivalence test (N.B. Harrison [34] suggests that this would be an easier to meet proof standard--although employing the tactic coding style of Boulton may obviate the difference). The shallow embeddings have the usual disadvantage of creating a new obligation for informal analysis, but the functional equivalence proof would make the job of extracting a PRI proof significantly more complicated. The MM proposal does not address this issue.