Any time there is a discussion of formal proofs of correctness in software, I hear the same arguments made. These arguments are almost invariably made by people who have never done any formal verification, and who don't necessarily have a particularly good knowledge of how formal verification even works. This post will attempt to provide both a few counter-arguments against the usual objections, and an overview of the theoretical basis upon which formal verification of software depends.

The Bug-free mythAll too often, I hear the phrase "free of bugs" used in conjunction with theorem proving or formal verification of software. I want to make perfectly clear,

this is not what theorem proving claims. You'll often hear phrases like "guaranteed correct". This is an (informal) shorthand for "guaranteed correct

with respect to some specification". Allow me to make the distinction clear with an example.

Here is a program:

x = 0;

while(1) {

x = x + 1;

}

This program isn't very useful. It is not, however,

incorrect (nor is it "correct"), without reference to some specification. If our specification is simply "keep incrementing a variable until the universe ends", this program is correct. If we were expecting this program to calculate something useful, or produce some output, then it is obviously

incorrect

. The "correctness" judgement always follows from a specification, and no judgement can be made in the absence of a specification.

This definition covers the (informal) usage of "correctness", but this is still distinct from "bug-free". The main activity within formal verification of software is generally

the verification of specific

safety and

liveness properties.

"Safety" is approximately the guarantee that some bad thing can never happen. It might be that a certain storage location can never overflow, or that a value is always positive, or that a pointer is not null, or even much more concrete functional things, like "the autopilot will not switch off the engines while the plane is in flight". Keep in mind that doing

nothing is always "safe". It's impossible for a storage location to overflow if the program never does anything. A program that simply halts immediately will fulfil almost any safety property you care to dream up, so we can re-state the definition as "if anything happens, it will be something desirable".

This is where

liveness comes in. This is approximately the guarantee that "some desirable thing will happen (eventually)". These might be properties like "the doors will open", or "the value stored in X will be incremented".

When people talk about software being

verified, they really mean that a set of safety and liveness properties have been shown to be true of the software in question. When undertaking a verification activity, we choose a set of safety and liveness properties that are of interest or importance, and we prove those. A property like "there are no bugs" is completely useless (and unprovable), however, for some safety property X, we can at least say "there are no bugs that would cause property X to become false".

One of the main ways we verify these safety and liveness properties is through the use of automated or interactive

theorem provers. These are pieces of software that permit the construction of mathematical proofs, by both

checking the proofs, and by providing some assistance to a human in searching for a suitable proof that some property holds. Unfortunately, as soon as you mention theorem provers and all the wonderful things they do, this usually leads to a new objection.

But what if the theorem prover has bugs in it?

OK, it seems a natural enough objection, but you don't think we're completely dumb, do you? For people in the business of proving software correct, you have to assume that we have some reason to believe the software we use is correct.

But actually,

correctness within theorem provers is less important than you might think. I said earlier that a theorem prover has two basic functions - it acts as a

proof checker, and it provides tools to aid in coming up with a suitable proof of a property.

The important part of this is that only the proof checker needs to be correct. The other parts of the theorem prover could just generate random garbage proofs, and they would be rejected as incorrect proofs of the property under consideration by the proof checker. In practice, the techniques used in proof search have some very good heuristics about the chance that they will succeed in finding a valid proof, and the time it will take for them to find such a proof, but these are just for the benefit of their users. These proof-search aids do not simply return "true" if they find a valid proof - they instead must return the proof itself, which the proof-checker can then verify.

So, hopefully you have been suitably convinced that

most of the theorem prover does not need to be correct.

So how do you know that the proof checker is correct?

I'm glad you asked. This is going to require a quick whirlwind tour of how mathematical logic works.

Theorem provers like

Isabelle and

Coq have a wonderfully rich set of constructs built into them - numbers, arithmetic, records, functions - and the list goes on. The important fact about all of these things is that they can be constructed from a very small set of much simpler constructs.

There are a class of logics called

Hilbert Systems that are built using a very small number of axioms and inference rules, and which permit you to build up most of the familiar constructs from logic and mathematics in a systematic way. Similarly,

Zermelo–Fraenkel set theory also allows one to build up almost all of mathematics in a systematic way. This is old magic, and it is pretty well understood.

Because I'm writing this for a general audience, I don't feel the need to scrape the depths of my own knowledge to come up with the specifics of the logical foundations used within specific theorem provers, but if you get the general idea that there have been many attempts to formalise all of mathematics from a very small set of core constructs, and many of them have been successful enough to give us a formalised foundation for the types of mathematics most of us care about, then you already know how a theorem prover works.

So once we've encoded this very small core manually, we can build all the rest of our theorem prover theories from this small core. We can mechanically check that we are doing this right, either by hand, or using various

automated proof-checking techniques.

The small core inside the theorem prover is all that is really important now, in terms of correctness. This can be verified by hand, and it can also be verified using other theorem provers.

But even this doesn't

really need to be correct. When we use a theorem prover to aid us in proving a property, we end up with a

proof script - some machine-readable version of the proof. This doesn't necessarily rely upon the software that was used to help create it (although it might get messy trying to decouple it). This proof can be verified using any other piece of software, or it can be verified by hand without any creativity - you just have to mechanically apply each step in the proof and satisfy yourself that each step follows by a valid step of inference from the last.

The final important property of theorem proving software is that we can always look for

consistency. If we get anything wrong anywhere along the way, proofs are going to start failing. It is already true that there are some

valid statements that cannot be proved to be true in the logic in question, but this isn't really a problem as far as the correctness of the theorem prover is concerned. We would only be concerned if there were false statements that could somehow be shown to be true inside our logical system - and this is not the case! It is erroneous to prove a contradiction, so if this happens, mechanical proof checking will reject the proof.

But what if the proof has bugs in it?

Hopefully you are satisfied that we can have a very high degree of confidence in the results we get out of a theorem prover, so a proof is not like a "program" in the sense that we can have valid steps of reasoning and just "get the wrong answer". The only place where real "bugs" can be introduced is in the human side of things, where we choose the wrong safety or liveness properties that do not actually capture the properties we are interested in.

There are several strategies that we can use here. The first is syntax and type checking on the specification itself. Given that we are expressing our safety properties inside a theorem prover using something that is essentially a programming language, we can use the same techniques that have been shown to be very useful in enforcing consistency and detecting errors in traditional programming languages. This should remove the possibility that a simple typo could be totally catastrophic.

The second way is enforcing a discipline where we re-state the same properties in different ways. By doing this, we ensure that either a proof of one of the properties will fail, or we will introduce inconsistency between our encodings of the properties which will be rejected by the theorem prover.

Finally, and although I've previously argued against it,

be careful. This is very different to the suggestion that you just "be careful" when constructing a program in a traditional language. This is about a set of safety properties. You can only guarantee that your program is correct with respect to the specification that you have written down. If you are falsely claiming confidence in your software where none exists, then that would be just as catastrophic in any environment. I really think it's worth saying here that we are way off into the land of the abstract. Even in a complex specification, it would be very difficult to encode things in such a way that you succeed in proving a whole set of properties, none of which provide the guarantees you think you are making (the proof is likely to fail if you have an inconsistent model of what you think you are proving and what you are actually proving). This is a much stronger safeguard against human error than is provided in any traditional programming environment available today.

But what if the compiler is incorrect?There are two basic counter-arguments to this. Basically, in proving properties about software, you are either working with an idealised model of the software encoded into the theorem prover assuming some particular semantics for the target language and/or underlying hardware, or you are actually writing the software

inside the theorem prover, and then generating code once you have proven all the properties of interest. In either case, you are somewhat at the mercy of compilers. This is always the case though - any correct program could be made incorrect if there are faults in the compiler or the hardware. You can take two approaches here - assume that the hardware and compiler work as expected (therefore invalidating your guarantees of correctness if they do not), or make use of the

recent work on

certified compilation to at least remove the uncertainty of the compiler from the equation. There is a whole different body of work on (hardware)

fault tolerance in software. If you're really getting to this level of safety, then you probably need to be an expert on these kinds of issues anyway. Theorem provers for software verification are concerned with the verification of certain properties. If validating your assumptions about the hardware and software environments are important, then you will need to do more work.

Is it worth it?For some things, yes. This kind of formal software verification is hard work. It's very labour intensive, and it requires a great deal of effort. That said, it is getting easier as the tools get better, the barrier to entry is getting lower. I don't think it will be very long before we start seeing a lot more in the way of light-weight theorem proving taking place as a standard part of compilation. In a world where software failure costs huge amount of money, it doesn't seem so remote a possibility that it will become practical and cost-efficient to apply formal verification to more software.

--

Personal plea: I'm still looking for reasonably-priced accommodation for my partner and I in Copenhagen if anyone has any leads. I'm eager to go start my PhD at

ITU, but need to find somewhere to live first.