Monday, 5 October 2009

The bug-free myth: Why we trust verified software

Any time there is a discussion of formal proofs of correctness in software, I hear the same arguments made. These arguments are almost invariably made by people who have never done any formal verification, and who don't necessarily have a particularly good knowledge of how formal verification even works. This post will attempt to provide both a few counter-arguments against the usual objections, and an overview of the theoretical basis upon which formal verification of software depends.

The Bug-free myth

All too often, I hear the phrase "free of bugs" used in conjunction with theorem proving or formal verification of software. I want to make perfectly clear, this is not what theorem proving claims. You'll often hear phrases like "guaranteed correct". This is an (informal) shorthand for "guaranteed correct with respect to some specification". Allow me to make the distinction clear with an example.

Here is a program:

x = 0;
while(1) {
x = x + 1;
}
This program isn't very useful. It is not, however, incorrect (nor is it "correct"), without reference to some specification. If our specification is simply "keep incrementing a variable until the universe ends", this program is correct. If we were expecting this program to calculate something useful, or produce some output, then it is obviously incorrect. The "correctness" judgement always follows from a specification, and no judgement can be made in the absence of a specification.

This definition covers the (informal) usage of "correctness", but this is still distinct from "bug-free". The main activity within formal verification of software is generally the verification of specific safety and liveness properties.

"Safety" is approximately the guarantee that some bad thing can never happen. It might be that a certain storage location can never overflow, or that a value is always positive, or that a pointer is not null, or even much more concrete functional things, like "the autopilot will not switch off the engines while the plane is in flight". Keep in mind that doing nothing is always "safe". It's impossible for a storage location to overflow if the program never does anything. A program that simply halts immediately will fulfil almost any safety property you care to dream up, so we can re-state the definition as "if anything happens, it will be something desirable".

This is where liveness comes in. This is approximately the guarantee that "some desirable thing will happen (eventually)". These might be properties like "the doors will open", or "the value stored in X will be incremented".

When people talk about software being verified, they really mean that a set of safety and liveness properties have been shown to be true of the software in question. When undertaking a verification activity, we choose a set of safety and liveness properties that are of interest or importance, and we prove those. A property like "there are no bugs" is completely useless (and unprovable), however, for some safety property X, we can at least say "there are no bugs that would cause property X to become false".

One of the main ways we verify these safety and liveness properties is through the use of automated or interactive theorem provers. These are pieces of software that permit the construction of mathematical proofs, by both checking the proofs, and by providing some assistance to a human in searching for a suitable proof that some property holds. Unfortunately, as soon as you mention theorem provers and all the wonderful things they do, this usually leads to a new objection.

But what if the theorem prover has bugs in it?

OK, it seems a natural enough objection, but you don't think we're completely dumb, do you? For people in the business of proving software correct, you have to assume that we have some reason to believe the software we use is correct.

But actually, correctness within theorem provers is less important than you might think. I said earlier that a theorem prover has two basic functions - it acts as a proof checker, and it provides tools to aid in coming up with a suitable proof of a property.

The important part of this is that only the proof checker needs to be correct. The other parts of the theorem prover could just generate random garbage proofs, and they would be rejected as incorrect proofs of the property under consideration by the proof checker. In practice, the techniques used in proof search have some very good heuristics about the chance that they will succeed in finding a valid proof, and the time it will take for them to find such a proof, but these are just for the benefit of their users. These proof-search aids do not simply return "true" if they find a valid proof - they instead must return the proof itself, which the proof-checker can then verify.

So, hopefully you have been suitably convinced that most of the theorem prover does not need to be correct.

So how do you know that the proof checker is correct?

I'm glad you asked. This is going to require a quick whirlwind tour of how mathematical logic works.

Theorem provers like Isabelle and Coq have a wonderfully rich set of constructs built into them - numbers, arithmetic, records, functions - and the list goes on. The important fact about all of these things is that they can be constructed from a very small set of much simpler constructs.

There are a class of logics called Hilbert Systems that are built using a very small number of axioms and inference rules, and which permit you to build up most of the familiar constructs from logic and mathematics in a systematic way. Similarly, Zermelo–Fraenkel set theory also allows one to build up almost all of mathematics in a systematic way. This is old magic, and it is pretty well understood.

Because I'm writing this for a general audience, I don't feel the need to scrape the depths of my own knowledge to come up with the specifics of the logical foundations used within specific theorem provers, but if you get the general idea that there have been many attempts to formalise all of mathematics from a very small set of core constructs, and many of them have been successful enough to give us a formalised foundation for the types of mathematics most of us care about, then you already know how a theorem prover works.

So once we've encoded this very small core manually, we can build all the rest of our theorem prover theories from this small core. We can mechanically check that we are doing this right, either by hand, or using various automated proof-checking techniques.

The small core inside the theorem prover is all that is really important now, in terms of correctness. This can be verified by hand, and it can also be verified using other theorem provers.

But even this doesn't really need to be correct. When we use a theorem prover to aid us in proving a property, we end up with a proof script - some machine-readable version of the proof. This doesn't necessarily rely upon the software that was used to help create it (although it might get messy trying to decouple it). This proof can be verified using any other piece of software, or it can be verified by hand without any creativity - you just have to mechanically apply each step in the proof and satisfy yourself that each step follows by a valid step of inference from the last.

The final important property of theorem proving software is that we can always look for consistency. If we get anything wrong anywhere along the way, proofs are going to start failing. It is already true that there are some valid statements that cannot be proved to be true in the logic in question, but this isn't really a problem as far as the correctness of the theorem prover is concerned. We would only be concerned if there were false statements that could somehow be shown to be true inside our logical system - and this is not the case! It is erroneous to prove a contradiction, so if this happens, mechanical proof checking will reject the proof.

But what if the proof has bugs in it
?

Hopefully you are satisfied that we can have a very high degree of confidence in the results we get out of a theorem prover, so a proof is not like a "program" in the sense that we can have valid steps of reasoning and just "get the wrong answer". The only place where real "bugs" can be introduced is in the human side of things, where we choose the wrong safety or liveness properties that do not actually capture the properties we are interested in.

There are several strategies that we can use here. The first is syntax and type checking on the specification itself. Given that we are expressing our safety properties inside a theorem prover using something that is essentially a programming language, we can use the same techniques that have been shown to be very useful in enforcing consistency and detecting errors in traditional programming languages. This should remove the possibility that a simple typo could be totally catastrophic.

The second way is enforcing a discipline where we re-state the same properties in different ways. By doing this, we ensure that either a proof of one of the properties will fail, or we will introduce inconsistency between our encodings of the properties which will be rejected by the theorem prover.

Finally, and although I've previously argued against it, be careful. This is very different to the suggestion that you just "be careful" when constructing a program in a traditional language. This is about a set of safety properties. You can only guarantee that your program is correct with respect to the specification that you have written down. If you are falsely claiming confidence in your software where none exists, then that would be just as catastrophic in any environment. I really think it's worth saying here that we are way off into the land of the abstract. Even in a complex specification, it would be very difficult to encode things in such a way that you succeed in proving a whole set of properties, none of which provide the guarantees you think you are making (the proof is likely to fail if you have an inconsistent model of what you think you are proving and what you are actually proving). This is a much stronger safeguard against human error than is provided in any traditional programming environment available today.

But what if the compiler is incorrect?

There are two basic counter-arguments to this. Basically, in proving properties about software, you are either working with an idealised model of the software encoded into the theorem prover assuming some particular semantics for the target language and/or underlying hardware, or you are actually writing the software inside the theorem prover, and then generating code once you have proven all the properties of interest. In either case, you are somewhat at the mercy of compilers. This is always the case though - any correct program could be made incorrect if there are faults in the compiler or the hardware. You can take two approaches here - assume that the hardware and compiler work as expected (therefore invalidating your guarantees of correctness if they do not), or make use of the recent work on certified compilation to at least remove the uncertainty of the compiler from the equation. There is a whole different body of work on (hardware) fault tolerance in software. If you're really getting to this level of safety, then you probably need to be an expert on these kinds of issues anyway. Theorem provers for software verification are concerned with the verification of certain properties. If validating your assumptions about the hardware and software environments are important, then you will need to do more work.

Is it worth it?

For some things, yes. This kind of formal software verification is hard work. It's very labour intensive, and it requires a great deal of effort. That said, it is getting easier as the tools get better, the barrier to entry is getting lower. I don't think it will be very long before we start seeing a lot more in the way of light-weight theorem proving taking place as a standard part of compilation. In a world where software failure costs huge amount of money, it doesn't seem so remote a possibility that it will become practical and cost-efficient to apply formal verification to more software.

--

Personal plea: I'm still looking for reasonably-priced accommodation for my partner and I in Copenhagen if anyone has any leads. I'm eager to go start my PhD at ITU, but need to find somewhere to live first.

Friday, 2 October 2009

Insecure software is your fault

Security breaches in software are really just a special case of software failure, but it has the dubious honour of having a "community" build around it. There is a cost associated with all software failure, but most people are willing to tolerate a certain amount of failure in their applications - as I'm often reminded whenever I discuss formal methods with other programmers - "mostly working is good enough". I can understand this viewpoint (although I don't agree with it), however by this reasoning, what is the acceptable level of security vulnerability in software?

You can tolerate your word processor crashing every few days, but is it acceptable for your entire customer database to be stolen once every few days? How about your production server being rooted once a month? A crippling worm attack once per year? Where does the cost/benefit relationship come into play here?

I don't believe that anyone would seriously advocate the position that a serious security breach once per year resulting in data loss or theft is "acceptable". I doubt anyone wants to have to pay for system administrators to reinstall compromised machines once a month.

The title of this article is "insecure software is your fault". I use "your" to refer to developers. I'll expand on this towards the end of this article, but basically, I believe this to be the case because we have the tools to eliminate most kinds of software vulnerabilities, and yet they still exist, because programmers continue to not employ any of these tools or techniques. If you aren't a software developer, or you actually use the best tools and practices available to you to produce good software, then you can safely assume that I don't mean that insecure software is your fault.

I should create some distinction here between two different types of software "insecurity":

1) Flaws and defects caused by language-level features such as manual memory management or insufficient sanity checks on input that permit damage to data or unauthorised access at the same privilege level as the program itself. SQL injection and some buffer overflow attacks would seem to be obvious examples of this kind of thing.

2) Flaws or defects caused by deficiencies in design - i.e., the software is unable to be compromised itself, however it enables undesirable actions within the application itself. Cross-site scripting attacks are the most obvious example of this - the browser itself remains uncompromised, but any data inside the browser is basically at risk due to an inadequately designed security model.

In the first class of vulnerabilities, there are language-based solutions. Type safety guarantees are sufficiently strong to prevent most kinds of attacks that rely on runtime type violations to exploit flaws. Removing manual memory management from languages reduces another huge class of possible errors. Automatic bounds-checking is the third and final part of it. Contrary to popular myth, the performance cost of these things is very small, and yet the benefit is huge. You've just eliminated even the potential to inadvertently expose yourself and your users to a huge class of security vulnerabilities that will cost time and money to fix.

Claiming that you don't need any of these catastrophe-preventing measures because you're careful is a weak excuse. It's a weak excuse because every experience we've had shows us that being careful just isn't good enough. Suggesting that any developer who has ever introduced a security flaw just wasn't careful enough is a hugely arrogant assertion. If you accept that security vulnerabilities can happen to people who are both smart and careful, then why can't they happen to you? You have a responsibility to your users to be more than careful, you have a responsibility to be sure. And using language-based features that remove a huge class of potential errors is a way to be sure.

In some rare circumstances where performance concerns actually dictate that you must not use these features, don't write a network-facing service. The only place I suggest being careful is in isolating potentially insecure software from the outside world.

There is a small caveat to this - not all languages were created equal. In some cases the design of the language may be sufficient to remove a large class of vulnerabilities (through automatic memory management, bounds checking and sensible typing strategies), however in the interpreted case, you are essentially at the mercy of the interpreter. Your code may be perfectly sane, and the language may be sane, but the interpreter written in C isn't necessarily sane or safe. I add this only as a caution, because the same could be said of the operating system or even the hardware platform - you're only as secure as the things you rely upon. However, anecdotally at least, it's very rare that operating systems have vulnerabilities that are likely to be exploitable only within your application if your application is otherwise defect-free, whereas this isn't always the case with network-facing interpreters.

The second class of errors I identified is errors caused by flaws in design. These can involve things like leaving scope for protocol violations, insufficient security models and some downright stupidity. Very few of these situations have an obvious "language-based" solution at this moment. If your application-level security model is flawed, the language-based features aren't going to help. If illegitimate users of the application are able to cause mischief without ever "exploiting", then the security model is flawed. This could be as simple as leaving a "delete" action outside of some authenticated realm (allowing unauthenticated users to delete things they shouldn't be able to), or it could be that a specially crafted series of actions leads to an undesirable or inconsistent state.

Whatever the specifics, these are all essentially safety properties, as in "this software is not able to exhibit these undesirable behaviours". We actually have ways of checking safety properties nearly automatically these days. You can check various properties through model checking. You can build things inside theorem provers and prove safety properties. Tools like Daikon can even aid in the generation of various invariants through runtime analysis. ESC/Java2 can check certain kinds of safety properties at compile time. BLAST works on C programs. Spec# can express various safety properties, and there are hundreds more similar efforts.

Basically what I'm getting at is that these tools really aren't impossible to use, and they can give you concrete guarantees where you might otherwise have just been guessing. The main thing is the ability to ensure a globally consistent view of the implications of any design decision relating to security. I've previously asserted that humans aren't very good at keeping globally consistent mental models of complex systems, hence why I (and many others before me) advocate mechanical approaches to ensuring global consistency.

So the take-home point from this (rather long) article is that we actually have the tools right now to eliminate most of the kinds of security vulnerabilities that exist. The problem is that people aren't really using them in many places where they really should be. The much-reported damage caused by viruses and worms and "crackers" and DDoS attacks and zombies are all the direct result of insecure software. Sure, people who engage in those sorts of activities are criminals, but really, we're basically leaving our doors and windows wide open and yet we still seems surprised when somebody comes in and steals stuff. If anything, I think we should all start to mentally substitute the phrase "damage caused by cyber-criminals" with "damage caused by irresponsible software developers".

If you happen to be a "computer security professional" (scare quotes intentional), then perhaps you can educate me as to why anti-virus software and firewalls and chasing endless exploits through ad-hoc "penetration testing" is still regarded as the primary activity of the "security community", rather than actually making serious efforts towards preventing security defects in the first place. I have my own pet theory (spoiler alert: $$$), but I'd be interested to hear more informed opinions.

--

As a side note, I can now announce that I've accepted a PhD position in the PLS group at the IT University of Copenhagen. I'm currently looking for a place for my parter and I to live in Copenhagen from the start of November, so if anyone from Denmark reading this has any leads, please let me know!