Adventures in Programming Languages and Semantics
I write lots of words about lots of things. May or may not contain tautological statements.
Sunday, 21 October 2012
The average value of an idea is (almost) zero
Wednesday, 13 April 2011
NZ Copyright Amendment Letter
Dear Mr Bennett,
I am writing to you as a constituent to express my deep concern for both the substance of the Copyright (Infringing File Sharing) Amendment currently before the house, and for the procedure by which this legislation has reached its current status.
The use of Parliamentary urgency under the auspices of assisting with the aftermath of the Canterbury earthquake to advance this controversial and flawed legislation is deeply disturbing to me. It reflects a perversion of the democratic process in which due time and consideration is given to the will of the people. This attempt to rush legislation with no particular time pressure attached to it through the house reflects poorly upon the dignity and legitimacy of our democracy.
Furthermore, the tone and substance of the debate in the house suggested that the majority of the MPs are not well-informed about the complex technical, legal and social implications of this legislation, particularly the "disconnection clause" that gives an extra-judicial remedy to intellectual property owners who are not required to submit to the usual burden of proof as in any other civil action. Given the fundamental nature of the internet within New Zealand business, education and personal life, this remedy that may be applied in the presence of a mere accusation seems disproportionate and set to disadvantage both the businesses that provide internet services within New Zealand, and the consumers of internet services who rely on them for so many facets of daily life.
As a doctoral student in Computer Science, the many technical and social issues that remain unaddressed by this legislation are perhaps more worrying to me than to those members of Parliament currently advancing this legislation. I (and many others within the internet community in New Zealand) feel that this legislation, while addressing a perceived problem, has not been adequately discussed and refined to the point where it could be effective in addressing the purported issues. This is another reason why attempting to effect this legislation so quickly is inappropriate.
I hope that you will take my genuine concerns in the spirit that they are given, and that you can be persuaded to oppose both the process and substance of this legislation until such a time as the full implications can be considered and discussed by the public at large, and effective, fair legislation can be drafted that will encourage the development of healthy creative industries and a healthy internet industry within New Zealand.
Yours sincerely,
Gian Perrone, B.Sc (Hons), M.Sc (Hons)
As an aside, be warned that if this legislation passes, anyone reading this letter on this blog is probably guilty of *gasp* file sharing!
Sunday, 1 August 2010
"It's a hardware problem" is a programmer problem
- Time
- Performance
- Concurrency
Friday, 26 February 2010
What are the semantics of your favourite programming language?
- Learn the syntax of one language. Do some basic coding. High rate of errors.
- Develop intuitions around semantics of that language. Rate of errors decreases.
- Learn the syntax of another language. Do some basic coding. Rate of errors slightly less than after learning the syntax of the first language.
- Develop intuitions around semantics of new language, relating these intuitions to the intuitions about the semantics of the first language. Rate of errors decreases.
Tuesday, 23 February 2010
The state of the world (and why I still think it's a bad idea)
Monday, 7 December 2009
By Request: My thoughts on Perl 6
Bad idea:
multi unique (*$x, *@xs) { member($x, |@xs) ?? unique(|@xs) !! ($x, unique(|@xs)) }
Monday, 5 October 2009
The bug-free myth: Why we trust verified software
The Bug-free myth
All too often, I hear the phrase "free of bugs" used in conjunction with theorem proving or formal verification of software. I want to make perfectly clear, this is not what theorem proving claims. You'll often hear phrases like "guaranteed correct". This is an (informal) shorthand for "guaranteed correct with respect to some specification". Allow me to make the distinction clear with an example.
Here is a program:
This program isn't very useful. It is not, however, incorrect (nor is it "correct"), without reference to some specification. If our specification is simply "keep incrementing a variable until the universe ends", this program is correct. If we were expecting this program to calculate something useful, or produce some output, then it is obviously incorrect. The "correctness" judgement always follows from a specification, and no judgement can be made in the absence of a specification.
x = 0;
while(1) {
x = x + 1;
}
This definition covers the (informal) usage of "correctness", but this is still distinct from "bug-free". The main activity within formal verification of software is generally the verification of specific safety and liveness properties.
"Safety" is approximately the guarantee that some bad thing can never happen. It might be that a certain storage location can never overflow, or that a value is always positive, or that a pointer is not null, or even much more concrete functional things, like "the autopilot will not switch off the engines while the plane is in flight". Keep in mind that doing nothing is always "safe". It's impossible for a storage location to overflow if the program never does anything. A program that simply halts immediately will fulfil almost any safety property you care to dream up, so we can re-state the definition as "if anything happens, it will be something desirable".
This is where liveness comes in. This is approximately the guarantee that "some desirable thing will happen (eventually)". These might be properties like "the doors will open", or "the value stored in X will be incremented".
When people talk about software being verified, they really mean that a set of safety and liveness properties have been shown to be true of the software in question. When undertaking a verification activity, we choose a set of safety and liveness properties that are of interest or importance, and we prove those. A property like "there are no bugs" is completely useless (and unprovable), however, for some safety property X, we can at least say "there are no bugs that would cause property X to become false".
One of the main ways we verify these safety and liveness properties is through the use of automated or interactive theorem provers. These are pieces of software that permit the construction of mathematical proofs, by both checking the proofs, and by providing some assistance to a human in searching for a suitable proof that some property holds. Unfortunately, as soon as you mention theorem provers and all the wonderful things they do, this usually leads to a new objection.
But what if the theorem prover has bugs in it?
OK, it seems a natural enough objection, but you don't think we're completely dumb, do you? For people in the business of proving software correct, you have to assume that we have some reason to believe the software we use is correct.
But actually, correctness within theorem provers is less important than you might think. I said earlier that a theorem prover has two basic functions - it acts as a proof checker, and it provides tools to aid in coming up with a suitable proof of a property.
The important part of this is that only the proof checker needs to be correct. The other parts of the theorem prover could just generate random garbage proofs, and they would be rejected as incorrect proofs of the property under consideration by the proof checker. In practice, the techniques used in proof search have some very good heuristics about the chance that they will succeed in finding a valid proof, and the time it will take for them to find such a proof, but these are just for the benefit of their users. These proof-search aids do not simply return "true" if they find a valid proof - they instead must return the proof itself, which the proof-checker can then verify.
So, hopefully you have been suitably convinced that most of the theorem prover does not need to be correct.
So how do you know that the proof checker is correct?
I'm glad you asked. This is going to require a quick whirlwind tour of how mathematical logic works.
Theorem provers like Isabelle and Coq have a wonderfully rich set of constructs built into them - numbers, arithmetic, records, functions - and the list goes on. The important fact about all of these things is that they can be constructed from a very small set of much simpler constructs.
There are a class of logics called Hilbert Systems that are built using a very small number of axioms and inference rules, and which permit you to build up most of the familiar constructs from logic and mathematics in a systematic way. Similarly, Zermelo–Fraenkel set theory also allows one to build up almost all of mathematics in a systematic way. This is old magic, and it is pretty well understood.
Because I'm writing this for a general audience, I don't feel the need to scrape the depths of my own knowledge to come up with the specifics of the logical foundations used within specific theorem provers, but if you get the general idea that there have been many attempts to formalise all of mathematics from a very small set of core constructs, and many of them have been successful enough to give us a formalised foundation for the types of mathematics most of us care about, then you already know how a theorem prover works.
So once we've encoded this very small core manually, we can build all the rest of our theorem prover theories from this small core. We can mechanically check that we are doing this right, either by hand, or using various automated proof-checking techniques.
The small core inside the theorem prover is all that is really important now, in terms of correctness. This can be verified by hand, and it can also be verified using other theorem provers.
But even this doesn't really need to be correct. When we use a theorem prover to aid us in proving a property, we end up with a proof script - some machine-readable version of the proof. This doesn't necessarily rely upon the software that was used to help create it (although it might get messy trying to decouple it). This proof can be verified using any other piece of software, or it can be verified by hand without any creativity - you just have to mechanically apply each step in the proof and satisfy yourself that each step follows by a valid step of inference from the last.
The final important property of theorem proving software is that we can always look for consistency. If we get anything wrong anywhere along the way, proofs are going to start failing. It is already true that there are some valid statements that cannot be proved to be true in the logic in question, but this isn't really a problem as far as the correctness of the theorem prover is concerned. We would only be concerned if there were false statements that could somehow be shown to be true inside our logical system - and this is not the case! It is erroneous to prove a contradiction, so if this happens, mechanical proof checking will reject the proof.
But what if the proof has bugs in it?
Hopefully you are satisfied that we can have a very high degree of confidence in the results we get out of a theorem prover, so a proof is not like a "program" in the sense that we can have valid steps of reasoning and just "get the wrong answer". The only place where real "bugs" can be introduced is in the human side of things, where we choose the wrong safety or liveness properties that do not actually capture the properties we are interested in.
There are several strategies that we can use here. The first is syntax and type checking on the specification itself. Given that we are expressing our safety properties inside a theorem prover using something that is essentially a programming language, we can use the same techniques that have been shown to be very useful in enforcing consistency and detecting errors in traditional programming languages. This should remove the possibility that a simple typo could be totally catastrophic.
The second way is enforcing a discipline where we re-state the same properties in different ways. By doing this, we ensure that either a proof of one of the properties will fail, or we will introduce inconsistency between our encodings of the properties which will be rejected by the theorem prover.
Finally, and although I've previously argued against it, be careful. This is very different to the suggestion that you just "be careful" when constructing a program in a traditional language. This is about a set of safety properties. You can only guarantee that your program is correct with respect to the specification that you have written down. If you are falsely claiming confidence in your software where none exists, then that would be just as catastrophic in any environment. I really think it's worth saying here that we are way off into the land of the abstract. Even in a complex specification, it would be very difficult to encode things in such a way that you succeed in proving a whole set of properties, none of which provide the guarantees you think you are making (the proof is likely to fail if you have an inconsistent model of what you think you are proving and what you are actually proving). This is a much stronger safeguard against human error than is provided in any traditional programming environment available today.
But what if the compiler is incorrect?
There are two basic counter-arguments to this. Basically, in proving properties about software, you are either working with an idealised model of the software encoded into the theorem prover assuming some particular semantics for the target language and/or underlying hardware, or you are actually writing the software inside the theorem prover, and then generating code once you have proven all the properties of interest. In either case, you are somewhat at the mercy of compilers. This is always the case though - any correct program could be made incorrect if there are faults in the compiler or the hardware. You can take two approaches here - assume that the hardware and compiler work as expected (therefore invalidating your guarantees of correctness if they do not), or make use of the recent work on certified compilation to at least remove the uncertainty of the compiler from the equation. There is a whole different body of work on (hardware) fault tolerance in software. If you're really getting to this level of safety, then you probably need to be an expert on these kinds of issues anyway. Theorem provers for software verification are concerned with the verification of certain properties. If validating your assumptions about the hardware and software environments are important, then you will need to do more work.
Is it worth it?
For some things, yes. This kind of formal software verification is hard work. It's very labour intensive, and it requires a great deal of effort. That said, it is getting easier as the tools get better, the barrier to entry is getting lower. I don't think it will be very long before we start seeing a lot more in the way of light-weight theorem proving taking place as a standard part of compilation. In a world where software failure costs huge amount of money, it doesn't seem so remote a possibility that it will become practical and cost-efficient to apply formal verification to more software.
--
Personal plea: I'm still looking for reasonably-priced accommodation for my partner and I in Copenhagen if anyone has any leads. I'm eager to go start my PhD at ITU, but need to find somewhere to live first.