Monday 7 September 2009

Why programming language design is hard (and a few ways it can be made easier)

When designing a programming language and accompanying implementation, you are forced to keep thinking at three different levels - the way that the implementation works, the way that people will use it, and the requirements that influence design decisions.

On the face of it, this is no different to any other software development activity. In building a stock control system or any other common piece of software, I would be forced to think about the way it will be used (e.g. how users will enter items and how the people in the distribution centre will scan items as they leave), the implementation (e.g. what classes/functions/methods implement this functionality) and the requirements influencing the design (e.g. Mr. Big Boss of International Corporation really wants the software to be beige).

The difference with programming languages and many other types of software implementations is that it is difficult to relate your requirements and design decisions back into the real world. When designing a stock control system, a stock item is designed in such a way that it is some representation of a real item in a real warehouse. The behaviour of the system attempts to reflect real processes carried out by real people. You can go down and meet the people who dispatch the items if you want, and ask them all about their jobs. Having these two very concrete things at your disposal makes implementation correspondingly easier, because you have a "real world" system against which to validate your software system.

Programming language design lacks these analogous constructs. The "behaviour" of the programming language basically corresponds to the semantics of the language, which you're probably still trying to nail down (it's unlikely that you have a written semantics, right?). The way the users are going to make use of your programming language depends on what the semantics are. While you can contrive examples and try to implement them in the language, you lack perspective, because the semantics that you have implemented are exactly the way they are because they correspond to your understanding of the semantics. You're probably the person worst-equipped to make judgements about the usability of your own creation. This removes our two best tools for guiding an implementation; the ability to predict users' behaviour with respect to the system, and concrete things against which to verify our assumptions.

A few solutions

1) Get feedback early and often - this is important in all software development, but if you are designing a new programming language, you need to be talking to people all the time. While some people you talk to will demonstrate objections on some sort of dogmatic ideological basis ("I don't like static type systems, and you have a static type system, so the entire language is junk"), explaining the semantics to other people and gauging how much effort it requires for them to get a sense of the language will be very helpful in guiding future decisions.

2) Know why you're designing a new programming language - novelty or fun are perfectly valid reasons for doing PL design, but never lose sight of the fact that almost nobody will use a new programming language unless it does something better/easier/faster than they can already do it in one of the 10 trillion programming languages in the wild today. Think about the kinds of programs that you want the language to be able to express easily, and what kinds of programs will be more difficult to express. You might be aiming to create a general-purpose programming language, but that doesn't mean it's going to be good at absolutely everything. If you do claim to be designing a general-purpose language, here's a good thing to keep in mind at every point during your development: "How would I feel about writing this compiler in my new language?"

3) Write things down, often - Your ideas will probably change the more concrete your design gets. As you start being able to interact with your design (because you have a partial implementation of an interpreter or compiler), you will probably change your mind. Writing down a description (formal or informal) of your syntax and semantics will help keep you from completely contradicting your own stated goals. If (as in point 2) you decided that you were designing a language that would be good at expressing programs of type X, and then you subsequently make some changes that make it very difficult to express those types of programs, then what was the point of your language again? Keep referring to point 1.

4) Write tests - however sceptical or enthusiastic you may be about test-driven development in other endeavours, I can't stress enough how useful it is for programming language design. At the very least, keep a directory full of the test programs you write and automate some means of running these. If you can write tests down to the statement level, do that. You can never have too many tests. As I said in point 3, your design is likely to change. You need to be bold in refactoring and redesigning, otherwise your ability to change fundamental design decisions will suffer. You'll end up building "hacks" into your syntax or semantics just because you really aren't sure if changing some assumption or construct will break everything. This has the added benefit of providing you with a whole lot of sample code when you come to write documentation.

5) Think about the theory - Don't be afraid of theory. Theory is your friend. A lot of people have published a lot of papers on many facets of programming language design. It's worth learning enough about the theory of programming languages to enable you to read some of this published output. If you're stumbling around in the dark trying to figure out a way to statically enforce some property, or a more efficient register-allocation algorithm for code generation or just trying to figure out why your parser generator is complaining about your grammar, chances are that somebody has solved this problem before, and has expressed some solution to it in a published form. You'll find it much easier to relate the constructs in your new language to other existing languages if you don't reinvent the terminology wheel too. I can't recommend strongly enough Types and Programming Languages by Benjamin C. Pierce as an excellent starting point for all things theoretical in the programming languages world.

6) Consider using ML - I wouldn't be the first to note that ML is really good for writing compilers. Obviously, your circumstances may be different to mine, but having written compilers and interpreters in a few different languages, I haven't found anything quite as good as ML at expressing the kinds of parsing, validation and translation activities that are integral to compiler implementation. The type system helps prevent the majority of errors that you are likely to encounter while writing compilers. Recursive data types express most internal compiler structures perfectly, and the presence of mutable storage and imperative I/O means that you don't need to jump through too may "functional programming" hoops in order to achieve some of the things for which those features are useful. Also, the very existence of Modern Compiler Implementation in ML by Andrew Appel (which I consider to be the best introductory text on compilers to date) should make you at least think about having a look before you decide on an implementation language for your next compiler.

7) Don't be afraid of being the same - There is a tendency towards ensuring that your syntax and semantics are different from everything else in existence. If language X has a feature that is similar to yours, do you really need to invent new syntax and terminology? If something exists elsewhere, it's probably worth your time to figure out how it works and how it is expressed, and then re-implement it in a way that would be familiar to users of language X. If you are actually interested in having people adopt your language, then you should reduce the learning curve as much as possible by only inventing new terminology and syntax where it is appropriate to do so (i.e., because it is either a novel feature, or there is some improvement that can be made by doing it differently). If you're creating a new language, you've already established that there are things you can do differently or better, so it's not necessary to go overboard in differentiating yourself. If it were, PHP would never have become popular in the presence of Perl. Clojure and Scheme would never have survived the disadvantage of Lisp already existing, and so on.

Having laid out all of these points, I'm certainly guilty of not taking my own medicine at times. I've largely written this out because I'm currently going through a design and implementation process with Lope, and I don't want to repeat my past mistakes!

9 comments:

  1. Thanks!!! was very informative!! :)
    http://twitter.com/iankits

    ReplyDelete
  2. Ok, I will use the role of a critical thinker.

    What you have left out is, in my opinion, that creating AND maintaing good programming languages is a tedious, difficult task. It requires that people have the knowledge to design such a language, and work out when problems arise.

    How many people can do so? It is way too much work to create a successful language and keep driving it. The ultimate reason why many languages suck is
    a) the implementation sucks
    b) the improvements are slow and often not worth it
    c) different goals lead to completely different languages

    I guess Java simply never had the same focus - and never will - as python, but scala and groovy try to fit into the niche of python (ease of creating something in the language quickly).

    The ultimate programming language would be one that allows a user as much freedom as possible. Sounds complicated, but where is the problem if people could program in a way that fits their style? Why not prosa-programming that could be used to empower an OS?

    Data will always be Data. It's just that some people disagree whether to store them in a database or in XML files....

    ReplyDelete
  3. PS: I'd love to edit my own comment to correct some mistakes, on some other blogs this is possible for like 15 minutes before the version is "final" ... :(
    ah well.

    ReplyDelete
  4. Yeah, I'm not sure why blogger prevents that. I'll see if there is a setting for it :)

    ReplyDelete
  5. "The ultimate programming language would be one that allows a user as much freedom as possible. Sounds complicated, but where is the problem if people could program in a way that fits their style?"

    I don't agree with that. Languages that provide a lot of freedom are generally associated with wildly incorrect programs. I'm much more in favour of languages that more closely match the domains they are applied to, in which the compiler can make stronger judgements about correctness.

    ReplyDelete
  6. "How would I feel about writing this compiler in my new language?"
    Thats a terrible metric! Compilers are a niche market for a language. Most users of your language are going to want to be building some kind of business application which has very different requirements. If you follow the 'compiler-driven' approach you end up with a language optimised for writing compilers - I'd suggest thats one of the problems with Scala.

    ReplyDelete
  7. That was referring specifically to languages claiming to be general-purpose. If the idea of writing your own compiler in your language fills you with dread, then it's probably not really general-purpose. If you like though, you can mentally substitute "compiler" for "a program equivalent in complexity to a compiler".

    ReplyDelete
  8. This is exactly what i was searching for. Thanks for sharing Gian.

    ReplyDelete
  9. And one more to add -- if you not already have seen that -- exciting languages by Wouter van Oortmerssen: http://strlen.com/language-design-overview (especially Aardappel)

    ReplyDelete