Friday 26 February 2010

What are the semantics of your favourite programming language?

"Focusing purely on syntax is like reading Shakespeare and then commenting on the excellent quality of the spelling" -- David Streader

I came across a recent post about the importance of semantics on the Abstract Heresies blog. While very short, it summed up well my general feelings about the attitudes of programmers to things like semantics. I get the feeling that for many, it goes something like this:
  • Learn the syntax of one language. Do some basic coding. High rate of errors.
  • Develop intuitions around semantics of that language. Rate of errors decreases.
  • Learn the syntax of another language. Do some basic coding. Rate of errors slightly less than after learning the syntax of the first language.
  • Develop intuitions around semantics of new language, relating these intuitions to the intuitions about the semantics of the first language. Rate of errors decreases.
And so on. I should take this opportunity to define what I mean by "semantics" (please excuse the self-referential nature of the last statement and this one). In the world of languages (formal or informal), there exists a distinction between the concrete representation of something, and what it actually means. In programming languages, this is essentially the difference between the program text that you feed into a compiler or interpreter, and the behaviour of the program once it is run. The typical approaches to defining semantics fall into two general categories - operational semantics, where one defines a kind of "abstract machine", specifying how its state is modified upon encountering some construct, and denotational semantics, where constructs in the language are mapped onto other structures that give them meaning (these are usually various kinds of mathematical structures). In human language, this syntax/semantics distinction could be demonstrated in the difference between the sounds that a person makes while speaking ("syntax"), and the understanding that one takes away from those words ("semantics").

Fundamentally, most errors in programs come from a disconnect between a programmer's understandings of the semantics of some system, and the actual semantics of that system. This could mean misunderstanding the effect of some statement in a programming language, or misunderstanding the effect of some statement in a specification (formal or informal).

Consequently, I really think that it is important to emphasise the semantics of programming languages. It seems like there is too much emphasis placed on syntax. Syntax is largely uninteresting - it differs in only trivial ways, even between languages that are radically different.

If I write down this definition:

function mystery 0 = 0
| mystery x = 1 / x

We can figure out what this function does based on our understanding of function application, and of division. That's not "real" syntax for any particular language. However, if we were to concretise it to a language, it would probably look similar in most languages, however we would very quickly begin to encounter semantic distinctions. Some languages will round the result, some will truncate it. Some will keep infinite precision, others will use floats or doubles. I would bet that with 5 different languages picked at random, you could get 5 different results.

These are relatively minor distinctions. How about this one?

function rec x = rec (x + 1)

It's obviously non-terminating, but what exactly does non-termination mean? Some languages will overflow the stack or exhaust some kind of memory limit within a finite number of applications of the function. Some languages may overflow their underlying numerical representation. Some might keep computing until they exhaust main memory. And some languages will never compute this function at all (because the effect is unobservable, so it'll be optimised away).

The effect starts getting even more pronounced when you start talking about objects and concurrency. Are your messages blocking or non-blocking? Is transmission guaranteed? How about ordering? Are there implied synchronisation points or locks? What does it mean for an object in one thread to hold a reference to an object in another thread, and call methods on that object?

While I'm an advocate of formal semantics, I'm a realist. I understand that that would not necessarily enrich the lives of all programmers. But c'mon guys, how about even an informal specification of the semantics of some of these languages that are in common use? Programmer experimentation gets you some of the way, but nothing really beats a rigorous treatment to shine a light in the dark corners of languages where errors hide.

4 comments:

  1. I like the post, but what's with the quote at the top? Syntax is a huge part of each language, while the "spelling" in Shakespeare's play really doesn't matter, ever.

    ReplyDelete
  2. I think most (successful) languages do have an informal specification. Usually, it's in the form of a tutorial or reference implementation.

    ReplyDelete
  3. Kurtis, my point is that syntax is relatively insignificant compared to the semantics. To focus on the syntax, rather than the semantics of programming languages is like focusing on the trivialities (such as spelling) in Shakespeare and not enjoying the plays!

    ReplyDelete
  4. j_baker: True, and that's a trend which needs to be extended further, I believe.

    ReplyDelete