Tuesday, 11 August 2009

If concurrency is easy, you're either absurdly smart or you're kidding yourself

Concurrency is hard. I like to think I'm a pretty smart guy. I'm definitely a very good programmer, and I've had plenty of theoretical and practical experience with concurrency. I still find concurrency very hard.

Concurrency is viewed as black magic, even by programmers who use it on a daily basis. When people tell me that concurrency is easy, it generally means one of two things - They are super-humanly smart, or the way they are using concurrency is restricted to nice simple patterns of interaction to which we can apply some simple mental analogy. Thread pools are fairly easy to think about, for example.

I'm talking about the kind of eye-wateringly difficult concurrency that arises when you are trying to extract some sort of task-level parallelism, or when you are working with loosely-coupled distributed systems (read "the general case"). The difficulty of concurrency is usually mentioned in the same breath as a list of common defects (e.g. race conditions, deadlocks, livelocks, etc), however I think it is important to remember that these things are all just outcomes that arise as a result of the difficulty, in the same way that hitting the ground is the probable outcome of falling off a building.

The possibility for a programmer to introduce problems such as deadlocks and race conditions is not what makes concurrency difficult. It's our total inability to mentally reason about concurrency that makes it difficult.

Most programmers above a certain level are very good at conceptualising large, complex systems. They can interrogate perceived weaknesses in a program before it is even written. This is how good programmers manage to write programs that are largely free of defects and that generally work in the way that is expected. Small-scale errors aside, a good high-level conceptual understanding of the system, coupled with an ability to mentally simulate the behaviour of any portion of a program provides programmers with most of the tools they need to construct good-quality programs. The programmer is likely to know (at least at some high level) how the entire system is going to work any any given moment.

Large-scale concurrency robs us of this ability to easily mentally simulate behaviour. It makes our mental models unimaginably complex. When it comes to concurrency, most of the intuitions we use to guide us through development simply break down. I think of it as a sort of night-blindness.

So what is different? Humans are very good at causality. When presented with the present state of the universe, plus some action, we can generally predict with a high degree of accuracy what the state of the universe will be after that action has been performed. We can integrate huge numbers of factors into our calculation of the future state of the universe - gravity, wind, light, sound, heat. If we start rolling a ball down a slope, we can imagine at any point in time approximately where that ball is going to be. What we don't expect, or even attempt to predict, is an inter-dimensional portal opening, an alternative-reality version of ourselves stepping out of the void and setting the ball on fire. And that's kinda concurrency for you. We are good at imagining ourselves in a situation. We are the observer in the code. Unfortunately in a concurrent situation, we are the observer in many different "realities".

Even with the knowledge that there is some algorithmic nature to the opening of dimensional portals during our ball-rolling activities, and that the alternative-reality observer's pyromaniac tendencies can moderated by holding up a certain flag, our mental models of the universe just don't extend to predicting this kind of behaviour. If I were inclined towards evolutionary psychology, I would probably suggest that it's because as a species, we don't encounter these kinds of situations in our natural habitats very often. We can predict cause-and-effect, but the idea of having to maintain a model of a second universe with its own rules as well as our own is just beyond the needs of most early humans, who were probably much more concerned with procreating and trying to avoid being eaten by tigers.

It's not just the breakdown of causality that is a problem - it's also syntax. I know I claimed I wasn't going to talk about syntax just two short posts ago, but it's kinda important here.

When we think about some function "f", we can conceptualise it as something like:

(S,x) ---(some computation)---> (S',y)

Where "S" on the right represents the state of the world before the computation occurs, "x" is some input, "S'" is the state of the world after the comptuation has occurred and "y" is some output. Given that most programming languages give us the ability to control scope, "S" is probably pretty small. We usually don't even bother writing it down, leading to the more comfortable:

f(x) = (some computation)

With an implied world-state being accessible to us before and after the computation. Causality applies and all is well with the world. We can mentally substitute any occurrence of "f(x)" with the behaviour of the computation.

Now, consider something concurrent. A given transition in a labeled transition system has one input state, one output state, plus a side-effect:

x ---a---> y

Assuming some interpretation of this LTS using name-based synchronisation, any other occurence of "a" in the system is going to be causally related to this occurence of "a". We're used to names just being names - they have no special meaning. Well now "a" does have a special meaning. This is where it all starts to get a bit dicey. We no longer have all of the information we need in front of us. Causality has gone from being able to consider events locally within some frame of reference to needing to maintain a globally consistent mental model of all the ways that an "a" action could occur anywhere in the universe. Reading this description of behaviour in isolation is no longer sufficient to tell us the whole story.

This isn't just some problem with name-based synchronisation. If anything, I think name-based point-to-point communication is considerably simpler than most of what we do in practice, and yet we are already at a point where our ability to rely on local information to give us clues as to how this piece of the system will behave has been taken away.

Introducing locking gets a bit scarier yet again. We're now required to know the state of our world at any given point in time, as well as needing to know the state of every other given world at any given point in time. And the notions of time in each world are not the same.

I think it's a wonder that we get concurrency right as often as we do. I think it's as much by careful attention to detail and dumb luck as by good planning or being good at this type of thing.

Now, before you jump up and down and tell me that "insert-your-favourite-concurrency-model-here addresses these problems", I know there are strategies for managing the complexity of these tasks. I did concurrency theory for a living, so I know some approaches are better than others. My point here was essentially to illustrate why it is important to manage complexity when doing concurrency. It's definitely not just a case of "you can't do concurrency so that means you are stupid". I'm arguing that pretty much everyone is bad at concurrency beyond some simple patterns of interaction that we cling to because we can maintain a good mental model by using them.

In my next exciting installment, I'll talk about how I think we can improve the state of the art. Spoiler alert: it involves Bigraphs, and some ideas that I hope I might be able to turn into a viable commercial proposition one of these days.

Oh, also, I recall reading a really good paper about the problems with syntax for representing terms in process algebras a few years ago, and I can't for the life of me remember what it was. If anyone has any ideas what it might have been, let me know. I'll give you a cookie.

14 comments:

  1. You didn't mention Clojure. Or Erlang. Or Haskell. Or Mozart Oz. etc. Curious how these don't elegantly address what you're talking about?

    ReplyDelete
  2. Oh, there are certainly some brilliant approaches that yield good results. I'll quote myself from above though:

    "Now, before you jump up and down and tell me that "insert-your-favourite-concurrency-model-here addresses these problems", I know there are strategies for managing the complexity of these tasks. I did concurrency theory for a living, so I know some approaches are better than others. My point here was essentially to illustrate why it is important to manage complexity when doing concurrency. It's definitely not just a case of "you can't do concurrency so that means you are stupid". I'm arguing that pretty much everyone is bad at concurrency beyond some simple patterns of interaction that we cling to because we can maintain a good mental model by using them."

    ReplyDelete
  3. Didn't you just contradict yourself? The problem with concurrency is side-effects. No side effects, no problem. If there's no mutable state then there's no race conditions, no need for locks on a 'critical section'. It's the same way as GPUs work. Have you ever written HLSL? That's massive concurrency at its best. You should really look into VVVV (.org) They figured out using HLSL concurrency for graphics, as well as networked concurrency with boygrouping.

    Sure, concurrency is hard in C++ or Java, you just need a better tool.

    ReplyDelete
  4. You`re suggesting that the difficulty people have with concurrency programming is an innate limitation of the brain and our understanding of causality. Personally, I feel that it`s more likely to be a consequence of the way people learn to program. Put another way, if we taught people how to program concurrency from the start, it might be possible to overcome this perceived limitation.

    ReplyDelete
  5. VS, yes, that's exactly what I'm saying. I do wonder if learning concurrency first might make life easier, however, I present an interesting counter-example:

    A supervisor of mine came to programming via process algebras, that is, he learned concurrency first. He has been publishing papers on concurrency theory for the better part of 20 years, and has probably been programming for around half that time. He still remarks on the incredible difficulty of maintaining a consistent mental model of a system. It was through my conversations with him on this topic that inspired me to write this post.

    So we're certainly versatile enough creature to teach ourselves to do difficult things. I guess my point is just that the way the universe is structured isn't doing us any favours :)

    ReplyDelete
  6. Andy, that's a class of problems that decompose well into simple models of concurrency.

    You've basically taken what I've said in the abstract and you're making it very, very concrete before responding. Concurrency is difficult with just a pencil and a piece of paper! Don't worry about the tools or languages or technologies. Write down some terms in a process calculus. Without using any mechanical methods (i.e. without calculating the result), try to conceive of the entirety of the behaviour in that system. It's hard! There's no mutable state there - just some actions.

    So, the point I made in the comments was that methods that permit us to have a good mental model of the behaviour of a system are great, but just storing concurrent behaviours in your head in such a way that you can interrogate their behaviour with reasonable accuracy is hard! When we have metaphors which allow us to hide some of the complexity, it gets easier.

    Or maybe it's all easy for you, in which case, you're probably absurdly smart :)

    ReplyDelete
  7. Also, I'm not sure how side-effects came up there. I used "side effect" to mean "an action being communicated on a channel" in the process algebraic sense, not in the my-programming-language-doesn't-have-side-effects kind of way.

    ReplyDelete
  8. I disagree completely. Humans start out learning concurrency, and then we hammer on them for decades to make them rigourous and serial in their thinking. It is then no surprise that they fall on their face when we ask them to solve the same problems, but in parallel algorithms; to think non-sequential.

    I believe computer languages are at fault for failing to describe the sort of richness that the human mind is capable of. As someone involved with a major corporation working on compilers for big metal like roadrunner, I'd like to hope that I have some say in how the language business will fix this.

    But don't rely on the companies, or the few of us in them who want to tackle this at the language level. Embrace languages that do parallelism right. If a language constrains side-effects, embrace it, because that's what gives us the most problems. the side-effect free languages like Haskell will rule the world as the number of cores increases.

    ReplyDelete
  9. clord, That sounds like it deserves to be nailed to the door of something.

    ReplyDelete
  10. Very well written; you have put to words what many of us take for granted.

    I study concurrency, and though I admit it can be very difficult to reason about concurrent systems (in any language), I have always been able to make it work. This feat has been through clean, careful design. Here are the guidelines I use:

    (1) (as you noted), I try to limit the scope of all computations. This corresponds exactly with good object oriented design.

    (2) I try to conceptualize the program as dataflow, instead of as control flow. I think this is useful because, if you want to take two parts A and B from your program and put them in two threads, you are immediately thinking of the values from A which contribute to B, and thus are thinking about thread synchronization. It also means that you are thinking about (and presumably trying to avoid) nasty things, such as cyclic communications between threads, which can be a nasty performance killer.

    (2b) Said another way, conceptualizing your program as data flow is the same as conceptualizing it as a program written in a functional style.

    (3) Armed with [1] and [2], I then try to frame the overall architecture as many instantiations of simple patterns: producer/consumer, software pipelines, worklists, etc. If you're thinking about muteces and semaphores, you're thinking at too low a level; think instead about concurrent queues.

    (4) Finally, I have a bag of tricks to simplify otherwise complicated multithreaded systems. Do two threads need to share a data structure? Maybe you can privatize it (or otherwise ensure that two threads are operating on disjoin halves of it), thus eliminating many inter-thread synchronizations. Does a value needs to be transferred from one thread to the next? Sometimes its faster and easier to recompute it. etc. etc. etc.

    When I do all this, things seem to work out well. Let me say explicitly, though, that this refers to systems that were developed from day 0 with concurrency in mind.

    Another thing to note is that all the patterns I mentioned are ususally considered high-level patterns for coarse grained parallelism. I very much disagree with this, and I emphasize that they can be applied to exploit fine-grained parallelism too.

    But yeah, I'm going to subscribe to your blog.

    ReplyDelete
  11. Thanks Nick. I think you've hit the nail on the head with pretty much all of those points.

    I think it would be fascinating to get your hands on a big corpus of very concurrent software and do some actual statistical analysis of the sorts of tactics that people use to manage the complexity. Sadly, like many things which require a large corpus of software and analysis, I don't have the time!

    ReplyDelete
  12. Gian,

    For me, when I saw the change in state S -> S' in your comment:

    (S,x) ---(some computation)---> (S',y)

    This got me thinking about side effects. Maybe I do not understand process algebra well enough, but based on my knowledge of programming, this is what I took it to mean.

    I was glad to see the discussion of functional languages like Haskell come up. I would like to add declarative languages for those who might not already be familiar.

    Todd

    ReplyDelete
  13. That wasn't really a very process algebraic formulation on my part, it was just meant to capture some notion of computation.

    And yes, that was basically meant to illustrate the "connection points" between a computation and its environment. Obviously in the purely functional world (or indeed, any other world where arbitrary side effects can't occur), S = S'.

    ReplyDelete
  14. I agree with the spirit of your post which I interpret as "be humble when writing concurrent code". It is very easy to get over-confident about one's understanding of concurrent code, only to get slammed later by a use case or hardware platform that shows up your bug. This is true, of course, of any program but the risk of these bugs escaping into the wild are much, much greater with concurrent vs. sequential code.

    What I don't agree with is that there is anything fundamental either in human cognition or even in standard software engineering education that makes concurrency hard. IMHO, the fundamental problem, as I think clord mentioned, is the set of abstractions (threads, locks, etc.) that most programmers have at their disposal. If we forced most sequential programmers to write assembly code instead of depending on high level languages and compilers, we'd have a similar set of problems with software quality & productivity (hey, that already happened!).

    The clue to the solution is implied in your description of the physical world. Humans can understand concurrency very easily if a few rules are put in place. Physics is a good example of this kind of rule set. Without a good understanding of those rules, we would't be able to function at all in the real world. However, by understanding the basics like gravity and momentum (and a few rules of thumb on predicting the behavior of our fellow humans), we can deal with massive levels of concurrency every day. Think about driving down a multi-lane highway at 60+mph. The slightest error can create a catastrophic collision and yet, with very few exceptions, we all deal with it easily. Some of us can even talk on the phone, text, put on makeup or shave while doing it (yes, I've seen all of those!)

    So back to concurrency in programming. I think the critical thing we need to do is not to try to solve the general problem of concurrency management but to gravitate toward a standard family of (in your words) "nice simple patterns of interaction to which we can apply some simple mental analogy". Indeed, this is really what we've done in sequential programming decades with abstract notions like a uniform memory area or complex expression evaluation, neither of which exist in real hardware. We've even adopted simple concurrency patterns and probably not even realized it. The UNIX shell is a great example. Every day, people write scripts and commands which use concurrency and synchronization, never running into problems with deadlocks or data races.

    So the question we should be asking is what are those patterns?

    ReplyDelete