Concurrency is viewed as black magic, even by programmers who use it on a daily basis. When people tell me that concurrency is easy, it generally means one of two things - They are super-humanly smart, or the way they are using concurrency is restricted to nice simple patterns of interaction to which we can apply some simple mental analogy. Thread pools are fairly easy to think about, for example.
I'm talking about the kind of eye-wateringly difficult concurrency that arises when you are trying to extract some sort of task-level parallelism, or when you are working with loosely-coupled distributed systems (read "the general case"). The difficulty of concurrency is usually mentioned in the same breath as a list of common defects (e.g. race conditions, deadlocks, livelocks, etc), however I think it is important to remember that these things are all just outcomes that arise as a result of the difficulty, in the same way that hitting the ground is the probable outcome of falling off a building.
The possibility for a programmer to introduce problems such as deadlocks and race conditions is not what makes concurrency difficult. It's our total inability to mentally reason about concurrency that makes it difficult.
Most programmers above a certain level are very good at conceptualising large, complex systems. They can interrogate perceived weaknesses in a program before it is even written. This is how good programmers manage to write programs that are largely free of defects and that generally work in the way that is expected. Small-scale errors aside, a good high-level conceptual understanding of the system, coupled with an ability to mentally simulate the behaviour of any portion of a program provides programmers with most of the tools they need to construct good-quality programs. The programmer is likely to know (at least at some high level) how the entire system is going to work any any given moment.
Large-scale concurrency robs us of this ability to easily mentally simulate behaviour. It makes our mental models unimaginably complex. When it comes to concurrency, most of the intuitions we use to guide us through development simply break down. I think of it as a sort of night-blindness.
So what is different? Humans are very good at causality. When presented with the present state of the universe, plus some action, we can generally predict with a high degree of accuracy what the state of the universe will be after that action has been performed. We can integrate huge numbers of factors into our calculation of the future state of the universe - gravity, wind, light, sound, heat. If we start rolling a ball down a slope, we can imagine at any point in time approximately where that ball is going to be. What we don't expect, or even attempt to predict, is an inter-dimensional portal opening, an alternative-reality version of ourselves stepping out of the void and setting the ball on fire. And that's kinda concurrency for you. We are good at imagining ourselves in a situation. We are the observer in the code. Unfortunately in a concurrent situation, we are the observer in many different "realities".
Even with the knowledge that there is some algorithmic nature to the opening of dimensional portals during our ball-rolling activities, and that the alternative-reality observer's pyromaniac tendencies can moderated by holding up a certain flag, our mental models of the universe just don't extend to predicting this kind of behaviour. If I were inclined towards evolutionary psychology, I would probably suggest that it's because as a species, we don't encounter these kinds of situations in our natural habitats very often. We can predict cause-and-effect, but the idea of having to maintain a model of a second universe with its own rules as well as our own is just beyond the needs of most early humans, who were probably much more concerned with procreating and trying to avoid being eaten by tigers.
It's not just the breakdown of causality that is a problem - it's also syntax. I know I claimed I wasn't going to talk about syntax just two short posts ago, but it's kinda important here.
When we think about some function "f", we can conceptualise it as something like:
Where "S" on the right represents the state of the world before the computation occurs, "x" is some input, "S'" is the state of the world after the comptuation has occurred and "y" is some output. Given that most programming languages give us the ability to control scope, "S" is probably pretty small. We usually don't even bother writing it down, leading to the more comfortable:
f(x) = (some computation)
With an implied world-state being accessible to us before and after the computation. Causality applies and all is well with the world. We can mentally substitute any occurrence of "f(x)" with the behaviour of the computation.
Now, consider something concurrent. A given transition in a labeled transition system has one input state, one output state, plus a side-effect:
x ---a---> y
Assuming some interpretation of this LTS using name-based synchronisation, any other occurence of "a" in the system is going to be causally related to this occurence of "a". We're used to names just being names - they have no special meaning. Well now "a" does have a special meaning. This is where it all starts to get a bit dicey. We no longer have all of the information we need in front of us. Causality has gone from being able to consider events locally within some frame of reference to needing to maintain a globally consistent mental model of all the ways that an "a" action could occur anywhere in the universe. Reading this description of behaviour in isolation is no longer sufficient to tell us the whole story.
This isn't just some problem with name-based synchronisation. If anything, I think name-based point-to-point communication is considerably simpler than most of what we do in practice, and yet we are already at a point where our ability to rely on local information to give us clues as to how this piece of the system will behave has been taken away.
Introducing locking gets a bit scarier yet again. We're now required to know the state of our world at any given point in time, as well as needing to know the state of every other given world at any given point in time. And the notions of time in each world are not the same.
I think it's a wonder that we get concurrency right as often as we do. I think it's as much by careful attention to detail and dumb luck as by good planning or being good at this type of thing.
Now, before you jump up and down and tell me that "insert-your-favourite-concurrency-model-here addresses these problems", I know there are strategies for managing the complexity of these tasks. I did concurrency theory for a living, so I know some approaches are better than others. My point here was essentially to illustrate why it is important to manage complexity when doing concurrency. It's definitely not just a case of "you can't do concurrency so that means you are stupid". I'm arguing that pretty much everyone is bad at concurrency beyond some simple patterns of interaction that we cling to because we can maintain a good mental model by using them.
In my next exciting installment, I'll talk about how I think we can improve the state of the art. Spoiler alert: it involves Bigraphs, and some ideas that I hope I might be able to turn into a viable commercial proposition one of these days.
Oh, also, I recall reading a really good paper about the problems with syntax for representing terms in process algebras a few years ago, and I can't for the life of me remember what it was. If anyone has any ideas what it might have been, let me know. I'll give you a cookie.