Tuesday, 23 February 2010

The state of the world (and why I still think it's a bad idea)

I've been working recently on context-aware systems - i.e. systems that respond to some measurable things in the user (or computing) environment to infer a context, exhibiting some context-sensitive behaviour as a result. There are lots of things that can define a context, and many ordinary systems can probably be characterised as context-aware, particularly in the mobile computing space.

So, one obvious measured value from the environment is location, which gives rise to the most ubiquitous class of context-aware systems: location based services. In its simplest form, this might mean using a GPS built into a mobile phone or PDA to infer the location of the user, guiding him or her to relevant local points. For example, a user might want to see all the restaurants within walking distance - fire up the GPS, query the database of restaurant locations - done! So far, so pedestrian. There are plenty of services that have done this (and that have done it well).

The complication comes as we wish to integrate multiple sources of information to infer a single abstract notion of a context, or when we wish to integrate discrete information. At the IT University of Copenhagen, we have a system of BLIP (Bluetooth) nodes installed in the corners of the building on every floor. Coverage is not uniform or total, so a device (and therefore a user) may be seen in one corner of the second floor, disappear from the Bluetooth coverage, and moments later reappear on the 4th floor. It is therefore necessary to begin to abstract away from the real measured environment some more general notion of location or space. Adding more sensors measuring different values with different intervals simply compounds the problem further. The disparity between the real environment and the inferred context grows wider and wider.

This conceptual gap is where my problem with notions of global state come in. Building a system that represents state explicitly (and globally) is disingenuous in most cases. In the object oriented world, for example, one might be tempted to have a "User" object that records information about the locations of users. Except, that object is persistent, and the user really isn't. We're now in the world of confidence intervals and uncertainty. If I query the object associated with some user and ask "where are you now?" - the response will likely be a firm and reassured position in some coordinate space, or a reference to a node.

The problem exists because we've built an abstraction that is not a true abstraction of the underlying measured environment. If we base some behaviour on the response from the User object, we're likely to be acting on information that is well and truly out of date. The sensors actually don't necessarily know where a user is at any given moment, only that the user was seen in a location at some time. If we shift our abstraction to work on this basis instead (perhaps the User object returns a "last seen" timestamp and a location), then what have we gained from this abstraction? We can start to build a whole host of largely equivalent abstractions, none of which are particularly helpful, because they all rely on a notion of having total knowledge of the state of the world at all times. The kinds of stateful representations provided by most mainstream programming languages are, I argue, poor models of many measurable environments. Without building higher-order abstractions over the abstractions inherent in the language, we are forced to either build useless abstractions, or hide too much necessary detail.

If you agree with this premise, then you may wonder what the solution is. In short, it's reactiveness. In order to interact with the measured environment producing discrete values, programs must not rely on state information about the environment, rather, new facts must be computed as the measurements are made. When something changes in the real environment, programs must react to this change, and emit new events to be consumed by other programs (or parts of the program). In this way, idioms such as Functional Reactive Programming seem well suited to the task. Even outside the world of context-aware computing, it seems that persistent global state is often smoke and mirrors hiding potential data currency issues.

So the question I ask is: do you really need it?


  1. That's an interesting solution at the end: listening to events rather than state.

    At work we track all kinds of student, teacher, school, state, etc. data over long periods of time. And, while I've inherited an oo-designed codebase, I've thought a lot about what the ideal would be. The oo mindset works so well, if you're frozen at a point in time, but where I work we never are. A simple query like "find all students' information at school x" becomes a tangled web of timestamp-like column lookups and joins. It's way too easy to make a mistake.

  2. That's a perfect example of the kinds of disparity you get between the OO model (where you have perfect knowledge of global state, based on your ability to query an object at any time), and the "real world", where you interact with discontinuous, discrete datasources.