The paradox of the benign data race

Oh! The dreaded data race! How we hate it! It makes life hard for the humble parallel programmer, causing programs to behave unpredictably when two processes try to use the same memory location at the same time and at least one of them is updating it.

But I recently came across the idea of the benign data race, through Bartosz Milewski’s new concurrency blog. The idea is that you might know your program can give rise to data races, but you might allow it to happen in certain circumstances in the interests of improving performance. The synchronisation measures that you need to use to avoid the race could slow your program down, and sometimes a technical data race has no harmful effect on the final program. If one thread is reading a bit in a bitfield and another thread is updating a different bit in the same bitfield, that might be characterised as a data race (indeed, it is), but it’s harmless because the data being read is constant throughout.

Microsoft published an academic paper about DataCollider (PDF), the tool it used to identify data races in the Windows 7 kernel. The paper says that only 10% of the data race reports it experienced corresponded to “real errors”, so it used a filtering step to prioritise those. Of course, there are some purists who argue that any data race is a real error, and I’d be interested to hear your views on this. Since we all grew up learning serial programming that has deterministic code, the idea that there might be some element of randomness in the program’s results can be unsettling.

Microsoft identified several classes of data races that it considers benign, by order of frequency:

  • Statistics counters. These are used to measure program performance and behaviour, and minor inaccuracies in their values might be considered trivial.
  • Safe Flag Updates: This is the case I mentioned earlier, where different processes might be accessing different bits at the same time.
  • Special Variables: The current time variable, for example, is updated by the timer interrupt and read by many processes at the same time.

Of course, the risk is that you end up overclassifying some data races as benign. Microsoft started off considering a race trivial if two threads were writing the same value, but found a case where this resulted in a genuine code execution error. Depending on the nature of your program, the accuracy of your statistics counters might be more important than it was in Microsoft’s case. The paper makes for interesting reading, but it’s dangerous to assume that its conclusions are broadly applicable, including the proportion of benign data races it identified. Few people are writing software similar to the Windows kernel, and by the time the data race validation was performed, I’m guessing the team had already manually identified and eliminated quite a few problematic data races.

Bartosz Milewski warns in a follow-up blog post that there are risks in allowing data races to persist in your code. In particular, he notes that compilers could optimise shared variables as if they were single threaded unless they’re explicitly marked as atomic. He gives the example of the compiler using a counter or binary flag as a temporary storage space for its own data because it’s not aware of other threads accessing that space.

What are your views on benign data races? Are they a bit of a ‘hack’, or a shrewd optimisation? How confident can you be that a data race is, in fact, benign?

2 Responses

  1. Interesting article. I’m glad that you linked to the Milewski article – I had not seen it before. Now that it’s public knowledge, I can mention the Windows port to ARM, which makes the idea of benign data races much more interesting. What is benign in a strongly ordered architecture, may be erroneous in a weakly ordered one. Add this to things the compiler can do it and it’s pretty scary. Here’s my rule of thumb: you should only have data races if you can explain the load/store semantics of the language and all architectures your code does and will run on. I know I cannot, so I avoid data races :)

  2. Hi John. Thanks for your comment. I’m with you on that – I think there’s a difference between having data races where you understand the full context of what’s going on, and having data races where “it kinda seems to work okay, I guess” and you don’t really have confidence that you fully understand the architectures the code will work on. I think code without data races is probably more future-proof too, because there’s less risk of new architectures resulting in problems arising from previously benign data races.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: