Oh! The dreaded data race! How we hate it! It makes life hard for the humble parallel programmer, causing programs to behave unpredictably when two processes try to use the same memory location at the same time and at least one of them is updating it.
But I recently came across the idea of the benign data race, through Bartosz Milewski’s new concurrency blog. The idea is that you might know your program can give rise to data races, but you might allow it to happen in certain circumstances in the interests of improving performance. The synchronisation measures that you need to use to avoid the race could slow your program down, and sometimes a technical data race has no harmful effect on the final program. If one thread is reading a bit in a bitfield and another thread is updating a different bit in the same bitfield, that might be characterised as a data race (indeed, it is), but it’s harmless because the data being read is constant throughout.
Microsoft published an academic paper about DataCollider (PDF), the tool it used to identify data races in the Windows 7 kernel. The paper says that only 10% of the data race reports it experienced corresponded to “real errors”, so it used a filtering step to prioritise those. Of course, there are some purists who argue that any data race is a real error, and I’d be interested to hear your views on this. Since we all grew up learning serial programming that has deterministic code, the idea that there might be some element of randomness in the program’s results can be unsettling.
Microsoft identified several classes of data races that it considers benign, by order of frequency:
- Statistics counters. These are used to measure program performance and behaviour, and minor inaccuracies in their values might be considered trivial.
- Safe Flag Updates: This is the case I mentioned earlier, where different processes might be accessing different bits at the same time.
- Special Variables: The current time variable, for example, is updated by the timer interrupt and read by many processes at the same time.
Of course, the risk is that you end up overclassifying some data races as benign. Microsoft started off considering a race trivial if two threads were writing the same value, but found a case where this resulted in a genuine code execution error. Depending on the nature of your program, the accuracy of your statistics counters might be more important than it was in Microsoft’s case. The paper makes for interesting reading, but it’s dangerous to assume that its conclusions are broadly applicable, including the proportion of benign data races it identified. Few people are writing software similar to the Windows kernel, and by the time the data race validation was performed, I’m guessing the team had already manually identified and eliminated quite a few problematic data races.
Bartosz Milewski warns in a follow-up blog post that there are risks in allowing data races to persist in your code. In particular, he notes that compilers could optimise shared variables as if they were single threaded unless they’re explicitly marked as atomic. He gives the example of the compiler using a counter or binary flag as a temporary storage space for its own data because it’s not aware of other threads accessing that space.
What are your views on benign data races? Are they a bit of a ‘hack’, or a shrewd optimisation? How confident can you be that a data race is, in fact, benign?