There’s been a lot of buzz about Haskell recently, with an experimental version of Intel Concurrency Collections for Haskell in development and the release of Axum stimulating debate about whether we need new programming languages to address the challenges of multicore programming.
At The University of Illinois’s Reflections Projections conference a couple of weeks ago, Don Stewart delivered a presentation outlining what makes Haskell so well suited for parallel programming. He’s the co-author of O’Reilly’s book Real World Haskell (available free online), and the slides from his talk are available for download. They are an ideal introduction to Haskell for people who are researching its potential with a view to parallel programming.
Some might be surprised to learn that Haskell has been around for about 20 years. The language is open source, is said to enable greater productivity than C++, and to deliver better performance than Erlang and Python.
One interesting aspect of Haskell is that it provides different levels of granularity for how you express parallelism. As is often the case, there’s a trade-off between how easy it is to program, and how much control you’re able to exert.
At the highest level, you can annotate “sparks”, which hint at where parallelism can be applied. The runtime converts the sparks into threads, which may run in parallel depending on the resources available. The “par” combinator that defines a spark can be used to overapproximate parallelism, and the runtime will look after what is actually possible. There is also a combinator “pseq” which is used to indicate operations that must be sequential, which enables conflicts to be prevented. The new Threadscope application by Donnie Jones, Simon Marlow and Satnam Singh can help to measure the performance sparks so they can be fine tuned. Sparks enable programs to avoid data races and other parallel errors, and are easy to read because when you delete the “par” combinators, you get the original program back. One drawback is that while “par” is cheap, it isn’t free, and the cost can become problematic in massive iterations. Proactive garbage collection is essential to prevent temporary stand-stills too.
It’s also possible to express explicit concurrency using threads and shared memory, but this brings the challenge of managing non-deterministic scheduling. To hide I/O latency, it’s possible to fork a thread for the work and then return to the user for more work. It’s also possible for threads to throw asynchronous messages to each other, which sounds similar to Axum’s actors implementation. Other ways to synchronise between threads include using MVars (which can deadlock) and software transaction memory (STM). STM actions always succeed and in the event of a conflict, logging and rollback are used, so there are no deadlocks.
Haskell also provides parallel arrays, to help with data parallelism, although this is still in technology preview. The idea is that if you want to do the same thing in parallel to every element of a large collection, you can avoid having to create explicit threads, and benefit from a clear cost model.
Visit Don Stewart’s website to download the slide deck, and the sample code routines which help to bring these concepts to life.