I’ve heard a lot of people advocating task-based parallelism over the last couple of years, the idea that the code should be abstracted away from cores and threads and packaged into tasks which can be run across the available resources. Shannon Cepeda has written a short but interesting blog post on Intel Software Network, outlining four of the ways that performance is increased when this approach is adopted.
Firstly, the code is scalable in line with the number of cores, because the model used (such as OpenMP, Intel Threading Building Blocks or Intel Cilk Plus) will allocate the work across all the available resources. Secondly, software threads are created for each CPU core are are reused to perform tasks, which saves the overhead of creating and deleting tasks. Thirdly, the libraries are cache aware and will try to achieve cache locality. And finally, the task-based models will use run-time scheduling and work stealing to balance loads.
The scalability argument I was familiar with, and given how rapidly the number of cores on the typical desktop might increase in the next few years, this is the main argument I’ve seen to support the case for task-based parallelism. But all the other stuff going on in the background could also have a huge impact on the performance of your application. Check out the blog post for more insight into why the task-based models are an important tool in your toolkit.