I found an interesting piece recently which grabbed my attention thanks to one sweeping and bold statement. It went like this: ‘…dataflow programming is arguably the easiest parallel strategy to adopt for the millions of developers trained in serial programming.”
I’m sure there are many of you who might agree with this and an equal number who might disagree. And equally likely, there may be a large number who wonder what relevance dataflow programming has to multicore applications.
To start with, the author Jim Falgout, opens his argument by making the fairly obvious point: “…many of today’s multicore systems are woefully under-utilized. We need a paradigm shift to a new programming model that embraces this high level of parallelism from the start, making it easy for developers to create highly scalable applications.”
You can’t argue with that, unless you’re the type of person who fights with their own reflection. And in that case…?
Essentially, dataflow programming is an architecture based on the concept of using a dataflow graph for program execution. A dataflow graph consists of nodes that are computational elements and the edges in the dataflow graph provide data paths between nodes. As such the nodes can execute in parallel, so data flows in a pipeline fashion.
Falgout goes on to say that dataflow modelling provides a computational model that creates a ‘very nice modularity’ because building blocks or nodes can be developed that plug together in an endless number of ways to create complex applications.
In essence, it is based on the idea of continuous functions executing in parallel on data streams. Apparently it’s easy to grasp and simple to express and according to the author is ideal for data-intensive applications which in turn lends itself to big data challenges.
Clearly I’m just skimming the surface here but I want to give you some sense of what Falgout is proposing. To clarify a bit further, ‘the dataflow architecture provides flow control.’ Flow control is inherent in the way dataflow works and it puts no burden on the programmer to deal with issues such as deadlock or race conditions.
Falgout says that it easily allows developers to take advantage of today’s multicore processors and it also fits well into a distributed environment. As such, he claims, ‘it is straightforward and ensures your applications will be able to scale in the future to meet growing demands…’
If there are is an issue with the dataflow programming model, it’s that it’s a different paradigm from the ones that developers are used to. As such it requires a shift in design thinking. In a sense, it’s like looking at a problem from a different angle, sort of four steps to the right and back a bit. This could be a hurdle or not. I guess it all depends on your attitude.
If you have experience of dataflow programming I’d be keen to hear from you and particularly if in your experience, ‘…dataflow programming is arguably the easiest parallel strategy to adopt for the millions of developers trained in serial programming.”
Does that match your experience?