Pervasive Datarush has been brought to my attention because it is a tool that helps developers restructure their approach to parallelism from a conceptual point of view. It’s designed to help you develop parallel applications that scale across many cores and multiple nodes. In fact, the company claims that you can build applications that scale from 2 to 384 cores and more.
It has delivered speed ups that are in the region of thousand-fold increases and more, but according to one commentator these are in part because of the nature of the problems that they have addressed.
The company says it has an inherent capability to scale across all cores in mainstream and monster multicore servers, as well as the ability to accelerate every node in an Apache Hadoop cluster, offering unmatched speed and economics. It’s claimed that ‘it blows through performance bottlenecks in data preparation and analytics.’
Apparently for big-data problems it is a significant step because it integrates dataflow with the scale-out capabilities in frameworks such as Hadoop. Hadoop, if you don’t know, is an open source framework for running applications on large clusters built of commodity hardware.
Pervasive Datarush no doubt solves some pressing problems and can accelerate solution development for ‘big-data challenges,’ by putting the power of parallelism into the hands of developers. Or as J.Scott Harrison, director at the Intel Software and Services Group, says: “…[it] helps organisations unlock the powerful parallelism of Intel Xeon processors to take on big-data challenges.”
However, and this isn’t meant to be detraction, I’ve noted one comment from a developer who says that as good as Pervasive Datarush is, it is essentially a tool that shows the value of parallelism by offering a different perspective. I thought it worth mentioning this comment because he brings the focus back to the wider context of parallel programming and points out that perhaps we need to move beyond toolsets, as useful and as important as they are, and look towards the provisioning of new languages as a step change in the move to true parallelism.
But where are these languages coming from? It’s quite likely that their development will be driven by commercial imperatives, such as vendors in key computing areas like high performance, business intelligence or data warehousing.
Or perhaps, we’ll see the development of programmer-friendly parallel data flow languages that dovetail with new programming models.
I’d be keen to get your views on this.
Filed under: tools Tagged: | Hadoop, Pervasive Datarush




