Achieving parallelism in Fortran 2008 using Coarrays

At today’s BCS event on Fortran in physics, John Reid, convenor of the ISO working group for Fortran, outlined some of the new features coming to Fortran 2008. Fortran 2008 is now in the final ballot stage, which means changes are limited to typographical and layout errors and the standard is otherwise pretty much complete. The biggest addition is support for coarrays, which is a means of programming for multicore devices. “There is a lot going for coarrays,” said Reid. “I hope to convince you of that today.”

Coarrays were created by Bob Numrich from the Minnesota Supercomputing Institute. He had several design objectives for them:

  • They should be a simple extension of Fortran and feel like Fortran, not like something different
  • They should make small demands on implementors. Earlier in the day, we heard how it took six years for the first Fortran 2003 compiler to emerge, so it was important that coarrays were easy to incorporate. Numrich persuaded a compiler writer at Cray to add coarrays into the Cray compiler covertly 10 years ago, which showed it was easy to add basic support.
  • They should retain all the serial optimisations between parallel synchronisations
  • They should make it obvious whether references are local or remote
  • They should provide scope to optimise communication

A subset of coarrays has been implemented for 10 years. They have been added to the G95 compiler and are being added to Gfort. Intel has said that coarrays are at the top of their development list, so it’s clear they’re going to have a big impact on the Fortran community.

Coarrays work like this: You have a single program multiple data (SPMD) model. The single program is replicated to a number of images (probably as executables). You might have a local machine with 8 cores running 8 images, for example, although it’s possible to have more images than cores, which is why the phrase ‘images’ was coined to cover the potential mismatch between processors and running programs. The number of images is fixed at execution, not at compile time, and each image has its own set of variables.

Coarrays are like ordinary variables but they can be accessed from one image in another. Square brackets are used to indicate coarrays being accessed within another image, which helps to ensure it’s obvious whether a local or remote data item is being accessed. The coarray always exists in every image and always has the same set of bounds, so the compiler can (optionally) arrange for the data to occupy the same memory addresses in each image. On a shared memory machine, a coarray might be implemented as a single large array.

The images usually execute asynchronously, with the user programming synchronisation points where required, through synchronisation with a particular image or with all images. Other control structures include locks and critical constructs that ensure only one image executes a chunk of code.

The synchronisation commands mark the boundaries of what are known as segments. Within each segment, the compiler has freedom to optimise the code as it sees fit. Across the synchronisation barriers, it must ensure that all memory operations are complete and that data has arrived in any images where it is required. The segment structure means that on any image, code is executed in the order segment1, segment2, segment 3. Any changes in segment1 will be visible to segment 2. However, across the different images, segment1 could execute at different times, so programmers need to take great care with how they use coarrays to avoid bugs. The golden rule is that if a variable is defined a segment, it must not be referenced, defined or become undefined in another segment unless the segments are ordered. That includes segments in different images. Reid concedes that programmers could make mistakes here. “If you want the performance, there generally has to be the downside of the possibility of making errors,” he said.

When an image terminates normally, its data remains active and available to other executing images, but an error condition should terminate all images as soon as possible.

Most of the time, the compiler can optimise code as if the image were running on its own, using temporary storage such as a cache and registers. There is no requirement for coherency while unordered segments are executing and the compiler has scope to optimise communication, amalgamating references to an image into one message, for example.

John Ashby and Reid produced a comparison between MPI and coarrays in 2008 for a program that modeled turbulence (PDF). Because MPI and coarrays can be mixed, he was able to migrate the program gradually and he left most of the solution writing and restart facilities in MPI. Most of the program’s time was spent in halo exchanges. The coarray code ran at a similar speed to the MPI code, but a key different was the code clarity. The code for halo exchanges (excluding comments) went from 176 lines to 105 lines. The code to broadcast global parameters was cut from 230 lines to 117 lines. That should also make the code easier to maintain.

Reid concluded with a summary of the advantages of coarrays. They’re easier to write because the compiler looks after the communications. The square brackets make references to local and remote data obvious. The more concise code is easier to understand and maintain. Perhaps most importantly, coarrays are officially integrated with Fortran and the compiler can optimise communications and still perform all its local optimisations. There are no severe demands on vendors to provide coherency, either, although you might argue that puts greater demands on programmers not to do something silly.

For more information on what’s new in Fortran 2008, see John Reid’s comprehensive paper (PDF).

8 Responses

  1. […] yesterday’s BCS Fortran event in London, which included an overview of Fortran standards and an introduction to coarrays for parallel programming, Stephen Blair-Chappell of Intel outlined some techniques for converting serial programs to run on […]

  2. “gfort” is actually called “gfortran” (or GNU Fortran).

    “They should make small demands on implementors.”

    Actually, even if one restricts on to “parsing-only” support by limiting the number of images to one, still a few thousands line of code are needed in the compiler (due to the run-time support for cobounds). Thus, that part is not as easy as it sounds. On the other hand, one has then already done half of the work for a full multi-image support. Therefore, I expect that (nearly?) all compilers, which will support coarrays, will also provide multi-image support.

    “Numrich persuaded a compiler writer at Cray to add coarrays […], which showed it was easy to add basic support.”

    Well, it only shows that the cost-benefit was (regarded as) small enough to be worthwhile, it does not tell you how much work the compiler writer actually had to do.

    Personally, it is unclear how much coarrays bring compared to MPI; in terms of performance probably nothing (roughly: same task to do = same performance), though in terms of syntax (a bit easier) and compile-time type checking, I see some advantages. Good is also that all major open-source/commercial compilers will have the support soon. What I miss is especially one-to-all communication (broadcasting values of the input file to all images – rather than looping though all images) and also reduction routines. I think to make coarrays really useful, one needs them. (They will be part of a to-be-written technical report.) Thus, in terms of the larger projects, I will probably stay with MPI for a while, for smaller programs, I will try coarrays.

  3. Hi T. Thanks for your comment.

    I think the Cray example suggested that the implementation went outside the normal channels and under the radar of management, so I assumed it was easy enough (small enough) to sneak past them, and that the cost-benefit wasn’t a big part of the discussion. That said, you’re right that everything is relative and implementing coarrays is unlikely to be trivial, and I can see how my blog post appears to understate the work involved.

    It’s interesting to see how you’ll split the use of MPI and coarrays in your projects. As the technical reports expand the scope of coarrays, they might become more attractive for larger projects for more people. I wonder as well whether there might be advantages in coarrays which will prompt people to live without some features they are accustomed to using. It will be interesting to see how the technology evolves and for what kinds of projects people adopt it.

    Thanks again for your comment.

  4. […] comprehensive Fortran support with a math kernels library. Fortran 2008 support also features coarrays which in turn support nodes and clusters. It also incorporates most of Fortran […]

  5. […] benchmark). There is also support for coarrays in Fortran for distributed memory systems. I wrote about Fortan coarrays last summer, if you want a bit more background on […]

  6. […] Fortran 2008 addition for parallel programming is Coarrays (which I blogged about previously), which were designed to be the smallest change required to convert Fortran into a robust and […]

  7. […] Achieving parallelism in Fortran 2008 using Coarrays […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: