At today’s BCS event on Fortran in physics, John Reid, convenor of the ISO working group for Fortran, outlined some of the new features coming to Fortran 2008. Fortran 2008 is now in the final ballot stage, which means changes are limited to typographical and layout errors and the standard is otherwise pretty much complete. The biggest addition is support for coarrays, which is a means of programming for multicore devices. “There is a lot going for coarrays,” said Reid. “I hope to convince you of that today.”
Coarrays were created by Bob Numrich from the Minnesota Supercomputing Institute. He had several design objectives for them:
- They should be a simple extension of Fortran and feel like Fortran, not like something different
- They should make small demands on implementors. Earlier in the day, we heard how it took six years for the first Fortran 2003 compiler to emerge, so it was important that coarrays were easy to incorporate. Numrich persuaded a compiler writer at Cray to add coarrays into the Cray compiler covertly 10 years ago, which showed it was easy to add basic support.
- They should retain all the serial optimisations between parallel synchronisations
- They should make it obvious whether references are local or remote
- They should provide scope to optimise communication
A subset of coarrays has been implemented for 10 years. They have been added to the G95 compiler and are being added to Gfort. Intel has said that coarrays are at the top of their development list, so it’s clear they’re going to have a big impact on the Fortran community.
Coarrays work like this: You have a single program multiple data (SPMD) model. The single program is replicated to a number of images (probably as executables). You might have a local machine with 8 cores running 8 images, for example, although it’s possible to have more images than cores, which is why the phrase ‘images’ was coined to cover the potential mismatch between processors and running programs. The number of images is fixed at execution, not at compile time, and each image has its own set of variables.
Coarrays are like ordinary variables but they can be accessed from one image in another. Square brackets are used to indicate coarrays being accessed within another image, which helps to ensure it’s obvious whether a local or remote data item is being accessed. The coarray always exists in every image and always has the same set of bounds, so the compiler can (optionally) arrange for the data to occupy the same memory addresses in each image. On a shared memory machine, a coarray might be implemented as a single large array.
The images usually execute asynchronously, with the user programming synchronisation points where required, through synchronisation with a particular image or with all images. Other control structures include locks and critical constructs that ensure only one image executes a chunk of code.
The synchronisation commands mark the boundaries of what are known as segments. Within each segment, the compiler has freedom to optimise the code as it sees fit. Across the synchronisation barriers, it must ensure that all memory operations are complete and that data has arrived in any images where it is required. The segment structure means that on any image, code is executed in the order segment1, segment2, segment 3. Any changes in segment1 will be visible to segment 2. However, across the different images, segment1 could execute at different times, so programmers need to take great care with how they use coarrays to avoid bugs. The golden rule is that if a variable is defined a segment, it must not be referenced, defined or become undefined in another segment unless the segments are ordered. That includes segments in different images. Reid concedes that programmers could make mistakes here. “If you want the performance, there generally has to be the downside of the possibility of making errors,” he said.
When an image terminates normally, its data remains active and available to other executing images, but an error condition should terminate all images as soon as possible.
Most of the time, the compiler can optimise code as if the image were running on its own, using temporary storage such as a cache and registers. There is no requirement for coherency while unordered segments are executing and the compiler has scope to optimise communication, amalgamating references to an image into one message, for example.
John Ashby and Reid produced a comparison between MPI and coarrays in 2008 for a program that modeled turbulence (PDF). Because MPI and coarrays can be mixed, he was able to migrate the program gradually and he left most of the solution writing and restart facilities in MPI. Most of the program’s time was spent in halo exchanges. The coarray code ran at a similar speed to the MPI code, but a key different was the code clarity. The code for halo exchanges (excluding comments) went from 176 lines to 105 lines. The code to broadcast global parameters was cut from 230 lines to 117 lines. That should also make the code easier to maintain.
Reid concluded with a summary of the advantages of coarrays. They’re easier to write because the compiler looks after the communications. The square brackets make references to local and remote data obvious. The more concise code is easier to understand and maintain. Perhaps most importantly, coarrays are officially integrated with Fortran and the compiler can optimise communications and still perform all its local optimisations. There are no severe demands on vendors to provide coherency, either, although you might argue that puts greater demands on programmers not to do something silly.
For more information on what’s new in Fortran 2008, see John Reid’s comprehensive paper (PDF).