Using a simple disk model, figure 2 shows a comparison of the execution time as predicted by the simulator between the different collective I/O implementations. Although node grouping is supposed to show up to 8-times improvement [NF95], it has performed very poorly for the NAS Benchmark. This is mostly due to the fact that this particular problem size of the NAS Benchmark does not generate enough I/O requests to flood the interconnection network. Thus, by not allowing all the LP's to make their I/O requests simultaneously as in Global Barrier Collective I/O, we are simply delaying the execution.
Results of collective I/O with matrix multiplication are in progress and will be included in the final version of this paper.