Parallel file systems offer the high performance necessary for
parallel scientific applications. To provide maximum scalabilty, the
machine's processors are divided into two separate groups, compute
nodes (cnodes) and I/O nodes (ionodes). Cnodes run the user
applications and send I/O requests to the ionodes. All disk access is
performed by the ionodes, which each have their own secondary storage
device (SSD) and manage a subset of the file system's total data. In
this manner, the parallelism of the user application can be preserved
when performing I/O.
Many parallel file systems [CFP
95,CF96]
even allow user
processes to configure the underlying I/O parallelism to match their
intended access pattern, further enhancing performance.
When properly tuned, the use of file system caching can drastically reduce the number of I/O requests which must wait to be serviced by the SSD. Unfortunately, choosing the correct cache configuration for peak performance is no mean feat. This is particularly true of parallel file systems, which offer a wide variety of techniques for managing the many caches spread across the ionodes and cnodes. One technique with much potential is cooperative caching. Presented in [DWAP94] as a set of high performance caching algorithms for use within a network file system, cooperative caching attempts to improve performance through better management of multiple client and server caches. This basically includes the ability to retrieve data from remote client caches and the global control of a portion of the client caches. With the low latency provided by the interconnection network of a parallel machine, cooperative caching may be even better suited for use with a parallel file system.
For users interested in maximum I/O performance, the pairing of a high performance parallel file system with a language-level parallel I/O library, such as MPI-IO, is a natural choice. MPI is a standard message passing library that provides a number of point-to-point and collective communication primitives [For93]. MPI-IO is a proposed extension to MPI to incorporate parallel I/O constructs [Com96]. The proposed constructs include both independent and collective I/O operations; asynchronous I/O calls; file access via independent file pointers, shared file pointers, and explicit offsets; and local and distributed datatype constructors,
The goal of our simulator is to provide a flexible environment for studying the performance, interaction, and design tradeoffs for these two central capabilities for high-performance I/O. In addition, support for parallel execution of the simulator will considerably reduce the turnaround time for the execution of detailed simulation models for complex system configurations and workloads.