UCLA Parallel Computing Laboratory

SESAME News before 1997


Parallel Program Simulator for Data Parallel Programs

Developed a parallel simulator for data parallel programs (DPSIM). Existing simulators for parallel programs are either sequential or parallel and architecture specific. This simulator is fast since it uses parallel simulation, and portable since the simulation techniques are architecture independent. The simulator has been validated on the IBM-SP2 using scientific applications e.g Gauss-Jordan elimination. It shows promising speedups as the number of processors in the simulation is increased (up to 11 for 16 processors). The simulator has been used to predict the effect of software and hardware factors e.g processor speed and message communication latency on the performance of programs e.g Gauss Jordan elimination, FFT and matrix multiplication. By varying these factors, it is possible to pinpoint performance bottlenecks, which may be either in the system software, hardware or application itself. The graphs below display the performance predicted by the simulator for the Gauss Jordan linear equation solver on an IBM-SP multicomputer for 16 and 64 processors. Results are predicted for a variety of message communication latencies and processor speeds. x=0 is the processor speed of the IBM-SP2(370). x=1 is for a processor with half the speed, and so on. Similarly, y=0 is the communication latency for the IBM-SP2 using the User Space(US) protocol over the High Performance Switch. y=4 is a latency which is 16 times greater (comparable to the ethernet using TCP/IP). y=-4 is for zero communication latency. For more details on DPSIM, see this paper(to appear in LCPC95).


Network Models

Interconnection network models for MPP systems and for networks of workstations and networks of MPP systems have been developed. In particular, parallel models of a high-speed electronic network based on wormhole routing has been developed. The simulator can model a variety of deadlock free routing protocols for packet-switched traffic. Integration of this simulator into DPSIM is in progress. This implementation will allow us to directly compare the performance of parallel programs on MPP systems and NOWs that are built using current technology and also identify the impact of future trends.


Repository Design

Initial design of the repository has begun. The general structure includes a layered set of "libraries" which will contain the results of our work. The particular structure defines a data library, a tools library, and a models library. On top of all this will be a graphical user interface, and an access engine taking full advantage of existing web interfaces.


Instrumentation

We have completed work on getting kernel profiling data (gprof) on each major subsystem of the Paragon OS. The results indicate that the concept used by gprof would not be useful as a general instrumentation tool. The two major drawbacks of the approach used are that memory resources are allocated at compile time and never freed and that gprof is either on or off for the entire system. Finer controls for the level of instrumentation are needed to limit the impact of the instrumentation of the system. Some of the changes made to Paragon OS for gprof are being carried forward for use in the instrumentation interfaces implementation on the Paragon.


Instrumentation Interfaces

The design of the instrumentation interfaces and framework system for the Intel Paragon is complete. This design includes the methodology for using various levels of detail to interactively narrow down operating system performance bottlenecks. Implementation of these interfaces and framework on the Intel Paragon OS is complete.


WWW comments to: prakash@cs.ucla.edu