摘要

High Performance Computing (HPC) systems tend to be complex to debug and analyze due to the large number of processes they involve and the way they communicate with each other to perform specific tasks. Recently, there has been an increase in the number of tools to help software engineers analyze the behavior of HPC applications. These tools provide several features that facilitate the understanding and analysis of the information contained in inter-process communication traces generated from running an HPC application. They, however, use different formats to represent traces, which hinders interoperability and sharing of data. In this paper, we address this by proposing an exchange format called MTF (MPI Trace Format) for representing and exchanging traces generated from HPC applications based on the MPI (Message Passing Interface) standard, which is a de facto standard for inter-process communication for high performance computing systems. The design of MTF is validated against well-known requirements for a standard exchange format, with an objective being to lead the work towards standardizing the way MPI traces are represented in order to allow better synergy among tools. We have also developed an MTF toolkit that supports the generation of MTF traces equipped with a query engine to facilitate the retrieval of data from MTF traces. Finally, we show how MTF can carry a large trace generated using a commercial off the shelf MPI trace analysis tool.

  • 出版日期2011-4