摘要

Non-contiguous partitioning strategies are often used to select and assign a set of nodes of a parallel computer to a particular job. The main advantage of these strategies, compared to contiguous ones, is the reduction of system fragmentation. However, without contiguity, locality in communications cannot be easily exploited, resulting in longer job execution times. Several metrics have been proposed in the literature to assess how fit a partition is to run an application on it. These metrics are computed considering the dispersion of the partition. In this paper we demonstrate that metrics based solely on dispersion are not always valid. Using simulation, we show how, for some applications, dispersion-based metrics of a partition do not correlate with the execution times of applications running on it. We define new metrics that do not only consider partition-related properties, but also application%26apos;s communication patterns and path diversity for communicating tasks. We evaluate these metrics in 2D and 3D meshes, using the NAS Parallel Benchmarks suite of applications as testing workload. A simulation-based study was carried out with a large set of partitions. Results show how metrics that include information about the traffic patterns of applications have consistent strong (and positive) correlations with execution times.

  • 出版日期2014-5