Compact network reconfiguration in fat-trees

作者:Zahid Feroz*; Gran Ernst Gunnar; Bogdanski Bartosz; Johnsen Bjorn Dag; Skeie Tor; Tasoulas Evangelos
来源:Journal of Supercomputing, 2016, 72(12): 4438-4467.
DOI:10.1007/s11227-016-1759-y

摘要

In large high-performance computing systems, the probability of component failure is high. At the same time, for a sustained system performance, reconfiguration is often needed to ensure high utilization of available resources. Reconfiguration in interconnection networks, like InfiniBand (IB), typically involves computation and distribution of a new set of routes in order to maintain connectivity and performance. In general, current routing algorithms do not consider the existing routes in a network when calculating new ones. Such configuration-oblivious routing might result in substantial modifications to the existing paths, and the reconfiguration becomes costly as it potentially involves a large number of source-destination pairs. In this paper, we propose a novel routing algorithm for IB-based fat-tree topologies, SlimUpdate. SlimUpdate employs path preservation techniques to achieve a decrease of up to 80% in the number of total path modifications, as compared to the OpenSM's fat-tree routing algorithm, in most reconfiguration scenarios. Furthermore, we present a metabase-aided re-routing method for fat-trees, based on destination leaf-switch multipathing. Our proposed method significantly reduces network reconfiguration overhead, while providing greater routing flexibility. On successive runs, our proposed method saves up to 85% of the total routing time over the traditional re-routing scheme. Based on the metabase-aided routing, we also present a modified SlimUpdate routing algorithm to dynamically optimize routes for a given MPI node order.

  • 出版日期2016-12