摘要
Current supercomputing centers usually deploy a large-scale compute system together with an associated data analysis or visualization system. Multiple scenarios have driven the demand that some associated jobs co-execute on different machines. We propose a multi-domain coscheduling mechanism, providing the ability to coordinate execution between jobs on multiple resource management domains without manual intervention. We have evaluated our mechanism based on real job traces from Intrepid and Eureka, the production Blue Gene/P system and a cluster with the largest GPU installation, deployed at Argonne National Laboratory. The experimental results show that coscheduling can be achieved with limited impact on system performance under varying workloads.
- 出版日期2013-2