Betriebssysteme · Institut für Systemarchitektur · Fakultät Informatik · TU Dresden



05. 07. 2010

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Cluster Computing


Huizhan Yi

National University of Defense Technology, Changsha (China)


Heterogeneous systems accelerated by GPUs have superior advantages on power efficiency and performance/price ratio but how to use GPUs to deliver superior performance is still difficult, especially for a petascale system. Unbalanced workloads between CPUs and GPUs and low-bandwidth CPU- GPU communication are two main obstacles to achieving high performance. This paper describes our experience on overcoming these two obstacles by developing an implementation of Linpack for TianHe-1, a petascale CPU/GPU system, the largest GPU-accelerated system ever attempted before. Workloads are distributed adaptively to the CPU cores and GPUs with negligible runtime overhead, resulting in better load balancing than static partitioning methods (even if they are driven by profiling or training, which consumes too much power to be practical in the petascale setting). The CPU-GPU communication overhead is effectively hidden by a software pipelining technique, which is particularly useful for large memory-bound applications. Our adaptive optimization framework results in a Linpack performance of 0.563 PFLOPS, making TianHe-1 the 5th fastest supercomputer in the latest Top500 list and the fastest in the Asia-Pacific region.
Julian Stecklina, http://os.inf.tu-dresden.de/~jsteckli/
23. May 2013
· Copyright © 2001-2010 Operating Systems Group, TU Dresden | Impressum ·