|
05.
07.
2010
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Cluster Computing
Huizhan Yi
National University of Defense Technology, Changsha (China)
Heterogeneous systems accelerated by GPUs have superior advantages
on power efficiency and performance/price ratio but how to use GPUs
to deliver superior performance is still difficult, especially for a
petascale system. Unbalanced workloads between CPUs and GPUs and
low-bandwidth CPU- GPU communication are two main obstacles to
achieving high performance. This paper describes our experience on
overcoming these two obstacles by developing an implementation of
Linpack for TianHe-1, a petascale CPU/GPU system, the largest
GPU-accelerated system ever attempted before. Workloads are
distributed adaptively to the CPU cores and GPUs with negligible
runtime overhead, resulting in better load balancing than static
partitioning methods (even if they are driven by profiling or
training, which consumes too much power to be practical in the
petascale setting). The CPU-GPU communication overhead is
effectively hidden by a software pipelining technique, which is
particularly useful for large memory-bound applications. Our
adaptive optimization framework results in a Linpack performance of
0.563 PFLOPS, making TianHe-1 the 5th fastest supercomputer in the
latest Top500 list and the fastest in the Asia-Pacific region.
|