Doctor of Philosophy (PhD)
Nowadays, High-performance Computing (HPC) scientific applications often face per- formance challenges when running on heterogeneous supercomputers, so do scalability, portability, and efficiency issues. For years, supercomputer architectures have been rapidly changing and becoming more complex, and this challenge will become even more com- plicated as we enter the exascale era, where computers will exceed one quintillion cal- culations per second. Software adaption and optimization are needed to address these challenges. Asynchronous many-task (AMT) systems show promise against the exascale challenge as they combine advantages of multi-core architectures with light-weight threads, asynchronous executions, smart scheduling, and portability across diverse architectures.
In this research, we optimize the performance of a highly scalable scientific application using HPX, an AMT runtime system, and address its performance bottlenecks on super- computers. We use DCA++ (Dynamical Cluster Approximation) as a research vehicle for studying the performance bottlenecks in parallel and concurrent applications. DCA++ is a high-performance research software application that provides a modern C++ imple- mentation to solve quantum many-body problems with a Quantum Monte Carlo (QMC) kernel. QMC solver applications are widely used and are mission-critical across the US Department of Energy’s (DOE’s) application landscape.
Throughout the research, we implement several optimization techniques. Firstly, we add HPX threading backend support to DCA++ and achieve significant performance speedup. Secondly, we solve a memory-bound challenge in DCA++ and develop ring- based communication algorithms using GPU RDMA technology that allow much larger scientific simulation cases. Thirdly, we explore a methodology for using LLVM-based tools to tune the DCA++ that targets the new ARM A64Fx processor. We profile all imple- mentations in-depth and observe significant performance improvement throughout all the implementations.
Wei, Weile, "Optimizing the Performance of Parallel and Concurrent Applications Based on Asynchronous Many-Task Runtimes" (2022). LSU Doctoral Dissertations. 5884.