The present study performs direct numerical simulations of turbulent channel flows using a spectral method in a large computational domain. Because of applying Fourier discretisation in the spectral method, parallelisation of the method may incur heavy communication overhead, thereby resulting in poor scalability. We design and improve the spectral code by exploring parallel techniques, including domain decomposition and data transposition algorithms. We focus particularly on the 2D domain decomposition and data transpose algorithm with the non-blocking collective operations improves parallel performance, thereby enabling latency mitigation by overlapping the computation and communication. Finally, we evaluate the code on the Nurion supercomputer at KISTI supercomputing centre. The transpose algorithm based on the non-blocking collective operations shows the best performance, which enables 3.55 times faster computing on 256 nodes using 16,384 MPI ranks for the L550 case of (Formula presented.) grid points than the non-optimised 2D decomposition case.
|Number of pages||14|
|Journal||International Journal of Computational Fluid Dynamics|
|Publication status||Published - 2020 Sep 13|
Bibliographical noteFunding Information:
This work was supported by the KISTI National Supercomputing Center (Nurion Supercomputer) and the National Research Foundation of Korea (NRF) grant funded by the Korean government (Ministry of Science and ICT) (NRF-20151009350).
All Science Journal Classification (ASJC) codes
- Computational Mechanics
- Aerospace Engineering
- Condensed Matter Physics
- Energy Engineering and Power Technology
- Mechanics of Materials
- Mechanical Engineering