Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4% in the case of private LLCs, and 43.8% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9% and 12.7% for the private LLC and shared LLC based systems, respectively.
|Title of host publication||PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation|
|Editors||Jeffrey S. Foster, Dan Grossman, Jeffrey S. Foster|
|Publisher||Association for Computing Machinery|
|Number of pages||16|
|Publication status||Published - 2018 Jun 11|
|Event||39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018 - Philadelphia, United States|
Duration: 2018 Jun 18 → 2018 Jun 22
|Name||Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)|
|Other||39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018|
|Period||18/6/18 → 18/6/22|
Bibliographical noteFunding Information:
We thank Kathryn S. McKinley for shepherding our paper. We also thank the anonymous reviewers for their constructive feedback. Jung is supported in part by NRF 2016R1C1B2015312, DOE DEAC02-05CH 11231, IITP-2017-2017-0-01015, NRF2015M3C4A7065645, and MemRay grant (2015-11-1731). This research is supported in part by NSF grants #1439021, #1629915, #1626251, #1409095, #1629129, #1526750, #1439057, and a grant from Intel.
© 2018 Association for Computing Machinery.
All Science Journal Classification (ASJC) codes