Enhancing computation-to-core assignment with physical location information

Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, Myoungsoo Jung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4% in the case of private LLCs, and 43.8% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9% and 12.7% for the private LLC and shared LLC based systems, respectively.

Original languageEnglish
Title of host publicationPLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation
EditorsJeffrey S. Foster, Dan Grossman, Jeffrey S. Foster
PublisherAssociation for Computing Machinery
Pages312-327
Number of pages16
ISBN (Electronic)9781450356985
DOIs
Publication statusPublished - 2018 Jun 11
Event39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018 - Philadelphia, United States
Duration: 2018 Jun 182018 Jun 22

Publication series

NameProceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Other

Other39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018
CountryUnited States
CityPhiladelphia
Period18/6/1818/6/22

Fingerprint

Core levels
Scalability
Data storage equipment
Controllers
Network-on-chip

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Kislal, O., Kotra, J., Tang, X., Kandemir, M. T., & Jung, M. (2018). Enhancing computation-to-core assignment with physical location information. In J. S. Foster, D. Grossman, & J. S. Foster (Eds.), PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 312-327). (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). Association for Computing Machinery. https://doi.org/10.1145/3192366.3192386
Kislal, Orhan ; Kotra, Jagadish ; Tang, Xulong ; Kandemir, Mahmut Taylan ; Jung, Myoungsoo. / Enhancing computation-to-core assignment with physical location information. PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. editor / Jeffrey S. Foster ; Dan Grossman ; Jeffrey S. Foster. Association for Computing Machinery, 2018. pp. 312-327 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).
@inproceedings{eba77b4b20884e189c7684b31f10ab6d,
title = "Enhancing computation-to-core assignment with physical location information",
abstract = "Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4{\%} in the case of private LLCs, and 43.8{\%} in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9{\%} and 12.7{\%} for the private LLC and shared LLC based systems, respectively.",
author = "Orhan Kislal and Jagadish Kotra and Xulong Tang and Kandemir, {Mahmut Taylan} and Myoungsoo Jung",
year = "2018",
month = "6",
day = "11",
doi = "10.1145/3192366.3192386",
language = "English",
series = "Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)",
publisher = "Association for Computing Machinery",
pages = "312--327",
editor = "Foster, {Jeffrey S.} and Dan Grossman and Foster, {Jeffrey S.}",
booktitle = "PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation",

}

Kislal, O, Kotra, J, Tang, X, Kandemir, MT & Jung, M 2018, Enhancing computation-to-core assignment with physical location information. in JS Foster, D Grossman & JS Foster (eds), PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Association for Computing Machinery, pp. 312-327, 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, United States, 18/6/18. https://doi.org/10.1145/3192366.3192386

Enhancing computation-to-core assignment with physical location information. / Kislal, Orhan; Kotra, Jagadish; Tang, Xulong; Kandemir, Mahmut Taylan; Jung, Myoungsoo.

PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ed. / Jeffrey S. Foster; Dan Grossman; Jeffrey S. Foster. Association for Computing Machinery, 2018. p. 312-327 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Enhancing computation-to-core assignment with physical location information

AU - Kislal, Orhan

AU - Kotra, Jagadish

AU - Tang, Xulong

AU - Kandemir, Mahmut Taylan

AU - Jung, Myoungsoo

PY - 2018/6/11

Y1 - 2018/6/11

N2 - Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4% in the case of private LLCs, and 43.8% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9% and 12.7% for the private LLC and shared LLC based systems, respectively.

AB - Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4% in the case of private LLCs, and 43.8% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9% and 12.7% for the private LLC and shared LLC based systems, respectively.

UR - http://www.scopus.com/inward/record.url?scp=85049567574&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049567574&partnerID=8YFLogxK

U2 - 10.1145/3192366.3192386

DO - 10.1145/3192366.3192386

M3 - Conference contribution

AN - SCOPUS:85049567574

T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

SP - 312

EP - 327

BT - PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

A2 - Foster, Jeffrey S.

A2 - Grossman, Dan

A2 - Foster, Jeffrey S.

PB - Association for Computing Machinery

ER -

Kislal O, Kotra J, Tang X, Kandemir MT, Jung M. Enhancing computation-to-core assignment with physical location information. In Foster JS, Grossman D, Foster JS, editors, PLDI 2018 - Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery. 2018. p. 312-327. (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). https://doi.org/10.1145/3192366.3192386