Profile-Guided deployment of stream programs on multicores

S. M. Farhad, Yousun Ko, Bernd Burgstaller, Bernhard Scholz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead. In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17% of an exhaustive profiling run over all data channels of a stream program.

Original languageEnglish
Title of host publicationProceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012
Pages79-88
Number of pages10
DOIs
Publication statusPublished - 2012 Jul 27
Event13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012 - Beijing, China
Duration: 2012 Jun 122012 Jun 13

Publication series

NameProceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)

Other

Other13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012
CountryChina
CityBeijing
Period12/6/1212/6/13

Fingerprint

Communication
Costs
Computer programming
Computer programming languages
Linear programming
Signal processing
Network protocols
Data storage equipment
Processing
Industry
Experiments

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Farhad, S. M., Ko, Y., Burgstaller, B., & Scholz, B. (2012). Profile-Guided deployment of stream programs on multicores. In Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012 (pp. 79-88). (Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)). https://doi.org/10.1145/2248418.2248430
Farhad, S. M. ; Ko, Yousun ; Burgstaller, Bernd ; Scholz, Bernhard. / Profile-Guided deployment of stream programs on multicores. Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012. 2012. pp. 79-88 (Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)).
@inproceedings{b808d5a30bb34fe8b867e2daa9ce5326,
title = "Profile-Guided deployment of stream programs on multicores",
abstract = "Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead. In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17{\%} of an exhaustive profiling run over all data channels of a stream program.",
author = "Farhad, {S. M.} and Yousun Ko and Bernd Burgstaller and Bernhard Scholz",
year = "2012",
month = "7",
day = "27",
doi = "10.1145/2248418.2248430",
language = "English",
isbn = "9781450312127",
series = "Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)",
pages = "79--88",
booktitle = "Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012",

}

Farhad, SM, Ko, Y, Burgstaller, B & Scholz, B 2012, Profile-Guided deployment of stream programs on multicores. in Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012. Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 79-88, 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012, Beijing, China, 12/6/12. https://doi.org/10.1145/2248418.2248430

Profile-Guided deployment of stream programs on multicores. / Farhad, S. M.; Ko, Yousun; Burgstaller, Bernd; Scholz, Bernhard.

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012. 2012. p. 79-88 (Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Profile-Guided deployment of stream programs on multicores

AU - Farhad, S. M.

AU - Ko, Yousun

AU - Burgstaller, Bernd

AU - Scholz, Bernhard

PY - 2012/7/27

Y1 - 2012/7/27

N2 - Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead. In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17% of an exhaustive profiling run over all data channels of a stream program.

AB - Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead. In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17% of an exhaustive profiling run over all data channels of a stream program.

UR - http://www.scopus.com/inward/record.url?scp=84864134748&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864134748&partnerID=8YFLogxK

U2 - 10.1145/2248418.2248430

DO - 10.1145/2248418.2248430

M3 - Conference contribution

AN - SCOPUS:84864134748

SN - 9781450312127

T3 - Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)

SP - 79

EP - 88

BT - Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012

ER -

Farhad SM, Ko Y, Burgstaller B, Scholz B. Profile-Guided deployment of stream programs on multicores. In Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012. 2012. p. 79-88. (Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)). https://doi.org/10.1145/2248418.2248430