Accelerating the execution of matrix languages on the cell broadband engine architecture

Raymes Khoury, bernd Burgstaller, Bernhard Scholz

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their script languages with matrix data types. Current implementations of matrix languages do not fully utilize high-performance, special-purpose chip architectures, such as the IBM PowerXCell processor (Cell). We present a new framework that extends Octave to harvest the computational power of the Cell. With this framework, the programmer is alleviated of the burden of introducing explicit notions of parallelism. Instead, the programmer uses a new matrix data type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into submatrices), scheduled and executed on the SPEs. Thereby, we exploit 1) data parallelism, 2) instruction level parallelism, 3) pipeline parallelism, and 4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.

Original languageEnglish
Article number5441290
Pages (from-to)7-21
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume23
Issue number1
DOIs
Publication statusPublished - 2011 Jan 1

Fingerprint

Engines
Processing
MATLAB
Pipelines
Semantics
Experiments

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

@article{b375131c33b84b39b755e527b7f19d7e,
title = "Accelerating the execution of matrix languages on the cell broadband engine architecture",
abstract = "Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their script languages with matrix data types. Current implementations of matrix languages do not fully utilize high-performance, special-purpose chip architectures, such as the IBM PowerXCell processor (Cell). We present a new framework that extends Octave to harvest the computational power of the Cell. With this framework, the programmer is alleviated of the burden of introducing explicit notions of parallelism. Instead, the programmer uses a new matrix data type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into submatrices), scheduled and executed on the SPEs. Thereby, we exploit 1) data parallelism, 2) instruction level parallelism, 3) pipeline parallelism, and 4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.",
author = "Raymes Khoury and bernd Burgstaller and Bernhard Scholz",
year = "2011",
month = "1",
day = "1",
doi = "10.1109/TPDS.2010.58",
language = "English",
volume = "23",
pages = "7--21",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "1",

}

Accelerating the execution of matrix languages on the cell broadband engine architecture. / Khoury, Raymes; Burgstaller, bernd; Scholz, Bernhard.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No. 1, 5441290, 01.01.2011, p. 7-21.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Accelerating the execution of matrix languages on the cell broadband engine architecture

AU - Khoury, Raymes

AU - Burgstaller, bernd

AU - Scholz, Bernhard

PY - 2011/1/1

Y1 - 2011/1/1

N2 - Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their script languages with matrix data types. Current implementations of matrix languages do not fully utilize high-performance, special-purpose chip architectures, such as the IBM PowerXCell processor (Cell). We present a new framework that extends Octave to harvest the computational power of the Cell. With this framework, the programmer is alleviated of the burden of introducing explicit notions of parallelism. Instead, the programmer uses a new matrix data type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into submatrices), scheduled and executed on the SPEs. Thereby, we exploit 1) data parallelism, 2) instruction level parallelism, 3) pipeline parallelism, and 4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.

AB - Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their script languages with matrix data types. Current implementations of matrix languages do not fully utilize high-performance, special-purpose chip architectures, such as the IBM PowerXCell processor (Cell). We present a new framework that extends Octave to harvest the computational power of the Cell. With this framework, the programmer is alleviated of the burden of introducing explicit notions of parallelism. Instead, the programmer uses a new matrix data type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into submatrices), scheduled and executed on the SPEs. Thereby, we exploit 1) data parallelism, 2) instruction level parallelism, 3) pipeline parallelism, and 4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.

UR - http://www.scopus.com/inward/record.url?scp=78649893133&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78649893133&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2010.58

DO - 10.1109/TPDS.2010.58

M3 - Article

VL - 23

SP - 7

EP - 21

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 1

M1 - 5441290

ER -