TY - GEN
T1 - A fast fourier transformation algorithm for single-chip cloud computers using RCCE
AU - Sodsong, Wasuwee
AU - Burgstaller, Bernd
PY - 2011
Y1 - 2011
N2 - Multimedia applications, spectrum analyses and data compression algorithms employ Fourier transformations as one of their main components to transform series from the time to the spectral domain and vice versa. Effective fast Fourier transformation (FFT) algorithms imply performance enhancements in related applications. Although architecturespecific programs achieve better performance, many FFT implementations were designed to be hardware independent and unaware of the underlining architecture. In this paper we introduce a novel FFT algorithm based on the RCCE native message passing library for the single-chip cloud computer (SCC). We parallelized the recursive (divide-and-conquer) radix- 2 FFT such that the inputs for all processing units are independent. Private memories are used to avoid cache coherence issues, and the algorithm was designed to minimize the message passing overhead. Preliminary experimental results were conducted using the RCCE emulator on an Intel Xeon 2 CPU quad-core computer. The emulator results showed promising scalability and speed-ups over the sequential implementation. Based on hardware availability, we plan to run the experiments on real SCC hardware for the final version of this paper.
AB - Multimedia applications, spectrum analyses and data compression algorithms employ Fourier transformations as one of their main components to transform series from the time to the spectral domain and vice versa. Effective fast Fourier transformation (FFT) algorithms imply performance enhancements in related applications. Although architecturespecific programs achieve better performance, many FFT implementations were designed to be hardware independent and unaware of the underlining architecture. In this paper we introduce a novel FFT algorithm based on the RCCE native message passing library for the single-chip cloud computer (SCC). We parallelized the recursive (divide-and-conquer) radix- 2 FFT such that the inputs for all processing units are independent. Private memories are used to avoid cache coherence issues, and the algorithm was designed to minimize the message passing overhead. Preliminary experimental results were conducted using the RCCE emulator on an Intel Xeon 2 CPU quad-core computer. The emulator results showed promising scalability and speed-ups over the sequential implementation. Based on hardware availability, we plan to run the experiments on real SCC hardware for the final version of this paper.
UR - http://www.scopus.com/inward/record.url?scp=84870568817&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870568817&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84870568817
SN - 9783866447172
T3 - 3rd Many-Core Applications Research Community Symposium, MARC 2011
SP - 85
EP - 87
BT - 3rd Many-Core Applications Research Community Symposium, MARC 2011
T2 - 3rd Symposium on Many-Core Applications Research Community, MARC 2011
Y2 - 5 July 2011 through 6 July 2011
ER -