The message passing interface (MPI) broadcast communication commonly causes a severe performance bottleneck in multicore system that uses distributed memory. Thus, in this paper, we propose a novel algorithm and hardware structure for the MPI broadcast communication to reduce the bottleneck situation. The transmission order is set based on the state of each processing node that comprises the multicore system, so the novel algorithm minimizes the performance degradation caused by conflict. The proposed scoreboard MPI unit is evaluated by modeling it with SystemC and implemented using VerilogHDL. The size of the proposed scoreboard MPI unit occupies less than 1.03% of the whole chip, and it yields a highly improved performance up to 75.48% as its maximum with 16 processing nodes. Hence, with respect to low-cost design and scalability, this scoreboard MPI unit is particularly useful towards increasing overall performance of the embedded MPSoC.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Computer Vision and Pattern Recognition
- Electrical and Electronic Engineering
- Artificial Intelligence