Emerging Big Data streaming applications are facing unbounded (infinite) data sets at a scale of millions of events per second. The information captured in a single event, e.g., GPS position information of mobile phone users, loses value (perishes) over time and requires sub-second latency responses. Conventional Cloud-based batch-processing platforms are inadequate to meet these constraints. Existing streaming engines exhibit low throughput and are thus equally ill-suited for emerging Big Data streaming applications. To validate this claim, we evaluated the Yahoo streaming benchmark and our own real-time trend detector on three state-of-the-art streaming engines: Apache Storm, Apache Flink and Spark Streaming. We adapted the Kieker dynamic profiling framework to gather accurate profiling information on the throughput and CPU utilization exhibited by the two benchmarks on the Google Compute Engine. To estimate the performance overhead incurred by current streaming engines, we re-implemented our Java-based trend detector as a multi-threaded, shared-memory application in C++. The achieved throughput of 3.2 million events per second on a stand-alone 2 CPU (44 cores) Intel Xeon E5-2699 v4 server is 44 times higher than the maximum throughput achieved with the Apache Storm version of the trend detector deployed on 30 virtual machines (nodes) in the Cloud. Our experiment suggests vertical scaling as a viable alternative to horizontal scaling, especially if shared state has to be maintained in a streaming application. For reproducibility, we have open-sourced our framework configurations on GitHub .
|Title of host publication||Euro-Par 2017|
|Subtitle of host publication||Parallel Processing Workshops - Euro-Par 2017 International Workshops|
|Editors||Dora B. Heras, Luc Bouge|
|Number of pages||12|
|Publication status||Published - 2018|
|Event||International Workshops on Parallel Processing, Euro-Par 2017 - Santiago de Compostela, Spain|
Duration: 2017 Aug 28 → 2017 Aug 29
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Other||International Workshops on Parallel Processing, Euro-Par 2017|
|City||Santiago de Compostela|
|Period||17/8/28 → 17/8/29|
Bibliographical noteFunding Information:
Acknowledgements. Research supported by the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT & Future Planning under grant NRF2015M3C4A7065522.
© Springer International Publishing AG, part of Springer Nature 2018.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)