Scalability and state: A critical assessment of throughput obtainable on big data streaming frameworks for applications with and without state information

Shinhyung Yang, Yonguk Jeong, Changwan Hong, Hyunje Jun, Bernd Burgstaller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Emerging Big Data streaming applications are facing unbounded (infinite) data sets at a scale of millions of events per second. The information captured in a single event, e.g., GPS position information of mobile phone users, loses value (perishes) over time and requires sub-second latency responses. Conventional Cloud-based batch-processing platforms are inadequate to meet these constraints. Existing streaming engines exhibit low throughput and are thus equally ill-suited for emerging Big Data streaming applications. To validate this claim, we evaluated the Yahoo streaming benchmark and our own real-time trend detector on three state-of-the-art streaming engines: Apache Storm, Apache Flink and Spark Streaming. We adapted the Kieker dynamic profiling framework to gather accurate profiling information on the throughput and CPU utilization exhibited by the two benchmarks on the Google Compute Engine. To estimate the performance overhead incurred by current streaming engines, we re-implemented our Java-based trend detector as a multi-threaded, shared-memory application in C++. The achieved throughput of 3.2 million events per second on a stand-alone 2 CPU (44 cores) Intel Xeon E5-2699 v4 server is 44 times higher than the maximum throughput achieved with the Apache Storm version of the trend detector deployed on 30 virtual machines (nodes) in the Cloud. Our experiment suggests vertical scaling as a viable alternative to horizontal scaling, especially if shared state has to be maintained in a streaming application. For reproducibility, we have open-sourced our framework configurations on GitHub [1].

Original languageEnglish
Title of host publicationEuro-Par 2017
Subtitle of host publicationParallel Processing Workshops - Euro-Par 2017 International Workshops
EditorsDora B. Heras, Luc Bouge
PublisherSpringer Verlag
Pages141-152
Number of pages12
ISBN (Print)9783319751771
DOIs
Publication statusPublished - 2018 Jan 1
EventInternational Workshops on Parallel Processing, Euro-Par 2017 - Santiago de Compostela, Spain
Duration: 2017 Aug 282017 Aug 29

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10659 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherInternational Workshops on Parallel Processing, Euro-Par 2017
CountrySpain
CitySantiago de Compostela
Period17/8/2817/8/29

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Scalability and state: A critical assessment of throughput obtainable on big data streaming frameworks for applications with and without state information'. Together they form a unique fingerprint.

  • Cite this

    Yang, S., Jeong, Y., Hong, C., Jun, H., & Burgstaller, B. (2018). Scalability and state: A critical assessment of throughput obtainable on big data streaming frameworks for applications with and without state information. In D. B. Heras, & L. Bouge (Eds.), Euro-Par 2017: Parallel Processing Workshops - Euro-Par 2017 International Workshops (pp. 141-152). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10659 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-75178-8_12