Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems with Failures

Jinbae Lee, Bobae Kim, Jong-Moon Chung

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks.

Original languageEnglish
Article number8605312
Pages (from-to)9658-9666
Number of pages9
JournalIEEE Access
Volume7
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Electric sparks
Scheduling
Big data
Processing
Industry
Compensation and Redress

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Cite this

@article{030203a3248348d49d686837a6c1992e,
title = "Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems with Failures",
abstract = "Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks.",
author = "Jinbae Lee and Bobae Kim and Jong-Moon Chung",
year = "2019",
month = "1",
day = "1",
doi = "10.1109/ACCESS.2019.2891001",
language = "English",
volume = "7",
pages = "9658--9666",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems with Failures. / Lee, Jinbae; Kim, Bobae; Chung, Jong-Moon.

In: IEEE Access, Vol. 7, 8605312, 01.01.2019, p. 9658-9666.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems with Failures

AU - Lee, Jinbae

AU - Kim, Bobae

AU - Chung, Jong-Moon

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks.

AB - Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks.

UR - http://www.scopus.com/inward/record.url?scp=85061182412&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061182412&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2891001

DO - 10.1109/ACCESS.2019.2891001

M3 - Article

VL - 7

SP - 9658

EP - 9666

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 8605312

ER -