Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification

Taeheung Kim, Byung Do Chung, Jong Seok Lee

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Naive Bayesian classification has been widely used in data mining area because of its simplicity and robustness to missing values and irrelevant attributes. However, naive Bayes classifiers sometimes show poor performance due to their unrealistic assumption that all attributes are equally important and conditionally independent of each other. In this research, we dispense with the former assumption by proposing a new attribute weighting method. The proposed method considers each attribute as a single classifier and measures its discriminating ability using the area under an ROC curve (AUC). Each AUC value is then used to weight the corresponding attribute. In addition, we try to reduce the complexity of classification models by selecting high AUC attributes. Using 20 real datasets from the machine learning repository at UC Irvine (UCI), we conduct a numerical experiment to show that the proposed method is an improvement over standard naive Bayes classification and existing weighting methods.

Original languageEnglish
Pages (from-to)203-218
Number of pages16
JournalComputing
Volume99
Issue number3
DOIs
Publication statusPublished - 2017 Mar 1

Fingerprint

Unbalanced Data
Data Classification
Naive Bayes
Operating Characteristics
Receiver
Attribute
Classifiers
Weighting
Data mining
Learning systems
Bayesian Classification
Naive Bayes Classifier
Missing Values
Receiver Operating Characteristic Curve
Repository
Simplicity
Data Mining
Machine Learning
Experiments
Classifier

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Numerical Analysis
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

@article{d911d68bbc9b443bbc06652a324198e0,
title = "Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification",
abstract = "Naive Bayesian classification has been widely used in data mining area because of its simplicity and robustness to missing values and irrelevant attributes. However, naive Bayes classifiers sometimes show poor performance due to their unrealistic assumption that all attributes are equally important and conditionally independent of each other. In this research, we dispense with the former assumption by proposing a new attribute weighting method. The proposed method considers each attribute as a single classifier and measures its discriminating ability using the area under an ROC curve (AUC). Each AUC value is then used to weight the corresponding attribute. In addition, we try to reduce the complexity of classification models by selecting high AUC attributes. Using 20 real datasets from the machine learning repository at UC Irvine (UCI), we conduct a numerical experiment to show that the proposed method is an improvement over standard naive Bayes classification and existing weighting methods.",
author = "Taeheung Kim and Chung, {Byung Do} and Lee, {Jong Seok}",
year = "2017",
month = "3",
day = "1",
doi = "10.1007/s00607-016-0483-z",
language = "English",
volume = "99",
pages = "203--218",
journal = "Computing (Vienna/New York)",
issn = "0010-485X",
publisher = "Springer Wien",
number = "3",

}

Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification. / Kim, Taeheung; Chung, Byung Do; Lee, Jong Seok.

In: Computing, Vol. 99, No. 3, 01.03.2017, p. 203-218.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification

AU - Kim, Taeheung

AU - Chung, Byung Do

AU - Lee, Jong Seok

PY - 2017/3/1

Y1 - 2017/3/1

N2 - Naive Bayesian classification has been widely used in data mining area because of its simplicity and robustness to missing values and irrelevant attributes. However, naive Bayes classifiers sometimes show poor performance due to their unrealistic assumption that all attributes are equally important and conditionally independent of each other. In this research, we dispense with the former assumption by proposing a new attribute weighting method. The proposed method considers each attribute as a single classifier and measures its discriminating ability using the area under an ROC curve (AUC). Each AUC value is then used to weight the corresponding attribute. In addition, we try to reduce the complexity of classification models by selecting high AUC attributes. Using 20 real datasets from the machine learning repository at UC Irvine (UCI), we conduct a numerical experiment to show that the proposed method is an improvement over standard naive Bayes classification and existing weighting methods.

AB - Naive Bayesian classification has been widely used in data mining area because of its simplicity and robustness to missing values and irrelevant attributes. However, naive Bayes classifiers sometimes show poor performance due to their unrealistic assumption that all attributes are equally important and conditionally independent of each other. In this research, we dispense with the former assumption by proposing a new attribute weighting method. The proposed method considers each attribute as a single classifier and measures its discriminating ability using the area under an ROC curve (AUC). Each AUC value is then used to weight the corresponding attribute. In addition, we try to reduce the complexity of classification models by selecting high AUC attributes. Using 20 real datasets from the machine learning repository at UC Irvine (UCI), we conduct a numerical experiment to show that the proposed method is an improvement over standard naive Bayes classification and existing weighting methods.

UR - http://www.scopus.com/inward/record.url?scp=85013678986&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013678986&partnerID=8YFLogxK

U2 - 10.1007/s00607-016-0483-z

DO - 10.1007/s00607-016-0483-z

M3 - Article

AN - SCOPUS:85013678986

VL - 99

SP - 203

EP - 218

JO - Computing (Vienna/New York)

JF - Computing (Vienna/New York)

SN - 0010-485X

IS - 3

ER -