Random effects logistic regression model for anomaly detection

Min Seok Mok, So Young Sohn, Yong Han Ju

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

As the influence of the internet continues to expand as a medium for communications and commerce, the threat from spammers, system attackers, and criminal enterprises has grown accordingly. This paper proposes a random effects logistic regression model to predict anomaly detection. Unlike the previous studies on anomaly detection, a random effects model was applied, which accommodates not only the risk factors of the exposures but also the uncertainty not explained by such factors. The specific factors of the risk category such as retained 'protocol type' and 'logged in' are included in the proposed model. The research is based on a sample of 49,427 random observations for 42 variables of the KDD-cup 1999 (Data Mining and Knowledge Discovery competition) data set that contains 'normal' and 'anomaly' connections. The proposed model has a classification accuracy of 98.94% for the training data set, while that for the validation data set is 98.68%.

Original languageEnglish
Pages (from-to)7162-7166
Number of pages5
JournalExpert Systems with Applications
Volume37
Issue number10
DOIs
Publication statusPublished - 2010 Jan 1

Fingerprint

Logistics
Data mining
Internet
Network protocols
Communication
Industry

All Science Journal Classification (ASJC) codes

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this

Mok, Min Seok ; Sohn, So Young ; Ju, Yong Han. / Random effects logistic regression model for anomaly detection. In: Expert Systems with Applications. 2010 ; Vol. 37, No. 10. pp. 7162-7166.
@article{476bc4a5a3074548b59b1ce69869433f,
title = "Random effects logistic regression model for anomaly detection",
abstract = "As the influence of the internet continues to expand as a medium for communications and commerce, the threat from spammers, system attackers, and criminal enterprises has grown accordingly. This paper proposes a random effects logistic regression model to predict anomaly detection. Unlike the previous studies on anomaly detection, a random effects model was applied, which accommodates not only the risk factors of the exposures but also the uncertainty not explained by such factors. The specific factors of the risk category such as retained 'protocol type' and 'logged in' are included in the proposed model. The research is based on a sample of 49,427 random observations for 42 variables of the KDD-cup 1999 (Data Mining and Knowledge Discovery competition) data set that contains 'normal' and 'anomaly' connections. The proposed model has a classification accuracy of 98.94{\%} for the training data set, while that for the validation data set is 98.68{\%}.",
author = "Mok, {Min Seok} and Sohn, {So Young} and Ju, {Yong Han}",
year = "2010",
month = "1",
day = "1",
doi = "10.1016/j.eswa.2010.04.017",
language = "English",
volume = "37",
pages = "7162--7166",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "10",

}

Random effects logistic regression model for anomaly detection. / Mok, Min Seok; Sohn, So Young; Ju, Yong Han.

In: Expert Systems with Applications, Vol. 37, No. 10, 01.01.2010, p. 7162-7166.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Random effects logistic regression model for anomaly detection

AU - Mok, Min Seok

AU - Sohn, So Young

AU - Ju, Yong Han

PY - 2010/1/1

Y1 - 2010/1/1

N2 - As the influence of the internet continues to expand as a medium for communications and commerce, the threat from spammers, system attackers, and criminal enterprises has grown accordingly. This paper proposes a random effects logistic regression model to predict anomaly detection. Unlike the previous studies on anomaly detection, a random effects model was applied, which accommodates not only the risk factors of the exposures but also the uncertainty not explained by such factors. The specific factors of the risk category such as retained 'protocol type' and 'logged in' are included in the proposed model. The research is based on a sample of 49,427 random observations for 42 variables of the KDD-cup 1999 (Data Mining and Knowledge Discovery competition) data set that contains 'normal' and 'anomaly' connections. The proposed model has a classification accuracy of 98.94% for the training data set, while that for the validation data set is 98.68%.

AB - As the influence of the internet continues to expand as a medium for communications and commerce, the threat from spammers, system attackers, and criminal enterprises has grown accordingly. This paper proposes a random effects logistic regression model to predict anomaly detection. Unlike the previous studies on anomaly detection, a random effects model was applied, which accommodates not only the risk factors of the exposures but also the uncertainty not explained by such factors. The specific factors of the risk category such as retained 'protocol type' and 'logged in' are included in the proposed model. The research is based on a sample of 49,427 random observations for 42 variables of the KDD-cup 1999 (Data Mining and Knowledge Discovery competition) data set that contains 'normal' and 'anomaly' connections. The proposed model has a classification accuracy of 98.94% for the training data set, while that for the validation data set is 98.68%.

UR - http://www.scopus.com/inward/record.url?scp=81355151606&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=81355151606&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2010.04.017

DO - 10.1016/j.eswa.2010.04.017

M3 - Article

AN - SCOPUS:81355151606

VL - 37

SP - 7162

EP - 7166

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 10

ER -