Anomalous query access detection in RBAC-administered databases with random forest and PCA

Charissa Ann Ronao, Sung-Bae Cho

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Relational databases are created for the purpose of handling and organizing sensitive data for organizations as well as for individuals. Although database security mechanisms and network intrusion detection systems (IDSs) are present, they have been found to be inadequate or unsuitable in detecting threats specifically directed toward the database application layer. Therefore, an IDS especially for the database is needed. In this paper, we propose random forest with weighted voting (WRF) and principal components analysis (PCA) as a feature selection technique, for the task of detecting database access anomalies, assuming that the database has a role-based access control (RBAC) model in place. PCA produces uncorrelated and relevant features, and, at the same time, reduces dimensionality for easier integration with large databases. RF exploits the inherent tree-structure syntax of SQL queries, and its weighted voting scheme further minimizes false alarms. Experiments showed that not only does the WRF result in improved false-positive and false-negative rates, but it is also fast in terms of model building and anomaly detection time. Moreover, for a given query, RF classification accuracy was found to be significantly affected by the type of command and the tables accessed, which, in turn, explains the confusion between some role classes. Lastly, both RF and PCA outperforms other state-of-the-art data mining techniques for the task of database anomaly detection, and WRF achieved the best performance, even on very skewed data.

Original languageEnglish
Pages (from-to)238-250
Number of pages13
JournalInformation sciences
Volume369
DOIs
Publication statusPublished - 2016 Nov 10

Fingerprint

Role-based Access Control
Random Forest
Access control
Principal component analysis
Principal Component Analysis
Anomalous
Query
Anomaly Detection
Voting
Intrusion detection
Database Security
Network Intrusion Detection
False Alarm
Intrusion Detection
Tree Structure
Relational Database
False Positive
Feature Selection
Anomaly
Dimensionality

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

@article{5a150e69bbae45e28d229a2a64b2ba9a,
title = "Anomalous query access detection in RBAC-administered databases with random forest and PCA",
abstract = "Relational databases are created for the purpose of handling and organizing sensitive data for organizations as well as for individuals. Although database security mechanisms and network intrusion detection systems (IDSs) are present, they have been found to be inadequate or unsuitable in detecting threats specifically directed toward the database application layer. Therefore, an IDS especially for the database is needed. In this paper, we propose random forest with weighted voting (WRF) and principal components analysis (PCA) as a feature selection technique, for the task of detecting database access anomalies, assuming that the database has a role-based access control (RBAC) model in place. PCA produces uncorrelated and relevant features, and, at the same time, reduces dimensionality for easier integration with large databases. RF exploits the inherent tree-structure syntax of SQL queries, and its weighted voting scheme further minimizes false alarms. Experiments showed that not only does the WRF result in improved false-positive and false-negative rates, but it is also fast in terms of model building and anomaly detection time. Moreover, for a given query, RF classification accuracy was found to be significantly affected by the type of command and the tables accessed, which, in turn, explains the confusion between some role classes. Lastly, both RF and PCA outperforms other state-of-the-art data mining techniques for the task of database anomaly detection, and WRF achieved the best performance, even on very skewed data.",
author = "Ronao, {Charissa Ann} and Sung-Bae Cho",
year = "2016",
month = "11",
day = "10",
doi = "10.1016/j.ins.2016.06.038",
language = "English",
volume = "369",
pages = "238--250",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

Anomalous query access detection in RBAC-administered databases with random forest and PCA. / Ronao, Charissa Ann; Cho, Sung-Bae.

In: Information sciences, Vol. 369, 10.11.2016, p. 238-250.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Anomalous query access detection in RBAC-administered databases with random forest and PCA

AU - Ronao, Charissa Ann

AU - Cho, Sung-Bae

PY - 2016/11/10

Y1 - 2016/11/10

N2 - Relational databases are created for the purpose of handling and organizing sensitive data for organizations as well as for individuals. Although database security mechanisms and network intrusion detection systems (IDSs) are present, they have been found to be inadequate or unsuitable in detecting threats specifically directed toward the database application layer. Therefore, an IDS especially for the database is needed. In this paper, we propose random forest with weighted voting (WRF) and principal components analysis (PCA) as a feature selection technique, for the task of detecting database access anomalies, assuming that the database has a role-based access control (RBAC) model in place. PCA produces uncorrelated and relevant features, and, at the same time, reduces dimensionality for easier integration with large databases. RF exploits the inherent tree-structure syntax of SQL queries, and its weighted voting scheme further minimizes false alarms. Experiments showed that not only does the WRF result in improved false-positive and false-negative rates, but it is also fast in terms of model building and anomaly detection time. Moreover, for a given query, RF classification accuracy was found to be significantly affected by the type of command and the tables accessed, which, in turn, explains the confusion between some role classes. Lastly, both RF and PCA outperforms other state-of-the-art data mining techniques for the task of database anomaly detection, and WRF achieved the best performance, even on very skewed data.

AB - Relational databases are created for the purpose of handling and organizing sensitive data for organizations as well as for individuals. Although database security mechanisms and network intrusion detection systems (IDSs) are present, they have been found to be inadequate or unsuitable in detecting threats specifically directed toward the database application layer. Therefore, an IDS especially for the database is needed. In this paper, we propose random forest with weighted voting (WRF) and principal components analysis (PCA) as a feature selection technique, for the task of detecting database access anomalies, assuming that the database has a role-based access control (RBAC) model in place. PCA produces uncorrelated and relevant features, and, at the same time, reduces dimensionality for easier integration with large databases. RF exploits the inherent tree-structure syntax of SQL queries, and its weighted voting scheme further minimizes false alarms. Experiments showed that not only does the WRF result in improved false-positive and false-negative rates, but it is also fast in terms of model building and anomaly detection time. Moreover, for a given query, RF classification accuracy was found to be significantly affected by the type of command and the tables accessed, which, in turn, explains the confusion between some role classes. Lastly, both RF and PCA outperforms other state-of-the-art data mining techniques for the task of database anomaly detection, and WRF achieved the best performance, even on very skewed data.

UR - http://www.scopus.com/inward/record.url?scp=84976618996&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976618996&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2016.06.038

DO - 10.1016/j.ins.2016.06.038

M3 - Article

VL - 369

SP - 238

EP - 250

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -