Relational databases are created for the purpose of handling and organizing sensitive data for organizations as well as for individuals. Although database security mechanisms and network intrusion detection systems (IDSs) are present, they have been found to be inadequate or unsuitable in detecting threats specifically directed toward the database application layer. Therefore, an IDS especially for the database is needed. In this paper, we propose random forest with weighted voting (WRF) and principal components analysis (PCA) as a feature selection technique, for the task of detecting database access anomalies, assuming that the database has a role-based access control (RBAC) model in place. PCA produces uncorrelated and relevant features, and, at the same time, reduces dimensionality for easier integration with large databases. RF exploits the inherent tree-structure syntax of SQL queries, and its weighted voting scheme further minimizes false alarms. Experiments showed that not only does the WRF result in improved false-positive and false-negative rates, but it is also fast in terms of model building and anomaly detection time. Moreover, for a given query, RF classification accuracy was found to be significantly affected by the type of command and the tables accessed, which, in turn, explains the confusion between some role classes. Lastly, both RF and PCA outperforms other state-of-the-art data mining techniques for the task of database anomaly detection, and WRF achieved the best performance, even on very skewed data.
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence