Ensemble models for data-driven prediction of malware infections

Chanhyun Kang, Noseong Park, B. Aditya Prakash, Edoardo Serra, V. S. Subrahmanian

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Given a history of detected malware attacks, can we predict the number of malware infections in a country? Can we do this for different malware and countries? This is an important question which has numerous implications for cyber security, right from designing better anti-virus software, to designing and implementing targeted patches to more accurately measuring the economic impact of breaches. This problem is compounded by the fact that, as externals, we can only detect a fraction of actual malware infections. In this paper we address this problem using data from Symantec covering more than 1.4 million hosts and 50 malware spread across 2 years and multiple countries. We first carefully design domain-based features from both malware and machine-hosts perspectives. Secondly, inspired by epidemiological and information diffusion models, we design a novel temporal non-linear model for malware spread and detection. Finally we present ESM, an ensemble-based approach which combines both these methods to construct a more accurate algorithm. Using extensive experiments spanning multiple malware and countries, we show that ESM can effectively predict malware infection ratios over time (both the actual number and trend) upto 4 times better compared to several baselines on various metrics. Furthermore, ESM's performance is stable and robust even when the number of detected infections is low.

Original languageEnglish
Title of host publicationWSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery, Inc
Pages583-592
Number of pages10
ISBN (Electronic)9781450337168
DOIs
Publication statusPublished - 2016 Feb 8
Event9th ACM International Conference on Web Search and Data Mining, WSDM 2016 - San Francisco, United States
Duration: 2016 Feb 222016 Feb 25

Publication series

NameWSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining

Conference

Conference9th ACM International Conference on Web Search and Data Mining, WSDM 2016
CountryUnited States
CitySan Francisco
Period16/2/2216/2/25

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Ensemble models for data-driven prediction of malware infections'. Together they form a unique fingerprint.

  • Cite this

    Kang, C., Park, N., Prakash, B. A., Serra, E., & Subrahmanian, V. S. (2016). Ensemble models for data-driven prediction of malware infections. In WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining (pp. 583-592). (WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining). Association for Computing Machinery, Inc. https://doi.org/10.1145/2835776.2835834