When a new cyber-vulnerability is detected, a Common Vulnerability and Exposure (CVE) number is attached to it. Malicious “exploits” may use these vulnerabilities to carry out attacks. Unlike works which study if a CVE will be used in an exploit, we study the problem of predicting when an exploit is first seen. This is an important question for system administrators as they need to devote scarce resources to take corrective action when a new vulnerability emerges. Moreover, past works assume that CVSS scores (released by NIST) are available for predictions, but we show on average that 49% of real world exploits occur before CVSS scores are published. This means that past works, which use CVSS scores, miss almost half of the exploits. In this paper, we propose a novel framework to predict when a vulnerability will be exploited via Twitter discussion, without using CVSS score information. We introduce the unique concept of a family of CVE-Author-Tweet (CAT) graphs and build a novel set of features based on such graphs. We define recurrence relations capturing “hotness” of tweets, “expertise” of Twitter users on CVEs, and “availability” of information about CVEs, and prove that we can solve these recurrences via a fix point algorithm. Our second innovation adopts Hawkes processes to estimate the number of tweets/retweets related to the CVEs. Using the above two sets of novel features, we propose two ensemble forecast models FEEU (for classification) and FRET (for regression) to predict when a CVE will be exploited. Compared with natural adaptations of past works (which predict if an exploit will be used), FEEU increases F1 score by 25.1%, while FRET decreases MAE by 37.2%.
|Title of host publication||KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|
|Publisher||Association for Computing Machinery|
|Number of pages||10|
|Publication status||Published - 2019 Jul 25|
|Event||25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019 - Anchorage, United States|
Duration: 2019 Aug 4 → 2019 Aug 8
|Name||Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|
|Conference||25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019|
|Period||19/8/4 → 19/8/8|
Bibliographical noteFunding Information:
This work is supported by ONR grants N00014-18-1-2670 and N00014-16-1-2896 and ARO grant W911NF-13-1-0421.
All Science Journal Classification (ASJC) codes
- Information Systems