With the advent of sophisticated deep learning models, various methods for classifying malware from structural features of source codes have been devised. Nevertheless, recent advanced detection-avoidance techniques actively imitate structural features of benign programs and share vulnerable subroutines, making it difficult to distinguish malicious attacks. Therefore, a method to distinguish and classify similar malicious attacks is urgent and significant. In this paper, we propose a method based on a triplet network of learning the disentangled malware space from assembly-level features beyond the structural characteristics of malware. The method comprises two major components, which are 1) triplet loss-trained network to disentangle deep representation between malware being close in the latent vector space, and 2) genetic optimization of assembly-level features to resolve collisions between thousands of assembly-level features. Experiments with the assembly and binary code dataset released from Microsoft show that the proposed method outperforms existing methods based on structural features, achieving the highest performance in 10-fold cross-validation. Moreover, we demonstrate the superiority of disentangled representation for malware classification by visualizing the latent space and ROC curves.
|Title of host publication||Hybrid Artificial Intelligent Systems - 17th International Conference, HAIS 2022, Proceedings|
|Editors||Pablo García Bringas, Hilde Pérez García, Francisco Javier Martínez de Pisón, José Ramón Villar Flecha, Alicia Troncoso Lora, Enrique A. de la Cal, Alvaro Herrero, Francisco Martínez Álvarez, Giuseppe Psaila, Hector Quintián, Emilio Corchado|
|Publisher||Springer Science and Business Media Deutschland GmbH|
|Number of pages||12|
|Publication status||Published - 2022|
|Event||17th International Conference on Hybrid Artificial Intelligence Systems, HAIS 2022 - Salamancaa, Spain|
Duration: 2022 Sept 5 → 2022 Sept 7
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||17th International Conference on Hybrid Artificial Intelligence Systems, HAIS 2022|
|Period||22/9/5 → 22/9/7|
Bibliographical noteFunding Information:
Acknowledgements. This work was supported by an IITP grant funded by the Korean government (MSIT) (No.2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)) and an ETRI grant funded by the Korean government (22ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System).
© 2022, Springer Nature Switzerland AG.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)