Detecting Intrusive Malware with a Hybrid Generative Deep Learning Model

Jin Young Kim, Sung-Bae Cho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

A small amount of unknown malware can be analyzed manually, but it is generated with extremely more and more so that automatic detection of them is needed. Malware is usually generated with different features from those of existing ones (e.g., code exchange, null value insertion, or reorganization of subroutines) to avoid detection of antivirus systems. To detect malware with obfuscation, this paper proposes a method called latent semantic controlling generative adversarial networks (LSC-GAN) that learns to generate malware data with i-feature from a specific Gaussian distribution which represents i-feature and distinguish it from the real. Variational autoencoder (VAE) projects data to latent space for feature extraction and is transferred to generator (G) of LSC-GAN to train it stably. G generates data from Gaussian distribution, so it produces similar data but not identical to the actual data: it includes modified features compared with the real. The detector is inherited with transfer learning in a encoder that learns various malware features using real and modified data generated by the LSC-GAN based on a LSC-VAE. We show that LSC-GAN achieves detection accuracy of 96.97% on average that is higher than those of other conventional models. We demonstrate statistical significance of the performance of the proposed model using t-test. The result of detection is analyzed with confusion matrix and F1-score.

Original languageEnglish
Title of host publicationIntelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings
EditorsHujun Yin, Paulo Novais, David Camacho, Antonio J. Tallón-Ballesteros
PublisherSpringer Verlag
Pages499-507
Number of pages9
ISBN (Print)9783030034924
DOIs
Publication statusPublished - 2018 Jan 1
Event19th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2018 - Madrid, Spain
Duration: 2018 Nov 212018 Nov 23

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11314 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other19th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2018
CountrySpain
CityMadrid
Period18/11/2118/11/23

Fingerprint

Malware
Semantics
Gaussian distribution
Model
Transfer Learning
Subroutines
Obfuscation
t-test
Statistical Significance
Feature extraction
Encoder
Computer systems
Feature Extraction
Insertion
Null
Learning
Deep learning
Detectors
Detector
Generator

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Kim, J. Y., & Cho, S-B. (2018). Detecting Intrusive Malware with a Hybrid Generative Deep Learning Model. In H. Yin, P. Novais, D. Camacho, & A. J. Tallón-Ballesteros (Eds.), Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings (pp. 499-507). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11314 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-03493-1_52
Kim, Jin Young ; Cho, Sung-Bae. / Detecting Intrusive Malware with a Hybrid Generative Deep Learning Model. Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings. editor / Hujun Yin ; Paulo Novais ; David Camacho ; Antonio J. Tallón-Ballesteros. Springer Verlag, 2018. pp. 499-507 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5ca77d6c13924678a96be5ba0bd252a4,
title = "Detecting Intrusive Malware with a Hybrid Generative Deep Learning Model",
abstract = "A small amount of unknown malware can be analyzed manually, but it is generated with extremely more and more so that automatic detection of them is needed. Malware is usually generated with different features from those of existing ones (e.g., code exchange, null value insertion, or reorganization of subroutines) to avoid detection of antivirus systems. To detect malware with obfuscation, this paper proposes a method called latent semantic controlling generative adversarial networks (LSC-GAN) that learns to generate malware data with i-feature from a specific Gaussian distribution which represents i-feature and distinguish it from the real. Variational autoencoder (VAE) projects data to latent space for feature extraction and is transferred to generator (G) of LSC-GAN to train it stably. G generates data from Gaussian distribution, so it produces similar data but not identical to the actual data: it includes modified features compared with the real. The detector is inherited with transfer learning in a encoder that learns various malware features using real and modified data generated by the LSC-GAN based on a LSC-VAE. We show that LSC-GAN achieves detection accuracy of 96.97{\%} on average that is higher than those of other conventional models. We demonstrate statistical significance of the performance of the proposed model using t-test. The result of detection is analyzed with confusion matrix and F1-score.",
author = "Kim, {Jin Young} and Sung-Bae Cho",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-03493-1_52",
language = "English",
isbn = "9783030034924",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "499--507",
editor = "Hujun Yin and Paulo Novais and David Camacho and Tall{\'o}n-Ballesteros, {Antonio J.}",
booktitle = "Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings",
address = "Germany",

}

Kim, JY & Cho, S-B 2018, Detecting Intrusive Malware with a Hybrid Generative Deep Learning Model. in H Yin, P Novais, D Camacho & AJ Tallón-Ballesteros (eds), Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11314 LNCS, Springer Verlag, pp. 499-507, 19th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2018, Madrid, Spain, 18/11/21. https://doi.org/10.1007/978-3-030-03493-1_52

Detecting Intrusive Malware with a Hybrid Generative Deep Learning Model. / Kim, Jin Young; Cho, Sung-Bae.

Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings. ed. / Hujun Yin; Paulo Novais; David Camacho; Antonio J. Tallón-Ballesteros. Springer Verlag, 2018. p. 499-507 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11314 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Detecting Intrusive Malware with a Hybrid Generative Deep Learning Model

AU - Kim, Jin Young

AU - Cho, Sung-Bae

PY - 2018/1/1

Y1 - 2018/1/1

N2 - A small amount of unknown malware can be analyzed manually, but it is generated with extremely more and more so that automatic detection of them is needed. Malware is usually generated with different features from those of existing ones (e.g., code exchange, null value insertion, or reorganization of subroutines) to avoid detection of antivirus systems. To detect malware with obfuscation, this paper proposes a method called latent semantic controlling generative adversarial networks (LSC-GAN) that learns to generate malware data with i-feature from a specific Gaussian distribution which represents i-feature and distinguish it from the real. Variational autoencoder (VAE) projects data to latent space for feature extraction and is transferred to generator (G) of LSC-GAN to train it stably. G generates data from Gaussian distribution, so it produces similar data but not identical to the actual data: it includes modified features compared with the real. The detector is inherited with transfer learning in a encoder that learns various malware features using real and modified data generated by the LSC-GAN based on a LSC-VAE. We show that LSC-GAN achieves detection accuracy of 96.97% on average that is higher than those of other conventional models. We demonstrate statistical significance of the performance of the proposed model using t-test. The result of detection is analyzed with confusion matrix and F1-score.

AB - A small amount of unknown malware can be analyzed manually, but it is generated with extremely more and more so that automatic detection of them is needed. Malware is usually generated with different features from those of existing ones (e.g., code exchange, null value insertion, or reorganization of subroutines) to avoid detection of antivirus systems. To detect malware with obfuscation, this paper proposes a method called latent semantic controlling generative adversarial networks (LSC-GAN) that learns to generate malware data with i-feature from a specific Gaussian distribution which represents i-feature and distinguish it from the real. Variational autoencoder (VAE) projects data to latent space for feature extraction and is transferred to generator (G) of LSC-GAN to train it stably. G generates data from Gaussian distribution, so it produces similar data but not identical to the actual data: it includes modified features compared with the real. The detector is inherited with transfer learning in a encoder that learns various malware features using real and modified data generated by the LSC-GAN based on a LSC-VAE. We show that LSC-GAN achieves detection accuracy of 96.97% on average that is higher than those of other conventional models. We demonstrate statistical significance of the performance of the proposed model using t-test. The result of detection is analyzed with confusion matrix and F1-score.

UR - http://www.scopus.com/inward/record.url?scp=85057101314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057101314&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-03493-1_52

DO - 10.1007/978-3-030-03493-1_52

M3 - Conference contribution

SN - 9783030034924

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 499

EP - 507

BT - Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings

A2 - Yin, Hujun

A2 - Novais, Paulo

A2 - Camacho, David

A2 - Tallón-Ballesteros, Antonio J.

PB - Springer Verlag

ER -

Kim JY, Cho S-B. Detecting Intrusive Malware with a Hybrid Generative Deep Learning Model. In Yin H, Novais P, Camacho D, Tallón-Ballesteros AJ, editors, Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings. Springer Verlag. 2018. p. 499-507. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-03493-1_52