Deep character-level anomaly detection based on a convolutional autoencoder for zero-day phishing url detection

Seok Jun Bu, Sung Bae Cho

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)


Considering the fatality of phishing attacks, the data-driven approach using massive URL observations has been verified, especially in the field of cyber security. On the other hand, the supervised learning approach relying on known attacks has limitations in terms of robustness against zero-day phishing attacks. Moreover, it is known that it is critical for the phishing detection task to fully exploit the sequential features from the URL characters. Taken together, to ensure both sustainability and intelligibility, we propose the combination of a convolution operation to model the character-level URL features and a deep convolutional autoencoder (CAE) to consider the nature of zero-day attacks. Extensive experiments on three real-world datasets consisting of 222,541 URLs showed the highest performance among the latest deep-learning methods. We demonstrated the superiority of the proposed method by receiver-operating characteristic (ROC) curve analysis in addition to 10-fold cross-validation and confirmed that the sensitivity improved by 3.98% compared to the latest deep model.

Original languageEnglish
Article number1492
JournalElectronics (Switzerland)
Issue number12
Publication statusPublished - 2021 Jun 2

Bibliographical note

Funding Information:
Funding: This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)).

Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Signal Processing
  • Hardware and Architecture
  • Computer Networks and Communications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Deep character-level anomaly detection based on a convolutional autoencoder for zero-day phishing url detection'. Together they form a unique fingerprint.

Cite this