Identification and Restoration of LZ77 Compressed Data Using a Machine Learning Approach

Beom Kwon, Myongsik Gong, Jungwoo Huh, Sanghoon Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Identifying the type of a codec that used to compress data is essential in digital forensics since many trials and errors required to restore data can be reduced. Nevertheless, most compression algorithms have been configured by using several parameters whose values can be different according to each user. Therefore, in order to restore data more effectively, the values of parameters as well as the type of the codec must be identified. In this paper, we present an identification and restoration method for Lempel-Ziv-77 (LZ77) compressed data. In the proposed method, we identify whether a given data is compressed by LZ77 or not. Moreover, we estimate the values of parameters that were used for compression. Using the estimated parameters, we restore the original data from the LZ77 compressed data. The simulation results demonstrate the feasibility and effectiveness of the proposed method with a successful compression identification and parameter estimation accuracies of 100% and 84.41%.

Original languageEnglish
Title of host publication2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1787-1790
Number of pages4
ISBN (Electronic)9789881476852
DOIs
Publication statusPublished - 2019 Mar 4
Event10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Honolulu, United States
Duration: 2018 Nov 122018 Nov 15

Publication series

Name2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

Conference

Conference10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
CountryUnited States
CityHonolulu
Period18/11/1218/11/15

Fingerprint

Parameter estimation
Restoration
Learning systems
Identification (control systems)
Digital forensics

All Science Journal Classification (ASJC) codes

  • Information Systems

Cite this

Kwon, B., Gong, M., Huh, J., & Lee, S. (2019). Identification and Restoration of LZ77 Compressed Data Using a Machine Learning Approach. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings (pp. 1787-1790). [8659755] (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/APSIPA.2018.8659755
Kwon, Beom ; Gong, Myongsik ; Huh, Jungwoo ; Lee, Sanghoon. / Identification and Restoration of LZ77 Compressed Data Using a Machine Learning Approach. 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 1787-1790 (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings).
@inproceedings{105e40687f464a24b96f2235c1f56794,
title = "Identification and Restoration of LZ77 Compressed Data Using a Machine Learning Approach",
abstract = "Identifying the type of a codec that used to compress data is essential in digital forensics since many trials and errors required to restore data can be reduced. Nevertheless, most compression algorithms have been configured by using several parameters whose values can be different according to each user. Therefore, in order to restore data more effectively, the values of parameters as well as the type of the codec must be identified. In this paper, we present an identification and restoration method for Lempel-Ziv-77 (LZ77) compressed data. In the proposed method, we identify whether a given data is compressed by LZ77 or not. Moreover, we estimate the values of parameters that were used for compression. Using the estimated parameters, we restore the original data from the LZ77 compressed data. The simulation results demonstrate the feasibility and effectiveness of the proposed method with a successful compression identification and parameter estimation accuracies of 100{\%} and 84.41{\%}.",
author = "Beom Kwon and Myongsik Gong and Jungwoo Huh and Sanghoon Lee",
year = "2019",
month = "3",
day = "4",
doi = "10.23919/APSIPA.2018.8659755",
language = "English",
series = "2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1787--1790",
booktitle = "2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings",
address = "United States",

}

Kwon, B, Gong, M, Huh, J & Lee, S 2019, Identification and Restoration of LZ77 Compressed Data Using a Machine Learning Approach. in 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings., 8659755, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 1787-1790, 10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018, Honolulu, United States, 18/11/12. https://doi.org/10.23919/APSIPA.2018.8659755

Identification and Restoration of LZ77 Compressed Data Using a Machine Learning Approach. / Kwon, Beom; Gong, Myongsik; Huh, Jungwoo; Lee, Sanghoon.

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 1787-1790 8659755 (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Identification and Restoration of LZ77 Compressed Data Using a Machine Learning Approach

AU - Kwon, Beom

AU - Gong, Myongsik

AU - Huh, Jungwoo

AU - Lee, Sanghoon

PY - 2019/3/4

Y1 - 2019/3/4

N2 - Identifying the type of a codec that used to compress data is essential in digital forensics since many trials and errors required to restore data can be reduced. Nevertheless, most compression algorithms have been configured by using several parameters whose values can be different according to each user. Therefore, in order to restore data more effectively, the values of parameters as well as the type of the codec must be identified. In this paper, we present an identification and restoration method for Lempel-Ziv-77 (LZ77) compressed data. In the proposed method, we identify whether a given data is compressed by LZ77 or not. Moreover, we estimate the values of parameters that were used for compression. Using the estimated parameters, we restore the original data from the LZ77 compressed data. The simulation results demonstrate the feasibility and effectiveness of the proposed method with a successful compression identification and parameter estimation accuracies of 100% and 84.41%.

AB - Identifying the type of a codec that used to compress data is essential in digital forensics since many trials and errors required to restore data can be reduced. Nevertheless, most compression algorithms have been configured by using several parameters whose values can be different according to each user. Therefore, in order to restore data more effectively, the values of parameters as well as the type of the codec must be identified. In this paper, we present an identification and restoration method for Lempel-Ziv-77 (LZ77) compressed data. In the proposed method, we identify whether a given data is compressed by LZ77 or not. Moreover, we estimate the values of parameters that were used for compression. Using the estimated parameters, we restore the original data from the LZ77 compressed data. The simulation results demonstrate the feasibility and effectiveness of the proposed method with a successful compression identification and parameter estimation accuracies of 100% and 84.41%.

UR - http://www.scopus.com/inward/record.url?scp=85063498148&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063498148&partnerID=8YFLogxK

U2 - 10.23919/APSIPA.2018.8659755

DO - 10.23919/APSIPA.2018.8659755

M3 - Conference contribution

AN - SCOPUS:85063498148

T3 - 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

SP - 1787

EP - 1790

BT - 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Kwon B, Gong M, Huh J, Lee S. Identification and Restoration of LZ77 Compressed Data Using a Machine Learning Approach. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. p. 1787-1790. 8659755. (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings). https://doi.org/10.23919/APSIPA.2018.8659755