Credible, resilient, and scalable detection of software plagiarism using authority histograms

Dong Kyu Chae, Jiwoon Ha, Sang Wook Kim, Boo Joong Kang, Eul Gyu Im, Sunju Park

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Software plagiarism has become a serious threat to the health of software industry. A software birthmark indicates unique characteristics of a program that can be used to analyze the similarity between two programs and provide proof of plagiarism. In this paper, we propose a novel birthmark, Authority Histograms (AH), which can satisfy three essential requirements for good birthmarks - resiliency, credibility, and scalability. Existing birthmarks fail to satisfy all of them simultaneously. AH reflects not only the frequency of APIs, but also their call orders, whereas previous birthmarks rarely consider them together. This property provides more accurate plagiarism detection, making our birthmark more resilient and credible than previously proposed birthmarks. By random walk with restart when generating AH, we make our proposal fully applicable to even large programs. Extensive experiments with a set of Windows applications verify that both the credibility and resiliency of AH exceed those of existing birthmarks; therefore AH provides improved accuracy in detecting plagiarism. Moreover, the construction and comparison phases of AH are established within a reasonable time.

Original languageEnglish
Pages (from-to)114-124
Number of pages11
JournalKnowledge-Based Systems
Volume95
DOIs
Publication statusPublished - 2016 Mar 1

Fingerprint

Application programming interfaces (API)
Scalability
Health
Industry
Experiments
Authority
Plagiarism
Software

All Science Journal Classification (ASJC) codes

  • Software
  • Management Information Systems
  • Information Systems and Management
  • Artificial Intelligence

Cite this

Chae, Dong Kyu ; Ha, Jiwoon ; Kim, Sang Wook ; Kang, Boo Joong ; Im, Eul Gyu ; Park, Sunju. / Credible, resilient, and scalable detection of software plagiarism using authority histograms. In: Knowledge-Based Systems. 2016 ; Vol. 95. pp. 114-124.
@article{349bc3e2018d4690b28a1dc0c7fccf89,
title = "Credible, resilient, and scalable detection of software plagiarism using authority histograms",
abstract = "Software plagiarism has become a serious threat to the health of software industry. A software birthmark indicates unique characteristics of a program that can be used to analyze the similarity between two programs and provide proof of plagiarism. In this paper, we propose a novel birthmark, Authority Histograms (AH), which can satisfy three essential requirements for good birthmarks - resiliency, credibility, and scalability. Existing birthmarks fail to satisfy all of them simultaneously. AH reflects not only the frequency of APIs, but also their call orders, whereas previous birthmarks rarely consider them together. This property provides more accurate plagiarism detection, making our birthmark more resilient and credible than previously proposed birthmarks. By random walk with restart when generating AH, we make our proposal fully applicable to even large programs. Extensive experiments with a set of Windows applications verify that both the credibility and resiliency of AH exceed those of existing birthmarks; therefore AH provides improved accuracy in detecting plagiarism. Moreover, the construction and comparison phases of AH are established within a reasonable time.",
author = "Chae, {Dong Kyu} and Jiwoon Ha and Kim, {Sang Wook} and Kang, {Boo Joong} and Im, {Eul Gyu} and Sunju Park",
year = "2016",
month = "3",
day = "1",
doi = "10.1016/j.knosys.2015.12.009",
language = "English",
volume = "95",
pages = "114--124",
journal = "Knowledge-Based Systems",
issn = "0950-7051",
publisher = "Elsevier",

}

Credible, resilient, and scalable detection of software plagiarism using authority histograms. / Chae, Dong Kyu; Ha, Jiwoon; Kim, Sang Wook; Kang, Boo Joong; Im, Eul Gyu; Park, Sunju.

In: Knowledge-Based Systems, Vol. 95, 01.03.2016, p. 114-124.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Credible, resilient, and scalable detection of software plagiarism using authority histograms

AU - Chae, Dong Kyu

AU - Ha, Jiwoon

AU - Kim, Sang Wook

AU - Kang, Boo Joong

AU - Im, Eul Gyu

AU - Park, Sunju

PY - 2016/3/1

Y1 - 2016/3/1

N2 - Software plagiarism has become a serious threat to the health of software industry. A software birthmark indicates unique characteristics of a program that can be used to analyze the similarity between two programs and provide proof of plagiarism. In this paper, we propose a novel birthmark, Authority Histograms (AH), which can satisfy three essential requirements for good birthmarks - resiliency, credibility, and scalability. Existing birthmarks fail to satisfy all of them simultaneously. AH reflects not only the frequency of APIs, but also their call orders, whereas previous birthmarks rarely consider them together. This property provides more accurate plagiarism detection, making our birthmark more resilient and credible than previously proposed birthmarks. By random walk with restart when generating AH, we make our proposal fully applicable to even large programs. Extensive experiments with a set of Windows applications verify that both the credibility and resiliency of AH exceed those of existing birthmarks; therefore AH provides improved accuracy in detecting plagiarism. Moreover, the construction and comparison phases of AH are established within a reasonable time.

AB - Software plagiarism has become a serious threat to the health of software industry. A software birthmark indicates unique characteristics of a program that can be used to analyze the similarity between two programs and provide proof of plagiarism. In this paper, we propose a novel birthmark, Authority Histograms (AH), which can satisfy three essential requirements for good birthmarks - resiliency, credibility, and scalability. Existing birthmarks fail to satisfy all of them simultaneously. AH reflects not only the frequency of APIs, but also their call orders, whereas previous birthmarks rarely consider them together. This property provides more accurate plagiarism detection, making our birthmark more resilient and credible than previously proposed birthmarks. By random walk with restart when generating AH, we make our proposal fully applicable to even large programs. Extensive experiments with a set of Windows applications verify that both the credibility and resiliency of AH exceed those of existing birthmarks; therefore AH provides improved accuracy in detecting plagiarism. Moreover, the construction and comparison phases of AH are established within a reasonable time.

UR - http://www.scopus.com/inward/record.url?scp=84957727746&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84957727746&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2015.12.009

DO - 10.1016/j.knosys.2015.12.009

M3 - Article

AN - SCOPUS:84957727746

VL - 95

SP - 114

EP - 124

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

SN - 0950-7051

ER -