Instant code clone search

Mu Woong Lee, Jong Won Roh, Seungwon Hwang, Sunghun Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a "post-mortem" approach involving the detection of clones "after" the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.

Original languageEnglish
Title of host publicationProceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE-18
Pages167-176
Number of pages10
DOIs
Publication statusPublished - 2010 Dec 1
Event18th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, FSE-18 - Santa Fe, NM, United States
Duration: 2010 Nov 72010 Nov 11

Publication series

NameProceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering

Other

Other18th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, FSE-18
CountryUnited States
CitySanta Fe, NM
Period10/11/710/11/11

Fingerprint

Search engines
Semantics

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Lee, M. W., Roh, J. W., Hwang, S., & Kim, S. (2010). Instant code clone search. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE-18 (pp. 167-176). (Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering). https://doi.org/10.1145/1882291.1882317
Lee, Mu Woong ; Roh, Jong Won ; Hwang, Seungwon ; Kim, Sunghun. / Instant code clone search. Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE-18. 2010. pp. 167-176 (Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering).
@inproceedings{1be1898380974753aeb86959ab18fb2e,
title = "Instant code clone search",
abstract = "In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a {"}post-mortem{"} approach involving the detection of clones {"}after{"} the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.",
author = "Lee, {Mu Woong} and Roh, {Jong Won} and Seungwon Hwang and Sunghun Kim",
year = "2010",
month = "12",
day = "1",
doi = "10.1145/1882291.1882317",
language = "English",
isbn = "9781605587912",
series = "Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering",
pages = "167--176",
booktitle = "Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE-18",

}

Lee, MW, Roh, JW, Hwang, S & Kim, S 2010, Instant code clone search. in Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE-18. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 167-176, 18th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, FSE-18, Santa Fe, NM, United States, 10/11/7. https://doi.org/10.1145/1882291.1882317

Instant code clone search. / Lee, Mu Woong; Roh, Jong Won; Hwang, Seungwon; Kim, Sunghun.

Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE-18. 2010. p. 167-176 (Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Instant code clone search

AU - Lee, Mu Woong

AU - Roh, Jong Won

AU - Hwang, Seungwon

AU - Kim, Sunghun

PY - 2010/12/1

Y1 - 2010/12/1

N2 - In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a "post-mortem" approach involving the detection of clones "after" the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.

AB - In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a "post-mortem" approach involving the detection of clones "after" the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.

UR - http://www.scopus.com/inward/record.url?scp=78751519375&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78751519375&partnerID=8YFLogxK

U2 - 10.1145/1882291.1882317

DO - 10.1145/1882291.1882317

M3 - Conference contribution

AN - SCOPUS:78751519375

SN - 9781605587912

T3 - Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering

SP - 167

EP - 176

BT - Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE-18

ER -

Lee MW, Roh JW, Hwang S, Kim S. Instant code clone search. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE-18. 2010. p. 167-176. (Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering). https://doi.org/10.1145/1882291.1882317