Surfacing code in the dark: an instant clone search approach

Jin Woo Park, Mu Woong Lee, Jong Won Roh, Seungwon Hwang, Sunghun Kim

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

In this paper, we study how to “surface” code for instant reference. A traditional mode of surfacing code has been treating code as text and applying keyword search techniques. However, many prior work observes the limitation of such approach: (1) semantic description of code is limited to comments and (2) syntactic keyword is often not selective enough. In contrast, we discuss enabling techniques and scenarios of instant semantic-based surfacing. For example, developers, during a development session, may reference the existing code sharing similar semantics, using his code so far as a query. In addition to such semantic-based surfacing, we also enhance keyword-based surfacing with semantics, by instantly adding semantic tags for code submitted to the repository. To achieve this goal, we first propose scalable indexing structures on vector abstractions of code. Our experimental results show our techniques outperform a state-of-the-art tool in efficiency without compromising accuracy. We then deploy our technique for instant search and tagging scenarios: For instant code search scenario, we demonstrate an instant clone search tool using our techniques, supporting sub-second search over 54 million LOC. For instant code tagging scenario, we propose an automatic instant code tagging algorithm to mine the meaningful tags from clones.

Original languageEnglish
Pages (from-to)727-759
Number of pages33
JournalKnowledge and Information Systems
Volume41
Issue number3
DOIs
Publication statusPublished - 2014 Nov 7

Fingerprint

Hard facing
Semantics
Syntactics

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Cite this

Park, Jin Woo ; Lee, Mu Woong ; Roh, Jong Won ; Hwang, Seungwon ; Kim, Sunghun. / Surfacing code in the dark : an instant clone search approach. In: Knowledge and Information Systems. 2014 ; Vol. 41, No. 3. pp. 727-759.
@article{3f79f9ac44f2474794ad99ea72fe6e94,
title = "Surfacing code in the dark: an instant clone search approach",
abstract = "In this paper, we study how to “surface” code for instant reference. A traditional mode of surfacing code has been treating code as text and applying keyword search techniques. However, many prior work observes the limitation of such approach: (1) semantic description of code is limited to comments and (2) syntactic keyword is often not selective enough. In contrast, we discuss enabling techniques and scenarios of instant semantic-based surfacing. For example, developers, during a development session, may reference the existing code sharing similar semantics, using his code so far as a query. In addition to such semantic-based surfacing, we also enhance keyword-based surfacing with semantics, by instantly adding semantic tags for code submitted to the repository. To achieve this goal, we first propose scalable indexing structures on vector abstractions of code. Our experimental results show our techniques outperform a state-of-the-art tool in efficiency without compromising accuracy. We then deploy our technique for instant search and tagging scenarios: For instant code search scenario, we demonstrate an instant clone search tool using our techniques, supporting sub-second search over 54 million LOC. For instant code tagging scenario, we propose an automatic instant code tagging algorithm to mine the meaningful tags from clones.",
author = "Park, {Jin Woo} and Lee, {Mu Woong} and Roh, {Jong Won} and Seungwon Hwang and Sunghun Kim",
year = "2014",
month = "11",
day = "7",
doi = "10.1007/s10115-013-0677-z",
language = "English",
volume = "41",
pages = "727--759",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer London",
number = "3",

}

Surfacing code in the dark : an instant clone search approach. / Park, Jin Woo; Lee, Mu Woong; Roh, Jong Won; Hwang, Seungwon; Kim, Sunghun.

In: Knowledge and Information Systems, Vol. 41, No. 3, 07.11.2014, p. 727-759.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Surfacing code in the dark

T2 - an instant clone search approach

AU - Park, Jin Woo

AU - Lee, Mu Woong

AU - Roh, Jong Won

AU - Hwang, Seungwon

AU - Kim, Sunghun

PY - 2014/11/7

Y1 - 2014/11/7

N2 - In this paper, we study how to “surface” code for instant reference. A traditional mode of surfacing code has been treating code as text and applying keyword search techniques. However, many prior work observes the limitation of such approach: (1) semantic description of code is limited to comments and (2) syntactic keyword is often not selective enough. In contrast, we discuss enabling techniques and scenarios of instant semantic-based surfacing. For example, developers, during a development session, may reference the existing code sharing similar semantics, using his code so far as a query. In addition to such semantic-based surfacing, we also enhance keyword-based surfacing with semantics, by instantly adding semantic tags for code submitted to the repository. To achieve this goal, we first propose scalable indexing structures on vector abstractions of code. Our experimental results show our techniques outperform a state-of-the-art tool in efficiency without compromising accuracy. We then deploy our technique for instant search and tagging scenarios: For instant code search scenario, we demonstrate an instant clone search tool using our techniques, supporting sub-second search over 54 million LOC. For instant code tagging scenario, we propose an automatic instant code tagging algorithm to mine the meaningful tags from clones.

AB - In this paper, we study how to “surface” code for instant reference. A traditional mode of surfacing code has been treating code as text and applying keyword search techniques. However, many prior work observes the limitation of such approach: (1) semantic description of code is limited to comments and (2) syntactic keyword is often not selective enough. In contrast, we discuss enabling techniques and scenarios of instant semantic-based surfacing. For example, developers, during a development session, may reference the existing code sharing similar semantics, using his code so far as a query. In addition to such semantic-based surfacing, we also enhance keyword-based surfacing with semantics, by instantly adding semantic tags for code submitted to the repository. To achieve this goal, we first propose scalable indexing structures on vector abstractions of code. Our experimental results show our techniques outperform a state-of-the-art tool in efficiency without compromising accuracy. We then deploy our technique for instant search and tagging scenarios: For instant code search scenario, we demonstrate an instant clone search tool using our techniques, supporting sub-second search over 54 million LOC. For instant code tagging scenario, we propose an automatic instant code tagging algorithm to mine the meaningful tags from clones.

UR - http://www.scopus.com/inward/record.url?scp=84911951983&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911951983&partnerID=8YFLogxK

U2 - 10.1007/s10115-013-0677-z

DO - 10.1007/s10115-013-0677-z

M3 - Article

AN - SCOPUS:84911951983

VL - 41

SP - 727

EP - 759

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

IS - 3

ER -