An efficient algorithm to compute differences between structured documents

Kyong Ho Lee, Yoon Chul Choy, Sung-Bae Cho

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

SGML/XML are having a profound impact on data modeling and processing. This paper presents an efficient algorithm to compute differences between old and new versions of an SGML/XML document. The difference between the two versions can be considered to be an edit script that transforms one document tree into another. The proposed algorithm is based on a hybridization of bottom-up and top-down methods: The matching relationships between nodes in the two versions are produced in a bottom-up manner and then the top-down breadth-first search computes an edit script. Faster matching is achieved because the algorithm does not need to investigate the possible existence of matchings for all nodes. Furthermore, it can detect structurally meaningful changes such as the movement and copy of a subtree as well as simple changes to the node itself like insertion, deletion, and update.

Original languageEnglish
Pages (from-to)965-979
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume16
Issue number8
DOIs
Publication statusPublished - 2004 Aug 1

Fingerprint

SGML
XML
Data structures

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

@article{8bc6b6476cf745989a338bde86d27542,
title = "An efficient algorithm to compute differences between structured documents",
abstract = "SGML/XML are having a profound impact on data modeling and processing. This paper presents an efficient algorithm to compute differences between old and new versions of an SGML/XML document. The difference between the two versions can be considered to be an edit script that transforms one document tree into another. The proposed algorithm is based on a hybridization of bottom-up and top-down methods: The matching relationships between nodes in the two versions are produced in a bottom-up manner and then the top-down breadth-first search computes an edit script. Faster matching is achieved because the algorithm does not need to investigate the possible existence of matchings for all nodes. Furthermore, it can detect structurally meaningful changes such as the movement and copy of a subtree as well as simple changes to the node itself like insertion, deletion, and update.",
author = "Lee, {Kyong Ho} and Choy, {Yoon Chul} and Sung-Bae Cho",
year = "2004",
month = "8",
day = "1",
doi = "10.1109/TKDE.2004.19",
language = "English",
volume = "16",
pages = "965--979",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "8",

}

An efficient algorithm to compute differences between structured documents. / Lee, Kyong Ho; Choy, Yoon Chul; Cho, Sung-Bae.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 8, 01.08.2004, p. 965-979.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An efficient algorithm to compute differences between structured documents

AU - Lee, Kyong Ho

AU - Choy, Yoon Chul

AU - Cho, Sung-Bae

PY - 2004/8/1

Y1 - 2004/8/1

N2 - SGML/XML are having a profound impact on data modeling and processing. This paper presents an efficient algorithm to compute differences between old and new versions of an SGML/XML document. The difference between the two versions can be considered to be an edit script that transforms one document tree into another. The proposed algorithm is based on a hybridization of bottom-up and top-down methods: The matching relationships between nodes in the two versions are produced in a bottom-up manner and then the top-down breadth-first search computes an edit script. Faster matching is achieved because the algorithm does not need to investigate the possible existence of matchings for all nodes. Furthermore, it can detect structurally meaningful changes such as the movement and copy of a subtree as well as simple changes to the node itself like insertion, deletion, and update.

AB - SGML/XML are having a profound impact on data modeling and processing. This paper presents an efficient algorithm to compute differences between old and new versions of an SGML/XML document. The difference between the two versions can be considered to be an edit script that transforms one document tree into another. The proposed algorithm is based on a hybridization of bottom-up and top-down methods: The matching relationships between nodes in the two versions are produced in a bottom-up manner and then the top-down breadth-first search computes an edit script. Faster matching is achieved because the algorithm does not need to investigate the possible existence of matchings for all nodes. Furthermore, it can detect structurally meaningful changes such as the movement and copy of a subtree as well as simple changes to the node itself like insertion, deletion, and update.

UR - http://www.scopus.com/inward/record.url?scp=4344572115&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4344572115&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2004.19

DO - 10.1109/TKDE.2004.19

M3 - Article

AN - SCOPUS:4344572115

VL - 16

SP - 965

EP - 979

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 8

ER -