Clustering of XML schemas for information integration

Tae Woo Rhim, Kyong H.O. Lee

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

As a prerequisite for information integration, this paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the values of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works in terms of the accuracy of clustering, the clustering rate, the quality of clustering, and the time complexity.

Original languageEnglish
Pages (from-to)3-13
Number of pages11
JournalJournal of Computer Information Systems
Volume46
Issue number2
Publication statusPublished - 2005 Dec 1

Fingerprint

XML
Costs
costs
Values

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Education
  • Computer Networks and Communications

Cite this

@article{40708b38e5404539b512b4331bce9b02,
title = "Clustering of XML schemas for information integration",
abstract = "As a prerequisite for information integration, this paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the values of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works in terms of the accuracy of clustering, the clustering rate, the quality of clustering, and the time complexity.",
author = "Rhim, {Tae Woo} and Lee, {Kyong H.O.}",
year = "2005",
month = "12",
day = "1",
language = "English",
volume = "46",
pages = "3--13",
journal = "Journal of Computer Information Systems",
issn = "0887-4417",
publisher = "International Association for Computer Information Systems",
number = "2",

}

Clustering of XML schemas for information integration. / Rhim, Tae Woo; Lee, Kyong H.O.

In: Journal of Computer Information Systems, Vol. 46, No. 2, 01.12.2005, p. 3-13.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Clustering of XML schemas for information integration

AU - Rhim, Tae Woo

AU - Lee, Kyong H.O.

PY - 2005/12/1

Y1 - 2005/12/1

N2 - As a prerequisite for information integration, this paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the values of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works in terms of the accuracy of clustering, the clustering rate, the quality of clustering, and the time complexity.

AB - As a prerequisite for information integration, this paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the values of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works in terms of the accuracy of clustering, the clustering rate, the quality of clustering, and the time complexity.

UR - http://www.scopus.com/inward/record.url?scp=33644505207&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33644505207&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33644505207

VL - 46

SP - 3

EP - 13

JO - Journal of Computer Information Systems

JF - Journal of Computer Information Systems

SN - 0887-4417

IS - 2

ER -