An efficient algorithm for clustering XML schemas

Tae Woo Rhim, Kyong Ho Lee, Myeong Cheol Ko

Research output: Contribution to journalArticle

Abstract

Schema clustering is important as a prerequisite to the integration of XML schemas. This paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the value of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works, resulting in a precision of 98% and a rate of clustering of 95% in average.

Original languageEnglish
Pages (from-to)372-377
Number of pages6
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3306
Publication statusPublished - 2004 Dec 1

Fingerprint

XML Schema
XML
Schema
Efficient Algorithms
Clustering
Hierarchical Clustering
Clustering Methods
Path
Costs
Experimental Results
Similarity

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

@article{78cba1c8c6dc4c78a288bb7863a75e62,
title = "An efficient algorithm for clustering XML schemas",
abstract = "Schema clustering is important as a prerequisite to the integration of XML schemas. This paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the value of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works, resulting in a precision of 98{\%} and a rate of clustering of 95{\%} in average.",
author = "Rhim, {Tae Woo} and Lee, {Kyong Ho} and Ko, {Myeong Cheol}",
year = "2004",
month = "12",
day = "1",
language = "English",
volume = "3306",
pages = "372--377",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - An efficient algorithm for clustering XML schemas

AU - Rhim, Tae Woo

AU - Lee, Kyong Ho

AU - Ko, Myeong Cheol

PY - 2004/12/1

Y1 - 2004/12/1

N2 - Schema clustering is important as a prerequisite to the integration of XML schemas. This paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the value of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works, resulting in a precision of 98% and a rate of clustering of 95% in average.

AB - Schema clustering is important as a prerequisite to the integration of XML schemas. This paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the value of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works, resulting in a precision of 98% and a rate of clustering of 95% in average.

UR - http://www.scopus.com/inward/record.url?scp=35048864905&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35048864905&partnerID=8YFLogxK

M3 - Article

VL - 3306

SP - 372

EP - 377

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -