An efficient algorithm for clustering XML schemas

Tae Woo Rhim, Kyong Ho Lee, Myeong Cheol Ko

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Schema clustering is important as a prerequisite to the integration of XML schemas. This paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the value of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works, resulting in a precision of 98% and a rate of clustering of 95% in average.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsXiaofang Zhou, Maria E. Orlowska, Stanley Su, Mike P. Papazoglou, Keith G. Jeffery
PublisherSpringer Verlag
Pages372-377
Number of pages6
ISBN (Electronic)3540238948, 9783540238942
DOIs
Publication statusPublished - 2004

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3306
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'An efficient algorithm for clustering XML schemas'. Together they form a unique fingerprint.

  • Cite this

    Rhim, T. W., Lee, K. H., & Ko, M. C. (2004). An efficient algorithm for clustering XML schemas. In X. Zhou, M. E. Orlowska, S. Su, M. P. Papazoglou, & K. G. Jeffery (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 372-377). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3306). Springer Verlag. https://doi.org/10.1007/978-3-540-30480-7_38