As a prerequisite for information integration, this paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the values of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works in terms of the accuracy of clustering, the clustering rate, the quality of clustering, and the time complexity.
|Number of pages||11|
|Journal||Journal of Computer Information Systems|
|Publication status||Published - 2005 Dec 1|
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Networks and Communications