Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature

Min Song, Xiaohua Hu, Illhoi Yoo, Eric Koppel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. In this paper, we propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, we apply the context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. We evaluated the proposed technique on article sets from MEDLINE, the largest biomedical digital library in Biomedicine. Our experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques.

Original languageEnglish
Title of host publicationData Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings
Pages217-226
Number of pages10
DOIs
Publication statusPublished - 2008 Oct 6
Event10th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2008 - Turin, Italy
Duration: 2008 Sep 22008 Sep 5

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5182 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2008
CountryItaly
CityTurin
Period08/9/208/9/5

Fingerprint

Document Clustering
Smoothing
Semantics
Clustering
Cell
Neural networks
Digital libraries
Information retrieval
Neural Networks
Digital Libraries
Experimental Evaluation
Information Retrieval
Retrieval
Model

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Song, M., Hu, X., Yoo, I., & Koppel, E. (2008). Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature. In Data Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings (pp. 217-226). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5182 LNCS). https://doi.org/10.1007/978-3-540-85836-2_21
Song, Min ; Hu, Xiaohua ; Yoo, Illhoi ; Koppel, Eric. / Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature. Data Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings. 2008. pp. 217-226 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{3d1b8f467382437e8ce5f697e6453b8e,
title = "Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature",
abstract = "The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. In this paper, we propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, we apply the context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. We evaluated the proposed technique on article sets from MEDLINE, the largest biomedical digital library in Biomedicine. Our experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques.",
author = "Min Song and Xiaohua Hu and Illhoi Yoo and Eric Koppel",
year = "2008",
month = "10",
day = "6",
doi = "10.1007/978-3-540-85836-2_21",
language = "English",
isbn = "3540858350",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "217--226",
booktitle = "Data Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings",

}

Song, M, Hu, X, Yoo, I & Koppel, E 2008, Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature. in Data Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5182 LNCS, pp. 217-226, 10th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2008, Turin, Italy, 08/9/2. https://doi.org/10.1007/978-3-540-85836-2_21

Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature. / Song, Min; Hu, Xiaohua; Yoo, Illhoi; Koppel, Eric.

Data Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings. 2008. p. 217-226 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5182 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature

AU - Song, Min

AU - Hu, Xiaohua

AU - Yoo, Illhoi

AU - Koppel, Eric

PY - 2008/10/6

Y1 - 2008/10/6

N2 - The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. In this paper, we propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, we apply the context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. We evaluated the proposed technique on article sets from MEDLINE, the largest biomedical digital library in Biomedicine. Our experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques.

AB - The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. In this paper, we propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, we apply the context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. We evaluated the proposed technique on article sets from MEDLINE, the largest biomedical digital library in Biomedicine. Our experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques.

UR - http://www.scopus.com/inward/record.url?scp=52949098384&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52949098384&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-85836-2_21

DO - 10.1007/978-3-540-85836-2_21

M3 - Conference contribution

AN - SCOPUS:52949098384

SN - 3540858350

SN - 9783540858355

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 217

EP - 226

BT - Data Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings

ER -

Song M, Hu X, Yoo I, Koppel E. Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature. In Data Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings. 2008. p. 217-226. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-85836-2_21