Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature

Min Song, Xiaohua Hu, Illhoi Yoo, Eric Koppel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. In this paper, we propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, we apply the context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. We evaluated the proposed technique on article sets from MEDLINE, the largest biomedical digital library in Biomedicine. Our experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques.

Original languageEnglish
Title of host publicationData Warehousing and Knowledge Discovery - 10th International Conference, DaWaK 2008, Proceedings
Pages217-226
Number of pages10
DOIs
Publication statusPublished - 2008
Event10th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2008 - Turin, Italy
Duration: 2008 Sept 22008 Sept 5

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5182 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2008
Country/TerritoryItaly
CityTurin
Period08/9/208/9/5

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Document clustering by semantic smoothing and Dynamic Growing Cell Structure (DynGCS) for biomedical literature'. Together they form a unique fingerprint.

Cite this