A dynamic and semantically-aware technique for document clustering in biomedical literature

Song Min, Hu Xiaohua, Yoo Illhoi, Eric Koppel

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

As an unsupervised learning process, document clustering has been used to improve information retrieval performance by grouping similar documents and to help text mining approaches by providing a high-quality input for them. In this article, the authors propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently, it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, the authors developed and applied a context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. They evaluated the proposed technique on biomedical article sets from MEDLINE, the largest biomedical digital library in the world. Their experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques including k-means and self-organizing map (SOM).

Original languageEnglish
Pages (from-to)44-57
Number of pages14
JournalInternational Journal of Data Warehousing and Mining
Volume5
Issue number4
DOIs
Publication statusPublished - 2009 Oct 1

Fingerprint

Semantics
Information retrieval
Neural networks
Unsupervised learning
Digital libraries
Self organizing maps

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture

Cite this

@article{70d7fa84552f434d9330c8aef2aa4fa7,
title = "A dynamic and semantically-aware technique for document clustering in biomedical literature",
abstract = "As an unsupervised learning process, document clustering has been used to improve information retrieval performance by grouping similar documents and to help text mining approaches by providing a high-quality input for them. In this article, the authors propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently, it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, the authors developed and applied a context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. They evaluated the proposed technique on biomedical article sets from MEDLINE, the largest biomedical digital library in the world. Their experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques including k-means and self-organizing map (SOM).",
author = "Song Min and Hu Xiaohua and Yoo Illhoi and Eric Koppel",
year = "2009",
month = "10",
day = "1",
doi = "10.4018/jdwm.2009080703",
language = "English",
volume = "5",
pages = "44--57",
journal = "International Journal of Data Warehousing and Mining",
issn = "1548-3924",
publisher = "IGI Publishing",
number = "4",

}

A dynamic and semantically-aware technique for document clustering in biomedical literature. / Min, Song; Xiaohua, Hu; Illhoi, Yoo; Koppel, Eric.

In: International Journal of Data Warehousing and Mining, Vol. 5, No. 4, 01.10.2009, p. 44-57.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A dynamic and semantically-aware technique for document clustering in biomedical literature

AU - Min, Song

AU - Xiaohua, Hu

AU - Illhoi, Yoo

AU - Koppel, Eric

PY - 2009/10/1

Y1 - 2009/10/1

N2 - As an unsupervised learning process, document clustering has been used to improve information retrieval performance by grouping similar documents and to help text mining approaches by providing a high-quality input for them. In this article, the authors propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently, it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, the authors developed and applied a context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. They evaluated the proposed technique on biomedical article sets from MEDLINE, the largest biomedical digital library in the world. Their experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques including k-means and self-organizing map (SOM).

AB - As an unsupervised learning process, document clustering has been used to improve information retrieval performance by grouping similar documents and to help text mining approaches by providing a high-quality input for them. In this article, the authors propose a novel hybrid clustering technique that incorporates semantic smoothing of document models into a neural network framework. Recently, it has been reported that the semantic smoothing model enhances the retrieval quality in Information Retrieval (IR). Inspired by that, the authors developed and applied a context-sensitive semantic smoothing model to boost accuracy of clustering that is generated by a dynamic growing cell structure algorithm, a variation of the neural network technique. They evaluated the proposed technique on biomedical article sets from MEDLINE, the largest biomedical digital library in the world. Their experimental evaluations show that the proposed algorithm significantly improves the clustering quality over the traditional clustering techniques including k-means and self-organizing map (SOM).

UR - http://www.scopus.com/inward/record.url?scp=70350064287&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350064287&partnerID=8YFLogxK

U2 - 10.4018/jdwm.2009080703

DO - 10.4018/jdwm.2009080703

M3 - Article

AN - SCOPUS:70350064287

VL - 5

SP - 44

EP - 57

JO - International Journal of Data Warehousing and Mining

JF - International Journal of Data Warehousing and Mining

SN - 1548-3924

IS - 4

ER -