An incremental document clustering for the large document database

Kil Hong Joo, Won Suk Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF×NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages374-387
Number of pages14
DOIs
Publication statusPublished - 2005 Dec 1
Event2nd Asia Information Retrieval Symposium, AIRS 2005 - Jeju Island, Korea, Republic of
Duration: 2005 Oct 132005 Oct 15

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3689 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other2nd Asia Information Retrieval Symposium, AIRS 2005
CountryKorea, Republic of
CityJeju Island
Period05/10/1305/10/15

Fingerprint

Document Clustering
Clustering algorithms
Internet
Clustering Algorithm
Incremental Algorithm
Assign
Correctness
Clustering
Necessary
Series
Experiments
Experiment

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Joo, K. H., & Lee, W. S. (2005). An incremental document clustering for the large document database. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 374-387). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3689 LNCS). https://doi.org/10.1007/11562382_29
Joo, Kil Hong ; Lee, Won Suk. / An incremental document clustering for the large document database. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. pp. 374-387 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{bd643fc006af4c4381eac08c2c02fa04,
title = "An incremental document clustering for the large document database",
abstract = "With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF×NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.",
author = "Joo, {Kil Hong} and Lee, {Won Suk}",
year = "2005",
month = "12",
day = "1",
doi = "10.1007/11562382_29",
language = "English",
isbn = "3540291865",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "374--387",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

Joo, KH & Lee, WS 2005, An incremental document clustering for the large document database. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3689 LNCS, pp. 374-387, 2nd Asia Information Retrieval Symposium, AIRS 2005, Jeju Island, Korea, Republic of, 05/10/13. https://doi.org/10.1007/11562382_29

An incremental document clustering for the large document database. / Joo, Kil Hong; Lee, Won Suk.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. p. 374-387 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3689 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - An incremental document clustering for the large document database

AU - Joo, Kil Hong

AU - Lee, Won Suk

PY - 2005/12/1

Y1 - 2005/12/1

N2 - With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF×NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.

AB - With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF×NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.

UR - http://www.scopus.com/inward/record.url?scp=33646140773&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646140773&partnerID=8YFLogxK

U2 - 10.1007/11562382_29

DO - 10.1007/11562382_29

M3 - Conference contribution

AN - SCOPUS:33646140773

SN - 3540291865

SN - 9783540291862

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 374

EP - 387

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -

Joo KH, Lee WS. An incremental document clustering for the large document database. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. p. 374-387. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11562382_29