TY - GEN
T1 - An incremental document clustering for the large document database
AU - Joo, Kil Hong
AU - Lee, Won Suk
PY - 2005
Y1 - 2005
N2 - With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF×NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.
AB - With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF×NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.
UR - http://www.scopus.com/inward/record.url?scp=33646140773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646140773&partnerID=8YFLogxK
U2 - 10.1007/11562382_29
DO - 10.1007/11562382_29
M3 - Conference contribution
AN - SCOPUS:33646140773
SN - 3540291865
SN - 9783540291862
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 374
EP - 387
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 2nd Asia Information Retrieval Symposium, AIRS 2005
Y2 - 13 October 2005 through 15 October 2005
ER -