Blog classification using K-means

Jun Lee Ki, Lee Myungjin, Kim Woqju

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm's effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines' usability.

Original languageEnglish
Title of host publicationICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings
Pages61-67
Number of pages7
Publication statusPublished - 2009 Dec 1
EventICEIS 2009 - 11th International Conference on Enterprise Information Systems - Milan, Italy
Duration: 2009 May 62009 May 10

Publication series

NameICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings
VolumeSAIC

Other

OtherICEIS 2009 - 11th International Conference on Enterprise Information Systems
CountryItaly
CityMilan
Period09/5/609/5/10

Fingerprint

Blogs
Search engines
K-means
Websites

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management

Cite this

Ki, J. L., Myungjin, L., & Woqju, K. (2009). Blog classification using K-means. In ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings (pp. 61-67). (ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings; Vol. SAIC).
Ki, Jun Lee ; Myungjin, Lee ; Woqju, Kim. / Blog classification using K-means. ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings. 2009. pp. 61-67 (ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings).
@inproceedings{6b878debf00f433d90f19ffe3899aedc,
title = "Blog classification using K-means",
abstract = "With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm's effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines' usability.",
author = "Ki, {Jun Lee} and Lee Myungjin and Kim Woqju",
year = "2009",
month = "12",
day = "1",
language = "English",
isbn = "9789898111845",
series = "ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings",
pages = "61--67",
booktitle = "ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings",

}

Ki, JL, Myungjin, L & Woqju, K 2009, Blog classification using K-means. in ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings. ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings, vol. SAIC, pp. 61-67, ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Milan, Italy, 09/5/6.

Blog classification using K-means. / Ki, Jun Lee; Myungjin, Lee; Woqju, Kim.

ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings. 2009. p. 61-67 (ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings; Vol. SAIC).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Blog classification using K-means

AU - Ki, Jun Lee

AU - Myungjin, Lee

AU - Woqju, Kim

PY - 2009/12/1

Y1 - 2009/12/1

N2 - With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm's effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines' usability.

AB - With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm's effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines' usability.

UR - http://www.scopus.com/inward/record.url?scp=74549192024&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74549192024&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:74549192024

SN - 9789898111845

T3 - ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings

SP - 61

EP - 67

BT - ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings

ER -

Ki JL, Myungjin L, Woqju K. Blog classification using K-means. In ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings. 2009. p. 61-67. (ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings).