Opinion polarity detection in Twitter data combining shrinkage regression and topic modeling

Hyui Geon Yoon, Hyungjun Kim, Chang Ouk Kim, Min Song

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

We propose a method to analyze public opinion about political issues online by automatically detecting polarity in Twitter data. Previous studies have focused on the polarity classification of individual tweets. However, to understand the direction of public opinion on a political issue, it is important to analyze the degree of polarity on the major topics at the center of the discussion in addition to the individual tweets. The first stage of the proposed method detects polarity in tweets using the Lasso and Ridge models of shrinkage regression. The models are beneficial in that the regression results provide sentiment scores for the terms that appear in tweets. The second stage identifies the major topics via a latent Dirichlet analysis (LDA) topic model and estimates the degree of polarity on the LDA topics using term sentiment scores. To the best of our knowledge, our study is the first to predict the polarities of public opinion on topics in this manner. We conducted an experiment on a mayoral election in Seoul, South Korea and compared the total detection accuracy of the regression models with five support vector machine (SVM) models with different numbers of input terms selected by a feature selection algorithm. The results indicated that the performance of the Ridge model was approximately 7% higher on average than that of the SVM models. Additionally, the degree of polarity on the LDA topics estimated using the proposed method was compared with actual public opinion responses. The results showed that the polarity detection accuracy of the Lasso model was 83%, indicating that the proposed method was valid in most cases.

Original languageEnglish
Pages (from-to)634-644
Number of pages11
JournalJournal of Informetrics
Volume10
Issue number2
DOIs
Publication statusPublished - 2016 May 1

Fingerprint

twitter
regression
public opinion
Support vector machines
South Korea
Feature extraction
election
experiment

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Library and Information Sciences

Cite this

@article{feab0b700b5945d098ff42875f2ad62c,
title = "Opinion polarity detection in Twitter data combining shrinkage regression and topic modeling",
abstract = "We propose a method to analyze public opinion about political issues online by automatically detecting polarity in Twitter data. Previous studies have focused on the polarity classification of individual tweets. However, to understand the direction of public opinion on a political issue, it is important to analyze the degree of polarity on the major topics at the center of the discussion in addition to the individual tweets. The first stage of the proposed method detects polarity in tweets using the Lasso and Ridge models of shrinkage regression. The models are beneficial in that the regression results provide sentiment scores for the terms that appear in tweets. The second stage identifies the major topics via a latent Dirichlet analysis (LDA) topic model and estimates the degree of polarity on the LDA topics using term sentiment scores. To the best of our knowledge, our study is the first to predict the polarities of public opinion on topics in this manner. We conducted an experiment on a mayoral election in Seoul, South Korea and compared the total detection accuracy of the regression models with five support vector machine (SVM) models with different numbers of input terms selected by a feature selection algorithm. The results indicated that the performance of the Ridge model was approximately 7{\%} higher on average than that of the SVM models. Additionally, the degree of polarity on the LDA topics estimated using the proposed method was compared with actual public opinion responses. The results showed that the polarity detection accuracy of the Lasso model was 83{\%}, indicating that the proposed method was valid in most cases.",
author = "Yoon, {Hyui Geon} and Hyungjun Kim and Kim, {Chang Ouk} and Min Song",
year = "2016",
month = "5",
day = "1",
doi = "10.1016/j.joi.2016.03.006",
language = "English",
volume = "10",
pages = "634--644",
journal = "Journal of Informetrics",
issn = "1751-1577",
publisher = "Elsevier BV",
number = "2",

}

Opinion polarity detection in Twitter data combining shrinkage regression and topic modeling. / Yoon, Hyui Geon; Kim, Hyungjun; Kim, Chang Ouk; Song, Min.

In: Journal of Informetrics, Vol. 10, No. 2, 01.05.2016, p. 634-644.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Opinion polarity detection in Twitter data combining shrinkage regression and topic modeling

AU - Yoon, Hyui Geon

AU - Kim, Hyungjun

AU - Kim, Chang Ouk

AU - Song, Min

PY - 2016/5/1

Y1 - 2016/5/1

N2 - We propose a method to analyze public opinion about political issues online by automatically detecting polarity in Twitter data. Previous studies have focused on the polarity classification of individual tweets. However, to understand the direction of public opinion on a political issue, it is important to analyze the degree of polarity on the major topics at the center of the discussion in addition to the individual tweets. The first stage of the proposed method detects polarity in tweets using the Lasso and Ridge models of shrinkage regression. The models are beneficial in that the regression results provide sentiment scores for the terms that appear in tweets. The second stage identifies the major topics via a latent Dirichlet analysis (LDA) topic model and estimates the degree of polarity on the LDA topics using term sentiment scores. To the best of our knowledge, our study is the first to predict the polarities of public opinion on topics in this manner. We conducted an experiment on a mayoral election in Seoul, South Korea and compared the total detection accuracy of the regression models with five support vector machine (SVM) models with different numbers of input terms selected by a feature selection algorithm. The results indicated that the performance of the Ridge model was approximately 7% higher on average than that of the SVM models. Additionally, the degree of polarity on the LDA topics estimated using the proposed method was compared with actual public opinion responses. The results showed that the polarity detection accuracy of the Lasso model was 83%, indicating that the proposed method was valid in most cases.

AB - We propose a method to analyze public opinion about political issues online by automatically detecting polarity in Twitter data. Previous studies have focused on the polarity classification of individual tweets. However, to understand the direction of public opinion on a political issue, it is important to analyze the degree of polarity on the major topics at the center of the discussion in addition to the individual tweets. The first stage of the proposed method detects polarity in tweets using the Lasso and Ridge models of shrinkage regression. The models are beneficial in that the regression results provide sentiment scores for the terms that appear in tweets. The second stage identifies the major topics via a latent Dirichlet analysis (LDA) topic model and estimates the degree of polarity on the LDA topics using term sentiment scores. To the best of our knowledge, our study is the first to predict the polarities of public opinion on topics in this manner. We conducted an experiment on a mayoral election in Seoul, South Korea and compared the total detection accuracy of the regression models with five support vector machine (SVM) models with different numbers of input terms selected by a feature selection algorithm. The results indicated that the performance of the Ridge model was approximately 7% higher on average than that of the SVM models. Additionally, the degree of polarity on the LDA topics estimated using the proposed method was compared with actual public opinion responses. The results showed that the polarity detection accuracy of the Lasso model was 83%, indicating that the proposed method was valid in most cases.

UR - http://www.scopus.com/inward/record.url?scp=84965180016&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84965180016&partnerID=8YFLogxK

U2 - 10.1016/j.joi.2016.03.006

DO - 10.1016/j.joi.2016.03.006

M3 - Article

AN - SCOPUS:84965180016

VL - 10

SP - 634

EP - 644

JO - Journal of Informetrics

JF - Journal of Informetrics

SN - 1751-1577

IS - 2

ER -