Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification

Sangheon Lee, Wooju Kim

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

In recent decades, analyzing the sentiments in online customer reviews has become important to many businesses and researchers. However, insufficient amount of labeled training corpus is a bottleneck for machine learning approaches. Self-training is one of the promising semi-supervised techniques which does not require large amounts of labeled data. However, self-training also suffers from an incorrect labeling problem along with insufficient amount of labeled data. This study proposed a semi-supervised learning framework that adds only confidently predicted data to the training corpus in order to enrich the initial classifier in self-training. The experimental results indicate that the proposed method performed better than self-training.

Original languageEnglish
Pages (from-to)35-49
Number of pages15
JournalElectronic Commerce Research and Applications
Volume26
DOIs
Publication statusPublished - 2017 Nov 1

Fingerprint

Labeling
Supervised learning
Learning systems
Sentiment
Sentiment classification
Classifiers
Industry

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computer Networks and Communications
  • Marketing
  • Management of Technology and Innovation

Cite this

@article{deab06f32f90417d9d5f5c808d00927f,
title = "Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification",
abstract = "In recent decades, analyzing the sentiments in online customer reviews has become important to many businesses and researchers. However, insufficient amount of labeled training corpus is a bottleneck for machine learning approaches. Self-training is one of the promising semi-supervised techniques which does not require large amounts of labeled data. However, self-training also suffers from an incorrect labeling problem along with insufficient amount of labeled data. This study proposed a semi-supervised learning framework that adds only confidently predicted data to the training corpus in order to enrich the initial classifier in self-training. The experimental results indicate that the proposed method performed better than self-training.",
author = "Sangheon Lee and Wooju Kim",
year = "2017",
month = "11",
day = "1",
doi = "10.1016/j.elerap.2017.09.006",
language = "English",
volume = "26",
pages = "35--49",
journal = "Electronic Commerce Research and Applications",
issn = "1567-4223",
publisher = "Elsevier",

}

TY - JOUR

T1 - Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification

AU - Lee, Sangheon

AU - Kim, Wooju

PY - 2017/11/1

Y1 - 2017/11/1

N2 - In recent decades, analyzing the sentiments in online customer reviews has become important to many businesses and researchers. However, insufficient amount of labeled training corpus is a bottleneck for machine learning approaches. Self-training is one of the promising semi-supervised techniques which does not require large amounts of labeled data. However, self-training also suffers from an incorrect labeling problem along with insufficient amount of labeled data. This study proposed a semi-supervised learning framework that adds only confidently predicted data to the training corpus in order to enrich the initial classifier in self-training. The experimental results indicate that the proposed method performed better than self-training.

AB - In recent decades, analyzing the sentiments in online customer reviews has become important to many businesses and researchers. However, insufficient amount of labeled training corpus is a bottleneck for machine learning approaches. Self-training is one of the promising semi-supervised techniques which does not require large amounts of labeled data. However, self-training also suffers from an incorrect labeling problem along with insufficient amount of labeled data. This study proposed a semi-supervised learning framework that adds only confidently predicted data to the training corpus in order to enrich the initial classifier in self-training. The experimental results indicate that the proposed method performed better than self-training.

UR - http://www.scopus.com/inward/record.url?scp=85030475861&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030475861&partnerID=8YFLogxK

U2 - 10.1016/j.elerap.2017.09.006

DO - 10.1016/j.elerap.2017.09.006

M3 - Article

VL - 26

SP - 35

EP - 49

JO - Electronic Commerce Research and Applications

JF - Electronic Commerce Research and Applications

SN - 1567-4223

ER -