Proposing a missing data method for hospitality research on online customer reviews: An application of imputation approach

Jewoo Kim, Jongho Im

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Purpose: The purpose of this paper is to introduce a new multiple imputation method that can effectively manage missing values in online review data, thereby allowing the online review analysis to yield valid results by using all available data. Design/methodology/approach: This study develops a missing data method based on the multivariate imputation chained equation to generate imputed values for online reviews. Sentiment analysis is used to incorporate customers’ textual opinions as the auxiliary information in the imputation procedures. To check the validity of the proposed imputation method, the authors apply this method to missing values of sub-ratings on hotel attributes in both the simulated and real Honolulu hotel review data sets. The estimation results are compared to those of different missing data techniques, namely, listwise deletion and conventional multiple imputation which does not consider text reviews. Findings: The findings from the simulation analysis show that the imputation method of the authors produces more efficient and less biased estimates compared to the other two missing data techniques when text reviews are possibly associated with the rating scores and response mechanism. When applying the imputation method to the real hotel review data, the findings show that the text sentiment-based propensity score can effectively explain the missingness of sub-ratings on hotel attributes, and the imputation method considering those propensity scores has better estimation results than the other techniques as in the simulation analysis. Originality/value: This study extends multiple imputation to online data considering its spontaneous and unstructured nature. This new method helps make the fuller use of the observed online data while avoiding potential missing problems.

Original languageEnglish
Pages (from-to)3250-3267
Number of pages18
JournalInternational Journal of Contemporary Hospitality Management
Volume30
Issue number11
DOIs
Publication statusPublished - 2018 Nov 12

Fingerprint

Missing data
Hospitality research
Imputation
method
simulation
methodology
Hotels
analysis
Online reviews
Rating
Multiple imputation
Simulation analysis
Propensity score
Missing values
attribute
Nature
Sentiment
Sentiment analysis
Design methodology
opinion

All Science Journal Classification (ASJC) codes

  • Tourism, Leisure and Hospitality Management

Cite this

@article{68701d2cc8bf4c91b0c882d48d2e1f72,
title = "Proposing a missing data method for hospitality research on online customer reviews: An application of imputation approach",
abstract = "Purpose: The purpose of this paper is to introduce a new multiple imputation method that can effectively manage missing values in online review data, thereby allowing the online review analysis to yield valid results by using all available data. Design/methodology/approach: This study develops a missing data method based on the multivariate imputation chained equation to generate imputed values for online reviews. Sentiment analysis is used to incorporate customers’ textual opinions as the auxiliary information in the imputation procedures. To check the validity of the proposed imputation method, the authors apply this method to missing values of sub-ratings on hotel attributes in both the simulated and real Honolulu hotel review data sets. The estimation results are compared to those of different missing data techniques, namely, listwise deletion and conventional multiple imputation which does not consider text reviews. Findings: The findings from the simulation analysis show that the imputation method of the authors produces more efficient and less biased estimates compared to the other two missing data techniques when text reviews are possibly associated with the rating scores and response mechanism. When applying the imputation method to the real hotel review data, the findings show that the text sentiment-based propensity score can effectively explain the missingness of sub-ratings on hotel attributes, and the imputation method considering those propensity scores has better estimation results than the other techniques as in the simulation analysis. Originality/value: This study extends multiple imputation to online data considering its spontaneous and unstructured nature. This new method helps make the fuller use of the observed online data while avoiding potential missing problems.",
author = "Jewoo Kim and Jongho Im",
year = "2018",
month = "11",
day = "12",
doi = "10.1108/IJCHM-10-2017-0708",
language = "English",
volume = "30",
pages = "3250--3267",
journal = "International Journal of Contemporary Hospitality Management",
issn = "0959-6119",
publisher = "Emerald Group Publishing Ltd.",
number = "11",

}

TY - JOUR

T1 - Proposing a missing data method for hospitality research on online customer reviews

T2 - An application of imputation approach

AU - Kim, Jewoo

AU - Im, Jongho

PY - 2018/11/12

Y1 - 2018/11/12

N2 - Purpose: The purpose of this paper is to introduce a new multiple imputation method that can effectively manage missing values in online review data, thereby allowing the online review analysis to yield valid results by using all available data. Design/methodology/approach: This study develops a missing data method based on the multivariate imputation chained equation to generate imputed values for online reviews. Sentiment analysis is used to incorporate customers’ textual opinions as the auxiliary information in the imputation procedures. To check the validity of the proposed imputation method, the authors apply this method to missing values of sub-ratings on hotel attributes in both the simulated and real Honolulu hotel review data sets. The estimation results are compared to those of different missing data techniques, namely, listwise deletion and conventional multiple imputation which does not consider text reviews. Findings: The findings from the simulation analysis show that the imputation method of the authors produces more efficient and less biased estimates compared to the other two missing data techniques when text reviews are possibly associated with the rating scores and response mechanism. When applying the imputation method to the real hotel review data, the findings show that the text sentiment-based propensity score can effectively explain the missingness of sub-ratings on hotel attributes, and the imputation method considering those propensity scores has better estimation results than the other techniques as in the simulation analysis. Originality/value: This study extends multiple imputation to online data considering its spontaneous and unstructured nature. This new method helps make the fuller use of the observed online data while avoiding potential missing problems.

AB - Purpose: The purpose of this paper is to introduce a new multiple imputation method that can effectively manage missing values in online review data, thereby allowing the online review analysis to yield valid results by using all available data. Design/methodology/approach: This study develops a missing data method based on the multivariate imputation chained equation to generate imputed values for online reviews. Sentiment analysis is used to incorporate customers’ textual opinions as the auxiliary information in the imputation procedures. To check the validity of the proposed imputation method, the authors apply this method to missing values of sub-ratings on hotel attributes in both the simulated and real Honolulu hotel review data sets. The estimation results are compared to those of different missing data techniques, namely, listwise deletion and conventional multiple imputation which does not consider text reviews. Findings: The findings from the simulation analysis show that the imputation method of the authors produces more efficient and less biased estimates compared to the other two missing data techniques when text reviews are possibly associated with the rating scores and response mechanism. When applying the imputation method to the real hotel review data, the findings show that the text sentiment-based propensity score can effectively explain the missingness of sub-ratings on hotel attributes, and the imputation method considering those propensity scores has better estimation results than the other techniques as in the simulation analysis. Originality/value: This study extends multiple imputation to online data considering its spontaneous and unstructured nature. This new method helps make the fuller use of the observed online data while avoiding potential missing problems.

UR - http://www.scopus.com/inward/record.url?scp=85053027466&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053027466&partnerID=8YFLogxK

U2 - 10.1108/IJCHM-10-2017-0708

DO - 10.1108/IJCHM-10-2017-0708

M3 - Article

AN - SCOPUS:85053027466

VL - 30

SP - 3250

EP - 3267

JO - International Journal of Contemporary Hospitality Management

JF - International Journal of Contemporary Hospitality Management

SN - 0959-6119

IS - 11

ER -