Understanding the correlations between social attention and topic trends of scientific publications

Xianlei Dong, Jian Xu, Ying Ding, Chenwei Zhang, Kunpeng Zhang, Min Song

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Purpose: We propose and apply a simplified nowcasting model to understand the correlations between social attention and topic trends of scientific publications. Design/methodology/approach: First, topics are generated from the obesity corpus by using the latent Dirichlet allocation (LDA) algorithm and time series of keyword search trends in Google Trends are obtained. We then establish the structural time series model using data from January 2004 to December 2012, and evaluate the model using data from January 2013. We employ a state-space model to separate different non-regression components in an observational time series (i.e. the tendency and the seasonality) and apply the “spike and slab prior” and stepwise regression to analyze the correlations between the regression component and the social media attention. The two parts are combined using Markov-chain Monte Carlo sampling techniques to obtain our results. Findings: The results of our study show that (1) the number of publications on child obesity increases at a lower rate than that of diabetes publications; (2) the number of publication on a given topic may exhibit a relationship with the season or time of year; and (3) there exists a correlation between the number of publications on a given topic and its social media attention, i.e. the search frequency related to that topic as identified by Google Trends. We found that our model is also able to predict the number of publications related to a given topic. Research limitations: First, we study a correlation rather than causality between topics' trends and social media. As a result, the relationships might not be robust, so we cannot predict the future in the long run. Second, we cannot identify the reasons or conditions that are driving obesity topics to present such tendencies and seasonal patterns, so we might need to do “field” study in the future. Third, we need to improve the efficiency of our model by finding more efficient variable selection models, because the stepwise regression method is time consuming, especially for a large number of variables. Practical implications: This paper analyzes publication topic trends from three perspectives: tendency, seasonality, and correlation with social media attention, providing a new perspective for identifying and understanding topical themes in academic publications. Originality/value: To the best of our knowledge, we are the first to apply the state-space model to examine the relationships between healthcare-related publications and social media to investigate the relationships between a topic's evolvement and people's search behavior in social media. This paper thus provides a new viewpoint in the correlation analysis area, and demonstrates the value of considering social media attention in the analysis of publication topic trends.

Original languageEnglish
Pages (from-to)28-49
Number of pages22
JournalJournal of Data and Information Science
Volume1
Issue number1
DOIs
Publication statusPublished - 2016 Feb

Fingerprint

social media
trend
time series
regression
search engine
Scientific publications
Social media
causality
chronic illness
efficiency
present
Obesity
methodology
Values
Seasonality
State-space model
Google
Stepwise regression

All Science Journal Classification (ASJC) codes

  • Public Administration
  • Library and Information Sciences
  • Information Systems and Management

Cite this

Dong, Xianlei ; Xu, Jian ; Ding, Ying ; Zhang, Chenwei ; Zhang, Kunpeng ; Song, Min. / Understanding the correlations between social attention and topic trends of scientific publications. In: Journal of Data and Information Science. 2016 ; Vol. 1, No. 1. pp. 28-49.
@article{2f51da388e3c489a98b18d6998d8a2c5,
title = "Understanding the correlations between social attention and topic trends of scientific publications",
abstract = "Purpose: We propose and apply a simplified nowcasting model to understand the correlations between social attention and topic trends of scientific publications. Design/methodology/approach: First, topics are generated from the obesity corpus by using the latent Dirichlet allocation (LDA) algorithm and time series of keyword search trends in Google Trends are obtained. We then establish the structural time series model using data from January 2004 to December 2012, and evaluate the model using data from January 2013. We employ a state-space model to separate different non-regression components in an observational time series (i.e. the tendency and the seasonality) and apply the “spike and slab prior” and stepwise regression to analyze the correlations between the regression component and the social media attention. The two parts are combined using Markov-chain Monte Carlo sampling techniques to obtain our results. Findings: The results of our study show that (1) the number of publications on child obesity increases at a lower rate than that of diabetes publications; (2) the number of publication on a given topic may exhibit a relationship with the season or time of year; and (3) there exists a correlation between the number of publications on a given topic and its social media attention, i.e. the search frequency related to that topic as identified by Google Trends. We found that our model is also able to predict the number of publications related to a given topic. Research limitations: First, we study a correlation rather than causality between topics' trends and social media. As a result, the relationships might not be robust, so we cannot predict the future in the long run. Second, we cannot identify the reasons or conditions that are driving obesity topics to present such tendencies and seasonal patterns, so we might need to do “field” study in the future. Third, we need to improve the efficiency of our model by finding more efficient variable selection models, because the stepwise regression method is time consuming, especially for a large number of variables. Practical implications: This paper analyzes publication topic trends from three perspectives: tendency, seasonality, and correlation with social media attention, providing a new perspective for identifying and understanding topical themes in academic publications. Originality/value: To the best of our knowledge, we are the first to apply the state-space model to examine the relationships between healthcare-related publications and social media to investigate the relationships between a topic's evolvement and people's search behavior in social media. This paper thus provides a new viewpoint in the correlation analysis area, and demonstrates the value of considering social media attention in the analysis of publication topic trends.",
author = "Xianlei Dong and Jian Xu and Ying Ding and Chenwei Zhang and Kunpeng Zhang and Min Song",
year = "2016",
month = "2",
doi = "10.20309/jdis.201604",
language = "English",
volume = "1",
pages = "28--49",
journal = "Journal of Data and Information Science",
issn = "2096-157X",
publisher = "De Gruyter Open Ltd.",
number = "1",

}

Understanding the correlations between social attention and topic trends of scientific publications. / Dong, Xianlei; Xu, Jian; Ding, Ying; Zhang, Chenwei; Zhang, Kunpeng; Song, Min.

In: Journal of Data and Information Science, Vol. 1, No. 1, 02.2016, p. 28-49.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Understanding the correlations between social attention and topic trends of scientific publications

AU - Dong, Xianlei

AU - Xu, Jian

AU - Ding, Ying

AU - Zhang, Chenwei

AU - Zhang, Kunpeng

AU - Song, Min

PY - 2016/2

Y1 - 2016/2

N2 - Purpose: We propose and apply a simplified nowcasting model to understand the correlations between social attention and topic trends of scientific publications. Design/methodology/approach: First, topics are generated from the obesity corpus by using the latent Dirichlet allocation (LDA) algorithm and time series of keyword search trends in Google Trends are obtained. We then establish the structural time series model using data from January 2004 to December 2012, and evaluate the model using data from January 2013. We employ a state-space model to separate different non-regression components in an observational time series (i.e. the tendency and the seasonality) and apply the “spike and slab prior” and stepwise regression to analyze the correlations between the regression component and the social media attention. The two parts are combined using Markov-chain Monte Carlo sampling techniques to obtain our results. Findings: The results of our study show that (1) the number of publications on child obesity increases at a lower rate than that of diabetes publications; (2) the number of publication on a given topic may exhibit a relationship with the season or time of year; and (3) there exists a correlation between the number of publications on a given topic and its social media attention, i.e. the search frequency related to that topic as identified by Google Trends. We found that our model is also able to predict the number of publications related to a given topic. Research limitations: First, we study a correlation rather than causality between topics' trends and social media. As a result, the relationships might not be robust, so we cannot predict the future in the long run. Second, we cannot identify the reasons or conditions that are driving obesity topics to present such tendencies and seasonal patterns, so we might need to do “field” study in the future. Third, we need to improve the efficiency of our model by finding more efficient variable selection models, because the stepwise regression method is time consuming, especially for a large number of variables. Practical implications: This paper analyzes publication topic trends from three perspectives: tendency, seasonality, and correlation with social media attention, providing a new perspective for identifying and understanding topical themes in academic publications. Originality/value: To the best of our knowledge, we are the first to apply the state-space model to examine the relationships between healthcare-related publications and social media to investigate the relationships between a topic's evolvement and people's search behavior in social media. This paper thus provides a new viewpoint in the correlation analysis area, and demonstrates the value of considering social media attention in the analysis of publication topic trends.

AB - Purpose: We propose and apply a simplified nowcasting model to understand the correlations between social attention and topic trends of scientific publications. Design/methodology/approach: First, topics are generated from the obesity corpus by using the latent Dirichlet allocation (LDA) algorithm and time series of keyword search trends in Google Trends are obtained. We then establish the structural time series model using data from January 2004 to December 2012, and evaluate the model using data from January 2013. We employ a state-space model to separate different non-regression components in an observational time series (i.e. the tendency and the seasonality) and apply the “spike and slab prior” and stepwise regression to analyze the correlations between the regression component and the social media attention. The two parts are combined using Markov-chain Monte Carlo sampling techniques to obtain our results. Findings: The results of our study show that (1) the number of publications on child obesity increases at a lower rate than that of diabetes publications; (2) the number of publication on a given topic may exhibit a relationship with the season or time of year; and (3) there exists a correlation between the number of publications on a given topic and its social media attention, i.e. the search frequency related to that topic as identified by Google Trends. We found that our model is also able to predict the number of publications related to a given topic. Research limitations: First, we study a correlation rather than causality between topics' trends and social media. As a result, the relationships might not be robust, so we cannot predict the future in the long run. Second, we cannot identify the reasons or conditions that are driving obesity topics to present such tendencies and seasonal patterns, so we might need to do “field” study in the future. Third, we need to improve the efficiency of our model by finding more efficient variable selection models, because the stepwise regression method is time consuming, especially for a large number of variables. Practical implications: This paper analyzes publication topic trends from three perspectives: tendency, seasonality, and correlation with social media attention, providing a new perspective for identifying and understanding topical themes in academic publications. Originality/value: To the best of our knowledge, we are the first to apply the state-space model to examine the relationships between healthcare-related publications and social media to investigate the relationships between a topic's evolvement and people's search behavior in social media. This paper thus provides a new viewpoint in the correlation analysis area, and demonstrates the value of considering social media attention in the analysis of publication topic trends.

UR - http://www.scopus.com/inward/record.url?scp=85050975722&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050975722&partnerID=8YFLogxK

U2 - 10.20309/jdis.201604

DO - 10.20309/jdis.201604

M3 - Article

AN - SCOPUS:85050975722

VL - 1

SP - 28

EP - 49

JO - Journal of Data and Information Science

JF - Journal of Data and Information Science

SN - 2096-157X

IS - 1

ER -