TY - JOUR
T1 - Suicidality Detection on Social Media Using Metadata and Text Feature Extraction and Machine Learning
AU - Jung, Woojin
AU - Kim, Donghun
AU - Nam, Seojin
AU - Zhu, Yongjun
N1 - Publisher Copyright:
© 2021 International Academy for Suicide Research.
PY - 2021
Y1 - 2021
N2 - In this study, we implemented machine learning models that can detect suicidality posts on Twitter. We randomly selected and annotated 20,000 tweets and explored metadata and text features to build effective models. Metadata features were studied in great details to understand their possibility and importance in suicidality detection models. Results showed that posting type (i.e., reply or not) and time-related features such as the month, day of the week, and the time (AM vs. PM) were the most important metadata features in suicidality detection models. Specifically, the probability of a social media post being suicidal is higher if the post is a reply to other users rather than an original tweet. Moreover, tweets created in in the afternoon, on Fridays and weekends, and in fall have higher probabilities of being detected as suicidality tweets compared with those created in other times. By integrating metadata and text features, we obtained a model of good performance (i.e., F1 score of 0.846) that can assist humans in the real-world setting to detect suicidality social media posts.
AB - In this study, we implemented machine learning models that can detect suicidality posts on Twitter. We randomly selected and annotated 20,000 tweets and explored metadata and text features to build effective models. Metadata features were studied in great details to understand their possibility and importance in suicidality detection models. Results showed that posting type (i.e., reply or not) and time-related features such as the month, day of the week, and the time (AM vs. PM) were the most important metadata features in suicidality detection models. Specifically, the probability of a social media post being suicidal is higher if the post is a reply to other users rather than an original tweet. Moreover, tweets created in in the afternoon, on Fridays and weekends, and in fall have higher probabilities of being detected as suicidality tweets compared with those created in other times. By integrating metadata and text features, we obtained a model of good performance (i.e., F1 score of 0.846) that can assist humans in the real-world setting to detect suicidality social media posts.
UR - http://www.scopus.com/inward/record.url?scp=85111666321&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111666321&partnerID=8YFLogxK
U2 - 10.1080/13811118.2021.1955783
DO - 10.1080/13811118.2021.1955783
M3 - Article
AN - SCOPUS:85111666321
JO - Archives of Suicide Research
JF - Archives of Suicide Research
SN - 1381-1118
ER -