DBSTexC: Density-Based spatio-Textual clustering on twitter

Minh D. Nguyen, Won Yong Shin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Density-based spatial clustering of applications with noise (DBSCAN) is the most commonly used density-based clustering algorithm, where it can discover multiple clusters with arbitrary shapes. DBSCAN works properly when the input data type is homogeneous, but the DBSCAN’s approach may not be sufficient when the input dataset has textual heterogeneity (e.g., when we intend to find clusters from geo-tagged posts on social media relevant to a certain point-of-interest (POI)), thus leading to poor performance. In this paper, we present DBSTexC, a new density-based clustering algorithm using spatio-textual information on Twitter. We first define POI-relevant and POI-irrelevant tweets as the records that contain and do not contain a POI name or its coherent variations, respectively. By taking into account the fractions of POI-relevant and POI-irrelevant tweets, our DBSTexC algorithm shows a much higher clustering quality than the DBSCAN case in terms of the F1 score and its variants. DBSTexC can be thought of as a generalized version of DBSCAN due to the findings that it performs identically as DBSCAN when the inputs are homogeneous and far outperforms DBSCAN when the heterogeneous input data type is given.

Original languageEnglish
Title of host publicationProceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017
EditorsJana Diesner, Elena Ferrari, Guandong Xu
PublisherAssociation for Computing Machinery, Inc
Pages23-26
Number of pages4
ISBN (Electronic)9781450349932
DOIs
Publication statusPublished - 2017 Jul 31
Event9th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017 - Sydney, Australia
Duration: 2017 Jul 312017 Aug 3

Publication series

NameProceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017

Other

Other9th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017
CountryAustralia
CitySydney
Period17/7/3117/8/3

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems

Fingerprint Dive into the research topics of 'DBSTexC: Density-Based spatio-Textual clustering on twitter'. Together they form a unique fingerprint.

  • Cite this

    Nguyen, M. D., & Shin, W. Y. (2017). DBSTexC: Density-Based spatio-Textual clustering on twitter. In J. Diesner, E. Ferrari, & G. Xu (Eds.), Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017 (pp. 23-26). (Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017). Association for Computing Machinery, Inc. https://doi.org/10.1145/3110025.3110096