Density-based spatial clustering of applications with noise (DBSCAN) is the most commonly used density-based clustering algorithm, where it can discover multiple clusters with arbitrary shapes. DBSCAN works properly when the input data type is homogeneous, but the DBSCAN’s approach may not be sufficient when the input dataset has textual heterogeneity (e.g., when we intend to find clusters from geo-tagged posts on social media relevant to a certain point-of-interest (POI)), thus leading to poor performance. In this paper, we present DBSTexC, a new density-based clustering algorithm using spatio-textual information on Twitter. We first define POI-relevant and POI-irrelevant tweets as the records that contain and do not contain a POI name or its coherent variations, respectively. By taking into account the fractions of POI-relevant and POI-irrelevant tweets, our DBSTexC algorithm shows a much higher clustering quality than the DBSCAN case in terms of the F1 score and its variants. DBSTexC can be thought of as a generalized version of DBSCAN due to the findings that it performs identically as DBSCAN when the inputs are homogeneous and far outperforms DBSCAN when the heterogeneous input data type is given.
|Title of host publication||Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017|
|Editors||Jana Diesner, Elena Ferrari, Guandong Xu|
|Publisher||Association for Computing Machinery, Inc|
|Number of pages||4|
|Publication status||Published - 2017 Jul 31|
|Event||9th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017 - Sydney, Australia|
Duration: 2017 Jul 31 → 2017 Aug 3
|Name||Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017|
|Other||9th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017|
|Period||17/7/31 → 17/8/3|
Bibliographical noteFunding Information:
tion of Korea (NRF) funded by the Ministry of Education (2017R1D1A1A09000835) and by the Ministry of Science, ICT & Future Planning (MSIP) (2015R1A2A1A15054248).
© 2017 Association for Computing Machinery.
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems