The discovery of community structures in social networks has gained significant attention since it is a fundamental problem in understanding the networks' topology and functions. However, most social network data are collected from partially observable networks with both missing nodes and edges. In this article, we address a new problem of detecting overlapping community structures in the context of such an incomplete network, where communities in the network are allowed to overlap since nodes belong to multiple communities at once. To solve this problem, we introduce KroMFac, a new framework that conducts community detection via regularized nonnegative matrix factorization (NMF) based on the Kronecker graph model. Specifically, from an inferred Kronecker generative parameter matrix, we first estimate the missing part of the network. As our major contribution to the proposed framework, to improve community detection accuracy, we then characterize and select influential nodes (which tend to have high degrees) by ranking, and add them to the existing graph. Finally, we uncover the community structures by solving the regularized NMF-aided optimization problem in terms of maximizing the likelihood of the underlying graph. Furthermore, adopting normalized mutual information (NMI), we empirically show superiority of our KroMFac approach over two baseline schemes by using both synthetic and real-world networks.
|Journal||ACM Transactions on Knowledge Discovery from Data|
|Publication status||Published - 2022 Apr|
Bibliographical noteFunding Information:
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A2C3004345), by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (HI20C0127), and by the Yonsei University Research Fund of 2021 (2021-22-0083). Won-Yong Shin is the corresponding author. Authors’ addresses: C. Tran, Department of Computer Science and Engineering, Dankook University, Yongin, Republic of Korea, Machine Intelligence and Data Science Laboratory, Yonsei University, Seoul 03722, Republic of Korea; email: firstname.lastname@example.org; W.-Y. Shin, School of Mathematics and Computing (Computational Science and Engineering), Yonsei University, Seoul 03722, Republic of Korea; email: email@example.com; A. Spitz, Department of Computer and Information Science, University of Konstanz, Konstanz 78467, Germany; email: firstname.lastname@example.org. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from email@example.com. © 2021 Association for Computing Machinery. 1556-4681/2021/07-ART22 $15.00 https://doi.org/10.1145/3461339
© 2021 Association for Computing Machinery.
All Science Journal Classification (ASJC) codes
- Computer Science(all)