TY - GEN
T1 - SocialSearch
T2 - 14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011
AU - You, Gae Won
AU - Hwang, Seung Won
AU - Nie, Zaiqing
AU - Wen, Ji Rong
PY - 2011
Y1 - 2011
N2 - This paper introduces the problem of matching people names to their corresponding social network identities such as their Twitter accounts. Existing tools for this purpose build upon naive textual matching and inevitably suffer low precision, due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage "relational" evidences extracted from the Web corpus. In particular, as such an example, weadopt Web document co-occurrences, which can be interpreted as an "implicit" counterpart of Twitter follower relationships. Using both textual and relational features, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate to outperform the baseline approach. We evaluate our proposed system using real-life internetscale entity-relationship and social network graphs.
AB - This paper introduces the problem of matching people names to their corresponding social network identities such as their Twitter accounts. Existing tools for this purpose build upon naive textual matching and inevitably suffer low precision, due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage "relational" evidences extracted from the Web corpus. In particular, as such an example, weadopt Web document co-occurrences, which can be interpreted as an "implicit" counterpart of Twitter follower relationships. Using both textual and relational features, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate to outperform the baseline approach. We evaluate our proposed system using real-life internetscale entity-relationship and social network graphs.
UR - http://www.scopus.com/inward/record.url?scp=79953885743&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953885743&partnerID=8YFLogxK
U2 - 10.1145/1951365.1951428
DO - 10.1145/1951365.1951428
M3 - Conference contribution
AN - SCOPUS:79953885743
SN - 9781450305280
T3 - ACM International Conference Proceeding Series
SP - 515
EP - 520
BT - Advances in Database Technology - EDBT 2011
Y2 - 22 March 2011 through 24 March 2011
ER -