SocialSearch+: Enriching social network with web evidences

Gae won You, Jin woo Park, Seung won Hwang, Zaiqing Nie, Ji Rong Wen

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

This paper introduces the problem of searching for social network accounts, e.g., Twitter accounts, with the rich information available on the Web, e.g., people names, attributes, and relationships to other people. For this purpose, we need to map Twitter accounts with Web entities. However, existing solutions building upon naive textual matching inevitably suffer low precision due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage "relational" evidences extracted from the Web corpus. We consider two types of evidence resources-First, web-scale entity relationship graphs, extracted from name co-occurrences crawled from the Web. This co-occurrence relationship can be interpreted as an "implicit" counterpart of Twitter follower relationships. Second, web-scale relational repositories, such as Freebase with complementary strength. Using both textual and relational features obtained from these resources, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate that our approach outperforms the baseline approach. We evaluate our proposed system using real-life internet-scale entity-relationship and social network graphs.

Original languageEnglish
Pages (from-to)701-727
Number of pages27
JournalWorld Wide Web
Volume16
Issue number5-6
DOIs
Publication statusPublished - 2013 Nov 1

Fingerprint

Classifiers
Internet

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

You, Gae won ; Park, Jin woo ; Hwang, Seung won ; Nie, Zaiqing ; Wen, Ji Rong. / SocialSearch+ : Enriching social network with web evidences. In: World Wide Web. 2013 ; Vol. 16, No. 5-6. pp. 701-727.
@article{a3f35fd4cd084d25a595d6a07955e440,
title = "SocialSearch+: Enriching social network with web evidences",
abstract = "This paper introduces the problem of searching for social network accounts, e.g., Twitter accounts, with the rich information available on the Web, e.g., people names, attributes, and relationships to other people. For this purpose, we need to map Twitter accounts with Web entities. However, existing solutions building upon naive textual matching inevitably suffer low precision due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage {"}relational{"} evidences extracted from the Web corpus. We consider two types of evidence resources-First, web-scale entity relationship graphs, extracted from name co-occurrences crawled from the Web. This co-occurrence relationship can be interpreted as an {"}implicit{"} counterpart of Twitter follower relationships. Second, web-scale relational repositories, such as Freebase with complementary strength. Using both textual and relational features obtained from these resources, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate that our approach outperforms the baseline approach. We evaluate our proposed system using real-life internet-scale entity-relationship and social network graphs.",
author = "You, {Gae won} and Park, {Jin woo} and Hwang, {Seung won} and Zaiqing Nie and Wen, {Ji Rong}",
year = "2013",
month = "11",
day = "1",
doi = "10.1007/s11280-012-0165-5",
language = "English",
volume = "16",
pages = "701--727",
journal = "World Wide Web",
issn = "1386-145X",
publisher = "Springer New York",
number = "5-6",

}

SocialSearch+ : Enriching social network with web evidences. / You, Gae won; Park, Jin woo; Hwang, Seung won; Nie, Zaiqing; Wen, Ji Rong.

In: World Wide Web, Vol. 16, No. 5-6, 01.11.2013, p. 701-727.

Research output: Contribution to journalArticle

TY - JOUR

T1 - SocialSearch+

T2 - Enriching social network with web evidences

AU - You, Gae won

AU - Park, Jin woo

AU - Hwang, Seung won

AU - Nie, Zaiqing

AU - Wen, Ji Rong

PY - 2013/11/1

Y1 - 2013/11/1

N2 - This paper introduces the problem of searching for social network accounts, e.g., Twitter accounts, with the rich information available on the Web, e.g., people names, attributes, and relationships to other people. For this purpose, we need to map Twitter accounts with Web entities. However, existing solutions building upon naive textual matching inevitably suffer low precision due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage "relational" evidences extracted from the Web corpus. We consider two types of evidence resources-First, web-scale entity relationship graphs, extracted from name co-occurrences crawled from the Web. This co-occurrence relationship can be interpreted as an "implicit" counterpart of Twitter follower relationships. Second, web-scale relational repositories, such as Freebase with complementary strength. Using both textual and relational features obtained from these resources, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate that our approach outperforms the baseline approach. We evaluate our proposed system using real-life internet-scale entity-relationship and social network graphs.

AB - This paper introduces the problem of searching for social network accounts, e.g., Twitter accounts, with the rich information available on the Web, e.g., people names, attributes, and relationships to other people. For this purpose, we need to map Twitter accounts with Web entities. However, existing solutions building upon naive textual matching inevitably suffer low precision due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage "relational" evidences extracted from the Web corpus. We consider two types of evidence resources-First, web-scale entity relationship graphs, extracted from name co-occurrences crawled from the Web. This co-occurrence relationship can be interpreted as an "implicit" counterpart of Twitter follower relationships. Second, web-scale relational repositories, such as Freebase with complementary strength. Using both textual and relational features obtained from these resources, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate that our approach outperforms the baseline approach. We evaluate our proposed system using real-life internet-scale entity-relationship and social network graphs.

UR - http://www.scopus.com/inward/record.url?scp=84885938830&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885938830&partnerID=8YFLogxK

U2 - 10.1007/s11280-012-0165-5

DO - 10.1007/s11280-012-0165-5

M3 - Article

AN - SCOPUS:84885938830

VL - 16

SP - 701

EP - 727

JO - World Wide Web

JF - World Wide Web

SN - 1386-145X

IS - 5-6

ER -