Hybrid entity clustering using crowds and data

Jongwuk Lee, Hyunsouk Cho, Jin Woo Park, Young rok Cha, Seung won Hwang, Zaiqing Nie, Ji Rong Wen

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Query result clustering has attracted considerable attention as a means of providing users with a concise overview of results. However, little research effort has been devoted to organizing the query results for entities which refer to real-world concepts, e.g., people, products, and locations. Entity-level result clustering is more challenging because diverse similarity notions between entities need to be supported in heterogeneous domains, e.g., image resolution is an important feature for cameras, but not for fruits. To address this challenge, we propose a hybrid relationship clustering algorithm, called Hydra, using co-occurrence and numeric features. Algorithm Hydra captures diverse user perceptions from co-occurrence and disambiguates different senses using feature-based similarity. In addition, we extend Hydra into HydragData with different sources, i.e., entity types and crowdsourcing. Experimental results show that the proposed algorithms achieve effectiveness and efficiency in real-life and synthetic datasets.

Original languageEnglish
Pages (from-to)711-726
Number of pages16
JournalVLDB Journal
Volume22
Issue number5
DOIs
Publication statusPublished - 2013 Oct 1

Fingerprint

Image resolution
Fruits
Clustering algorithms
Cameras

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Hardware and Architecture

Cite this

Lee, J., Cho, H., Park, J. W., Cha, Y. R., Hwang, S. W., Nie, Z., & Wen, J. R. (2013). Hybrid entity clustering using crowds and data. VLDB Journal, 22(5), 711-726. https://doi.org/10.1007/s00778-013-0328-8
Lee, Jongwuk ; Cho, Hyunsouk ; Park, Jin Woo ; Cha, Young rok ; Hwang, Seung won ; Nie, Zaiqing ; Wen, Ji Rong. / Hybrid entity clustering using crowds and data. In: VLDB Journal. 2013 ; Vol. 22, No. 5. pp. 711-726.
@article{5249950022464aa5b84300226bdcf208,
title = "Hybrid entity clustering using crowds and data",
abstract = "Query result clustering has attracted considerable attention as a means of providing users with a concise overview of results. However, little research effort has been devoted to organizing the query results for entities which refer to real-world concepts, e.g., people, products, and locations. Entity-level result clustering is more challenging because diverse similarity notions between entities need to be supported in heterogeneous domains, e.g., image resolution is an important feature for cameras, but not for fruits. To address this challenge, we propose a hybrid relationship clustering algorithm, called Hydra, using co-occurrence and numeric features. Algorithm Hydra captures diverse user perceptions from co-occurrence and disambiguates different senses using feature-based similarity. In addition, we extend Hydra into HydragData with different sources, i.e., entity types and crowdsourcing. Experimental results show that the proposed algorithms achieve effectiveness and efficiency in real-life and synthetic datasets.",
author = "Jongwuk Lee and Hyunsouk Cho and Park, {Jin Woo} and Cha, {Young rok} and Hwang, {Seung won} and Zaiqing Nie and Wen, {Ji Rong}",
year = "2013",
month = "10",
day = "1",
doi = "10.1007/s00778-013-0328-8",
language = "English",
volume = "22",
pages = "711--726",
journal = "VLDB Journal",
issn = "1066-8888",
publisher = "Springer New York",
number = "5",

}

Lee, J, Cho, H, Park, JW, Cha, YR, Hwang, SW, Nie, Z & Wen, JR 2013, 'Hybrid entity clustering using crowds and data', VLDB Journal, vol. 22, no. 5, pp. 711-726. https://doi.org/10.1007/s00778-013-0328-8

Hybrid entity clustering using crowds and data. / Lee, Jongwuk; Cho, Hyunsouk; Park, Jin Woo; Cha, Young rok; Hwang, Seung won; Nie, Zaiqing; Wen, Ji Rong.

In: VLDB Journal, Vol. 22, No. 5, 01.10.2013, p. 711-726.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Hybrid entity clustering using crowds and data

AU - Lee, Jongwuk

AU - Cho, Hyunsouk

AU - Park, Jin Woo

AU - Cha, Young rok

AU - Hwang, Seung won

AU - Nie, Zaiqing

AU - Wen, Ji Rong

PY - 2013/10/1

Y1 - 2013/10/1

N2 - Query result clustering has attracted considerable attention as a means of providing users with a concise overview of results. However, little research effort has been devoted to organizing the query results for entities which refer to real-world concepts, e.g., people, products, and locations. Entity-level result clustering is more challenging because diverse similarity notions between entities need to be supported in heterogeneous domains, e.g., image resolution is an important feature for cameras, but not for fruits. To address this challenge, we propose a hybrid relationship clustering algorithm, called Hydra, using co-occurrence and numeric features. Algorithm Hydra captures diverse user perceptions from co-occurrence and disambiguates different senses using feature-based similarity. In addition, we extend Hydra into HydragData with different sources, i.e., entity types and crowdsourcing. Experimental results show that the proposed algorithms achieve effectiveness and efficiency in real-life and synthetic datasets.

AB - Query result clustering has attracted considerable attention as a means of providing users with a concise overview of results. However, little research effort has been devoted to organizing the query results for entities which refer to real-world concepts, e.g., people, products, and locations. Entity-level result clustering is more challenging because diverse similarity notions between entities need to be supported in heterogeneous domains, e.g., image resolution is an important feature for cameras, but not for fruits. To address this challenge, we propose a hybrid relationship clustering algorithm, called Hydra, using co-occurrence and numeric features. Algorithm Hydra captures diverse user perceptions from co-occurrence and disambiguates different senses using feature-based similarity. In addition, we extend Hydra into HydragData with different sources, i.e., entity types and crowdsourcing. Experimental results show that the proposed algorithms achieve effectiveness and efficiency in real-life and synthetic datasets.

UR - http://www.scopus.com/inward/record.url?scp=84884590588&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84884590588&partnerID=8YFLogxK

U2 - 10.1007/s00778-013-0328-8

DO - 10.1007/s00778-013-0328-8

M3 - Article

AN - SCOPUS:84884590588

VL - 22

SP - 711

EP - 726

JO - VLDB Journal

JF - VLDB Journal

SN - 1066-8888

IS - 5

ER -

Lee J, Cho H, Park JW, Cha YR, Hwang SW, Nie Z et al. Hybrid entity clustering using crowds and data. VLDB Journal. 2013 Oct 1;22(5):711-726. https://doi.org/10.1007/s00778-013-0328-8