Hybrid entity clustering using crowds and data

Jongwuk Lee, Hyunsouk Cho, Jin Woo Park, Young rok Cha, Seung won Hwang, Zaiqing Nie, Ji Rong Wen

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)


Query result clustering has attracted considerable attention as a means of providing users with a concise overview of results. However, little research effort has been devoted to organizing the query results for entities which refer to real-world concepts, e.g., people, products, and locations. Entity-level result clustering is more challenging because diverse similarity notions between entities need to be supported in heterogeneous domains, e.g., image resolution is an important feature for cameras, but not for fruits. To address this challenge, we propose a hybrid relationship clustering algorithm, called Hydra, using co-occurrence and numeric features. Algorithm Hydra captures diverse user perceptions from co-occurrence and disambiguates different senses using feature-based similarity. In addition, we extend Hydra into HydragData with different sources, i.e., entity types and crowdsourcing. Experimental results show that the proposed algorithms achieve effectiveness and efficiency in real-life and synthetic datasets.

Original languageEnglish
Pages (from-to)711-726
Number of pages16
JournalVLDB Journal
Issue number5
Publication statusPublished - 2013 Oct

Bibliographical note

Funding Information:
This research was supported by the Ministry of Knowledge Economy (MKE), Korea and Microsoft Research, under IT/SW Creative research program supervised by the NIPA (National IT Industry Promotion Agency). (NIPA-2012-H0503-12-1036).

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Hardware and Architecture


Dive into the research topics of 'Hybrid entity clustering using crowds and data'. Together they form a unique fingerprint.

Cite this