Automatic categorization of query results

Kaushik Chakrabarti, Surajit Chaudhuri, Seung Won Hwang

Research output: Contribution to journalConference article

54 Citations (Scopus)

Abstract

Exploratory ad-hoc queries could return too many answers - a phenomenon commonly referred to as "information overload". In this paper, we propose to automatically categorize the results of SQL queries to address this problem. We dynamically generate a labeled, hierarchical category structure - users can determine whether a category is relevant or not by examining simply its label; she can then explore just the relevant categories and ignore the remaining ones, thereby reducing information overload. We first develop analytical models to estimate information overload faced by a user for a given exploration. Based on those models, we formulate the categorization problem as a cost optimization problem and develop heuristic algorithms to compute the min-cost categorization.

Original languageEnglish
Pages (from-to)755-766
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
Publication statusPublished - 2004 Jul 27
EventProceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004 - Paris, France
Duration: 2004 Jun 132004 Jun 18

Fingerprint

Heuristic algorithms
Labels
Costs
Analytical models

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

@article{545383862f0c4ac195f008e759c42c72,
title = "Automatic categorization of query results",
abstract = "Exploratory ad-hoc queries could return too many answers - a phenomenon commonly referred to as {"}information overload{"}. In this paper, we propose to automatically categorize the results of SQL queries to address this problem. We dynamically generate a labeled, hierarchical category structure - users can determine whether a category is relevant or not by examining simply its label; she can then explore just the relevant categories and ignore the remaining ones, thereby reducing information overload. We first develop analytical models to estimate information overload faced by a user for a given exploration. Based on those models, we formulate the categorization problem as a cost optimization problem and develop heuristic algorithms to compute the min-cost categorization.",
author = "Kaushik Chakrabarti and Surajit Chaudhuri and Hwang, {Seung Won}",
year = "2004",
month = "7",
day = "27",
language = "English",
pages = "755--766",
journal = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
issn = "0730-8078",
publisher = "Association for Computing Machinery (ACM)",

}

Automatic categorization of query results. / Chakrabarti, Kaushik; Chaudhuri, Surajit; Hwang, Seung Won.

In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 27.07.2004, p. 755-766.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Automatic categorization of query results

AU - Chakrabarti, Kaushik

AU - Chaudhuri, Surajit

AU - Hwang, Seung Won

PY - 2004/7/27

Y1 - 2004/7/27

N2 - Exploratory ad-hoc queries could return too many answers - a phenomenon commonly referred to as "information overload". In this paper, we propose to automatically categorize the results of SQL queries to address this problem. We dynamically generate a labeled, hierarchical category structure - users can determine whether a category is relevant or not by examining simply its label; she can then explore just the relevant categories and ignore the remaining ones, thereby reducing information overload. We first develop analytical models to estimate information overload faced by a user for a given exploration. Based on those models, we formulate the categorization problem as a cost optimization problem and develop heuristic algorithms to compute the min-cost categorization.

AB - Exploratory ad-hoc queries could return too many answers - a phenomenon commonly referred to as "information overload". In this paper, we propose to automatically categorize the results of SQL queries to address this problem. We dynamically generate a labeled, hierarchical category structure - users can determine whether a category is relevant or not by examining simply its label; she can then explore just the relevant categories and ignore the remaining ones, thereby reducing information overload. We first develop analytical models to estimate information overload faced by a user for a given exploration. Based on those models, we formulate the categorization problem as a cost optimization problem and develop heuristic algorithms to compute the min-cost categorization.

UR - http://www.scopus.com/inward/record.url?scp=3142727860&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3142727860&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:3142727860

SP - 755

EP - 766

JO - Proceedings of the ACM SIGMOD International Conference on Management of Data

JF - Proceedings of the ACM SIGMOD International Conference on Management of Data

SN - 0730-8078

ER -