Keyphrase extraction-based query expansion in digital libraries

Min Song, Il Yeol Song, Robert B. Allen, Zoran Obradovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

In pseudo-relevance feedback, the two key factors affecting the retrieval performance most are the source from which expansion terms are generated and the method of ranking those expansion terms. In this paper, we present a novel unsupervised query expansion technique that utilizes keyphrases and POS phrase categorization. The keyphrases are extracted from the retrieved documents and weighted with an algorithm based on information gain and co-occurrence of phrases. The selected keyphrases are translated into Disjunctive Normal Form (DNF) based on the POS phrase categorization technique for better query refomulation. Furthermore, we study whether ontologies such as WordNet and MeSH improve the retrieval performance in conjunction with the keyphrases. We test our techniques on TREC 5, 6, and 7 as well as a MEDLINE collection. The experimental results show that the use of keyphrases with POS phrase categorization produces the best average precision.

Original languageEnglish
Title of host publication6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006
Subtitle of host publicationOpening Information Horizons, JCDL '06
Pages202-209
Number of pages8
DOIs
Publication statusPublished - 2006 Dec 1
Event6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06 - Chapel Hill, NC, United States
Duration: 2006 Jun 112006 Jun 15

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2006
ISSN (Print)1552-5996

Other

Other6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06
CountryUnited States
CityChapel Hill, NC
Period06/6/1106/6/15

Fingerprint

Digital libraries
Ontology
Feedback

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Song, M., Song, I. Y., Allen, R. B., & Obradovic, Z. (2006). Keyphrase extraction-based query expansion in digital libraries. In 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06 (pp. 202-209). (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; Vol. 2006). https://doi.org/10.1145/1141753.1141800
Song, Min ; Song, Il Yeol ; Allen, Robert B. ; Obradovic, Zoran. / Keyphrase extraction-based query expansion in digital libraries. 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06. 2006. pp. 202-209 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).
@inproceedings{e30b71caab30431abc5055b0688f72c3,
title = "Keyphrase extraction-based query expansion in digital libraries",
abstract = "In pseudo-relevance feedback, the two key factors affecting the retrieval performance most are the source from which expansion terms are generated and the method of ranking those expansion terms. In this paper, we present a novel unsupervised query expansion technique that utilizes keyphrases and POS phrase categorization. The keyphrases are extracted from the retrieved documents and weighted with an algorithm based on information gain and co-occurrence of phrases. The selected keyphrases are translated into Disjunctive Normal Form (DNF) based on the POS phrase categorization technique for better query refomulation. Furthermore, we study whether ontologies such as WordNet and MeSH improve the retrieval performance in conjunction with the keyphrases. We test our techniques on TREC 5, 6, and 7 as well as a MEDLINE collection. The experimental results show that the use of keyphrases with POS phrase categorization produces the best average precision.",
author = "Min Song and Song, {Il Yeol} and Allen, {Robert B.} and Zoran Obradovic",
year = "2006",
month = "12",
day = "1",
doi = "10.1145/1141753.1141800",
language = "English",
isbn = "1595933549",
series = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",
pages = "202--209",
booktitle = "6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006",

}

Song, M, Song, IY, Allen, RB & Obradovic, Z 2006, Keyphrase extraction-based query expansion in digital libraries. in 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, vol. 2006, pp. 202-209, 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06, Chapel Hill, NC, United States, 06/6/11. https://doi.org/10.1145/1141753.1141800

Keyphrase extraction-based query expansion in digital libraries. / Song, Min; Song, Il Yeol; Allen, Robert B.; Obradovic, Zoran.

6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06. 2006. p. 202-209 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; Vol. 2006).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Keyphrase extraction-based query expansion in digital libraries

AU - Song, Min

AU - Song, Il Yeol

AU - Allen, Robert B.

AU - Obradovic, Zoran

PY - 2006/12/1

Y1 - 2006/12/1

N2 - In pseudo-relevance feedback, the two key factors affecting the retrieval performance most are the source from which expansion terms are generated and the method of ranking those expansion terms. In this paper, we present a novel unsupervised query expansion technique that utilizes keyphrases and POS phrase categorization. The keyphrases are extracted from the retrieved documents and weighted with an algorithm based on information gain and co-occurrence of phrases. The selected keyphrases are translated into Disjunctive Normal Form (DNF) based on the POS phrase categorization technique for better query refomulation. Furthermore, we study whether ontologies such as WordNet and MeSH improve the retrieval performance in conjunction with the keyphrases. We test our techniques on TREC 5, 6, and 7 as well as a MEDLINE collection. The experimental results show that the use of keyphrases with POS phrase categorization produces the best average precision.

AB - In pseudo-relevance feedback, the two key factors affecting the retrieval performance most are the source from which expansion terms are generated and the method of ranking those expansion terms. In this paper, we present a novel unsupervised query expansion technique that utilizes keyphrases and POS phrase categorization. The keyphrases are extracted from the retrieved documents and weighted with an algorithm based on information gain and co-occurrence of phrases. The selected keyphrases are translated into Disjunctive Normal Form (DNF) based on the POS phrase categorization technique for better query refomulation. Furthermore, we study whether ontologies such as WordNet and MeSH improve the retrieval performance in conjunction with the keyphrases. We test our techniques on TREC 5, 6, and 7 as well as a MEDLINE collection. The experimental results show that the use of keyphrases with POS phrase categorization produces the best average precision.

UR - http://www.scopus.com/inward/record.url?scp=34247260471&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247260471&partnerID=8YFLogxK

U2 - 10.1145/1141753.1141800

DO - 10.1145/1141753.1141800

M3 - Conference contribution

AN - SCOPUS:34247260471

SN - 1595933549

SN - 9781595933546

T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

SP - 202

EP - 209

BT - 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006

ER -

Song M, Song IY, Allen RB, Obradovic Z. Keyphrase extraction-based query expansion in digital libraries. In 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06. 2006. p. 202-209. (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). https://doi.org/10.1145/1141753.1141800