Efficient text proximity search

Ralf Schenkel, Andreas Broschart, Seungwon Hwang, Martin Theobald, Gerhard Weikum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Citations (Scopus)

Abstract

In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings
Pages287-299
Number of pages13
Publication statusPublished - 2007 Dec 1
Event14th International Symposium on String Processing and Information Retrieval, SPIRE 2007 - Santiago, Chile
Duration: 2007 Oct 292007 Oct 31

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4726 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other14th International Symposium on String Processing and Information Retrieval, SPIRE 2007
CountryChile
CitySantiago
Period07/10/2907/10/31

Fingerprint

Proximity
Retrieval
Scoring
Evaluation
Engines
Text Retrieval
Term
Pruning
Engine
Experiments
Query
Benchmark
Text
Experiment
Framework
Model

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Schenkel, R., Broschart, A., Hwang, S., Theobald, M., & Weikum, G. (2007). Efficient text proximity search. In String Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings (pp. 287-299). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4726 LNCS).
Schenkel, Ralf ; Broschart, Andreas ; Hwang, Seungwon ; Theobald, Martin ; Weikum, Gerhard. / Efficient text proximity search. String Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings. 2007. pp. 287-299 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{0cdc33d357334134bef482eaf0b77a4b,
title = "Efficient text proximity search",
abstract = "In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.",
author = "Ralf Schenkel and Andreas Broschart and Seungwon Hwang and Martin Theobald and Gerhard Weikum",
year = "2007",
month = "12",
day = "1",
language = "English",
isbn = "9783540755296",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "287--299",
booktitle = "String Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings",

}

Schenkel, R, Broschart, A, Hwang, S, Theobald, M & Weikum, G 2007, Efficient text proximity search. in String Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4726 LNCS, pp. 287-299, 14th International Symposium on String Processing and Information Retrieval, SPIRE 2007, Santiago, Chile, 07/10/29.

Efficient text proximity search. / Schenkel, Ralf; Broschart, Andreas; Hwang, Seungwon; Theobald, Martin; Weikum, Gerhard.

String Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings. 2007. p. 287-299 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4726 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Efficient text proximity search

AU - Schenkel, Ralf

AU - Broschart, Andreas

AU - Hwang, Seungwon

AU - Theobald, Martin

AU - Weikum, Gerhard

PY - 2007/12/1

Y1 - 2007/12/1

N2 - In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.

AB - In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.

UR - http://www.scopus.com/inward/record.url?scp=38049093465&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38049093465&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:38049093465

SN - 9783540755296

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 287

EP - 299

BT - String Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings

ER -

Schenkel R, Broschart A, Hwang S, Theobald M, Weikum G. Efficient text proximity search. In String Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings. 2007. p. 287-299. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).