Automatic semantic mapping between query terms and controlled vocabulary through using wordnet and wikipedia

Xiaozhong Liu, Jian Qin, Miao Chen, Ji-Hong Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Query log analysis can provide valuable information for improving information retrieval performance. This paper reports findings from a query log mining project, in which query terms falling in the very long tail of low to zero similarity (with the controlled vocabulary) scores were analyzed by using similarity algorithms. The query log data was collected from the Gateway to Educational Materials (GEM). The limited number of terms in the GEM controlled vocabulary was a major source for the long tail of low or zero similarity scores for the query terms. To mitigate this limitation, we employed a strategy that involved using the general-purpose (domain-independent) ontology WordNet and community-created Wikipedia as the bridge to establish semantic relatedness between GEM controlled vocabulary (as well as new concept classes identified by human experts) and user query terms. The two sources, WordNet and Wikipedia, were complementary in mapping different types of query terms. A combination of both sources achieved a modest rate of mapping accuracy. The paper discussed the implications of the findings for automatic semantic analysis and vocabulary development and validation.

Original languageEnglish
Title of host publicationASIST 2008
Subtitle of host publicationProceedings of the 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People
Volume45
Publication statusPublished - 2008 Dec 1
EventASIST 2008: 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People - Columbus, OH, United States
Duration: 2008 Oct 242008 Oct 29

Other

OtherASIST 2008: 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People
CountryUnited States
CityColumbus, OH
Period08/10/2408/10/29

Fingerprint

Thesauri
Wikipedia
vocabulary
Semantics
semantics
class concept
Information retrieval
Ontology
information retrieval
ontology
expert
community
performance

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Library and Information Sciences

Cite this

Liu, X., Qin, J., Chen, M., & Park, J-H. (2008). Automatic semantic mapping between query terms and controlled vocabulary through using wordnet and wikipedia. In ASIST 2008: Proceedings of the 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People (Vol. 45)
Liu, Xiaozhong ; Qin, Jian ; Chen, Miao ; Park, Ji-Hong. / Automatic semantic mapping between query terms and controlled vocabulary through using wordnet and wikipedia. ASIST 2008: Proceedings of the 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People. Vol. 45 2008.
@inproceedings{c41d4c09f7694a9ab3fabbded11b1c3f,
title = "Automatic semantic mapping between query terms and controlled vocabulary through using wordnet and wikipedia",
abstract = "Query log analysis can provide valuable information for improving information retrieval performance. This paper reports findings from a query log mining project, in which query terms falling in the very long tail of low to zero similarity (with the controlled vocabulary) scores were analyzed by using similarity algorithms. The query log data was collected from the Gateway to Educational Materials (GEM). The limited number of terms in the GEM controlled vocabulary was a major source for the long tail of low or zero similarity scores for the query terms. To mitigate this limitation, we employed a strategy that involved using the general-purpose (domain-independent) ontology WordNet and community-created Wikipedia as the bridge to establish semantic relatedness between GEM controlled vocabulary (as well as new concept classes identified by human experts) and user query terms. The two sources, WordNet and Wikipedia, were complementary in mapping different types of query terms. A combination of both sources achieved a modest rate of mapping accuracy. The paper discussed the implications of the findings for automatic semantic analysis and vocabulary development and validation.",
author = "Xiaozhong Liu and Jian Qin and Miao Chen and Ji-Hong Park",
year = "2008",
month = "12",
day = "1",
language = "English",
isbn = "0877155402",
volume = "45",
booktitle = "ASIST 2008",

}

Liu, X, Qin, J, Chen, M & Park, J-H 2008, Automatic semantic mapping between query terms and controlled vocabulary through using wordnet and wikipedia. in ASIST 2008: Proceedings of the 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People. vol. 45, ASIST 2008: 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People, Columbus, OH, United States, 08/10/24.

Automatic semantic mapping between query terms and controlled vocabulary through using wordnet and wikipedia. / Liu, Xiaozhong; Qin, Jian; Chen, Miao; Park, Ji-Hong.

ASIST 2008: Proceedings of the 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People. Vol. 45 2008.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic semantic mapping between query terms and controlled vocabulary through using wordnet and wikipedia

AU - Liu, Xiaozhong

AU - Qin, Jian

AU - Chen, Miao

AU - Park, Ji-Hong

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Query log analysis can provide valuable information for improving information retrieval performance. This paper reports findings from a query log mining project, in which query terms falling in the very long tail of low to zero similarity (with the controlled vocabulary) scores were analyzed by using similarity algorithms. The query log data was collected from the Gateway to Educational Materials (GEM). The limited number of terms in the GEM controlled vocabulary was a major source for the long tail of low or zero similarity scores for the query terms. To mitigate this limitation, we employed a strategy that involved using the general-purpose (domain-independent) ontology WordNet and community-created Wikipedia as the bridge to establish semantic relatedness between GEM controlled vocabulary (as well as new concept classes identified by human experts) and user query terms. The two sources, WordNet and Wikipedia, were complementary in mapping different types of query terms. A combination of both sources achieved a modest rate of mapping accuracy. The paper discussed the implications of the findings for automatic semantic analysis and vocabulary development and validation.

AB - Query log analysis can provide valuable information for improving information retrieval performance. This paper reports findings from a query log mining project, in which query terms falling in the very long tail of low to zero similarity (with the controlled vocabulary) scores were analyzed by using similarity algorithms. The query log data was collected from the Gateway to Educational Materials (GEM). The limited number of terms in the GEM controlled vocabulary was a major source for the long tail of low or zero similarity scores for the query terms. To mitigate this limitation, we employed a strategy that involved using the general-purpose (domain-independent) ontology WordNet and community-created Wikipedia as the bridge to establish semantic relatedness between GEM controlled vocabulary (as well as new concept classes identified by human experts) and user query terms. The two sources, WordNet and Wikipedia, were complementary in mapping different types of query terms. A combination of both sources achieved a modest rate of mapping accuracy. The paper discussed the implications of the findings for automatic semantic analysis and vocabulary development and validation.

UR - http://www.scopus.com/inward/record.url?scp=71949122187&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=71949122187&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0877155402

SN - 9780877155409

VL - 45

BT - ASIST 2008

ER -

Liu X, Qin J, Chen M, Park J-H. Automatic semantic mapping between query terms and controlled vocabulary through using wordnet and wikipedia. In ASIST 2008: Proceedings of the 71st ASIST Annual Meeting: People Transforming Information - Information Transforming People. Vol. 45. 2008