Investigation into the existence of the indexer effect in key phrase extraction

Jung Eun Hahm, Su Yeon Kim, Meen Chul Kim, Min Song

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Introduction. The indexer effect has been studied in several research studies in the field of information science to reveal intellectual structures. In this study, we bring that concept into document classification to verify whether it also influences the results in key phrase extraction. Method. We employ the well-known key phrase extraction technique called the key phrase extraction algorithm for our study. In particular, we extract key phrases from three different datasets: 1) papers in the same journal, 2) papers from different journals in the same field, and 3) papers from journals in different fields. All of these datasets provide keywords and index terms which we used as training data for the algorithm. Analysis. For evaluation, we compare the difference in the performance of key phrases between two groups of key phrases that were extracted using the algorithm: 1) those that used author-provided keywords for the training set, and 2) those that used indexer-assigned index terms for the training set. We analyse those two groups of extracted key phrases in terms of exact (100%) and fair (70%) matching, which is based on the average number of key phrases extracted correctly per document. Results. We conclude that automatic key phrase extraction based on index terms performs better than its counterpart based on author-provided keywords in most cases. However, it also reveals that indexers tend to assign terms more inconsistently. Conclusions. The findings of the study provide some insights into making use of index terms as training data in key phrase extraction. On the other hand, it should be also noted that automatically extracted key phrases might lead users to irrelevant documents in information retrieval.

Original languageEnglish
JournalInformation Research
Volume18
Issue number4
Publication statusPublished - 2013 Dec 20

Fingerprint

information retrieval
information science
Group
evaluation
performance

All Science Journal Classification (ASJC) codes

  • Library and Information Sciences

Cite this

Hahm, Jung Eun ; Kim, Su Yeon ; Kim, Meen Chul ; Song, Min. / Investigation into the existence of the indexer effect in key phrase extraction. In: Information Research. 2013 ; Vol. 18, No. 4.
@article{be8ba07ede4d4446a3e10602e7db5a3e,
title = "Investigation into the existence of the indexer effect in key phrase extraction",
abstract = "Introduction. The indexer effect has been studied in several research studies in the field of information science to reveal intellectual structures. In this study, we bring that concept into document classification to verify whether it also influences the results in key phrase extraction. Method. We employ the well-known key phrase extraction technique called the key phrase extraction algorithm for our study. In particular, we extract key phrases from three different datasets: 1) papers in the same journal, 2) papers from different journals in the same field, and 3) papers from journals in different fields. All of these datasets provide keywords and index terms which we used as training data for the algorithm. Analysis. For evaluation, we compare the difference in the performance of key phrases between two groups of key phrases that were extracted using the algorithm: 1) those that used author-provided keywords for the training set, and 2) those that used indexer-assigned index terms for the training set. We analyse those two groups of extracted key phrases in terms of exact (100{\%}) and fair (70{\%}) matching, which is based on the average number of key phrases extracted correctly per document. Results. We conclude that automatic key phrase extraction based on index terms performs better than its counterpart based on author-provided keywords in most cases. However, it also reveals that indexers tend to assign terms more inconsistently. Conclusions. The findings of the study provide some insights into making use of index terms as training data in key phrase extraction. On the other hand, it should be also noted that automatically extracted key phrases might lead users to irrelevant documents in information retrieval.",
author = "Hahm, {Jung Eun} and Kim, {Su Yeon} and Kim, {Meen Chul} and Min Song",
year = "2013",
month = "12",
day = "20",
language = "English",
volume = "18",
journal = "Information Research",
issn = "1368-1613",
publisher = "Thomas Daniel Wilson",
number = "4",

}

Investigation into the existence of the indexer effect in key phrase extraction. / Hahm, Jung Eun; Kim, Su Yeon; Kim, Meen Chul; Song, Min.

In: Information Research, Vol. 18, No. 4, 20.12.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Investigation into the existence of the indexer effect in key phrase extraction

AU - Hahm, Jung Eun

AU - Kim, Su Yeon

AU - Kim, Meen Chul

AU - Song, Min

PY - 2013/12/20

Y1 - 2013/12/20

N2 - Introduction. The indexer effect has been studied in several research studies in the field of information science to reveal intellectual structures. In this study, we bring that concept into document classification to verify whether it also influences the results in key phrase extraction. Method. We employ the well-known key phrase extraction technique called the key phrase extraction algorithm for our study. In particular, we extract key phrases from three different datasets: 1) papers in the same journal, 2) papers from different journals in the same field, and 3) papers from journals in different fields. All of these datasets provide keywords and index terms which we used as training data for the algorithm. Analysis. For evaluation, we compare the difference in the performance of key phrases between two groups of key phrases that were extracted using the algorithm: 1) those that used author-provided keywords for the training set, and 2) those that used indexer-assigned index terms for the training set. We analyse those two groups of extracted key phrases in terms of exact (100%) and fair (70%) matching, which is based on the average number of key phrases extracted correctly per document. Results. We conclude that automatic key phrase extraction based on index terms performs better than its counterpart based on author-provided keywords in most cases. However, it also reveals that indexers tend to assign terms more inconsistently. Conclusions. The findings of the study provide some insights into making use of index terms as training data in key phrase extraction. On the other hand, it should be also noted that automatically extracted key phrases might lead users to irrelevant documents in information retrieval.

AB - Introduction. The indexer effect has been studied in several research studies in the field of information science to reveal intellectual structures. In this study, we bring that concept into document classification to verify whether it also influences the results in key phrase extraction. Method. We employ the well-known key phrase extraction technique called the key phrase extraction algorithm for our study. In particular, we extract key phrases from three different datasets: 1) papers in the same journal, 2) papers from different journals in the same field, and 3) papers from journals in different fields. All of these datasets provide keywords and index terms which we used as training data for the algorithm. Analysis. For evaluation, we compare the difference in the performance of key phrases between two groups of key phrases that were extracted using the algorithm: 1) those that used author-provided keywords for the training set, and 2) those that used indexer-assigned index terms for the training set. We analyse those two groups of extracted key phrases in terms of exact (100%) and fair (70%) matching, which is based on the average number of key phrases extracted correctly per document. Results. We conclude that automatic key phrase extraction based on index terms performs better than its counterpart based on author-provided keywords in most cases. However, it also reveals that indexers tend to assign terms more inconsistently. Conclusions. The findings of the study provide some insights into making use of index terms as training data in key phrase extraction. On the other hand, it should be also noted that automatically extracted key phrases might lead users to irrelevant documents in information retrieval.

UR - http://www.scopus.com/inward/record.url?scp=84890345969&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890345969&partnerID=8YFLogxK

M3 - Article

VL - 18

JO - Information Research

JF - Information Research

SN - 1368-1613

IS - 4

ER -