Introduction. The indexer effect has been studied in several research studies in the field of information science to reveal intellectual structures. In this study, we bring that concept into document classification to verify whether it also influences the results in key phrase extraction. Method. We employ the well-known key phrase extraction technique called the key phrase extraction algorithm for our study. In particular, we extract key phrases from three different datasets: 1) papers in the same journal, 2) papers from different journals in the same field, and 3) papers from journals in different fields. All of these datasets provide keywords and index terms which we used as training data for the algorithm. Analysis. For evaluation, we compare the difference in the performance of key phrases between two groups of key phrases that were extracted using the algorithm: 1) those that used author-provided keywords for the training set, and 2) those that used indexer-assigned index terms for the training set. We analyse those two groups of extracted key phrases in terms of exact (100%) and fair (70%) matching, which is based on the average number of key phrases extracted correctly per document. Results. We conclude that automatic key phrase extraction based on index terms performs better than its counterpart based on author-provided keywords in most cases. However, it also reveals that indexers tend to assign terms more inconsistently. Conclusions. The findings of the study provide some insights into making use of index terms as training data in key phrase extraction. On the other hand, it should be also noted that automatically extracted key phrases might lead users to irrelevant documents in information retrieval.
|Publication status||Published - 2013 Dec 20|
All Science Journal Classification (ASJC) codes
- Library and Information Sciences