Combining supervised learning techniques to key-phrase extraction for biomedical full-text

Yanliang Qi, Min Song, Suk Chung Yoon, Lori Watrous-De Versterre

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Key-phrase extraction plays a useful a role in research areas of Information Systems (IS) like digital libraries. Short metadata like key phrases are beneficial for searchers to understand the concepts found in the documents. This paper evaluates the effectiveness of different supervised learning techniques on biomedical full-text: Sequential Minimal Optimization (SMO) and K-Nearest Neighbor, both of which could be embedded inside an information system for document search. The authors use these techniques to extract key phrases from PubMed and evaluate the performance of these systems using the holdout validation method. This paper compares different classifier techniques and performance differences between the full-text and it's abstract. Compared with the authors' previous work, which investigated the performance of Naïve Bayes, Linear Regression and SVM(reg1/2), this paper finds that SVMreg-1 performs best in key-phrase extraction for full-text, whereas Naïve Bayes performs best for abstracts. These techniques should be considered for use in information system search functionality. Additional research issues also are identified.

Original languageEnglish
Pages (from-to)33-44
Number of pages12
JournalInternational Journal of Intelligent Information Technologies
Volume7
Issue number1
DOIs
Publication statusPublished - 2011 Jan 1

Fingerprint

Supervised learning
Information systems
Digital libraries
Metadata
Linear regression
Classifiers

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Decision Sciences (miscellaneous)

Cite this

@article{043be1ae213f483bbd01d2441ff42218,
title = "Combining supervised learning techniques to key-phrase extraction for biomedical full-text",
abstract = "Key-phrase extraction plays a useful a role in research areas of Information Systems (IS) like digital libraries. Short metadata like key phrases are beneficial for searchers to understand the concepts found in the documents. This paper evaluates the effectiveness of different supervised learning techniques on biomedical full-text: Sequential Minimal Optimization (SMO) and K-Nearest Neighbor, both of which could be embedded inside an information system for document search. The authors use these techniques to extract key phrases from PubMed and evaluate the performance of these systems using the holdout validation method. This paper compares different classifier techniques and performance differences between the full-text and it's abstract. Compared with the authors' previous work, which investigated the performance of Na{\"i}ve Bayes, Linear Regression and SVM(reg1/2), this paper finds that SVMreg-1 performs best in key-phrase extraction for full-text, whereas Na{\"i}ve Bayes performs best for abstracts. These techniques should be considered for use in information system search functionality. Additional research issues also are identified.",
author = "Yanliang Qi and Min Song and Yoon, {Suk Chung} and {Watrous-De Versterre}, Lori",
year = "2011",
month = "1",
day = "1",
doi = "10.4018/jiit.2011010103",
language = "English",
volume = "7",
pages = "33--44",
journal = "International Journal of Intelligent Information Technologies",
issn = "1548-3657",
publisher = "IGI Publishing",
number = "1",

}

Combining supervised learning techniques to key-phrase extraction for biomedical full-text. / Qi, Yanliang; Song, Min; Yoon, Suk Chung; Watrous-De Versterre, Lori.

In: International Journal of Intelligent Information Technologies, Vol. 7, No. 1, 01.01.2011, p. 33-44.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Combining supervised learning techniques to key-phrase extraction for biomedical full-text

AU - Qi, Yanliang

AU - Song, Min

AU - Yoon, Suk Chung

AU - Watrous-De Versterre, Lori

PY - 2011/1/1

Y1 - 2011/1/1

N2 - Key-phrase extraction plays a useful a role in research areas of Information Systems (IS) like digital libraries. Short metadata like key phrases are beneficial for searchers to understand the concepts found in the documents. This paper evaluates the effectiveness of different supervised learning techniques on biomedical full-text: Sequential Minimal Optimization (SMO) and K-Nearest Neighbor, both of which could be embedded inside an information system for document search. The authors use these techniques to extract key phrases from PubMed and evaluate the performance of these systems using the holdout validation method. This paper compares different classifier techniques and performance differences between the full-text and it's abstract. Compared with the authors' previous work, which investigated the performance of Naïve Bayes, Linear Regression and SVM(reg1/2), this paper finds that SVMreg-1 performs best in key-phrase extraction for full-text, whereas Naïve Bayes performs best for abstracts. These techniques should be considered for use in information system search functionality. Additional research issues also are identified.

AB - Key-phrase extraction plays a useful a role in research areas of Information Systems (IS) like digital libraries. Short metadata like key phrases are beneficial for searchers to understand the concepts found in the documents. This paper evaluates the effectiveness of different supervised learning techniques on biomedical full-text: Sequential Minimal Optimization (SMO) and K-Nearest Neighbor, both of which could be embedded inside an information system for document search. The authors use these techniques to extract key phrases from PubMed and evaluate the performance of these systems using the holdout validation method. This paper compares different classifier techniques and performance differences between the full-text and it's abstract. Compared with the authors' previous work, which investigated the performance of Naïve Bayes, Linear Regression and SVM(reg1/2), this paper finds that SVMreg-1 performs best in key-phrase extraction for full-text, whereas Naïve Bayes performs best for abstracts. These techniques should be considered for use in information system search functionality. Additional research issues also are identified.

UR - http://www.scopus.com/inward/record.url?scp=79954606500&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79954606500&partnerID=8YFLogxK

U2 - 10.4018/jiit.2011010103

DO - 10.4018/jiit.2011010103

M3 - Article

AN - SCOPUS:79954606500

VL - 7

SP - 33

EP - 44

JO - International Journal of Intelligent Information Technologies

JF - International Journal of Intelligent Information Technologies

SN - 1548-3657

IS - 1

ER -