Combining supervised learning techniques to key-phrase extraction for biomedical full-text

Yanliang Qi, Min Song, Suk Chung Yoon, Lori Watrous-De Versterre

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)


Key-phrase extraction plays a useful a role in research areas of Information Systems (IS) like digital libraries. Short metadata like key phrases are beneficial for searchers to understand the concepts found in the documents. This paper evaluates the effectiveness of different supervised learning techniques on biomedical full-text: Sequential Minimal Optimization (SMO) and K-Nearest Neighbor, both of which could be embedded inside an information system for document search. The authors use these techniques to extract key phrases from PubMed and evaluate the performance of these systems using the holdout validation method. This paper compares different classifier techniques and performance differences between the full-text and it's abstract. Compared with the authors' previous work, which investigated the performance of Naïve Bayes, Linear Regression and SVM(reg1/2), this paper finds that SVMreg-1 performs best in key-phrase extraction for full-text, whereas Naïve Bayes performs best for abstracts. These techniques should be considered for use in information system search functionality. Additional research issues also are identified.

Original languageEnglish
Pages (from-to)33-44
Number of pages12
JournalInternational Journal of Intelligent Information Technologies
Issue number1
Publication statusPublished - 2011 Jan

Bibliographical note

Funding Information:
Min Song is an Assistant Professor in the Department of Information Systems and Co-director of the Informatics Research Laboratory at New Jersey Institute of Technology, where the goal of his research is discovery of knowledge from large natural language data such as blogs, doctor’s notes, and scientific publications. His research interests are in text mining, bioinfomatics, information retrieval and digital libraries. Min received the outstanding service award from the International Conference on Information and Knowledge Management in 2009. His work received an honorable mention award in the 2006 Greater Philadelphia Bioinformatics Symposium and the Drexel Dissertation Award in 2005. In addition, the paper entitled “Extracting and Mining Protein-protein interaction Network from Biomedical Literature” has received the best paper award from 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, which was held in San Diego, Calif., on Oct. 7-8, 2004. In addition, Min was nominated for Microsoft Research New Faculty Award in 2008 and Sloan Research Fellow in 2009 by NJIT. He has published one book, five book chapters, 15 journals, and 40 conference papers. Min has received several grants from the National Science Foundation (NSF) and the Institute of Museum and Library Services (IMLS) on developing advanced Digital Libraries systems. He received his PhD in Information Systems from Drexel University, an MS from Indiana University and a BA from Yonsei University in Korea.

Funding Information:
Suk-Chung Yoon is currently the chair and William R. Bailey Endowed Professor of the Computer Science Department at Widener University, Chester, Pennsylvania. Dr. Yoon is responsible for two majors, Computer Science and Computer Information Systems. Over the past 20 years, Dr. Yoon has taught Computer Science to all levels of students ranging from non-majors to upper class Computer Science majors and graduate students in our master programs. His research has focused on artificial intelligence, data mining, and large scale computing. His work has been supported by the National Science Foundation as well as corporations such as IBM and J.P. Morgan Chase. Dr. Yoon earned the B.S. from Yonsei University and received my M.S. and Ph.D. degrees from Northwestern University.

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Decision Sciences (miscellaneous)

Fingerprint Dive into the research topics of 'Combining supervised learning techniques to key-phrase extraction for biomedical full-text'. Together they form a unique fingerprint.

Cite this