Combining active learning and semi-supervised learning techniques to extract protein interaction sentences

Min Song, Hwanjo Yu, Wook Shin Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore integrating active learning approaches to semi-supervised SVMs with a NLP-driven feature selection technique. Our contributions in this paper are as follows: (a) We proposed a novel PPI extraction technique called PPISpotter by combining an active learning technique with semi-supervised SVMs to extract proteinprotein interaction. (b) We extracted a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques for SVM classifiers. (c) We conducted experiments with three different PPI corpora and showed that PPISpotter is superior to four other comparison techniques in terms of precision, recall, and F-measure.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages35-44
Number of pages10
ISBN (Electronic)9781605583020
Publication statusPublished - 2010 Jan 1
Event9th International Workshop on Data Mining in Bioinformatics, BIOKDD 2010, Held in Conjunction with 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining - Washington, United States
Duration: 2010 Jul 252010 Jul 28

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other9th International Workshop on Data Mining in Bioinformatics, BIOKDD 2010, Held in Conjunction with 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityWashington
Period10/7/2510/7/28

Fingerprint

Supervised learning
Proteins
Problem-Based Learning
Processing
Feature extraction
Classifiers

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Song, M., Yu, H., & Han, W. S. (2010). Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 35-44). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery.
Song, Min ; Yu, Hwanjo ; Han, Wook Shin. / Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2010. pp. 35-44 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{0dfcc350d3a44db59ca5e774079eac28,
title = "Combining active learning and semi-supervised learning techniques to extract protein interaction sentences",
abstract = "Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore integrating active learning approaches to semi-supervised SVMs with a NLP-driven feature selection technique. Our contributions in this paper are as follows: (a) We proposed a novel PPI extraction technique called PPISpotter by combining an active learning technique with semi-supervised SVMs to extract proteinprotein interaction. (b) We extracted a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques for SVM classifiers. (c) We conducted experiments with three different PPI corpora and showed that PPISpotter is superior to four other comparison techniques in terms of precision, recall, and F-measure.",
author = "Min Song and Hwanjo Yu and Han, {Wook Shin}",
year = "2010",
month = "1",
day = "1",
language = "English",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
pages = "35--44",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

Song, M, Yu, H & Han, WS 2010, Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, pp. 35-44, 9th International Workshop on Data Mining in Bioinformatics, BIOKDD 2010, Held in Conjunction with 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, United States, 10/7/25.

Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. / Song, Min; Yu, Hwanjo; Han, Wook Shin.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2010. p. 35-44 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Combining active learning and semi-supervised learning techniques to extract protein interaction sentences

AU - Song, Min

AU - Yu, Hwanjo

AU - Han, Wook Shin

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore integrating active learning approaches to semi-supervised SVMs with a NLP-driven feature selection technique. Our contributions in this paper are as follows: (a) We proposed a novel PPI extraction technique called PPISpotter by combining an active learning technique with semi-supervised SVMs to extract proteinprotein interaction. (b) We extracted a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques for SVM classifiers. (c) We conducted experiments with three different PPI corpora and showed that PPISpotter is superior to four other comparison techniques in terms of precision, recall, and F-measure.

AB - Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore integrating active learning approaches to semi-supervised SVMs with a NLP-driven feature selection technique. Our contributions in this paper are as follows: (a) We proposed a novel PPI extraction technique called PPISpotter by combining an active learning technique with semi-supervised SVMs to extract proteinprotein interaction. (b) We extracted a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques for SVM classifiers. (c) We conducted experiments with three different PPI corpora and showed that PPISpotter is superior to four other comparison techniques in terms of precision, recall, and F-measure.

UR - http://www.scopus.com/inward/record.url?scp=84908310216&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908310216&partnerID=8YFLogxK

M3 - Conference contribution

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 35

EP - 44

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -

Song M, Yu H, Han WS. Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2010. p. 35-44. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).