Combining active learning and semi-supervised learning techniques to extract protein interaction sentences.

Min Song, Hwanjo Yu, Wook Shin Han

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.

Original languageEnglish
JournalBMC bioinformatics
Volume12 Suppl 12
DOIs
Publication statusPublished - 2011 Jan 1

Fingerprint

Problem-Based Learning
Semi-supervised Learning
Active Learning
Supervised learning
Protein-protein Interaction
Proteins
Protein
Interaction
Feature Selection
Random Sampling
Supervised Machine Learning
Annealing
Natural Language
Feature extraction
System Performance
Classifier
Natural Language Processing
Clustering
Syntactics
Semantics

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

@article{c8f9e4f97a97441a88e2d8e4d0c987f7,
title = "Combining active learning and semi-supervised learning techniques to extract protein interaction sentences.",
abstract = "Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.",
author = "Min Song and Hwanjo Yu and Han, {Wook Shin}",
year = "2011",
month = "1",
day = "1",
doi = "10.1186/1471-2105-12-S12-S4",
language = "English",
volume = "12 Suppl 12",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. / Song, Min; Yu, Hwanjo; Han, Wook Shin.

In: BMC bioinformatics, Vol. 12 Suppl 12, 01.01.2011.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Combining active learning and semi-supervised learning techniques to extract protein interaction sentences.

AU - Song, Min

AU - Yu, Hwanjo

AU - Han, Wook Shin

PY - 2011/1/1

Y1 - 2011/1/1

N2 - Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.

AB - Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.

UR - http://www.scopus.com/inward/record.url?scp=84864998804&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864998804&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-S12-S4

DO - 10.1186/1471-2105-12-S12-S4

M3 - Article

VL - 12 Suppl 12

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

ER -