Integrative gene network construction to analyze cancer recurrence using semi-supervised learning

Chihyun Park, Jaegyoon Ahn, Hyunjin Kim, Sang Hyun Park

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Background: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. Results: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. Conclusions: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

Original languageEnglish
Article numbere86309
JournalPLoS One
Volume9
Issue number1
DOIs
Publication statusPublished - 2014 Jan 31

Fingerprint

Gene Regulatory Networks
Supervised learning
learning
Genes
Recurrence
neoplasms
Neoplasms
Gene expression
Gene Expression
Bioinformatics
Neoplasm Genes
Learning algorithms
gene expression
Computational Biology
genes
Sample Size
Libraries
Supervised Machine Learning
gene regulatory networks
bioinformatics

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

@article{b30cf9c1ef4a440bb99d4fd8317693f6,
title = "Integrative gene network construction to analyze cancer recurrence using semi-supervised learning",
abstract = "Background: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. Results: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. Conclusions: The average improvement rate of accuracy for three different cancer datasets was 24.9{\%} compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.",
author = "Chihyun Park and Jaegyoon Ahn and Hyunjin Kim and Park, {Sang Hyun}",
year = "2014",
month = "1",
day = "31",
doi = "10.1371/journal.pone.0086309",
language = "English",
volume = "9",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "1",

}

Integrative gene network construction to analyze cancer recurrence using semi-supervised learning. / Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sang Hyun.

In: PLoS One, Vol. 9, No. 1, e86309, 31.01.2014.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Integrative gene network construction to analyze cancer recurrence using semi-supervised learning

AU - Park, Chihyun

AU - Ahn, Jaegyoon

AU - Kim, Hyunjin

AU - Park, Sang Hyun

PY - 2014/1/31

Y1 - 2014/1/31

N2 - Background: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. Results: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. Conclusions: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

AB - Background: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. Results: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. Conclusions: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

UR - http://www.scopus.com/inward/record.url?scp=84900305562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900305562&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0086309

DO - 10.1371/journal.pone.0086309

M3 - Article

C2 - 24497942

AN - SCOPUS:84900305562

VL - 9

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 1

M1 - e86309

ER -