Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers

Jonghwan Choi, Sang Hyun Park, Youngmi Yoon, Jaegyoon Ahn

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Motivation Identification of genes that can be used to predict prognosis in patients with cancer is important in that it can lead to improved therapy, and can also promote our understanding of tumor progression on the molecular level. One of the common but fundamental problems that render identification of prognostic genes and prediction of cancer outcomes difficult is the heterogeneity of patient samples. Results To reduce the effect of sample heterogeneity, we clustered data samples using K-means algorithm and applied modified PageRank to functional interaction (FI) networks weighted using gene expression values of samples in each cluster. Hub genes among resulting prioritized genes were selected as biomarkers to predict the prognosis of samples. This process outperformed traditional feature selection methods as well as several network-based prognostic gene selection methods when applied to Random Forest. We were able to find many cluster-specific prognostic genes for each dataset. Functional study showed that distinct biological processes were enriched in each cluster, which seems to reflect different aspect of tumor progression or oncogenesis among distinct patient groups. Taken together, these results provide support for the hypothesis that our approach can effectively identify heterogeneous prognostic genes, and these are complementary to each other, improving prediction accuracy. Availability and implementation https://github.com/mathcom/CPR Contact jgahn@inu.ac.kr Supplementary informationSupplementary dataare available at Bioinformatics online.

Original languageEnglish
Pages (from-to)3619-3626
Number of pages8
JournalBioinformatics
Volume33
Issue number22
DOIs
Publication statusPublished - 2017 Nov 15

Fingerprint

Biomarkers
Breast Cancer
Genes
Breast Neoplasms
Gene
Prediction
Prognosis
Progression
Tumor
Cancer
Tumors
Distinct
Biological Phenomena
Predict
Gene Selection
Neoplasms
Weighted Networks
Clustered Data
PageRank
Random Forest

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Choi, Jonghwan ; Park, Sang Hyun ; Yoon, Youngmi ; Ahn, Jaegyoon. / Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers. In: Bioinformatics. 2017 ; Vol. 33, No. 22. pp. 3619-3626.
@article{06d633bb8e5e44899148659191019946,
title = "Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers",
abstract = "Motivation Identification of genes that can be used to predict prognosis in patients with cancer is important in that it can lead to improved therapy, and can also promote our understanding of tumor progression on the molecular level. One of the common but fundamental problems that render identification of prognostic genes and prediction of cancer outcomes difficult is the heterogeneity of patient samples. Results To reduce the effect of sample heterogeneity, we clustered data samples using K-means algorithm and applied modified PageRank to functional interaction (FI) networks weighted using gene expression values of samples in each cluster. Hub genes among resulting prioritized genes were selected as biomarkers to predict the prognosis of samples. This process outperformed traditional feature selection methods as well as several network-based prognostic gene selection methods when applied to Random Forest. We were able to find many cluster-specific prognostic genes for each dataset. Functional study showed that distinct biological processes were enriched in each cluster, which seems to reflect different aspect of tumor progression or oncogenesis among distinct patient groups. Taken together, these results provide support for the hypothesis that our approach can effectively identify heterogeneous prognostic genes, and these are complementary to each other, improving prediction accuracy. Availability and implementation https://github.com/mathcom/CPR Contact jgahn@inu.ac.kr Supplementary informationSupplementary dataare available at Bioinformatics online.",
author = "Jonghwan Choi and Park, {Sang Hyun} and Youngmi Yoon and Jaegyoon Ahn",
year = "2017",
month = "11",
day = "15",
doi = "10.1093/bioinformatics/btx487",
language = "English",
volume = "33",
pages = "3619--3626",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "22",

}

Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers. / Choi, Jonghwan; Park, Sang Hyun; Yoon, Youngmi; Ahn, Jaegyoon.

In: Bioinformatics, Vol. 33, No. 22, 15.11.2017, p. 3619-3626.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers

AU - Choi, Jonghwan

AU - Park, Sang Hyun

AU - Yoon, Youngmi

AU - Ahn, Jaegyoon

PY - 2017/11/15

Y1 - 2017/11/15

N2 - Motivation Identification of genes that can be used to predict prognosis in patients with cancer is important in that it can lead to improved therapy, and can also promote our understanding of tumor progression on the molecular level. One of the common but fundamental problems that render identification of prognostic genes and prediction of cancer outcomes difficult is the heterogeneity of patient samples. Results To reduce the effect of sample heterogeneity, we clustered data samples using K-means algorithm and applied modified PageRank to functional interaction (FI) networks weighted using gene expression values of samples in each cluster. Hub genes among resulting prioritized genes were selected as biomarkers to predict the prognosis of samples. This process outperformed traditional feature selection methods as well as several network-based prognostic gene selection methods when applied to Random Forest. We were able to find many cluster-specific prognostic genes for each dataset. Functional study showed that distinct biological processes were enriched in each cluster, which seems to reflect different aspect of tumor progression or oncogenesis among distinct patient groups. Taken together, these results provide support for the hypothesis that our approach can effectively identify heterogeneous prognostic genes, and these are complementary to each other, improving prediction accuracy. Availability and implementation https://github.com/mathcom/CPR Contact jgahn@inu.ac.kr Supplementary informationSupplementary dataare available at Bioinformatics online.

AB - Motivation Identification of genes that can be used to predict prognosis in patients with cancer is important in that it can lead to improved therapy, and can also promote our understanding of tumor progression on the molecular level. One of the common but fundamental problems that render identification of prognostic genes and prediction of cancer outcomes difficult is the heterogeneity of patient samples. Results To reduce the effect of sample heterogeneity, we clustered data samples using K-means algorithm and applied modified PageRank to functional interaction (FI) networks weighted using gene expression values of samples in each cluster. Hub genes among resulting prioritized genes were selected as biomarkers to predict the prognosis of samples. This process outperformed traditional feature selection methods as well as several network-based prognostic gene selection methods when applied to Random Forest. We were able to find many cluster-specific prognostic genes for each dataset. Functional study showed that distinct biological processes were enriched in each cluster, which seems to reflect different aspect of tumor progression or oncogenesis among distinct patient groups. Taken together, these results provide support for the hypothesis that our approach can effectively identify heterogeneous prognostic genes, and these are complementary to each other, improving prediction accuracy. Availability and implementation https://github.com/mathcom/CPR Contact jgahn@inu.ac.kr Supplementary informationSupplementary dataare available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85034424290&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034424290&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btx487

DO - 10.1093/bioinformatics/btx487

M3 - Article

C2 - 28961949

AN - SCOPUS:85034424290

VL - 33

SP - 3619

EP - 3626

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 22

ER -