Motivation Identification of genes that can be used to predict prognosis in patients with cancer is important in that it can lead to improved therapy, and can also promote our understanding of tumor progression on the molecular level. One of the common but fundamental problems that render identification of prognostic genes and prediction of cancer outcomes difficult is the heterogeneity of patient samples. Results To reduce the effect of sample heterogeneity, we clustered data samples using K-means algorithm and applied modified PageRank to functional interaction (FI) networks weighted using gene expression values of samples in each cluster. Hub genes among resulting prioritized genes were selected as biomarkers to predict the prognosis of samples. This process outperformed traditional feature selection methods as well as several network-based prognostic gene selection methods when applied to Random Forest. We were able to find many cluster-specific prognostic genes for each dataset. Functional study showed that distinct biological processes were enriched in each cluster, which seems to reflect different aspect of tumor progression or oncogenesis among distinct patient groups. Taken together, these results provide support for the hypothesis that our approach can effectively identify heterogeneous prognostic genes, and these are complementary to each other, improving prediction accuracy. Availability and implementation https://github.com/mathcom/CPR Contact email@example.com Supplementary informationSupplementary dataare available at Bioinformatics online.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics