Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles

Chihyun Park, Jung Rim Kim, Jeongwoo Kim, Sanghyun Park

Research output: Contribution to journalArticle

Abstract

The identification of disease-related genes and disease mechanisms is an important research goal; many studies have approached this problem by analysing genetic networks based on gene expression profiles and interaction datasets. To construct a gene network, correlations or associations among pairs of genes must be obtained. However, when gene expression data are heterogeneous with high levels of noise for samples assigned to the same condition, it is difficult to accurately determine whether a gene pair represents a significant gene–gene interaction (GGI). In order to solve this problem, we proposed a random forest-based method to classify significant GGIs from gene expression data. To train the model, we defined novel feature sets and utilised various high-confidence interactome datasets to deduce the correct answer set from known disease-specific genes. Using Alzheimer’s disease data, the proposed method showed remarkable accuracy, and the GGIs established in the analysis can be used to build a meaningful genetic network that can explain the mechanisms underlying Alzheimer’s disease.

Original languageEnglish
Article numbere0201056
JournalPloS one
Volume13
Issue number7
DOIs
Publication statusPublished - 2018 Jul

Fingerprint

artificial intelligence
Transcriptome
Gene expression
Learning systems
Identification (control systems)
Genes
gene expression
Alzheimer disease
Alzheimer Disease
genes
Gene Expression
Gene Regulatory Networks
Noise
Machine Learning
methodology
Research
sampling
Datasets

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

@article{66f16b19b6a04f038a6e1d4f13cfa4ed,
title = "Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles",
abstract = "The identification of disease-related genes and disease mechanisms is an important research goal; many studies have approached this problem by analysing genetic networks based on gene expression profiles and interaction datasets. To construct a gene network, correlations or associations among pairs of genes must be obtained. However, when gene expression data are heterogeneous with high levels of noise for samples assigned to the same condition, it is difficult to accurately determine whether a gene pair represents a significant gene–gene interaction (GGI). In order to solve this problem, we proposed a random forest-based method to classify significant GGIs from gene expression data. To train the model, we defined novel feature sets and utilised various high-confidence interactome datasets to deduce the correct answer set from known disease-specific genes. Using Alzheimer’s disease data, the proposed method showed remarkable accuracy, and the GGIs established in the analysis can be used to build a meaningful genetic network that can explain the mechanisms underlying Alzheimer’s disease.",
author = "Chihyun Park and Kim, {Jung Rim} and Jeongwoo Kim and Sanghyun Park",
year = "2018",
month = "7",
doi = "10.1371/journal.pone.0201056",
language = "English",
volume = "13",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "7",

}

Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles. / Park, Chihyun; Kim, Jung Rim; Kim, Jeongwoo; Park, Sanghyun.

In: PloS one, Vol. 13, No. 7, e0201056, 07.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles

AU - Park, Chihyun

AU - Kim, Jung Rim

AU - Kim, Jeongwoo

AU - Park, Sanghyun

PY - 2018/7

Y1 - 2018/7

N2 - The identification of disease-related genes and disease mechanisms is an important research goal; many studies have approached this problem by analysing genetic networks based on gene expression profiles and interaction datasets. To construct a gene network, correlations or associations among pairs of genes must be obtained. However, when gene expression data are heterogeneous with high levels of noise for samples assigned to the same condition, it is difficult to accurately determine whether a gene pair represents a significant gene–gene interaction (GGI). In order to solve this problem, we proposed a random forest-based method to classify significant GGIs from gene expression data. To train the model, we defined novel feature sets and utilised various high-confidence interactome datasets to deduce the correct answer set from known disease-specific genes. Using Alzheimer’s disease data, the proposed method showed remarkable accuracy, and the GGIs established in the analysis can be used to build a meaningful genetic network that can explain the mechanisms underlying Alzheimer’s disease.

AB - The identification of disease-related genes and disease mechanisms is an important research goal; many studies have approached this problem by analysing genetic networks based on gene expression profiles and interaction datasets. To construct a gene network, correlations or associations among pairs of genes must be obtained. However, when gene expression data are heterogeneous with high levels of noise for samples assigned to the same condition, it is difficult to accurately determine whether a gene pair represents a significant gene–gene interaction (GGI). In order to solve this problem, we proposed a random forest-based method to classify significant GGIs from gene expression data. To train the model, we defined novel feature sets and utilised various high-confidence interactome datasets to deduce the correct answer set from known disease-specific genes. Using Alzheimer’s disease data, the proposed method showed remarkable accuracy, and the GGIs established in the analysis can be used to build a meaningful genetic network that can explain the mechanisms underlying Alzheimer’s disease.

UR - http://www.scopus.com/inward/record.url?scp=85050689713&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050689713&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0201056

DO - 10.1371/journal.pone.0201056

M3 - Article

C2 - 30048494

AN - SCOPUS:85050689713

VL - 13

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 7

M1 - e0201056

ER -