Direct integration of microarrays for selecting informative genes and phenotype classification

Youngmi Yoon, Jongchan Lee, Sanghyun Park, Sangjay Bien, Hyun Cheol Chung, Sun Young Rha

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.

Original languageEnglish
Pages (from-to)88-105
Number of pages18
JournalInformation sciences
Volume178
Issue number1
DOIs
Publication statusPublished - 2008 Jan 2

Fingerprint

Microarrays
Microarray
Phenotype
Genes
Gene
Classifier
Microarray Data
Classifiers
Gene Selection
Ensemble Methods
Swap
Feature Selection
Gene Expression
Specificity
Exceed
High Accuracy
Gene expression

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

@article{5c176445d5d74c0d8938a05a202315d1,
title = "Direct integration of microarrays for selecting informative genes and phenotype classification",
abstract = "The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.",
author = "Youngmi Yoon and Jongchan Lee and Sanghyun Park and Sangjay Bien and Chung, {Hyun Cheol} and Rha, {Sun Young}",
year = "2008",
month = "1",
day = "2",
doi = "10.1016/j.ins.2007.08.013",
language = "English",
volume = "178",
pages = "88--105",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",
number = "1",

}

Direct integration of microarrays for selecting informative genes and phenotype classification. / Yoon, Youngmi; Lee, Jongchan; Park, Sanghyun; Bien, Sangjay; Chung, Hyun Cheol; Rha, Sun Young.

In: Information sciences, Vol. 178, No. 1, 02.01.2008, p. 88-105.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Direct integration of microarrays for selecting informative genes and phenotype classification

AU - Yoon, Youngmi

AU - Lee, Jongchan

AU - Park, Sanghyun

AU - Bien, Sangjay

AU - Chung, Hyun Cheol

AU - Rha, Sun Young

PY - 2008/1/2

Y1 - 2008/1/2

N2 - The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.

AB - The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.

UR - http://www.scopus.com/inward/record.url?scp=34948842501&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34948842501&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2007.08.013

DO - 10.1016/j.ins.2007.08.013

M3 - Article

AN - SCOPUS:34948842501

VL - 178

SP - 88

EP - 105

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

IS - 1

ER -