Data mining for GENE expression profiles from DNA microarray

Sung Bae Cho, Hong Hee Won

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Microarray technology has supplied a large volume of data, which changes many problems in biology into the problems of computing. As a result techniques for extracting useful information from the data are developed. In particular, microarray technology has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer. To precisely classify cancer we have to select genes related to cancer because the genes extracted from microarray have many noises. In this paper, we attempt to explore seven feature selection methods and four classifiers and propose ensemble classifiers in three benchmark datasets to systematically evaluate the performances of the feature selection methods and machine learning classifiers. Three benchmark datasets are leukemia cancer dataset, colon cancer dataset and lymphoma cancer data set. The methods to combine the classifiers are majority voting, weighted voting, and Bayesian approach to improve the performance of classification. Experimental results show that the ensemble with several basis classifiers produces the best recognition rate on the benchmark datasets.

Original languageEnglish
Pages (from-to)593-608
Number of pages16
JournalInternational Journal of Software Engineering and Knowledge Engineering
Volume13
Issue number6
DOIs
Publication statusPublished - 2003 Dec 1

Fingerprint

Microarrays
Data mining
DNA
Classifiers
Feature extraction
Genes
Learning systems

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence

Cite this

@article{97623398b6084d89ba377c0e6edbbe61,
title = "Data mining for GENE expression profiles from DNA microarray",
abstract = "Microarray technology has supplied a large volume of data, which changes many problems in biology into the problems of computing. As a result techniques for extracting useful information from the data are developed. In particular, microarray technology has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer. To precisely classify cancer we have to select genes related to cancer because the genes extracted from microarray have many noises. In this paper, we attempt to explore seven feature selection methods and four classifiers and propose ensemble classifiers in three benchmark datasets to systematically evaluate the performances of the feature selection methods and machine learning classifiers. Three benchmark datasets are leukemia cancer dataset, colon cancer dataset and lymphoma cancer data set. The methods to combine the classifiers are majority voting, weighted voting, and Bayesian approach to improve the performance of classification. Experimental results show that the ensemble with several basis classifiers produces the best recognition rate on the benchmark datasets.",
author = "Cho, {Sung Bae} and Won, {Hong Hee}",
year = "2003",
month = "12",
day = "1",
doi = "10.1142/S0218194003001469",
language = "English",
volume = "13",
pages = "593--608",
journal = "International Journal of Software Engineering and Knowledge Engineering",
issn = "0218-1940",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "6",

}

Data mining for GENE expression profiles from DNA microarray. / Cho, Sung Bae; Won, Hong Hee.

In: International Journal of Software Engineering and Knowledge Engineering, Vol. 13, No. 6, 01.12.2003, p. 593-608.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Data mining for GENE expression profiles from DNA microarray

AU - Cho, Sung Bae

AU - Won, Hong Hee

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Microarray technology has supplied a large volume of data, which changes many problems in biology into the problems of computing. As a result techniques for extracting useful information from the data are developed. In particular, microarray technology has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer. To precisely classify cancer we have to select genes related to cancer because the genes extracted from microarray have many noises. In this paper, we attempt to explore seven feature selection methods and four classifiers and propose ensemble classifiers in three benchmark datasets to systematically evaluate the performances of the feature selection methods and machine learning classifiers. Three benchmark datasets are leukemia cancer dataset, colon cancer dataset and lymphoma cancer data set. The methods to combine the classifiers are majority voting, weighted voting, and Bayesian approach to improve the performance of classification. Experimental results show that the ensemble with several basis classifiers produces the best recognition rate on the benchmark datasets.

AB - Microarray technology has supplied a large volume of data, which changes many problems in biology into the problems of computing. As a result techniques for extracting useful information from the data are developed. In particular, microarray technology has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer. To precisely classify cancer we have to select genes related to cancer because the genes extracted from microarray have many noises. In this paper, we attempt to explore seven feature selection methods and four classifiers and propose ensemble classifiers in three benchmark datasets to systematically evaluate the performances of the feature selection methods and machine learning classifiers. Three benchmark datasets are leukemia cancer dataset, colon cancer dataset and lymphoma cancer data set. The methods to combine the classifiers are majority voting, weighted voting, and Bayesian approach to improve the performance of classification. Experimental results show that the ensemble with several basis classifiers produces the best recognition rate on the benchmark datasets.

UR - http://www.scopus.com/inward/record.url?scp=1142264043&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1142264043&partnerID=8YFLogxK

U2 - 10.1142/S0218194003001469

DO - 10.1142/S0218194003001469

M3 - Article

AN - SCOPUS:1142264043

VL - 13

SP - 593

EP - 608

JO - International Journal of Software Engineering and Knowledge Engineering

JF - International Journal of Software Engineering and Knowledge Engineering

SN - 0218-1940

IS - 6

ER -