Novel and simple transformation algorithm for combining microarray data sets

Ki Yeol Kim, Dong Hyuk Ki, Ha Jin Jeong, Hei Cheul Jeung, Hyuncheol Chung, SunYoung Rha

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Background: With microarray technology, variability in experimental environments such as RNA sources, microarray production, or the use of different platforms, can cause bias. Such systematic differences present a substantial obstacle to the analysis of microarray data, resulting in inconsistent and unreliable information. Therefore, one of the most pressing challenges in the field of microarray technology is how to integrate results from different microarray experiments or combine data sets prior to the specific analysis. Results: Two microarray data sets based on a 17k cDNA microarray system were used, consisting of 82 normal colon mucosa and 72 colorectal cancer tissues. Each data set was prepared from either total RNA or amplified mRNA, and the difference of RNA source between these two data sets was detected by ANOVA (Analysis of variance) model. A simple integration method was introduced which was based on the distributions of gene expression ratios among different microarray data sets. The method transformed gene expression ratios into the form of a reference data set on a gene by gene basis. Hierarchical clustering analysis, density and box plots, and mixture scores with correlation coefficients revealed that the two data sets were well intermingled, indicating that the proposed method minimized the experimental bias. In addition, any RNA source effect was not detected by the proposed transformation method. In the mixed data set, two previously identified subgroups of normal and tumor were well separated, and the efficiency of integration was more prominent in tumor groups than normal groups. The transformation method was slightly more effective when a data set with strong homogeneity in the same experimental group was used as a reference data set. Conclusion: Proposed method is simple but useful to combine several data sets from different experimental conditions. With this method, biologically useful information can be detectable by applying various analytic methods to the combined data set with increased sample size.

Original languageEnglish
Article number218
JournalBMC Bioinformatics
Volume8
DOIs
Publication statusPublished - 2007 Jun 25

Fingerprint

Microarrays
Microarray Data
RNA
Microarray
Gene expression
Tumors
Genes
Gene Expression
Datasets
Tumor
Analysis of variance (ANOVA)
Box plot
CDNA Microarray
Mixed Data
Gene
Technology
Colorectal Cancer
Clustering Analysis
Complementary DNA
Hierarchical Clustering

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Kim, Ki Yeol ; Ki, Dong Hyuk ; Jeong, Ha Jin ; Jeung, Hei Cheul ; Chung, Hyuncheol ; Rha, SunYoung. / Novel and simple transformation algorithm for combining microarray data sets. In: BMC Bioinformatics. 2007 ; Vol. 8.
@article{ff605ab50ac942069b8c5f280aa6242d,
title = "Novel and simple transformation algorithm for combining microarray data sets",
abstract = "Background: With microarray technology, variability in experimental environments such as RNA sources, microarray production, or the use of different platforms, can cause bias. Such systematic differences present a substantial obstacle to the analysis of microarray data, resulting in inconsistent and unreliable information. Therefore, one of the most pressing challenges in the field of microarray technology is how to integrate results from different microarray experiments or combine data sets prior to the specific analysis. Results: Two microarray data sets based on a 17k cDNA microarray system were used, consisting of 82 normal colon mucosa and 72 colorectal cancer tissues. Each data set was prepared from either total RNA or amplified mRNA, and the difference of RNA source between these two data sets was detected by ANOVA (Analysis of variance) model. A simple integration method was introduced which was based on the distributions of gene expression ratios among different microarray data sets. The method transformed gene expression ratios into the form of a reference data set on a gene by gene basis. Hierarchical clustering analysis, density and box plots, and mixture scores with correlation coefficients revealed that the two data sets were well intermingled, indicating that the proposed method minimized the experimental bias. In addition, any RNA source effect was not detected by the proposed transformation method. In the mixed data set, two previously identified subgroups of normal and tumor were well separated, and the efficiency of integration was more prominent in tumor groups than normal groups. The transformation method was slightly more effective when a data set with strong homogeneity in the same experimental group was used as a reference data set. Conclusion: Proposed method is simple but useful to combine several data sets from different experimental conditions. With this method, biologically useful information can be detectable by applying various analytic methods to the combined data set with increased sample size.",
author = "Kim, {Ki Yeol} and Ki, {Dong Hyuk} and Jeong, {Ha Jin} and Jeung, {Hei Cheul} and Hyuncheol Chung and SunYoung Rha",
year = "2007",
month = "6",
day = "25",
doi = "10.1186/1471-2105-8-218",
language = "English",
volume = "8",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

Novel and simple transformation algorithm for combining microarray data sets. / Kim, Ki Yeol; Ki, Dong Hyuk; Jeong, Ha Jin; Jeung, Hei Cheul; Chung, Hyuncheol; Rha, SunYoung.

In: BMC Bioinformatics, Vol. 8, 218, 25.06.2007.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Novel and simple transformation algorithm for combining microarray data sets

AU - Kim, Ki Yeol

AU - Ki, Dong Hyuk

AU - Jeong, Ha Jin

AU - Jeung, Hei Cheul

AU - Chung, Hyuncheol

AU - Rha, SunYoung

PY - 2007/6/25

Y1 - 2007/6/25

N2 - Background: With microarray technology, variability in experimental environments such as RNA sources, microarray production, or the use of different platforms, can cause bias. Such systematic differences present a substantial obstacle to the analysis of microarray data, resulting in inconsistent and unreliable information. Therefore, one of the most pressing challenges in the field of microarray technology is how to integrate results from different microarray experiments or combine data sets prior to the specific analysis. Results: Two microarray data sets based on a 17k cDNA microarray system were used, consisting of 82 normal colon mucosa and 72 colorectal cancer tissues. Each data set was prepared from either total RNA or amplified mRNA, and the difference of RNA source between these two data sets was detected by ANOVA (Analysis of variance) model. A simple integration method was introduced which was based on the distributions of gene expression ratios among different microarray data sets. The method transformed gene expression ratios into the form of a reference data set on a gene by gene basis. Hierarchical clustering analysis, density and box plots, and mixture scores with correlation coefficients revealed that the two data sets were well intermingled, indicating that the proposed method minimized the experimental bias. In addition, any RNA source effect was not detected by the proposed transformation method. In the mixed data set, two previously identified subgroups of normal and tumor were well separated, and the efficiency of integration was more prominent in tumor groups than normal groups. The transformation method was slightly more effective when a data set with strong homogeneity in the same experimental group was used as a reference data set. Conclusion: Proposed method is simple but useful to combine several data sets from different experimental conditions. With this method, biologically useful information can be detectable by applying various analytic methods to the combined data set with increased sample size.

AB - Background: With microarray technology, variability in experimental environments such as RNA sources, microarray production, or the use of different platforms, can cause bias. Such systematic differences present a substantial obstacle to the analysis of microarray data, resulting in inconsistent and unreliable information. Therefore, one of the most pressing challenges in the field of microarray technology is how to integrate results from different microarray experiments or combine data sets prior to the specific analysis. Results: Two microarray data sets based on a 17k cDNA microarray system were used, consisting of 82 normal colon mucosa and 72 colorectal cancer tissues. Each data set was prepared from either total RNA or amplified mRNA, and the difference of RNA source between these two data sets was detected by ANOVA (Analysis of variance) model. A simple integration method was introduced which was based on the distributions of gene expression ratios among different microarray data sets. The method transformed gene expression ratios into the form of a reference data set on a gene by gene basis. Hierarchical clustering analysis, density and box plots, and mixture scores with correlation coefficients revealed that the two data sets were well intermingled, indicating that the proposed method minimized the experimental bias. In addition, any RNA source effect was not detected by the proposed transformation method. In the mixed data set, two previously identified subgroups of normal and tumor were well separated, and the efficiency of integration was more prominent in tumor groups than normal groups. The transformation method was slightly more effective when a data set with strong homogeneity in the same experimental group was used as a reference data set. Conclusion: Proposed method is simple but useful to combine several data sets from different experimental conditions. With this method, biologically useful information can be detectable by applying various analytic methods to the combined data set with increased sample size.

UR - http://www.scopus.com/inward/record.url?scp=34447528847&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34447528847&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-8-218

DO - 10.1186/1471-2105-8-218

M3 - Article

VL - 8

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 218

ER -