Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer

Byung Soo Kim, Inyoung Kim, Sunho Lee, Sangcheol Kim, SunYoung Rha, Hyuncheol Chung

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Motivation: It is a common practice in cancer microarray experiments that a normal tissue is collected from the same individual from whom the tumor tissue was taken. The indirect design is usually adopted for the experiment that uses a common reference RNA hybridized both to normal and tumor tissues. However, it is often the case that the test material is not large enough for the experimenter to extract enough RNA to conduct the microarray experiment. Hence, collecting n cases does not necessarily end up with a matched pair sample of size n. Instead we usually have a matched pair sample of size n1, and two independent samples of sizes n2 and n3, respectively, for 'reference versus normal tissue only' and 'reference versus tumor tissue only' hybridizations (n = n1 + n2 + n3). Standard statistical methods need to be modified and new statistical procedures are developed for analyzing this mixed dataset. Results: We propose a new test statistic, t3, as a means of combining all the information in the mixed dataset for detecting differentially expressed (DE) genes between normal and tumor tissues. We employed the extended receiver operating characteristic approach to the mixed dataset. We devised a measure of disagreement between a RT-PCR experiment and a microarray experiment. Hotelling's T2 statistic is employed to detect a set of DE genes and its prediction rate is compared with the prediction rate of a univariate procedure. We observe that Hotelling's T2 statistic detects DE genes more efficiently than a univariate procedure and that further research is warranted on the formal test procedure using Hotelling's T2 statistic.

Original languageEnglish
Pages (from-to)517-528
Number of pages12
JournalBioinformatics
Volume21
Issue number4
DOIs
Publication statusPublished - 2005 Feb 15

Fingerprint

Colorectal Cancer
Microarrays
Microarray Data
Statistical method
Colorectal Neoplasms
Statistical methods
Diagnostics
Hotelling's T2
Tissue
Tumor
Microarray
Tumors
Statistic
Matched pairs
Statistics
Sample Size
Experiment
Gene
Genes
Univariate

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

@article{7c1508e1c1a64c58a43e9531b0ef5479,
title = "Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer",
abstract = "Motivation: It is a common practice in cancer microarray experiments that a normal tissue is collected from the same individual from whom the tumor tissue was taken. The indirect design is usually adopted for the experiment that uses a common reference RNA hybridized both to normal and tumor tissues. However, it is often the case that the test material is not large enough for the experimenter to extract enough RNA to conduct the microarray experiment. Hence, collecting n cases does not necessarily end up with a matched pair sample of size n. Instead we usually have a matched pair sample of size n1, and two independent samples of sizes n2 and n3, respectively, for 'reference versus normal tissue only' and 'reference versus tumor tissue only' hybridizations (n = n1 + n2 + n3). Standard statistical methods need to be modified and new statistical procedures are developed for analyzing this mixed dataset. Results: We propose a new test statistic, t3, as a means of combining all the information in the mixed dataset for detecting differentially expressed (DE) genes between normal and tumor tissues. We employed the extended receiver operating characteristic approach to the mixed dataset. We devised a measure of disagreement between a RT-PCR experiment and a microarray experiment. Hotelling's T2 statistic is employed to detect a set of DE genes and its prediction rate is compared with the prediction rate of a univariate procedure. We observe that Hotelling's T2 statistic detects DE genes more efficiently than a univariate procedure and that further research is warranted on the formal test procedure using Hotelling's T2 statistic.",
author = "Kim, {Byung Soo} and Inyoung Kim and Sunho Lee and Sangcheol Kim and SunYoung Rha and Hyuncheol Chung",
year = "2005",
month = "2",
day = "15",
doi = "10.1093/bioinformatics/bti029",
language = "English",
volume = "21",
pages = "517--528",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "4",

}

Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer. / Kim, Byung Soo; Kim, Inyoung; Lee, Sunho; Kim, Sangcheol; Rha, SunYoung; Chung, Hyuncheol.

In: Bioinformatics, Vol. 21, No. 4, 15.02.2005, p. 517-528.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer

AU - Kim, Byung Soo

AU - Kim, Inyoung

AU - Lee, Sunho

AU - Kim, Sangcheol

AU - Rha, SunYoung

AU - Chung, Hyuncheol

PY - 2005/2/15

Y1 - 2005/2/15

N2 - Motivation: It is a common practice in cancer microarray experiments that a normal tissue is collected from the same individual from whom the tumor tissue was taken. The indirect design is usually adopted for the experiment that uses a common reference RNA hybridized both to normal and tumor tissues. However, it is often the case that the test material is not large enough for the experimenter to extract enough RNA to conduct the microarray experiment. Hence, collecting n cases does not necessarily end up with a matched pair sample of size n. Instead we usually have a matched pair sample of size n1, and two independent samples of sizes n2 and n3, respectively, for 'reference versus normal tissue only' and 'reference versus tumor tissue only' hybridizations (n = n1 + n2 + n3). Standard statistical methods need to be modified and new statistical procedures are developed for analyzing this mixed dataset. Results: We propose a new test statistic, t3, as a means of combining all the information in the mixed dataset for detecting differentially expressed (DE) genes between normal and tumor tissues. We employed the extended receiver operating characteristic approach to the mixed dataset. We devised a measure of disagreement between a RT-PCR experiment and a microarray experiment. Hotelling's T2 statistic is employed to detect a set of DE genes and its prediction rate is compared with the prediction rate of a univariate procedure. We observe that Hotelling's T2 statistic detects DE genes more efficiently than a univariate procedure and that further research is warranted on the formal test procedure using Hotelling's T2 statistic.

AB - Motivation: It is a common practice in cancer microarray experiments that a normal tissue is collected from the same individual from whom the tumor tissue was taken. The indirect design is usually adopted for the experiment that uses a common reference RNA hybridized both to normal and tumor tissues. However, it is often the case that the test material is not large enough for the experimenter to extract enough RNA to conduct the microarray experiment. Hence, collecting n cases does not necessarily end up with a matched pair sample of size n. Instead we usually have a matched pair sample of size n1, and two independent samples of sizes n2 and n3, respectively, for 'reference versus normal tissue only' and 'reference versus tumor tissue only' hybridizations (n = n1 + n2 + n3). Standard statistical methods need to be modified and new statistical procedures are developed for analyzing this mixed dataset. Results: We propose a new test statistic, t3, as a means of combining all the information in the mixed dataset for detecting differentially expressed (DE) genes between normal and tumor tissues. We employed the extended receiver operating characteristic approach to the mixed dataset. We devised a measure of disagreement between a RT-PCR experiment and a microarray experiment. Hotelling's T2 statistic is employed to detect a set of DE genes and its prediction rate is compared with the prediction rate of a univariate procedure. We observe that Hotelling's T2 statistic detects DE genes more efficiently than a univariate procedure and that further research is warranted on the formal test procedure using Hotelling's T2 statistic.

UR - http://www.scopus.com/inward/record.url?scp=14644429668&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14644429668&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bti029

DO - 10.1093/bioinformatics/bti029

M3 - Article

VL - 21

SP - 517

EP - 528

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 4

ER -