Apparently low reproducibility of true differential expression discoveries in microarray studies

被引:95
作者
Zhang, Min [1 ]
Yao, Chen [2 ,3 ]
Guo, Zheng [1 ,2 ,3 ]
Zou, Jinfeng [1 ]
Zhang, Lin [2 ,3 ]
Xiao, Hui [1 ]
Wang, Dong [1 ]
Yang, Da [1 ]
Gong, Xue [1 ]
Zhu, Jing [2 ,3 ]
Li, Yanhui [2 ,3 ]
Li, Xia [1 ]
机构
[1] Harbin Med Univ, Sch Bioinformat Sci & Technol, Harbin 150086, Peoples R China
[2] Univ Elect Sci & Technol China, Bioinformat Ctr, Chengdu 610054, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Life Sci, Chengdu 610054, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1093/bioinformatics/btn365
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Differentially expressed gene (DEG) lists detected from different microarray studies for a same disease are often highly inconsistent. Even in technical replicate tests using identical samples, DEG detection still shows very low reproducibility. It is often believed that current small microarray studies will largely introduce false discoveries. Results: Based on a statistical model, we show that even in technical replicate tests using identical samples, it is highly likely that the selected DEG lists will be very inconsistent in the presence of small measurement variations. Therefore, the apparently low reproducibility of DEG detection from current technical replicate tests does not indicate low quality of microarray technology. We also demonstrate that heterogeneous biological variations existing in real cancer data will further reduce the overall reproducibility of DEG detection. Nevertheless, in small subsamples from both simulated and real data, the actual false discovery rate (FDR) for each DEG list tends to be low, suggesting that each separately determined list may comprise mostly true DEGs. Rather than simply counting the overlaps of the discovery lists from different studies for a complex disease, novel metrics are needed for evaluating the reproducibility of discoveries characterized with correlated molecular changes.
引用
收藏
页码:2057 / 2063
页数:7
相关论文
共 50 条
[1]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[2]   Statistical strategies for avoiding false discoveries in metabolomics and related experiments [J].
Broadhurst, David I. ;
Kell, Douglas B. .
METABOLOMICS, 2006, 2 (04) :171-196
[3]   Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data [J].
Chen, James J. ;
Hsueh, Huey-Miin ;
Delongchamp, Robert R. ;
Lin, Chien-Ju ;
Tsai, Chen-An .
BMC BIOINFORMATICS, 2007, 8 (1) :1-14
[4]   Gene expression patterns in human liver cancers [J].
Chen, X ;
Cheung, ST ;
So, S ;
Fan, ST ;
Barry, C ;
Higgins, J ;
Lai, KM ;
Ji, JF ;
Dudoit, S ;
Ng, IOL ;
van de Rijn, M ;
Botstein, D ;
Brown, PO .
MOLECULAR BIOLOGY OF THE CELL, 2002, 13 (06) :1929-1939
[5]   SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data [J].
Diehn, M ;
Sherlock, G ;
Binkley, G ;
Jin, H ;
Matese, JC ;
Hernandez-Boussard, T ;
Rees, CA ;
Cherry, JM ;
Botstein, D ;
Brown, PO ;
Alizadeh, AA .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :219-223
[6]  
Do JH, 2006, MOL CELLS, V22, P254
[7]  
Dobbin KK, 2005, CLIN CANCER RES, V11, P565
[8]   Outcome signature genes in breast cancer: is there a unique set? [J].
Ein-Dor, L ;
Kela, I ;
Getz, G ;
Givol, D ;
Domany, E .
BIOINFORMATICS, 2005, 21 (02) :171-178
[9]   Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer [J].
Ein-Dor, L ;
Zuk, O ;
Domany, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) :5923-5928
[10]   An array of problems [J].
Frantz, S .
NATURE REVIEWS DRUG DISCOVERY, 2005, 4 (05) :362-363