Asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types

被引:12
作者
Dawany, Noor B. [1 ]
Tozeren, Aydin [1 ]
机构
[1] Drexel Univ, Ctr Integrated Bioinformat, Philadelphia, PA 19104 USA
关键词
EXPRESSION PROFILES; BREAST-CANCER; INTEGRATIVE ANALYSIS; SET ENRICHMENT; METAANALYSIS; PATHWAY; PROGRESSION; PROGNOSIS; IDENTIFICATION; VALIDATION;
D O I
10.1186/1471-2105-11-483
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Much of the public access cancer microarray data is asymmetric, belonging to datasets containing no samples from normal tissue. Asymmetric data cannot be used in standard meta-analysis approaches (such as the inverse variance method) to obtain large sample sizes for statistical power enrichment. Noting that plenty of normal tissue microarray samples exist in studies not involving cancer, we investigated the viability and accuracy of an integrated microarray analysis approach based on significance analysis of microarrays (merged SAM) using a collection of data from separate diseased and normal samples. Results: We focused on five solid cancer types (colon, kidney, liver, lung, and pancreas), where available microarray data allowed us to compare meta-analysis and integrated approaches. Our results from the merged SAM significantly overlapped gene lists from the validated inverse-variance method. Both meta-analysis and merged SAM approaches successfully captured the aberrances in the cell cycle that commonly occur in the different cancer types. However, the integrated SAM analysis replicated the known cancer literature (excluding microarray studies) with much more accuracy than the meta-analysis. Conclusion: The merged SAM test is a powerful, robust approach for combining data from similar platforms and for analyzing asymmetric datasets, including those with only normal or only cancer samples that cannot be utilized by meta-analysis methods. The integrated SAM approach can also be used in comparing global gene expression between various subtypes of cancer arising from the same tissue.
引用
收藏
页数:14
相关论文
共 58 条
[1]   NCBI GEO: mining tens of millions of expression profiles - database and tools update [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Rudnev, Dmitry ;
Evangelista, Carlos ;
Kim, Irene F. ;
Soboleva, Alexandra ;
Tomashevsky, Maxim ;
Edgar, Ron .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D760-D765
[2]   T-profiler: scoring the activity of predefined groups of genes using gene expression data [J].
Boorsma, A ;
Foat, BC ;
Vis, D ;
Klis, F ;
Bussemaker, HJ .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W592-W595
[3]   ArrayExpress - a public repository for microarray gene expression data at the EBI [J].
Brazma, A ;
Parkinson, H ;
Sarkans, U ;
Shojatalab, M ;
Vilo, J ;
Abeygunawardena, N ;
Holloway, E ;
Kapushesky, M ;
Kemmeren, P ;
Lara, GG ;
Oezcimen, A ;
Rocca-Serra, P ;
Sansone, SA .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :68-71
[4]   Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments [J].
Breitling, R ;
Armengaud, P ;
Amtmann, A ;
Herzyk, P .
FEBS LETTERS, 2004, 573 (1-3) :83-92
[5]   Integrative analysis of multiple gene expression profiles applied to liver cancer study [J].
Choi, JK ;
Choi, JY ;
Kim, DG ;
Choi, DW ;
Kim, BY ;
Lee, KH ;
Yeom, YI ;
Yoo, HS ;
Yoo, OJ ;
Kim, S .
FEBS LETTERS, 2004, 565 (1-3) :93-100
[6]   Combining multiple microarray studies and modeling interstudy variation [J].
Choi, Jung Kyoon ;
Yu, Ungsik ;
Kim, Sangsoo ;
Yoo, Ook Joon .
BIOINFORMATICS, 2003, 19 :i84-i90
[7]   Abrogation of T cell quiescence characterizes patients at high risk for multiple sclerosis after the initial neurological event [J].
Corvol, Jean-Christophe ;
Pelletier, Daniel ;
Henry, Roland G. ;
Caillier, Stacy J. ;
Wang, Joanne ;
Pappas, Derek ;
Casazza, Simona ;
Okuda, Darin T. ;
Hauser, Stephen L. ;
Oksenberg, Jorge R. ;
Baranzini, Sergio E. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (33) :11839-11844
[8]   Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data [J].
Dai, MH ;
Wang, PL ;
Boyd, AD ;
Kostov, G ;
Athey, B ;
Jones, EG ;
Bunney, WE ;
Myers, RM ;
Speed, TP ;
Akil, H ;
Watson, SJ ;
Meng, F .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :e175.1-e175.9
[9]  
DeConde R, 2006, STAT APPL GENET MOL, V5
[10]   DAVID: Database for annotation, visualization, and integrated discovery [J].
Dennis, G ;
Sherman, BT ;
Hosack, DA ;
Yang, J ;
Gao, W ;
Lane, HC ;
Lempicki, RA .
GENOME BIOLOGY, 2003, 4 (09)