Integrative analysis and variable selection with multiple high-dimensional data sets

被引:43
作者
Ma, Shuangge [1 ]
Huang, Jian [2 ,3 ]
Song, Xiao [4 ]
机构
[1] Yale Univ, Sch Publ Hlth, New Haven, CT 06520 USA
[2] Univ Iowa, Dept Stat & Actuarial Sci, Iowa City, IA 52242 USA
[3] Univ Iowa, Dept Biostat, Iowa City, IA 52242 USA
[4] Univ Georgia, Coll Publ Hlth, Dept Epidemiol & Biostat, Paul Coverdell Ctr, Athens, GA 30602 USA
基金
美国国家卫生研究院;
关键词
High-dimensional data; Integrative analysis; 2-norm group bridge; PANCREATIC-CANCER; GENE-EXPRESSION; LOGISTIC-REGRESSION; MICROARRAY DATA; METAANALYSIS; LASSO;
D O I
10.1093/biostatistics/kxr004
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In high-throughput -omics studies, markers identified from analysis of single data sets often suffer from a lack of reproducibility because of sample limitation. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple -omics data sets is challenging because of the high dimensionality of data and heterogeneity among studies. In this article, for marker selection in integrative analysis of data from multiple heterogeneous studies, we propose a 2-norm group bridge penalization approach. This approach can effectively identify markers with consistent effects across multiple studies and accommodate the heterogeneity among studies. We propose an efficient computational algorithm and establish the asymptotic consistency property. Simulations and applications in cancer profiling studies show satisfactory performance of the proposed approach.
引用
收藏
页码:763 / 775
页数:13
相关论文
共 18 条
[1]  
[Anonymous], METAANALYSIS COMBINI
[2]   A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments [J].
Choi, Hyungwon ;
Shen, Ronglai ;
Chinnaiyan, Arul M. ;
Ghosh, Debashis .
BMC BIOINFORMATICS, 2007, 8 (1)
[3]   Integrative analysis of multiple gene expression profiles applied to liver cancer study [J].
Choi, JK ;
Choi, JY ;
Kim, DG ;
Choi, DW ;
Kim, BY ;
Lee, KH ;
Yeom, YI ;
Yoo, HS ;
Yoo, OJ ;
Kim, S .
FEBS LETTERS, 2004, 565 (1-3) :93-100
[4]   Molecular alterations in pancreatic carcinoma: expression profiling shows that dysregulated expression of S100 genes is highly prevalent [J].
Crnogorac-Jurcevic, T ;
Missiaglia, E ;
Blaveri, E ;
Gangeswaran, R ;
Jones, M ;
Terris, B ;
Costello, F ;
Neoptolemos, JP ;
Lemoine, NR .
JOURNAL OF PATHOLOGY, 2003, 201 (01) :63-74
[5]   Microarray-based identification of differentially expressed growth- and metastasis-associated genes in pancreatic cancer [J].
Friess, H ;
Ding, J ;
Kleeff, J ;
Fenkell, L ;
Rosinski, JA ;
Guweidhi, A ;
Reidhaar-Olson, JF ;
Korc, M ;
Hammer, J ;
Büchler, MW .
CELLULAR AND MOLECULAR LIFE SCIENCES, 2003, 60 (06) :1180-1199
[6]   Meta-analysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes [J].
Grützmann, R ;
Boriss, H ;
Ammerpohl, O ;
Lüttges, J ;
Kalthoff, H ;
Schackert, HK ;
Klöppel, G ;
Saeger, HD ;
Pilarsky, C .
ONCOGENE, 2005, 24 (32) :5079-5088
[7]   A group bridge approach for variable selection [J].
Huang, Jian ;
Ma, Shuange ;
Xie, Huiliang ;
Zhang, Cun-Hui .
BIOMETRIKA, 2009, 96 (02) :339-355
[8]  
Iacobuzio-Donahue CA, 2003, CANCER RES, V63, P8614
[9]  
KNUDSEN S, 2005, CANC DIAGNOSTICS DNA
[10]  
Logsdon CD, 2003, CANCER RES, V63, P2649