Penalized partial least squares for pleiotropy

被引:3
作者
Broc, Camilo [1 ,2 ]
Truong, Therese [3 ,4 ]
Liquet, Benoit [2 ,5 ]
机构
[1] CEA, LIST, Lab Data Sci & Decis Digiteo, Gif Sur Yvette, France
[2] CNRS, Lab Math & Leurs Applicat PAU E2S UPPA, Pau, France
[3] Univ Paris Saclay, INSERM, UVSQ, CESP, F-94807 Villejuif, France
[4] Inst Gustave Roussy, F-94805 Villejuif, France
[5] Macquarie Univ, Dept Math & Stat, Sydney, NSW, Australia
关键词
Genetic epidemiology; High dimensional data; Lasso Penalization; Meta-analysis; Oncology; Partial Least Square; Pathway analysis; Pleiotropy; Sparse methods; Variable selection; COMPLEX TRAITS; ASSOCIATION; GENE; METAANALYSIS; CANCER; CHALLENGES; REGRESSION; SELECTION; VARIANTS; PLS;
D O I
10.1186/s12859-021-03968-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.
引用
收藏
页数:31
相关论文
共 54 条
[1]  
[Anonymous], 2005, Advances in Neural Information Processing Systems
[2]   The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans [J].
Ardlie, Kristin G. ;
DeLuca, David S. ;
Segre, Ayellet V. ;
Sullivan, Timothy J. ;
Young, Taylor R. ;
Gelfand, Ellen T. ;
Trowbridge, Casandra A. ;
Maller, Julian B. ;
Tukiainen, Taru ;
Lek, Monkol ;
Ward, Lucas D. ;
Kheradpour, Pouya ;
Iriarte, Benjamin ;
Meng, Yan ;
Palmer, Cameron D. ;
Esko, Tonu ;
Winckler, Wendy ;
Hirschhorn, Joel N. ;
Kellis, Manolis ;
MacArthur, Daniel G. ;
Getz, Gad ;
Shabalin, Andrey A. ;
Li, Gen ;
Zhou, Yi-Hui ;
Nobel, Andrew B. ;
Rusyn, Ivan ;
Wright, Fred A. ;
Lappalainen, Tuuli ;
Ferreira, Pedro G. ;
Ongen, Halit ;
Rivas, Manuel A. ;
Battle, Alexis ;
Mostafavi, Sara ;
Monlong, Jean ;
Sammeth, Michael ;
Mele, Marta ;
Reverter, Ferran ;
Goldmann, Jakob M. ;
Koller, Daphne ;
Guigo, Roderic ;
McCarthy, Mark I. ;
Dermitzakis, Emmanouil T. ;
Gamazon, Eric R. ;
Im, Hae Kyung ;
Konkashbaev, Anuar ;
Nicolae, Dan L. ;
Cox, Nancy J. ;
Flutre, Timothee ;
Wen, Xiaoquan ;
Stephens, Matthew .
SCIENCE, 2015, 348 (6235) :648-660
[3]   A Subset-Based Approach Improves Power and Interpretation for the Combined Analysis of Genetic Association Studies of Heterogeneous Traits [J].
Bhattacharjee, Samsiddhi ;
Rajaraman, Preetha ;
Jacobs, Kevin B. ;
Wheeler, William A. ;
Melin, Beatrice S. ;
Hartge, Patricia ;
Yeager, Meredith ;
Chung, Charles C. ;
Chanock, Stephen J. ;
Chatterjee, Nilanjan .
AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 90 (05) :821-835
[4]   Partial least squares: a versatile tool for the analysis of high-dimensional genomic data [J].
Boulesteix, Anne-Laure ;
Strimmer, Korbinian .
BRIEFINGS IN BIOINFORMATICS, 2007, 8 (01) :32-44
[5]   SPARSE PCA: OPTIMAL RATES AND ADAPTIVE ESTIMATION [J].
Cai, T. Tony ;
Ma, Zongming ;
Wu, Yihong .
ANNALS OF STATISTICS, 2013, 41 (06) :3074-3110
[6]   Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems [J].
Cao, Kim-Anh Le ;
Boitard, Simon ;
Besse, Philippe .
BMC BIOINFORMATICS, 2011, 12
[7]   Gene-based sequential burden association test [J].
Chen, Zhongxue ;
Wang, Kai .
STATISTICS IN MEDICINE, 2019, 38 (13) :2353-2363
[8]   A comparison of partial least squares (PLS) and sparse PLS regressions in genomic selection in French dairy cattle [J].
Colombani, C. ;
Croiseau, P. ;
Fritz, S. ;
Guillaume, F. ;
Legarra, A. ;
Ducrocq, V. ;
Robert-Granie, C. .
JOURNAL OF DAIRY SCIENCE, 2012, 95 (04) :2120-2131
[9]   Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish family-cancer database [J].
Czene, K ;
Lichtenstein, P ;
Hemminki, K .
INTERNATIONAL JOURNAL OF CANCER, 2002, 99 (02) :260-266
[10]   PLS for Big Data: A unified parallel algorithm for regularised group PLS [J].
de Micheaux, Pierre Lafaye ;
Liquet, Benoit ;
Sutton, Matthew .
STATISTICS SURVEYS, 2019, 13 :119-149