Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data

被引:51
作者
Climente-Gonzalez, Hector [1 ,2 ,3 ,4 ]
Azencott, Chloe-Agathe [1 ,2 ,3 ]
Kaski, Samuel [5 ]
Yamada, Makoto [4 ,6 ]
机构
[1] PSL Res Univ, Inst Curie, F-75005 Paris, France
[2] INSERM, U900, F-75005 Paris, France
[3] PSL Res Univ, MINES ParisTech, CBIO Ctr Computat Biol, F-75006 Paris, France
[4] RIKEN AIP, Tokyo 1030027, Japan
[5] Aalto Univ, Dept Comp Sci, Espoo, Finland
[6] Kyoto Univ, Dept Intelligence Sci & Technol, Kyoto 6068501, Japan
基金
芬兰科学院; 欧盟地平线“2020”;
关键词
FEATURE-SELECTION; MUTUAL INFORMATION; EXPRESSION; DEPENDENCE; GENE;
D O I
10.1093/bioinformatics/btz333
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Finding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks. Results We compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons. Availability and implementation Block HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso). Supplementary information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:I427 / I435
页数:9
相关论文
共 30 条
[1]  
[Anonymous], MRMR
[2]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[3]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[4]   The properties of high-dimensional data spaces: implications for exploring gene and protein expression data [J].
Clarke, Robert ;
Ressom, Habtom W. ;
Wang, Antai ;
Xuan, Jianhua ;
Liu, Minetta C. ;
Gehan, Edmund A. ;
Wang, Yue .
NATURE REVIEWS CANCER, 2008, 8 (01) :37-49
[5]  
Cover Thomas M, 2006, Elements of information theory
[6]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[7]  
Fujishige S, 2005, ANN DISCR MATH, V58, P1
[8]  
Gentleman R, 2005, STAT BIOL HEALTH, P189
[9]  
Gretton A, 2005, LECT NOTES ARTIF INT, V3734, P63
[10]   A single-cell survey of the small intestinal epithelium [J].
Haber, Adam L. ;
Biton, Moshe ;
Rogel, Noga ;
Herbst, Rebecca H. ;
Shekhar, Karthik ;
Smillie, Christopher ;
Burgin, Grace ;
Delorey, Toni M. ;
Howitt, Michael R. ;
Katz, Yarden ;
Tirosh, Itay ;
Beyaz, Semir ;
Dionne, Danielle ;
Zhang, Mei ;
Raychowdhury, Raktima ;
Garrett, Wendy S. ;
Rozenblatt-Rosen, Orit ;
Shi, Hai Ning ;
Yilmaz, Omer ;
Xavier, Ramnik J. ;
Regev, Aviv .
NATURE, 2017, 551 (7680) :333-+