Structured sparsity regularization for analyzing high-dimensional omics data

被引:37
作者
Vinga, Susana [1 ,2 ]
机构
[1] Univ Lisbon, Inst Super Tecn, INESC ID, Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, IDMEC, Lisbon, Portugal
关键词
GENERALIZED LINEAR-MODELS; ADAPTIVE ELASTIC-NET; VARIABLE SELECTION; LOGISTIC-REGRESSION; GENE SELECTION; COX REGRESSION; SURVIVAL ANALYSIS; LASSO; PREDICTION; CANCER;
D O I
10.1093/bib/bbaa122
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The development of new molecular and cell technologies is having a significant impact on the quantity of data generated nowadays. The growth of omics databases is creating a considerable potential for knowledge discovery and, concomitantly, is bringing new challenges to statistical learning and computational biology for health applications. Indeed, the high dimensionality of these data may hamper the use of traditional regression methods and parameter estimation algorithms due to the intrinsic non-identifiability of the inherent optimization problem. Regularized optimization has been rising as a promising and useful strategy to solve these ill-posed problems by imposing additional constraints in the solution parameter space. In particular, the field of statistical learning with sparsity has been significantly contributing to building accurate models that also bring interpretability to biological observations and phenomena. Beyond the now-classic elastic net, one of the best-known methods that combine lasso with ridge penalizations, we briefly overview recent literature on structured regularizers and penalty functions that have been applied in biomedical data to build parsimonious models in a variety of underlying contexts, from survival to generalized linear models. These methods include functions of l(k)-norms and network-based penalties that take into account the inherent relationships between the features. The successful application to omics data illustrates the potential of sparse structured regularization for identifying disease's molecular signatures and for creating high-performance clinical decision support systems towards more personalized healthcare. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
引用
收藏
页码:77 / 87
页数:11
相关论文
共 93 条
[1]   Gene selection for microarray gene expression classification using Bayesian Lasso quantile regression [J].
Algamal, Zakariya Yahya ;
Alhamzawi, Rahim ;
Ali, Haithem Taha Mohammad .
COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 97 :145-152
[2]   Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification [J].
Algamal, Zakariya Yahya ;
Lee, Muhammad Hisyam .
COMPUTERS IN BIOLOGY AND MEDICINE, 2015, 67 :136-145
[3]   The Bayesian adaptive lasso regression [J].
Alhamzawi, Rahim ;
Ali, Haithem Taha Mohammad .
MATHEMATICAL BIOSCIENCES, 2018, 303 :75-82
[4]   Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection [J].
Ang, Jun Chin ;
Mirzal, Andri ;
Haron, Habibollah ;
Hamed, Haza Nuzly Abdull .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :971-989
[5]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[6]   Structured Sparsity through Convex Optimization [J].
Bach, Francis ;
Jenatton, Rodolphe ;
Mairal, Julien ;
Obozinski, Guillaume .
STATISTICAL SCIENCE, 2012, 27 (04) :450-468
[7]   Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer [J].
Baker, Simon ;
Ali, Imran ;
Silins, Ilona ;
Pyysalo, Sampo ;
Guo, Yufan ;
Hogberg, Johan ;
Stenius, Ulla ;
Korhonen, Anna .
BIOINFORMATICS, 2017, 33 (24) :3973-3981
[8]   RWEN: response-weighted elastic net for prediction of chemosensitivity of cancer cell lines [J].
Basu, Amrita ;
Mitra, Ritwik ;
Liu, Han ;
Schreiber, Stuart L. ;
Clemons, Paul A. .
BIOINFORMATICS, 2018, 34 (19) :3332-3339
[9]   Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR [J].
Bondell, Howard D. ;
Reich, Brian J. .
BIOMETRICS, 2008, 64 (01) :115-123
[10]   Added predictive value of high-throughput molecular data to clinical data and its validation [J].
Boulesteix, Anne-Laure ;
Sauerbrei, Willi .
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (03) :215-229