Ensembled sparse-input hierarchical networks for high-dimensional datasets

被引：4

作者：

Feng, Jean ^{[1
]}

Simon, Noah ^{[2
]}

机构：

[1] Univ Calif San Francisco, Dept Epidemiol & Biostat, San Francisco, CA 94143 USA

[2] Univ Washington, Dept Biostat, Seattle, WA 98195 USA

来源：

STATISTICAL ANALYSIS AND DATA MINING | 2022年 / 15卷 / 06期

关键词：

Bayesian model averaging; deep learning; grouping effect; lasso; network pruning; neural networks; VARIABLE SELECTION; REGULARIZATION;

D O I：

10.1002/sam.11579

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In high-dimensional datasets where the number of covariates far exceeds the number of observations, the most popular prediction methods make strong modeling assumptions. Unfortunately, these methods struggle to scale up in model complexity as the number of observations grows. To this end, we consider using neural networks because they span a wide range of model capacities, from sparse linear models to deep neural networks. Because neural networks are notoriously tedious to tune and train, our aim is to develop a convenient procedure that employs a minimal number of hyperparameters. Our method, Ensemble by Averaging Sparse-Input hiERarchical networks (EASIER-net), employs only two L-1-penalty parameters, one that controls the input sparsity and another for the number of hidden layers and nodes. EASIER-net selects the true support with high probability when there is sufficient evidence; otherwise, it performs variable selection with uncertainty quantification, where strongly correlated covariates are selected at similar rates. On a large collection of gene expression datasets, EASIER-net achieved higher classification accuracy and selected fewer genes than existing methods. We found that EASIER-net adaptively selected the model complexity: it fit deep networks when there was sufficient information to learn nonlinearities and interactions and fit sparse logistic models for smaller datasets with less information.

引用

页码：736 / 750

页数：15

共 45 条

[1] RANDOMIZE-THEN-OPTIMIZE: A METHOD FOR SAMPLING FROM POSTERIOR DISTRIBUTIONS IN NONLINEAR INVERSE PROBLEMS [J].

Bardsley, Johnathan M. ;

Solonen, Antti ;

Haario, Heikki ;

Laine, Marko .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (04) :A1895-A1910

[2]

Barron A.R., 2019, COMPLEXITY STAT RISK

[3]

Bengio Yoshua, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P437, DOI 10.1007/978-3-642-35289-8_26

[4]

Bergstra J, 2012, J MACH LEARN RES, V13, P281

[5]

Blalock D., 2020, P MACHINE LEARNING S, V2, P129

[6] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[7]

Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9

[8] CuMiDa: An Extensively Curated Microarray Database for Benchmarking and Testing of Machine Learning Approaches in Cancer Research [J].

Feltes, Bruno Cesar ;

Chandelier, Eduardo Bassani ;

Grisci, Bruno Iochins ;

Dorn, Marcio .

JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (04) :376-386

[9]

Feng J., 2019, SPARSE INPUT NEURAL

[10] Gradient-based Regularization Parameter Selection for Problems With Nonsmooth Penalty Functions [J].

Feng, Jean ;

Simon, Noah .

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2018, 27 (02) :426-435

← 1 2 3 4 5 →