Flexible, non-parametric modeling using regularized neural networks

被引：0

作者：

Oskar Allerbo

Rebecka Jörnsten

机构：

[1] University of Gothenburg and Chalmers University of Technology,Mathematical Sciences

来源：

Computational Statistics | 2022年 / 37卷

关键词：

Additive models; Model selection; Non-parametric regression; Neural networks; Regularization; Adaptive lasso;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Non-parametric, additive models are able to capture complex data dependencies in a flexible, yet interpretable way. However, choosing the format of the additive components often requires non-trivial data exploration. Here, as an alternative, we propose PrAda-net, a one-hidden-layer neural network, trained with proximal gradient descent and adaptive lasso. PrAda-net automatically adjusts the size and architecture of the neural network to reflect the complexity and structure of the data. The compact network obtained by PrAda-net can be translated to additive model components, making it suitable for non-parametric statistical modelling with automatic model selection. We demonstrate PrAda-net on simulated data, where we compare the test error performance, variable importance and variable subset identification properties of PrAda-net to other lasso-based regularization approaches for neural networks. We also apply PrAda-net to the massive U.K. black smoke data set, to demonstrate how PrAda-net can be used to model complex and heterogeneous data with spatial and temporal components. In contrast to classical, statistical non-parametric approaches, PrAda-net requires no preliminary modeling to select the functional forms of the additive components, yet still results in an interpretable model representation.

引用

页码：2029 / 2047

页数：18

共 37 条

[1]

Cybenko G(1989)Approximation by superpositions of a sigmoidal function Math Control Signals Syst 2 303-314

[2]

Friedman J(2010)Regularization paths for generalized linear models via coordinate descent J Stat Softw 33 1-823

[3]

Hastie T(1981)Projection pursuit regression J Am Stat Assoc 76 817-6177

[4]

Tibshirani R(2018)Adaptive lasso echo state network based on modified bayesian information criterion for nonlinear system modeling Neural Comput Appl 31 6163-898

[5]

Friedman JH(1976)Monotone operators and the proximal point algorithm SIAM J Control Optim 14 877-89

[6]

Stuetzle W(2017)Group sparse regularization for deep neural networks Neurocomputing 241 81-1396

[7]

Qiao J(2017)Design and application of a variable selection method for multilayer perceptron neural network with lasso IEEE Trans Neural Netw Learn Syst 28 1386-288

[8]

Wang L(1996)Regression shrinkage and selection via the lasso J R Stat Soc Ser B (Methodol) 58 267-2024

[9]

Yang C(2018)A novel pruning algorithm for smoothing feedforward neural networks based on group lasso method IEEE Trans Neural Netw Learn Syst 29 2012-1210

[10]

Rockafellar RT(2017)Generalized additive models for gigadata: modeling the uk black smoke network daily data J Am Stat Assoc 112 1199-2183

← 1 2 3 4 →