DEEP NEURAL NETWORKS FOR NONPARAMETRIC INTERACTION MODELS WITH DIVERGING DIMENSION

被引:1
作者
Bhattacharya, Sohom [1 ]
Fan, Jianqing [2 ]
Mukherjee, Debarghya [3 ]
机构
[1] Univ Florida, Dept Stat, Gainesville, FL 32611 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA
[3] Boston Univ, Dept Math & Stat, Boston, MA USA
关键词
Deep neural networks; high-dimensional statistics; nonparametric interaction model; minimax rate; sparse nonparametric components; MINIMAX-OPTIMAL RATES; OPTIMAL APPROXIMATION; ADDITIVE REGRESSION; LEAST-SQUARES; CONVERGENCE; BOUNDS; SELECTION; SMOOTH; LASSO;
D O I
10.1214/24-AOS2442
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Deep neural networks have achieved tremendous success due to their representation power and adaptation to low-dimensional structures. Their potential for estimating structured regression functions has been recently established in the literature. However, most of the studies require the input dimension to be fixed, and consequently, they ignore the effect of dimension on the rate of convergence and hamper their applications to modern big data with high dimensionality. In this paper, we bridge this gap by analyzing a k-way nonparametric interaction model in both growing dimension scenarios (d grows with n but at a slower rate) and in high dimension (d greater than or similar to n). In the latter case, sparsity assumptions and associated regularization are required to obtain optimal convergence rates. A new challenge in diverging dimension setting is in calculation mean-square error; the covariance terms among estimated additive components are an order of magnitude larger than those of the variances and can deteriorate statistical properties without proper care. We introduce a critical debiasing technique to amend the problem. We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of (n, d). Our proof techniques rely crucially on a novel debiasing technique that makes the covariances of additive components negligible in the mean-square error calculation. In addition, we establish the matching lower bounds.
引用
收藏
页码:2738 / 2766
页数:29
相关论文
共 57 条
[1]  
[Anonymous], 2014, Conference on Learning Theory
[2]  
Anthony M., 1999, Neural Network Learning: Theoretical Foundations, DOI 10.1017/CBO9780511624216
[3]   Regularization of wavelet approximations - Rejoinder [J].
Antoniadis, A ;
Fan, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (455) :964-967
[4]   UNIVERSAL APPROXIMATION BOUNDS FOR SUPERPOSITIONS OF A SIGMOIDAL FUNCTION [J].
BARRON, AR .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (03) :930-945
[5]  
Bartlett PL, 2019, J MACH LEARN RES, V20, P1
[6]   ON DEEP LEARNING AS A REMEDY FOR THE CURSE OF DIMENSIONALITY IN NONPARAMETRIC REGRESSION [J].
Bauer, Benedikt ;
Kohler, Michael .
ANNALS OF STATISTICS, 2019, 47 (04) :2261-2285
[7]   Least squares after model selection in high-dimensional sparse models [J].
Belloni, Alexandre ;
Chernozhukov, Victor .
BERNOULLI, 2013, 19 (02) :521-547
[8]   DEEP NEURAL NETWORKS FOR NONPARAMETRIC INTERACTION MODELS WITH DIVERGING DIMENSION [J].
Bhattacharya, Sohom ;
Fan, Jianqing ;
Mukherjee, Debarghya .
ANNALS OF STATISTICS, 2024, 52 (06) :2738-2766
[9]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[10]   A LASSO FOR HIERARCHICAL INTERACTIONS [J].
Bien, Jacob ;
Taylor, Jonathan ;
Tibshirani, Robert .
ANNALS OF STATISTICS, 2013, 41 (03) :1111-1141