Sparse Neural Additive Model: Interpretable Deep Learning with Feature Selection via Group Sparsity

被引:2
|
作者
Xu, Shiyun [1 ]
Bu, Zhiqi [1 ]
Chaudhari, Pratik [2 ]
Barnett, Ian J. [3 ]
机构
[1] Univ Penn, Dept Appl Math & Computat Sci, Philadelphia, PA 19104 USA
[2] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT III | 2023年 / 14171卷
基金
美国国家科学基金会;
关键词
Interpretability; Additive Models; Group LASSO; Feature Selection; VARIABLE SELECTION; LASSO; REGRESSION; SHRINKAGE;
D O I
10.1007/978-3-031-43418-1_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art accuracy among the large family of generalized additive models. In order to empower NAM with feature selection and improve the generalization, we propose the sparse neural additive models (SNAM) that employ the group sparsity regularization (e.g. Group LASSO), where each feature is learned by a sub-network whose trainable parameters are clustered as a group. We study the theoretical properties for SNAM with novel techniques to tackle the non-parametric truth, thus extending from classical sparse linear models such as the LASSO, which only works on the parametric truth. Specifically, we show that SNAM with subgradient and proximal gradient descents provably converges to zero training loss as t -> infinity, and that the estimation error of SNAM vanishes asymptotically as n -> infinity. We also prove that SNAM, similar to LASSO, can have exact support recovery, i.e. perfect feature selection, with appropriate regularization. Moreover, we show that the SNAM can generalize well and preserve the 'identifiability', recovering each feature's effect. We validate our theories via extensive experiments and further testify to the good accuracy and efficiency of SNAM (Appendix can be found at https://arxiv.org/abs/2202.12482.).
引用
收藏
页码:343 / 359
页数:17
相关论文
共 50 条
  • [1] Group Selection and Shrinkage: Structured Sparsity for Semiparametric Additive Models
    Thompson, Ryan
    Vahid, Farshid
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024, 33 (04) : 1286 - 1297
  • [2] Structured Sparsity of Convolutional Neural Networks via Nonconvex Sparse Group Regularization
    Bui, Kevin
    Park, Fredrick
    Zhang, Shuai
    Qi, Yingyong
    Xin, Jack
    FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2021, 6
  • [3] Deep Feature Selection using an Enhanced Sparse Group Lasso Algorithm
    Farokhmanesh, Fatemeh
    Sadeghi, Mohammad Taghi
    2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, : 1549 - 1552
  • [4] Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO
    Zhao, Lei
    Hu, Qinghua
    Wang, Wenwu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1936 - 1948
  • [5] Efficient nonconvex sparse group feature selection via continuous and discrete optimization
    Xiang, Shuo
    Shen, Xiaotong
    Ye, Jieping
    ARTIFICIAL INTELLIGENCE, 2015, 224 : 28 - 50
  • [6] Multi-label feature selection via feature manifold learning and sparsity regularization
    Cai, Zhiling
    Zhu, William
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (08) : 1321 - 1334
  • [7] Unsupervised feature selection via joint local learning and group sparse regression
    Yue WU
    Can WANG
    Yue-qing ZHANG
    Jia-jun BU
    FrontiersofInformationTechnology&ElectronicEngineering, 2019, 20 (04) : 538 - 553
  • [8] Unsupervised feature selection via joint local learning and group sparse regression
    Yue Wu
    Can Wang
    Yue-qing Zhang
    Jia-jun Bu
    Frontiers of Information Technology & Electronic Engineering, 2019, 20 : 538 - 553
  • [9] Unsupervised feature selection via joint local learning and group sparse regression
    Wu, Yue
    Wang, Can
    Zhang, Yue-qing
    Bu, Jia-jun
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2019, 20 (04) : 538 - 553
  • [10] Local adaptive learning for semi-supervised feature selection with group sparsity
    Zeng, ZhiQiang
    Wang, Xiaodong
    Yan, Fei
    Chen, Yuming
    KNOWLEDGE-BASED SYSTEMS, 2019, 181