Statistical mechanical analysis of sparse linear regression as a variable selection problem

被引:7
作者
Obuchi, Tomoyuki [1 ]
Nakanishi-Ohno, Yoshinori [2 ,3 ]
Okada, Masato [4 ]
Kabashima, Yoshiyuki [1 ]
机构
[1] Tokyo Inst Technol, Dept Math & Comp Sci, Meguro Ku, Ookayama 2-12-1, Tokyo 1528550, Japan
[2] Univ Tokyo, Grad Sch Arts & Sci, Meguro Ku, Komaba 3-8-1, Tokyo 1538902, Japan
[3] Japan Sci & Technol Agcy, Precursory Res Embryon Sci & Technol, Honcho 4-1-8, Kawaguchi, Saitama 3320012, Japan
[4] Univ Tokyo, Grad Sch Frontier Sci, Kashiwanoha 5-1-5, Kashiwa, Chiba 2778561, Japan
来源
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT | 2018年
关键词
cavity and replica method; energy landscapes; statistical inference; UNCERTAINTY PRINCIPLES; RECOVERY; CDMA; PERFORMANCE; RELAXATION; ALGORITHM;
D O I
10.1088/1742-5468/aae02c
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
An algorithmic limit of compressed sensing or related variable-selection problems is analytically evaluated when a design matrix is given by an overcomplete random matrix. The replica method from statistical mechanics is employed to derive the result. The analysis is conducted through evaluation of the entropy, an exponential rate of the number of combinations of variables giving a specific value of fit error to given data which is assumed to be generated from a linear process using the design matrix. This yields the typical achievable limit of the fit error when solving a representative to problem and includes the presence of unfavourable phase transitions preventing local search algorithms from reaching the minimum-error configuration. The associated phase diagrams are presented. A noteworthy outcome of the phase diagrams is that there exists a wide parameter region where any phase transition is absent from the high temperature to the lowest temperature at which the minimum-error configuration or the ground state is reached. This implies that certain local search algorithms can find the ground state with moderate computational costs in that region. Another noteworthy result is the presence of the random first-order transition in the strong noise case. The theoretical evaluation of the entropy is confirmed by extensive numerical methods using the exchange Monte Carlo and the multi-histogram methods. Another numerical test based on a metaheuristic optimisation algorithm called simulated annealing is conducted, which well supports the theoretical predictions on the local search algorithms. In the successful region with no phase transition, the computational cost of the simulated annealing to reach the ground state is estimated as the third order polynomial of the model dimensionality.
引用
收藏
页数:41
相关论文
共 50 条
  • [1] Exhaustive Search for Sparse Variable Selection in Linear Regression
    Igarashi, Yasuhiko
    Takenaka, Hikaru
    Nakanishi-Ohno, Yoshinori
    Uemura, Makoto
    Ikeda, Shiro
    Okada, Masato
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2018, 87 (04)
  • [2] VARIABLE SELECTION IN SPARSE REGRESSION WITH QUADRATIC MEASUREMENTS
    Fan, Jun
    Kong, Lingchen
    Wang, Liqun
    Xiu, Naihua
    STATISTICA SINICA, 2018, 28 (03) : 1157 - 1178
  • [3] SPARSE AND ROBUST LINEAR REGRESSION: AN OPTIMIZATION ALGORITHM AND ITS STATISTICAL PROPERTIES
    Katayama, Shota
    Fujisawa, Hironori
    STATISTICA SINICA, 2017, 27 (03) : 1243 - 1264
  • [4] Variable selection for both outcomes and predictors: sparse multivariate principal covariates regression
    Park, Soogeun
    Ceulemans, Eva
    Van Deun, Katrijn
    MACHINE LEARNING, 2024, 113 (10) : 7319 - 7370
  • [5] Sparse PCA from Sparse Linear Regression
    Bresler, Guy
    Park, Sung Min
    Persu, Madalina
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [6] Scaled sparse linear regression
    Sun, Tingni
    Zhang, Cun-Hui
    BIOMETRIKA, 2012, 99 (04) : 879 - 898
  • [7] Sparse Mixed Linear Regression with Guarantees: Taming an Intractable Problem with Invex Relaxation
    Barik, Adarsh
    Honorio, Jean
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] An Improved Forward Regression Variable Selection Algorithm for High-Dimensional Linear Regression Models
    Xie, Yanxi
    Li, Yuewen
    Xia, Zhijie
    Yan, Ruixia
    IEEE ACCESS, 2020, 8 (08): : 129032 - 129042
  • [9] Linear Regression Problem Relaxations Solved by Nonconvex ADMM With Convergence Analysis
    Zhang, Hengmin
    Gao, Junbin
    Qian, Jianjun
    Yang, Jian
    Xu, Chunyan
    Zhang, Bob
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 828 - 838
  • [10] Variable selection for mode regression
    Chen, Yingzhen
    Ma, Xuejun
    Zhou, Jingke
    JOURNAL OF APPLIED STATISTICS, 2018, 45 (06) : 1077 - 1084