Sparse regression for extreme values

被引:3
作者
Chang, Andersen [1 ]
Wang, Minjie [1 ]
Allen, Genevera, I [1 ,2 ,3 ,4 ,5 ]
机构
[1] Rice Univ, Dept Stat, Houston, TX 77251 USA
[2] Rice Univ, Dept Elect & Comp Engn, Houston, TX 77251 USA
[3] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA
[4] Baylor Coll Med, Dept Pediat Neurol, Houston, TX 77030 USA
[5] Texas Childrens Hosp, Jan & Dan Duncan Neurol Res Inst, Houston, TX 77030 USA
来源
ELECTRONIC JOURNAL OF STATISTICS | 2021年 / 15卷 / 02期
关键词
Linear regression; sparse modeling; extreme values; Subbotin distribution; generalized normal distribution; VARIABLE SELECTION; ROBUST REGRESSION; CONSISTENCY; INFERENCE; MODEL;
D O I
10.1214/21-EJS1937
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We study the problem of selecting features associated with extreme values in high dimensional linear regression. Normally, in linear modeling problems, the presence of abnormal extreme values or outliers is considered an anomaly which should either be removed from the data or remedied using robust regression methods. In many situations, however, the extreme values in regression modeling are not outliers but rather the signals of interest; consider traces from spiking neurons, volatility in finance, or extreme events in climate science, for example. In this paper, we propose a new method for sparse high-dimensional linear regression for extreme values which is motivated by the Subbotin, or generalized normal distribution, which we call the extreme value linear regression model. For our method, we utilize an l(p) norm loss where p is an even integer greater than two; we demonstrate that this loss increases the weight on extreme values. We prove consistency and variable selection consistency for the extreme value linear regression with a Lasso penalty, which we term the Extreme Lasso, and we also analyze the theoretical impact of extreme value observations on the model parameter estimates using the concept of influence functions. Through simulation studies and a real-world data example, we show that the Extreme Lasso outperforms other methods currently used in the literature for selecting features of interest associated with extreme values in high-dimensional regression.
引用
收藏
页码:5995 / 6035
页数:41
相关论文
共 45 条
  • [1] [Anonymous], 2016, Concentration Inequalities: A Nonasymptotic Theory of Independence, DOI DOI 10.1093/ACPROF:OSO/9780199535255.001.0001
  • [2] The generalized extreme value distribution
    Bali, TG
    [J]. ECONOMICS LETTERS, 2003, 79 (03) : 423 - 427
  • [3] Bangare S.L., 2015, International Journal of Applied Engineering Research, V10, P21777
  • [4] l1-PENALIZED QUANTILE REGRESSION IN HIGH-DIMENSIONAL SPARSE MODELS
    Belloni, Alexandre
    Chernozhukov, Victor
    [J]. ANNALS OF STATISTICS, 2011, 39 (01) : 82 - 130
  • [5] Modelling small and medium enterprise loan defaults as rare events: the generalized extreme value regression model
    Calabrese, Raffaella
    Osmetti, Silvia Angela
    [J]. JOURNAL OF APPLIED STATISTICS, 2013, 40 (06) : 1172 - 1188
  • [6] Within group variable selection through the Exclusive Lasso
    Campbell, Frederick
    Allen, Genevera I.
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 4220 - 4257
  • [7] Practical selection of SVM parameters and noise estimation for SVM regression
    Cherkassky, V
    Ma, YQ
    [J]. NEURAL NETWORKS, 2004, 17 (01) : 113 - 126
  • [8] Friedman J., 2010, Applications of the lasso and grouped lasso to the estimation of sparse graphical models
  • [9] The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2)
    Gelaro, Ronald
    McCarty, Will
    Suarez, Max J.
    Todling, Ricardo
    Molod, Andrea
    Takacs, Lawrence
    Randles, Cynthia A.
    Darmenov, Anton
    Bosilovich, Michael G.
    Reichle, Rolf
    Wargan, Krzysztof
    Coy, Lawrence
    Cullather, Richard
    Draper, Clara
    Akella, Santha
    Buchard, Virginie
    Conaty, Austin
    da Silva, Arlindo M.
    Gu, Wei
    Kim, Gi-Kong
    Koster, Randal
    Lucchesi, Robert
    Merkova, Dagmar
    Nielsen, Jon Eric
    Partyka, Gary
    Pawson, Steven
    Putman, William
    Rienecker, Michele
    Schubert, Siegfried D.
    Sienkiewicz, Meta
    Zhao, Bin
    [J]. JOURNAL OF CLIMATE, 2017, 30 (14) : 5419 - 5454
  • [10] Hampel C. A., 1968, ENCY CHEM ELEMENTS