A sparse approach for high-dimensional data with heavy-tailed noise

被引：3

作者：

Ye, Yafen ^{[1
]}

Shao, Yuanhai ^{[2
]}

Li, Chunna ^{[2
]}

机构：

[1] Zhejiang Univ Technol, Sch Econ, Hangzhou, Peoples R China

[2] Hainan Univ, Management Sch, Haikou, Hainan, Peoples R China

来源：

ECONOMIC RESEARCH-EKONOMSKA ISTRAZIVANJA | 2022年 / 35卷 / 01期

关键词：

High-dimensional data; heavy-tailed noise; L-p-norm support vector quantile regression; variable selection; SUPPORT VECTOR REGRESSION; ORDINARY LEAST-SQUARES; VARIABLE SELECTION; QUANTILE REGRESSION; SHRINKAGE; ALGORITHM; LASSO;

D O I：

10.1080/1331677X.2021.1978306

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

High-dimensional data have commonly emerged in diverse fields, such as economics, finance, genetics, medicine, machine learning, and so on. In this paper, we consider the sparse quantile regression problem of high-dimensional data with heavy-tailed noise, especially when the number of regressors is much larger than the sample size. We bring the spirit of L-p-norm support vector regression into quantile regression and propose a robust L-p-norm support vector quantile regression for high-dimensional data with heavy-tailed noise. The proposed method achieves robustness against heavy-tailed noise due to its use of the pinball loss function. Furthermore, L-p-norm support vector quantile regression ensures that the most representative variables are selected automatically by using a sparse parameter. We use a simulation study to test the variable selection performance of L-p-norm support vector quantile regression, where the number of explanatory variables greatly exceeds the sample size. The simulation study confirms that L-p-norm support vector quantile regression is not only robust against heavy-tailed noise but also selects representative variables. We further apply the proposed method to solve the variable selection problem of index construction, which also confirms the robustness and sparseness of L-p-norm support vector quantile regression.

引用

页码：2764 / 2780

页数：17

共 50 条

[1] A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification [J].

Algamal, Zakariya Yahya ;

Lee, Muhammad Hisyam .

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (03) :753-771

[2] A new asymmetric ε-insensitive pinball loss function based support vector quantile regression model [J].

Anand, Pritam ;

Rastogi, Reshma ;

Chandra, Suresh .

APPLIED SOFT COMPUTING, 2020, 94

[3] l1-PENALIZED QUANTILE REGRESSION IN HIGH-DIMENSIONAL SPARSE MODELS [J].

Belloni, Alexandre ;

Chernozhukov, Victor .

ANNALS OF STATISTICS, 2011, 39 (01) :82-130

[4] OPTIMALITY OF THE LEAST-SQUARES ESTIMATOR [J].

BERK, R ;

HWANG, JT .

JOURNAL OF MULTIVARIATE ANALYSIS, 1989, 30 (02) :245-254

[5] BEST SUBSET SELECTION VIA A MODERN OPTIMIZATION LENS [J].

Bertsimas, Dimitris ;

King, Angela ;

Mazumder, Rahul .

ANNALS OF STATISTICS, 2016, 44 (02) :813-852

[6] Debt capacity, debt choice, and underinvestment problem: Evidence from China [J].

Bhat, Kalim-Ullah ;

Chen, Shihua ;

Chen, Yan ;

Jebran, Khalil .

ECONOMIC RESEARCH-EKONOMSKA ISTRAZIVANJA, 2020, 33 (01) :267-287

[7]

Chen X, 2020, J MACH LEARN RES, V21

[8] The properties of high-dimensional data spaces: implications for exploring gene and protein expression data [J].

Clarke, Robert ;

Ressom, Habtom W. ;

Wang, Antai ;

Xuan, Jianhua ;

Liu, Minetta C. ;

Gehan, Edmund A. ;

Wang, Yue .

NATURE REVIEWS CANCER, 2008, 8 (01) :37-49

[9] SIMULATION STUDY OF ALTERNATIVES TO ORDINARY LEAST-SQUARES [J].

DEMPSTER, AP ;

SCHATZOFF, M ;

WERMUTH, N .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1977, 72 (357) :77-93

[10] The joint lasso: high-dimensional regression for group structured data [J].

Dondelinger, Frank ;

Mukherjee, Sach .

BIOSTATISTICS, 2020, 21 (02) :219-235

← 1 2 3 4 5 →