Outlier detection in high-dimensional regression model

被引:16
作者
Wang, Tao [1 ,2 ]
Li, Zhonghua [3 ,4 ]
机构
[1] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[2] Kashgar Univ, Sch Math & Stat, Kashgar City, Peoples R China
[3] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[4] Nankai Univ, LPMC, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Bootstrap; Cook's distance; distance correlation; high-dimensional; leave-one-out; outlier detection; 62H20; 62J02; 62J05; 62J86; LINEAR-REGRESSION;
D O I
10.1080/03610926.2016.1140783
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.
引用
收藏
页码:6947 / 6958
页数:12
相关论文
共 50 条
  • [41] Support high-order tensor data description for outlier detection in high-dimensional big sensor data
    Deng, Xiaowu
    Jiang, Peng
    Peng, Xiaoning
    Mi, Chunqiao
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 81 : 177 - 187
  • [42] Robust Functional Regression for Outlier Detection
    Hullait, Harjit
    Leslie, David S.
    Pavlidis, Nicos G.
    King, Steve
    ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2019, 2020, 11986 : 3 - 13
  • [43] Variational Inference in high-dimensional linear regression
    Mukherjee, Sumit
    Sen, Subhabrata
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [44] Subgroup analysis for high-dimensional functional regression
    Zhang, Xiaochen
    Zhang, Qingzhao
    Ma, Shuangge
    Fang, Kuangnan
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 192
  • [45] HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS
    Wang, Peiyao
    Li, Quefeng
    Shen, Dinggan
    Liu, Yufeng
    STATISTICA SINICA, 2023, 33 (01) : 27 - 53
  • [46] Vibration-Based Outlier Detection on High Dimensional Data
    Xia, Shuyin
    Wang, Guoyin
    Yu, Hong
    Liu, Qun
    Wang, Jin
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2016, 25 (03)
  • [47] An Ensemble Outlier Detection Method Based on Information Entropy-Weighted Subspaces for High-Dimensional Data
    Li, Zihao
    Zhang, Liumei
    ENTROPY, 2023, 25 (08)
  • [48] High-dimensional regression adjustments in randomized experiments
    Wager, Stefan
    Du, Wenfei
    Taylor, Jonathan
    Tibshirani, Robert J.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (45) : 12673 - 12678
  • [49] High-dimensional consistencies of KOO methods in multivariate regression model and discriminant analysis
    Fujikoshi, Yasunori
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 188
  • [50] Asymptotic properties on high-dimensional multivariate regression M-estimation
    Ding, Hao
    Qin, Shanshan
    Wu, Yuehua
    Wu, Yaohua
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 183