Outlier detection in high-dimensional regression model

被引:16
作者
Wang, Tao [1 ,2 ]
Li, Zhonghua [3 ,4 ]
机构
[1] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[2] Kashgar Univ, Sch Math & Stat, Kashgar City, Peoples R China
[3] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[4] Nankai Univ, LPMC, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Bootstrap; Cook's distance; distance correlation; high-dimensional; leave-one-out; outlier detection; 62H20; 62J02; 62J05; 62J86; LINEAR-REGRESSION;
D O I
10.1080/03610926.2016.1140783
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.
引用
收藏
页码:6947 / 6958
页数:12
相关论文
共 50 条
  • [1] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461
  • [2] Sparse signal shrinkage and outlier detection in high-dimensional quantile regression with variational Bayes
    Lim, Daeyoung
    Park, Beomjo
    Nott, David
    Wang, Xueou
    Choi, Taeryon
    STATISTICS AND ITS INTERFACE, 2020, 13 (02) : 237 - 249
  • [3] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xu, Xiaodan
    Liu, Huawen
    Li, Li
    Yao, Minghai
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 652 - 662
  • [4] An effective and efficient algorithm for high-dimensional outlier detection
    Charu C. Aggarwal
    Philip S. Yu
    The VLDB Journal, 2005, 14 : 211 - 221
  • [5] A geometric framework for outlier detection in high-dimensional data
    Herrmann, Moritz
    Pfisterer, Florian
    Scheipl, Fabian
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)
  • [6] An effective and efficient algorithm for high-dimensional outlier detection
    Aggarwal, CC
    Yu, PS
    VLDB JOURNAL, 2005, 14 (02) : 211 - 221
  • [7] High-dimensional outlier detection using random projections
    P. Navarro-Esteban
    J. A. Cuesta-Albertos
    TEST, 2021, 30 : 908 - 934
  • [8] IPMOD: An efficient outlier detection model for high-dimensional medical data streams
    Yang, Yun
    Fan, ChongJun
    Chen, Liang
    Xiong, HongLin
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [9] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xiaodan Xu
    Huawen Liu
    Li Li
    Minghai Yao
    International Journal of Computational Intelligence Systems, 2018, 11 : 652 - 662
  • [10] Feature Extraction for Outlier Detection in High-Dimensional Spaces
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON FEATURE SELECTION IN DATA MINING, 2010, 10 : 66 - 75