Outlier detection in high-dimensional regression model

被引:16
作者
Wang, Tao [1 ,2 ]
Li, Zhonghua [3 ,4 ]
机构
[1] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[2] Kashgar Univ, Sch Math & Stat, Kashgar City, Peoples R China
[3] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[4] Nankai Univ, LPMC, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Bootstrap; Cook's distance; distance correlation; high-dimensional; leave-one-out; outlier detection; 62H20; 62J02; 62J05; 62J86; LINEAR-REGRESSION;
D O I
10.1080/03610926.2016.1140783
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.
引用
收藏
页码:6947 / 6958
页数:12
相关论文
共 50 条
  • [31] High-Dimensional Regression with Unknown Variance
    Giraud, Christophe
    Huet, Sylvie
    Verzelen, Nicolas
    STATISTICAL SCIENCE, 2012, 27 (04) : 500 - 518
  • [32] Research on outlier detection for high dimensional data stream
    Yu, Liping
    Li, Yunfei
    Jia, Juncheng
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND ENGINEERING APPLICATIONS, 2016, 63 : 395 - 398
  • [33] Outlier Detection in High Dimensional Data
    Kamalov, Firuz
    Leung, Ho Hon
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2020, 19 (01)
  • [34] VOA*: Fast Angle-Based Outlier Detection over High-Dimensional Data Streams
    Khalique, Vijdan
    Kitagawa, Hiroyuki
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT I, 2021, 12712 : 40 - 52
  • [35] Manifold-based denoising, outlier detection, and dimension reduction algorithm for high-dimensional data
    Zhao, Guanghua
    Yang, Tao
    Fu, Dongmei
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3923 - 3942
  • [36] CELOF: Effective and fast memory efficient local outlier detection in high-dimensional data streams
    Chen, Liang
    Wang, Wei
    Yang, Yun
    APPLIED SOFT COMPUTING, 2021, 102
  • [37] Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
    Anna Koufakou
    Jimmy Secretan
    Michael Georgiopoulos
    Knowledge and Information Systems, 2011, 29 : 697 - 725
  • [38] Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
    Koufakou, Anna
    Secretan, Jimmy
    Georgiopoulos, Michael
    KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 29 (03) : 697 - 725
  • [39] Improved estimators for semi-supervised high-dimensional regression model
    Livne, Ilan
    Azriel, David
    Goldberg, Yair
    ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (02): : 5437 - 5487
  • [40] Manifold-based denoising, outlier detection, and dimension reduction algorithm for high-dimensional data
    Guanghua Zhao
    Tao Yang
    Dongmei Fu
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3923 - 3942