A label noise filtering method for regression based on adaptive threshold and noise score

被引:9
作者
Li, Chuang [1 ]
Mao, Zhizhong [1 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Peoples R China
关键词
Noise filter; Real-valued label noise; Adaptive noise determination; Noise score; Ensemble filtering; Iterative filtering; CLASSIFICATION; PERFORMANCE; SELECTION; PREDICTION; RANKING; FUSION; TESTS; SET;
D O I
10.1016/j.eswa.2023.120422
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The quality of training data plays a decisive role in the establishment of intelligent models. Since raw data obtained from the real world are usually entwined with noise due to variety of causes, noise filtering has become an important aspect of machine learning techniques. In contrast with the extensive research conducted on noise elimination for classification purposes, papers addressing this problem for regression tasks are rather scarce. In this paper, we propose a novel noise filter to clean noisy instances with real-valued label noise. Aiming at the deficiency of the existing noise determination criterion, a new adaptive threshold-based method is first proposed. It allows a noisy instance to be adaptively defined according to the fitting difficulty levels of different datasets, and areas with different densities. Embedded with this criterion, an effective noise filtering procedure is also designed. An ensemble filtering scheme and an iterative filtering process are combined to detect as many po-tential noisy samples as possible from the original training set. According to the acquire noise detection infor-mation, a noise score for evaluating the noise level is specifically developed. The potential noisy samples whose scores exceed a reasonable threshold are further filtered, which can compensate for the possible errors incurred during the previous procedure, and contribute to more reliable filtering results. The validity of the proposed method is studied in exhaustive experiments. We discuss reasonable hyperparameters, and compare the devel-oped method with several state-of-the-art noise filters. The outcomes show that the prediction accuracy of the utilized regressor can greatly benefit from preprocessing the given raw dataset by using our method. Simulta-neously, the method is able to acquire a good balance between the elimination of noisy samples and the retention of clean samples, and consistently achieves a better noise filtering performance.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] A real-valued label noise cleaning method based on ensemble iterative filtering with noise score
    Li, Chuang
    Mao, Zhizhong
    Jia, Mingxing
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (09) : 4093 - 4118
  • [2] A Label Noise Filtering Method Based on Relative Outlier Factor
    Hou S.-Y.
    Jiang G.-X.
    Wang W.-J.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (01): : 154 - 168
  • [3] Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric
    Feng, Wei
    Quan, Yinghui
    Dauphin, Gabriel
    SENSORS, 2020, 20 (23) : 1 - 16
  • [4] A robust adaptive linear regression method for severe noise
    Guo, Yaqing
    Wang, Wenjian
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (11) : 4613 - 4653
  • [5] A label noise filtering and label missing supplement framework based on game theory
    Liu, Yuwen
    Yao, Rongju
    Jia, Song
    Wang, Fan
    Wang, Ruili
    Ma, Rui
    Qi, Lianyong
    DIGITAL COMMUNICATIONS AND NETWORKS, 2023, 9 (04) : 887 - 895
  • [6] Enhanced Label Noise Filtering with Multiple Voting
    Guan, Donghai
    Hussain, Maqbool
    Yuan, Weiwei
    Khattak, Asad Masood
    Fahim, Muhammad
    Khan, Wajahat Ali
    APPLIED SCIENCES-BASEL, 2019, 9 (23):
  • [7] Cluster Validation Measures for Label Noise Filtering
    Boeva, Veselka
    Lundberg, Lars
    Angelova, Milena
    Kohstall, Jan
    2018 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS (IS), 2018, : 109 - 116
  • [8] Label noise filtering techniques to improve monotonic classification
    Cano, Jose-Ramon
    Luengo, Julian
    Garcia, Salvador
    NEUROCOMPUTING, 2019, 353 : 83 - 95
  • [9] Improving Label Noise Filtering by Exploiting Unlabeled Data
    Guan, Donghai
    Wei, Hongqiang
    Yuan, Weiwei
    Han, Guangjie
    Tian, Yuan
    Al-Dhelaan, Mohanmmed
    Al-Dhelaan, Abdullah
    IEEE ACCESS, 2018, 6 : 11154 - 11165
  • [10] KSIPF: an effective noise filtering oversampling method based on k-means and iterative-partitioning filter
    Sun, Pengfei
    Wang, Zhiping
    Jia, Liyan
    Wang, Xiaoxi
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (04)