A novel data repairing approach based on constraints and ensemble learning

被引:4
|
作者
Ataeyan, Mahdieh [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Data repairing; Noise detection; Functional dependency; Ensemble learning;
D O I
10.1016/j.eswa.2020.113511
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data repairing is an important task in data mining. This paper proposes a novel data repairing approach based on a combination of constraints and ensemble learning. At first, functional dependencies (FDs) are used as constraints to identify inconsistent records. For each FD, all repeated values in the correct records are discovered. After that, noisy attributes in erroneous records are detected using correct records and the repeated values. To correct the detected noises, a supervised ensemble learning model is constructed for each attribute. The ensemble model consists of a Bayes classifier, a decision tree, and a MultiLayer Perceptron (MLP). A majority of votes is used as the combination strategy in the ensemble learning model. The proposed approach automatically repairs data without any user interaction. Moreover, the proposed method can detect more than one noise in a record. Experimental results show that our approach outperforms similar repairing algorithms (HoloClean and KATARA) in both terms of precision and recall. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] A Novel Ensemble Machine Learning Approach for Bioarchaeological Sex Prediction
    Muzzall, Evan
    TECHNOLOGIES, 2021, 9 (02)
  • [32] A Novel Ensemble Learning Approach for Intelligent Logistics Demand Management
    Li, Boyang
    Yang, Yuhang
    Zhao, Ziyu
    Ni, Xin
    Zhang, Diyang
    JOURNAL OF INTERNET TECHNOLOGY, 2024, 25 (04): : 507 - 515
  • [33] A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
    Liu, Na
    Li, Xiaomei
    Qi, Ershi
    Xu, Man
    Li, Ling
    Gao, Bo
    IEEE ACCESS, 2020, 8 : 171263 - 171280
  • [34] Self-repairing infrared electronic nose based on ensemble learning and PCA fault diagnosis
    Wang, Jinlei
    Lei, Bingjie
    Yang, Zaiyun
    Lei, Shaochong
    INFRARED PHYSICS & TECHNOLOGY, 2022, 127
  • [35] A Recursive Ensemble Learning Approach With Noisy Labels or Unlabeled Data
    Wang, Yuchen
    Yang, Yang
    Liu, Yun-Xia
    Bharath, Anil Anthony
    IEEE ACCESS, 2019, 7 : 36459 - 36470
  • [36] A novel ensemble machine learning for robust microarray data classification
    Peng, Yonghong
    COMPUTERS IN BIOLOGY AND MEDICINE, 2006, 36 (06) : 553 - 573
  • [37] A novel intelligence approach based active and ensemble learning for agricultural soil organic carbon prediction using multispectral and SAR data fusion
    Thu Thuy Nguyen
    Tien Dat Pham
    Chi Trung Nguyen
    Delfos, Jacob
    Archibald, Robert
    Kinh Bac Dang
    Ngoc Bich Hoang
    Guo, Wenshan
    Huu Hao Ngo
    SCIENCE OF THE TOTAL ENVIRONMENT, 2022, 804
  • [38] A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection
    Hossain, Md. Alamgir
    Islam, Md. Saiful
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [39] A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection
    Md. Alamgir Hossain
    Md. Saiful Islam
    Scientific Reports, 13
  • [40] A Weighted Ensemble Learning Algorithm Based on Diversity Using a Novel Particle Swarm Optimization Approach
    You, Gui-Rong
    Shiue, Yeou-Ren
    Yeh, Wei-Chang
    Chen, Xi-Li
    Chen, Chih-Ming
    ALGORITHMS, 2020, 13 (10)