RLclean: An unsupervised integrated data cleaning framework based on deep reinforcement learning

被引:1
|
作者
Peng, Jinfeng [1 ]
Shen, Derong [1 ]
Nie, Tiezheng [1 ]
Kou, Yue [1 ]
机构
[1] Sch Northeastern Univ, Coll Comp Sci & Engn, Shenyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Error detection; Data repair; Deep reinforcement learning; ERROR-DETECTION; REPRESENTATION; ALGORITHM;
D O I
10.1016/j.ins.2024.121281
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data cleaning, a prerequisite to subsequent data analysis, has always been the focus of data science research. Datasets with errors can severely detract from the quality of downstream analytical results. Unfortunately, despite the proliferation of various data cleaning methods, it remains a time-consuming problem and frequently entails considerable labor expenses. In reality, errors are often heterogeneous and require different solutions. As a result, stand-alone methods often inadequate for addressing dirty data with multiple types of errors, while studies have demonstrated that combining such methods always require human intervention and the result remains unsatisfactory. In this paper, we propose an unsupervised integrated data cleaning framework, namely RLclean. Based on deep reinforcement learning, RLclean takes advantages of multiple data cleaning techniques, enabling it to effectively clean multiple types of errors and achieve satisfactory results. Additionally, it eliminates the need for costly human involvement, as the cleaning strategy is learned by data-driven, which further allows the framework to self-adapt to diverse domains. RLclean mainly consists of two parts: (i) an integrated error detection model that unites multiple techniques to detect different types of errors from multiple views; and (ii) an integrated data repair model that learns the optimal repair operations and repairs dirty data in an unsupervised manner. Extensive experiments on benchmark datasets have demonstrated the superiority of RLclean over state-of-the-art methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] A Deep Learning Framework for Smart Street Cleaning
    Balchandani, Chandni
    Hatwar, Rakshith Koravadi
    Makkar, Parteek
    Shah, Yanki
    Yelure, Pooja
    Eirinaki, Magdalini
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 112 - 117
  • [22] An Improved Reinforcement Learning Method Based on Unsupervised Learning
    Chang, Xin
    Li, Yanbin
    Zhang, Guanjie
    Liu, Donghui
    Fu, Changjun
    IEEE ACCESS, 2024, 12 : 12295 - 12307
  • [23] A coordinated scheduling optimization method for integrated energy systems with data centres based on deep reinforcement learning
    Sun, Yi
    Ding, Yiyuan
    Chen, Minghao
    Zhang, Xudong
    Tao, Peng
    Guo, Wei
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2024, 18 (19) : 3071 - 3084
  • [24] Deep Reinforcement Learning-Based Detection Framework for False Data Injection Attacks in Power Systems
    Prabhu, T. N.
    Ranjeethkumar, C.
    Mohankumar, B.
    Rajaram, A.
    INTERNATIONAL JOURNAL OF RENEWABLE ENERGY RESEARCH, 2024, 14 (02): : 311 - 323
  • [25] A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer
    Luo, Fuli
    Li, Peng
    Zhou, Jie
    Yang, Pengcheng
    Chang, Baobao
    Sun, Xu
    Sui, Zhifang
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5116 - 5122
  • [26] Two-Stage Unsupervised Hyperspectral Band Selection Based on Deep Reinforcement Learning
    Guo, Yi
    Wang, Qianqian
    Hu, Bingliang
    Qian, Xueming
    Ye, Haibo
    REMOTE SENSING, 2025, 17 (04)
  • [27] Combined data augmentation framework for generalizing deep reinforcement learning from pixels
    Xiong, Xi
    Shen, Chun
    Wu, Junhong
    Lu, Shuai
    Zhang, Xiaodan
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
  • [28] Deep Reinforcement Learning-Based Control Framework for Multilateral Telesurgery
    Bacha, Sarah Chams
    Bai, Weibang
    Wang, Ziwei
    Xiao, Bo
    Yeatman, Eric M.
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2022, 4 (02): : 352 - 355
  • [29] Optimal dispatch of integrated energy system based on deep reinforcement learning
    Zhou, Xiang
    Wang, Jiye
    Wang, Xinying
    Chen, Sheng
    ENERGY REPORTS, 2023, 9 : 373 - 378
  • [30] A Hierarchical SLAM Framework Based on Deep Reinforcement Learning for Active Exploration
    Xue, Yuntao
    Chen, Weisheng
    Zhang, Liangbin
    PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 957 - 966