RLclean: An unsupervised integrated data cleaning framework based on deep reinforcement learning

被引:1
|
作者
Peng, Jinfeng [1 ]
Shen, Derong [1 ]
Nie, Tiezheng [1 ]
Kou, Yue [1 ]
机构
[1] Sch Northeastern Univ, Coll Comp Sci & Engn, Shenyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Error detection; Data repair; Deep reinforcement learning; ERROR-DETECTION; REPRESENTATION; ALGORITHM;
D O I
10.1016/j.ins.2024.121281
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data cleaning, a prerequisite to subsequent data analysis, has always been the focus of data science research. Datasets with errors can severely detract from the quality of downstream analytical results. Unfortunately, despite the proliferation of various data cleaning methods, it remains a time-consuming problem and frequently entails considerable labor expenses. In reality, errors are often heterogeneous and require different solutions. As a result, stand-alone methods often inadequate for addressing dirty data with multiple types of errors, while studies have demonstrated that combining such methods always require human intervention and the result remains unsatisfactory. In this paper, we propose an unsupervised integrated data cleaning framework, namely RLclean. Based on deep reinforcement learning, RLclean takes advantages of multiple data cleaning techniques, enabling it to effectively clean multiple types of errors and achieve satisfactory results. Additionally, it eliminates the need for costly human involvement, as the cleaning strategy is learned by data-driven, which further allows the framework to self-adapt to diverse domains. RLclean mainly consists of two parts: (i) an integrated error detection model that unites multiple techniques to detect different types of errors from multiple views; and (ii) an integrated data repair model that learns the optimal repair operations and repairs dirty data in an unsupervised manner. Extensive experiments on benchmark datasets have demonstrated the superiority of RLclean over state-of-the-art methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Deep-Reinforcement-Learning-Based IoT Sensor Data Cleaning Framework for Enhanced Data Analytics
    Mohammed, Alaelddin F. Y.
    Sultan, Salman Md
    Lee, Joohyung
    Lim, Sunhwan
    SENSORS, 2023, 23 (04)
  • [2] ImageDC: Image Data Cleaning Framework Based on Deep Learning
    Zhang, Yun
    Jin, Zongze
    Liu, Fan
    Zhu, Weilin
    Mu, Weimin
    Wang, Weiping
    PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS), 2020, : 748 - 752
  • [3] An Unsupervised Deep Learning Framework via Integrated Optimization of Representation Learning and GMM-Based Modeling
    Wang, Jinghua
    Jiang, Jianmin
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 249 - 265
  • [4] Unsupervised Video Summarization Based on Deep Reinforcement Learning with Interpolation
    Yoon, Ui Nyoung
    Hong, Myung Duk
    Jo, Geun-Sik
    SENSORS, 2023, 23 (07)
  • [5] DMM: A Deep Reinforcement Learning Based Map Matching Framework for Cellular Data
    Shen, Zhihao
    Yang, Kang
    Zhao, Xi
    Zou, Jianhua
    Du, Wan
    Wu, Junjie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (10) : 5120 - 5137
  • [6] Unsupervised Representation Learning in Deep Reinforcement Learning: A Review
    Botteghi, Nicolo
    Poel, Mannes
    Brune, Christoph
    IEEE CONTROL SYSTEMS MAGAZINE, 2025, 45 (02): : 26 - 68
  • [7] Deep reinforcement learning framework and algorithms integrated with cognitive behavior models
    Chen H.
    Li J.-X.
    Huang J.
    Wang C.
    Liu Q.
    Zhang Z.-J.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (11): : 3209 - 3218
  • [8] An Antenna Optimization Framework Based on Deep Reinforcement Learning
    Peng, Fengling
    Chen, Xing
    IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2024, 72 (10) : 7594 - 7605
  • [9] Unsupervised Paraphrasing via Deep Reinforcement Learning
    Siddique, A. B.
    Oymak, Samet
    Hristidis, Vagelis
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1800 - 1809
  • [10] Unsupervised Inverse Reinforcement Learning with Noisy Data
    Surana, Amit
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 4938 - 4945