RLclean: An unsupervised integrated data cleaning framework based on deep reinforcement learning

被引：1

作者：

Peng, Jinfeng ^{[1
]}

Shen, Derong ^{[1
]}

Nie, Tiezheng ^{[1
]}

Kou, Yue ^{[1
]}

机构：

[1] Sch Northeastern Univ, Coll Comp Sci & Engn, Shenyang, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 682卷

基金：

中国国家自然科学基金;

关键词：

Error detection; Data repair; Deep reinforcement learning; ERROR-DETECTION; REPRESENTATION; ALGORITHM;

D O I：

10.1016/j.ins.2024.121281

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data cleaning, a prerequisite to subsequent data analysis, has always been the focus of data science research. Datasets with errors can severely detract from the quality of downstream analytical results. Unfortunately, despite the proliferation of various data cleaning methods, it remains a time-consuming problem and frequently entails considerable labor expenses. In reality, errors are often heterogeneous and require different solutions. As a result, stand-alone methods often inadequate for addressing dirty data with multiple types of errors, while studies have demonstrated that combining such methods always require human intervention and the result remains unsatisfactory. In this paper, we propose an unsupervised integrated data cleaning framework, namely RLclean. Based on deep reinforcement learning, RLclean takes advantages of multiple data cleaning techniques, enabling it to effectively clean multiple types of errors and achieve satisfactory results. Additionally, it eliminates the need for costly human involvement, as the cleaning strategy is learned by data-driven, which further allows the framework to self-adapt to diverse domains. RLclean mainly consists of two parts: (i) an integrated error detection model that unites multiple techniques to detect different types of errors from multiple views; and (ii) an integrated data repair model that learns the optimal repair operations and repairs dirty data in an unsupervised manner. Extensive experiments on benchmark datasets have demonstrated the superiority of RLclean over state-of-the-art methods.

引用

页数：15

共 50 条

[21] A Deep Learning Framework for Smart Street Cleaning
Balchandani, Chandni
Hatwar, Rakshith Koravadi
Makkar, Parteek
Shah, Yanki
Yelure, Pooja
Eirinaki, Magdalini
2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 112 - 117
[22] An Improved Reinforcement Learning Method Based on Unsupervised Learning
Chang, Xin
Li, Yanbin
Zhang, Guanjie
Liu, Donghui
Fu, Changjun
IEEE ACCESS, 2024, 12 : 12295 - 12307
[23] A coordinated scheduling optimization method for integrated energy systems with data centres based on deep reinforcement learning
Sun, Yi
Ding, Yiyuan
Chen, Minghao
Zhang, Xudong
Tao, Peng
Guo, Wei
IET GENERATION TRANSMISSION & DISTRIBUTION, 2024, 18 (19) : 3071 - 3084
[24] Deep Reinforcement Learning-Based Detection Framework for False Data Injection Attacks in Power Systems
Prabhu, T. N.
Ranjeethkumar, C.
Mohankumar, B.
Rajaram, A.
INTERNATIONAL JOURNAL OF RENEWABLE ENERGY RESEARCH, 2024, 14 (02): : 311 - 323
[25] A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer
Luo, Fuli
Li, Peng
Zhou, Jie
Yang, Pengcheng
Chang, Baobao
Sun, Xu
Sui, Zhifang
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5116 - 5122
[26] Two-Stage Unsupervised Hyperspectral Band Selection Based on Deep Reinforcement Learning
Guo, Yi
Wang, Qianqian
Hu, Bingliang
Qian, Xueming
Ye, Haibo
REMOTE SENSING, 2025, 17 (04)
[27] Combined data augmentation framework for generalizing deep reinforcement learning from pixels
Xiong, Xi
Shen, Chun
Wu, Junhong
Lu, Shuai
Zhang, Xiaodan
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
[28] Deep Reinforcement Learning-Based Control Framework for Multilateral Telesurgery
Bacha, Sarah Chams
Bai, Weibang
Wang, Ziwei
Xiao, Bo
Yeatman, Eric M.
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2022, 4 (02): : 352 - 355
[29] Optimal dispatch of integrated energy system based on deep reinforcement learning
Zhou, Xiang
Wang, Jiye
Wang, Xinying
Chen, Sheng
ENERGY REPORTS, 2023, 9 : 373 - 378
[30] A Hierarchical SLAM Framework Based on Deep Reinforcement Learning for Active Exploration
Xue, Yuntao
Chen, Weisheng
Zhang, Liangbin
PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 957 - 966

← 1 2 3 4 5 →