ILAEDA: An Imitation Learning Based Approach for Automatic Exploratory Data Analysis

被引:0
作者
Manatkar, Abhijit [1 ]
Patel, Devarsh [2 ]
Patel, Hima [3 ]
Manwani, Naresh [1 ]
机构
[1] Int Inst Informat Technol Hyderabad, Hyderabad, Telangana, India
[2] IISER Pune, Pune, Maharashtra, India
[3] IBM Res, Bangalore, Karnataka, India
来源
PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON AI-ML SYSTEMS 2024 | 2024年
关键词
Exploratory Data Analysis; Imitation Learning; Interestingness;
D O I
10.1145/3703412.3703430
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automating end-to-end Exploratory Data Analysis (AutoEDA) is a challenging open problem, often tackled through Reinforcement Learning (RL) by learning to predict a sequence of analysis operations (FILTER, GROUP, etc). Defining rewards for each operation is a challenging task and existing methods rely on various interestingness measures to craft reward functions to capture the importance of each operation. In this work, we argue that not all of the essential features of what makes an operation important can be accurately captured mathematically using rewards. We propose an AutoEDA model trained through imitation learning from expert EDA sessions, bypassing the need for manually defined interestingness measures. Our method, based on generative adversarial imitation learning (GAIL), generalizes well across datasets, even with limited expert data. We also introduce a novel approach for generating synthetic EDA demonstrations for training. Our method outperforms the existing state-of-the-art end-to-end EDA approach on benchmarks by up to 3x, showing strong performance and generalization while naturally capturing diverse interestingness measures in generated EDA sessions.
引用
收藏
页数:11
相关论文
共 37 条
[1]   A collaborative filtering approach for recommending OLAP sessions [J].
Aligon, Julien ;
Gallinucci, Enrico ;
Golfarelli, Matteo ;
Marcel, Patrick ;
Rizzi, Stefano .
DECISION SUPPORT SYSTEMS, 2015, 69 :20-30
[2]   Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning [J].
Bar El, Ori ;
Milo, Tova ;
Somech, Amit .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :1527-1537
[3]  
Bar El Ori, 2020, ATENA Basic Implementation
[4]  
Cao Yukun, 2022, IEEE ICDE 2022, P1720
[5]   Summarization - compressing data into an informative representation [J].
Chandola, Varun ;
Kumar, Vipin .
KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (03) :355-378
[6]  
Chanson Alexandre, 2022, EDBT
[7]  
Ding Yiming, 2019, NeurIPS, V32
[8]   YMALDB: exploring relational databases via result-driven recommendations [J].
Drosou, Marina ;
Pitoura, Evaggelia .
VLDB JOURNAL, 2013, 22 (06) :849-874
[9]   QueRIE: Collaborative Database Exploration [J].
Eirinaki, Magdalini ;
Abraham, Suju ;
Polyzotis, Neoklis ;
Shaikh, Naushin .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) :1778-1790
[10]  
Garg D, 2022, Arxiv, DOI arXiv:2106.12142