Automating Exploratory Data Analysis via Machine Learning: An Overview

被引:41
作者
Milo, Tova [1 ]
Somech, Amit [1 ]
机构
[1] Tel Aviv Univ, Tel Aviv, Israel
来源
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2020年
关键词
DATABASES;
D O I
10.1145/3318464.3383126
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Exploratory Data Analysis (EDA) is an important initial step for any knowledge discovery process, in which data scientists interactively explore unfamiliar datasets by issuing a sequence of analysis operations (e.g. filter, aggregation, and visualization). Since EDA is long known as a difficult task, requiring profound analytical skills, experience, and domain knowledge, a plethora of systems have been devised over the last decade in order to facilitate EDA. In particular, advancements in machine learning research have created exciting opportunities, not only for better facilitating EDA, but to fully automate the process. In this tutorial, we review recent lines of work for automating EDA. Starting from recommender systems for suggesting a single exploratory action, going through kNN-based classifiers and active-learning methods for predicting users' interestingness preferences, and finally to fully automating EDA using state-of-the-art methods such as deep reinforcement learning and sequence-to-sequence models. We conclude the tutorial with a discussion on the main challenges and open questions to be dealt with in order to ultimately reduce the manual effort required for EDA.
引用
收藏
页码:2617 / 2622
页数:6
相关论文
共 50 条
[1]   A collaborative filtering approach for recommending OLAP sessions [J].
Aligon, Julien ;
Gallinucci, Enrico ;
Golfarelli, Matteo ;
Marcel, Patrick ;
Rizzi, Stefano .
DECISION SUPPORT SYSTEMS, 2015, 69 :20-30
[2]  
[Anonymous], 2016, NEURIPS
[3]  
[Anonymous], 2015, NOTE EVALUATION GENE
[4]   Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning [J].
Bar El, Ori ;
Milo, Tova ;
Somech, Amit .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :1527-1537
[5]   ATENA: An Autonomous System for Data Exploration Based on Deep Reinforcement Learning [J].
Bar, El Ori ;
Milo, Tova ;
Somech, Amit .
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, :2873-2876
[6]   Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets [J].
Chirigati, Fernando ;
Doraiswamy, Harish ;
Damoulas, Theodoros ;
Freire, Juliana .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1011-1025
[7]  
Crotty A., 2016, HILDA
[8]  
De Bie T, 2013, LECT NOTES COMPUT SC, V8207, P19, DOI 10.1007/978-3-642-41398-8_3
[9]  
Deutch D, 2016, PROC INT CONF DATA, P1358, DOI 10.1109/ICDE.2016.7498344
[10]   Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks [J].
Dibia, Victor ;
Demiralp, Cagatay .
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2019, 39 (05) :33-46