Key Insights from a Feature Discovery User Study

被引:1
作者
Ionescu, Andra [1 ]
Mouw, Zeger [1 ]
Aivaloglou, Efthimia [1 ]
Katsifodimos, Asterios [1 ]
机构
[1] Delft Univ Technol, Delft, Netherlands
来源
WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024 | 2024年
关键词
WORK;
D O I
10.1145/3665939.3665961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiple works in data management research focus on automating the processes of data augmentation and feature discovery to save users from having to perform these tasks manually. Yet, this automation often leads to a disconnect with the users, as it fails to consider the specific needs and preferences of the actual end-users of data management systems for machine learning. To explore this issue further, we conducted 19 semi-structured, think-aloud use-case studies based on a scenario in which data specialists were tasked with augmenting a base table with additional features to train a machine learning model. In this paper, we share key insights into the practices of feature discovery on tabular data performed by real-world data specialists derived from our user study. Our research uncovered differences between the user assumptions reported in the literature and the actual practices, as well as some areas where literature and real-world practices align.
引用
收藏
页数:5
相关论文
共 29 条
[1]   Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices [J].
Alspaugh, Sara ;
Zokaei, Nava ;
Liu, Andrea ;
Jin, Cindy ;
Hearst, Marti A. .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (01) :22-31
[2]   The Art and Practice of Data Science Pipelines [J].
Biswas, Sumon ;
Wardat, Mohammad ;
Rajan, Hridesh .
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, :2091-2103
[3]  
Blair E., 2015, J METHODS MEASUREMEN, V6, P14, DOI [https://doi.org/10.2458/v6i1.18772, DOI 10.2458/V6I1.18772]
[4]   Data Management for Machine Learning: A Survey [J].
Chai, Chengliang ;
Wang, Jiayi ;
Luo, Yuyu ;
Niu, Zeping ;
Li, Guoliang .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) :4646-4667
[5]   ARDA: Automatic Relational Data Augmentation for Machine Learning [J].
Chepurko, Nadiia ;
Marcus, Ryan ;
Zgraggen, Emanuel ;
Castro Fernandez, Raul ;
Kraska, Tim ;
Karger, David .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (09) :1373-1387
[6]  
Cong TJ, 2023, Arxiv, DOI [arXiv:2212.14155, 10.48550/arXiv.2212.14155]
[7]   Passing the Data Baton : A Retrospective Analysis on Data Science Work and Workers [J].
Crisan, Anamaria ;
Fiore-Gartland, Brittany ;
Tory, Melanie .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) :1860-1870
[8]  
Esmailoghli M, 2021, EDBT, P331
[9]   Aurum: A Data Discovery System [J].
Fernandez, Raul Castro ;
Abedjan, Ziawasch ;
Koko, Famien ;
Yuan, Gina ;
Madden, Sam ;
Stonebraker, Michael .
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, :1001-1012
[10]  
Ionescu Andra, 2024, 2024 IEEE 40th International Conference on Data Engineering (ICDE), P1861, DOI 10.1109/ICDE60146.2024.00150