What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities

被引:81
作者
Chattopadhyay, Souti [1 ]
Prasad, Ishita [2 ]
Henley, Austin Z. [3 ]
Sarma, Anita [1 ]
Barik, Titus [2 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
[2] Microsoft, Redmond, WA USA
[3] Univ Tennessee, Knoxville, TN USA
来源
PROCEEDINGS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'20) | 2020年
基金
美国国家科学基金会;
关键词
Computational notebooks; challenges; data science; interviews; pain points; survey;
D O I
10.1145/3313831.3376729
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Computational notebooks-such as Azure, Databricks, and Jupyter-are a popular, interactive paradigm for data scientists to author code, analyze data, and interleave visualizations, all within a single document. Nevertheless, as data scientists incorporate more of their activities into notebooks, they encounter unexpected difficulties, or pain points, that impact their productivity and disrupt their workflow. Through a systematic, mixed-methods study using semi-structured interviews (n = 20) and survey (n = 156) with data scientists, we catalog nine pain points when working with notebooks. Our findings suggest that data scientists face numerous pain points throughout the entire workflow-from setting up notebooks to deploying to production-across many notebook environments. Our data scientists report essential notebook requirements, such as supporting data exploration and visualization. The results of our study inform and inspire the design of computational notebooks.
引用
收藏
页数:12
相关论文
共 39 条
[21]  
Kery MB, 2018, 2018 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), P147, DOI 10.1109/VLHCC.2018.8506576
[22]  
Kery MB, 2017, S VIS LANG HUM CEN C, P25, DOI 10.1109/VLHCC.2017.8103446
[23]   Jupyter Notebooks-a publishing format for reproducible computational workflows [J].
Kluyver, Thomas ;
Ragan-Kelley, Benjamin ;
Perez, Fernando ;
Granger, Brian ;
Bussonnier, Matthias ;
Frederic, Jonathan ;
Kelley, Kyle ;
Hamrick, Jessica ;
Grout, Jason ;
Corlay, Sylvain ;
Ivanov, Paul ;
Avila, Damin ;
Abdalla, Safia ;
Willing, Carol .
POSITIONING AND POWER IN ACADEMIC PUBLISHING: PLAYERS, AGENTS AND AGENDAS, 2016, :87-90
[24]   Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges [J].
Kross, Sean ;
Guo, Philip J. .
CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[25]   Micro-Versioning Tool to Support Experimentation in Exploratory Programming [J].
Mikami, Hiroaki ;
Sakamoto, Daisuke ;
Igarashi, Takeo .
PROCEEDINGS OF THE 2017 ACM SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'17), 2017, :6208-6219
[26]   How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation [J].
Muller, Michael ;
Lange, Ingrid ;
Wang, Dakuo ;
Piorkowski, David ;
Tsay, Jason ;
Liao, Q. Vera ;
Dugan, Casey ;
Erickson, Thomas .
CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[27]  
Myers B. A., 1998, CHI 98. Human Factors in Computing Systems. CHI 98 Conference Proceedings, P534, DOI 10.1145/274644.274716
[28]  
Netflix, 2018, SCHED NOT NETFL 2
[29]   Validity and qualitative research: An oxymoron? [J].
Onwuegbuzie, Anthony J. ;
Leech, Nancy L. .
QUALITY & QUANTITY, 2007, 41 (02) :233-249
[30]  
Perez F., 2015, Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science