Demystifying Data Science Projects: A Look on the People and Process of Data Science Today

被引:15
作者
Aho, Timo [1 ]
Sievi-Korte, Outi [2 ]
Kilamo, Terhi [2 ]
Yaman, Sezin [3 ]
Mikkonen, Tommi [4 ]
机构
[1] TietoEVRY, Tampere, Finland
[2] Tampere Univ, Tampere, Finland
[3] KPMG Finland, Helsinki, Finland
[4] Univ Helsinki, Helsinki, Finland
来源
PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT (PROFES 2020) | 2020年 / 12562卷
关键词
Data science; Data engineering; Software process; Prototyping; Case study;
D O I
10.1007/978-3-030-64148-1_10
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Processes and practices used in data science projects have been reshaping especially over the last decade. These are different from their software engineering counterparts. However, to a large extent, data science relies on software, and, once taken to use, the results of a data science project are often embedded in software context. Hence, seeking synergy between software engineering and data science might open promising avenues. However, while there are various studies on data science workflows and data science project teams, there have been no attempts to combine these two very interlinked aspects. Furthermore, existing studies usually focus on practices within one company. Our study will fill these gaps with a multi-company case study, concentrating both on the roles found in data science project teams as well as the process. In this paper, we have studied a number of practicing data scientists to understand a typical process flow for a data science project. In addition, we studied the involved roles and the teamwork that would take place in the data context. Our analysis revealed three main elements of data science projects: Experimentation, Development Approach, and Multi-disciplinary team(work). These key concepts are further broken down to 13 different sub-themes in total. The found themes pinpoint critical elements and challenges found in data science projects, which are still often done in an ad-hoc fashion. Finally, we compare the results with modern software development to analyse how good a match there is.
引用
收藏
页码:153 / 167
页数:15
相关论文
共 27 条
[1]   Software Engineering for Machine Learning: A Case Study [J].
Amershi, Saleema ;
Begel, Andrew ;
Bird, Christian ;
DeLine, Robert ;
Gall, Harald ;
Kamar, Ece ;
Nagappan, Nachiappan ;
Nushi, Besmira ;
Zimmermann, Thomas .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2019), 2019, :291-300
[2]  
Ang<prime>ee S., 2018, INT C KNOWL MAN ORG
[3]  
[Anonymous], 2014, CRISP DM STILL TOP M
[4]  
Azevedo A., 2008, IADIS EUR C DAT MIN
[5]  
Brachman R. J., 1994, AAAI WORKSH KNOWL DI
[6]  
Braun V, 2006, Qualitative Research in Psychology, V3, P77, DOI [DOI 10.1191/1478088706QP063OA, 10.1191/1478088706qp063oa, DOI 10.1080/14780887.2020.1769238]
[7]  
Budde R., 1992, INFORM TECHNOL PEOPL, P6, DOI 10.1007/978-3- 642- 76820-0-2
[8]  
Grady NW, 2017, IEEE INT CONF BIG DA, P2331, DOI 10.1109/BigData.2017.8258187
[9]  
Grady NW, 2016, 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), P1603, DOI 10.1109/BigData.2016.7840770
[10]  
Hill C, 2016, S VIS LANG HUM CEN C, P162, DOI 10.1109/VLHCC.2016.7739680