Machine/Deep Learning for Software Engineering: A Systematic Literature Review

被引:26
作者
Wang, Simin [1 ]
Huang, Liguo [1 ]
Gao, Amiao [1 ]
Ge, Jidong [2 ]
Zhang, Tengfei [2 ]
Feng, Haitao [2 ]
Satyarth, Ishna [1 ]
Li, Ming [2 ]
Zhang, He [2 ]
Ng, Vincent [3 ]
机构
[1] Southern Methodist Univ, Dept Comp Sci, Dallas, TX 75275 USA
[2] Nanjing Univ, Nanjing 210093, Jiangsu, Peoples R China
[3] Univ Texas Dallas, Human Language Technol Res Inst, Richardson, TX 75083 USA
关键词
Task analysis; Software; Data models; Complexity theory; Codes; Predictive models; Analytical models; Software engineering; machine learning; deep learning; DEFECT PREDICTION; BUG LOCALIZATION; CLASSIFICATION; REPRESENTATION; PERFORMANCE; MODELS; REPRODUCIBILITY; COMPONENTS; AGREEMENT; TUTORIAL;
D O I
10.1109/TSE.2022.3173346
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Since 2009, the deep learning revolution, which was triggered by the introduction of ImageNet, has stimulated the synergy between Software Engineering (SE) and Machine Learning (ML)/Deep Learning (DL). Meanwhile, critical reviews have emerged that suggest that ML/DL should be used cautiously. To improve the applicability and generalizability of ML/DL-related SE studies, we conducted a 12-year Systematic Literature Review (SLR) on 1,428 ML/DL-related SE papers published between 2009 and 2020. Our trend analysis demonstrated the impacts that ML/DL brought to SE. We examined the complexity of applying ML/DL solutions to SE problems and how such complexity led to issues concerning the reproducibility and replicability of ML/DL studies in SE. Specifically, we investigated how ML and DL differ in data preprocessing, model training, and evaluation when applied to SE tasks, and what details need to be provided to ensure that a study can be reproduced or replicated. By categorizing the rationales behind the selection of ML/DL techniques into five themes, we analyzed how model performance, robustness, interpretability, complexity, and data simplicity affected the choices of ML/DL models.
引用
收藏
页码:1188 / 1231
页数:44
相关论文
共 397 条
[21]   psc2code: Denoising Code Extraction from Programming Screencasts [J].
Bao, Lingfeng ;
Xing, Zhenchang ;
Xia, Xin ;
Lo, David ;
Wu, Minghui ;
Yang, Xiaohu .
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2020, 29 (03)
[22]   VT-Revolution: Interactive Programming Video Tutorial Authoring and Watching System [J].
Bao, Lingfeng ;
Xing, Zhenchang ;
Xia, Xin ;
Lo, David .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (08) :823-838
[23]   Inference of development activities from interaction with uninstrumented applications [J].
Bao, Lingfeng ;
Xing, Zhenchang ;
Xia, Xin ;
Lo, David ;
Hassan, Ahmed E. .
EMPIRICAL SOFTWARE ENGINEERING, 2018, 23 (03) :1313-1351
[24]   Who Will Leave the Company? A Large-Scale Industry Study of Developer Turnover by Mining Monthly Work Report [J].
Bao, Lingfeng ;
Xing, Zhenchang ;
Xia, Xin ;
Lo, David ;
Li, Shanping .
2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, :170-181
[25]  
Barriga A, 2020, 23RD ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2020, P24, DOI 10.1145/3365438.3410957
[26]   Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms [J].
Ben Abdessalem, Raja ;
Nejati, Shiva ;
Briand, Lionel C. ;
Stifter, Thomas .
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, :1016-1026
[27]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[28]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[29]   Translating Video Recordings of Mobile App Usages into Replayable Scenarios [J].
Bernal-Cardenas, Carlos ;
Cooper, Nathan ;
Moran, Kevin ;
Chaparro, Oscar ;
Marcus, Andrian ;
Poshyvanyk, Denys .
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, :309-321
[30]   Learning-to-Rank vs Ranking-to-Learn: Strategies for Regression Testing in Continuous Integration [J].
Bertolino, Antonia ;
Guerriero, Antonio ;
Miranda, Breno ;
Pietrantuono, Roberto ;
Russo, Stefano .
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, :1-12