Discovering Repetitive Code Changes in Python']Python ML Systems

被引:18
作者
Dilhara, Malinda [1 ]
Ketkar, Ameya [2 ,3 ]
Sannidhi, Nikhith [1 ]
Dig, Danny [1 ]
机构
[1] Univ Colorado, Boulder, CO 80309 USA
[2] Uber Technol Inc, San Francisco, CA USA
[3] Oregon State Univ, Corvallis, OR 97331 USA
来源
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022) | 2022年
关键词
Refactoring; Repetition; Code changes; Machine learning; !text type='Python']Python[!/text;
D O I
10.1145/3510003.3510225
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Over the years, researchers capitalized on the repetitiveness of software changes to automate many software evolution tasks. Despite the extraordinary rise in popularity of Python-based ML systems, they do not benefit from these advances. Without knowing what are the repetitive changes that ML developers make, researchers, tool, and library designers miss opportunities for automation, and ML developers fail to learn and use best coding practices. To fill the knowledge gap and advance the science and tooling in ML software evolution, we conducted the first and most finegrained study on code change patterns in a diverse corpus of 1000 top-rated ML systems comprising 58 million SLOC. To conduct this study we reuse, adapt, and improve upon the state-of-the-art repetitive change mining techniques. Our novel tool, R-CPATMINER, mines over 4M commits and constructs 350K fine-grained change graphs and detects 28K change patterns. Using thematic analysis, we identified 22 pattern groups and we reveal 4 major trends of how ML developers change their code. We surveyed 650 ML developers to further shed light on these patterns and their applications, and we received a 15% response rate. We present actionable, empiricallyjustified implications for four audiences: (i) researchers, (ii) tool builders, (iii) ML library vendors, and (iv) developers and educators.
引用
收藏
页码:736 / 748
页数:13
相关论文
共 91 条
  • [1] On the Usage of Python']Pythonic Idioms
    Alexandru, Carol V.
    Merchante, Jose J.
    Panichella, Sebastiano
    Proksch, Sebastian
    Gall, Harald C.
    Robles, Gregorio
    [J]. ONWARD!'18: PROCEEDINGS OF THE 2018 ACM SIGPLAN INTERNATIONAL SYMPOSIUM ON NEW IDEAS, NEW PARADIGMS, AND REFLECTIONS ON PROGRAMMING AND SOFTWARE, 2018, : 1 - 11
  • [2] Mining Idioms from Source Code
    Allamanis, Miltiadis
    Sutton, Charles
    [J]. 22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 472 - 483
  • [3] MIGRATIONMINER: An Automated Detection Tool of Third-Party Java']Java Library Migration at the Method Level
    Alrubaye, Hussein
    Mkaouer, Mohamed Wiem
    Ouni, Ali
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2019), 2019, : 414 - 417
  • [4] Software Engineering for Machine Learning: A Case Study
    Amershi, Saleema
    Begel, Andrew
    Bird, Christian
    DeLine, Robert
    Gall, Harald
    Kamar, Ece
    Nagappan, Nachiappan
    Nushi, Besmira
    Zimmermann, Thomas
    [J]. 2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2019), 2019, : 291 - 300
  • [5] API Code Recommendation using Statistical Learning from Fine-Grained Changes
    Anh Tuan Nguyen
    Hilton, Michael
    Codoban, Mihai
    Hoan Anh Nguyen
    Mast, Lily
    Rademacher, Eli
    Nguyen, Tien N.
    Dig, Danny
    [J]. FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 511 - 522
  • [6] Nguyen AT, 2012, PROC INT CONF SOFTW, P69, DOI 10.1109/ICSE.2012.6227205
  • [7] PYREF: Refactoring Detection in Python']Python Projects
    Atwi, Hassan
    Lin, Bin
    Tsantalis, Nikolaos
    Kashiwa, Yutaro
    Kamei, Yasutaka
    Ubayashi, Naoyasu
    Bavota, Gabriele
    Lanza, Michele
    [J]. IEEE 21ST INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2021), 2021, : 136 - 141
  • [8] Getafix: Learning to Fix Bugs Automatically
    Bader, Johannes
    Scott, Andrew
    Pradel, Michael
    Chandra, Satish
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (OOPSLA):
  • [9] The Plastic Surgery Hypothesis
    Barr, Earl T.
    Brun, Yuriy
    Devanbu, Premkumar
    Harman, Mark
    Sarro, Federica
    [J]. 22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 306 - 317
  • [10] On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects
    Barrak, Amine
    Eghan, Ellis E.
    Adams, Bram
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021), 2021, : 422 - 433