On misbehaviour and fault tolerance in machine learning systems

被引:14
作者
Myllyaho, Lalli [1 ]
Raatikainen, Mikko [1 ]
Mannisto, Tomi [1 ]
Nurminen, Jukka K. [1 ]
Mikkonen, Tommi [1 ]
机构
[1] Univ Helsinki, Helsinki, Finland
关键词
Machine learning; Fault tolerance; Software architecture; Software engineering; Case study; FUTURE;
D O I
10.1016/j.jss.2021.111096
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability, such as reliability and security, of these systems. Systems can be tested and monitored, but this does not provide protection against faults and failures in adapted ML systems themselves. We studied software designs that aim at introducing fault tolerance in ML systems so that possible problems in ML components of the systems can be avoided. The research was conducted as a case study, and its data was collected through five semi-structured interviews with experienced software architects. We present a conceptualisation of the misbehaviour of ML systems, the perceived role of fault tolerance, and the designs used. Common patterns to incorporating ML components in design in a fault tolerant fashion have started to emerge. ML models are, for example, guarded by monitoring the inputs and their distribution, and enforcing business rules on acceptable outputs. Multiple, specialised ML models are used to adapt to the variations and changes in the surrounding world, and simpler fall-over techniques like default outputs are put in place to have systems up and running in the face of problems. However, the general role of these patterns is not widely acknowledged. This is mainly due to the relative immaturity of using ML as part of a complete software system: the field still lacks established frameworks and practices beyond training to implement, operate, and maintain the software that utilises ML. ML software engineering needs further analysis and development on all fronts. (c) 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:13
相关论文
共 34 条
[1]   Software Engineering for Machine Learning: A Case Study [J].
Amershi, Saleema ;
Begel, Andrew ;
Bird, Christian ;
DeLine, Robert ;
Gall, Harald ;
Kamar, Ece ;
Nagappan, Nachiappan ;
Nushi, Besmira ;
Zimmermann, Thomas .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2019), 2019, :291-300
[2]  
[Anonymous], 1975, IEEE T SOFTW ENG, DOI DOI 10.1109/TSE.1975.6312842
[3]  
[Anonymous], 2006, IEEE standard dictionary of measures of the software aspects of dependability, P1, DOI DOI 10.1109/IEEESTD.2006.215280
[4]  
[Anonymous], 1994, Constructing questions for interviews and questionnaires: Theory and practice in Social Research
[5]   Basic concepts and taxonomy of dependable and secure computing [J].
Avizienis, A ;
Laprie, JC ;
Randell, B ;
Landwehr, C .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2004, 1 (01) :11-33
[6]  
Breck E, 2017, IEEE INT CONF BIG DA, P1123, DOI 10.1109/BigData.2017.8258038
[7]  
Cook T. D., 2002, Experimental and Quasi-Experimental Designs for Generalized Causal Inference
[8]   A Retargetable Fault Injection Framework for Safety Validation of Autonomous Vehicles [J].
Fu, Yuting ;
Terechko, Andrei ;
Bijlsma, Tjerk ;
Cuijpers, Pieter J. L. ;
Redegeld, Jeroen ;
Ors, Ali Osman .
2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION (ICSA-C 2019), 2019, :69-76
[9]   Diversity in Machine Learning [J].
Gong, Zhiqiang ;
Zhong, Ping ;
Hu, Weidong .
IEEE ACCESS, 2019, 7 :64323-64350
[10]  
Jonsson L., 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (ICST 2012), P437, DOI 10.1109/ICST.2012.56