What Did My AI Learn? How Data Scientists Make Sense of Model Behavior

被引:21
作者
Cabrera, Angel Alexander [1 ]
Ribeiro, Marco Tulio [2 ]
Lee, Bongshin [2 ]
Deline, Robert [2 ]
Perer, Adam [1 ]
Drucker, Steven M. [2 ]
机构
[1] Carnegie Mellon Univ, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
[2] Microsoft Res, Microsoft Bldg 99,14820 NE 36th St, Redmond, WA 98052 USA
基金
美国国家科学基金会;
关键词
Machine learning; AI; machine behavior; machine learning testing; sensemaking; visualization;
D O I
10.1145/3542921
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data scientists require rich mental models of how AI systems behave to effectively train, debug, and work with them. Despite the prevalence of AI analysis tools, there is no general theory describing how people make sense of what their models have learned. We frame this process as a form of sensemaking and derive a framework describing how data scientists develop mental models of AI behavior. To evaluate the framework, we show how existing AI analysis tools fit into this sensemaking process and use it to design AIFinnity, a system for analyzing image-and-text models. Lastly, we explored how data scientists use a tool developed with the framework through a think-aloud study with 10 data scientists tasked with using AIFinnity to pick an image captioning model. We found that AIFinnity's sensemaking workflow reflected participants' mental processes and enabled them to discover and validate diverse AI behaviors.
引用
收藏
页数:27
相关论文
共 93 条
[1]   Software Engineering for Machine Learning: A Case Study [J].
Amershi, Saleema ;
Begel, Andrew ;
Bird, Christian ;
DeLine, Robert ;
Gall, Harald ;
Kamar, Ece ;
Nagappan, Nachiappan ;
Nushi, Besmira ;
Zimmermann, Thomas .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2019), 2019, :291-300
[2]   ModelTracker: Redesigning Performance Analysis Tools for Machine Learning [J].
Amershi, Saleema ;
Chickering, Max ;
Drucker, Steven M. ;
Lee, Bongshin ;
Simard, Patrice ;
Suh, Jina .
CHI 2015: PROCEEDINGS OF THE 33RD ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2015, :337-346
[3]  
Ancona D., 2012, HDB TEACHING LEADERS, P3
[4]  
Angwin J., 2016, ProPublica
[5]  
[Anonymous], 2019, Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrians Tempe, Ariz.
[6]   FactSheets: Increasing trust in AI services through supplier's declarations of conformity [J].
Arnold, M. ;
Bellamy, R. K. E. ;
Hind, M. ;
Houde, S. ;
Mehta, S. ;
Mojsilovic, A. ;
Nair, R. ;
Ramamurthy, K. Natesan ;
Olteanu, A. ;
Piorkowski, D. ;
Reimer, D. ;
Richards, J. ;
Tsay, J. ;
Varshney, K. R. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (4-5)
[7]  
Attenberg Josh, 2011, P WORKSHOPS 25 AAAI
[8]   Symphony: Composing Interactive Interfaces for Machine Learning [J].
Baeuerle, Alex ;
Cabrera, Angel Alexander ;
Hohman, Fred ;
Maher, Megan ;
Koski, David ;
Suau, Xavier ;
Barik, Titus ;
Moritz, Dominik .
PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
[9]  
Bansal G, 2018, AAAI CONF ARTIF INTE, P1463
[10]  
Beaudouin-Lafon Michel, 2004, P WORK C ADV VIS INT, P15, DOI DOI 10.1145/989863.989865