Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs

被引:88
作者
Hong S.R. [1 ]
Hullman J. [2 ]
Bertini E. [1 ]
机构
[1] New York University, New York
[2] Northwestern University, Evanston
关键词
data scientist; domain expert; empirical study; explainable AI; group work; machine learning; mental model; model interpretability; sense-making; subject matter expert;
D O I
10.1145/3392878
中图分类号
学科分类号
摘要
As the use of machine learning (ML) models in product development and data-driven decision-making processes became pervasive in many domains, people's focus on building a well-performing model has increasingly shifted to understanding how their model works. While scholarly interest in model interpretability has grown rapidly in research communities like HCI, ML, and beyond, little is known about how practitioners perceive and aim to provide interpretability in the context of their existing workflows. This lack of understanding of interpretability as practiced may prevent interpretability research from addressing important needs, or lead to unrealistic solutions. To bridge this gap, we conducted 22 semi-structured interviews with industry practitioners to understand how they conceive of and design for interpretability while they plan, build, and use their models. Based on a qualitative analysis of our results, we differentiate interpretability roles, processes, goals and strategies as they exist within organizations making heavy use of ML models. The characterization of interpretability work that emerges from our analysis suggests that model interpretability frequently involves cooperation and mental model comparison between people in different roles, often aimed at building trust not only between people and models but also between people within the organization. We present implications for design that discuss gaps between the interpretability challenges that practitioners face in their practice and approaches proposed in the literature, highlighting possible research directions that can better address real-world needs. © 2020 ACM.
引用
收藏
相关论文
共 77 条
  • [1] Attenberg J.M., Ipeirotis P.G., Provost F., Beat the machine: Challenging workers to find the unknown unknowns, Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence, (2011)
  • [2] Balagopalan A., Novikova J., Rudzicz F., Ghassemi M., The Effect of Heterogeneous Data for Alzheimer's Disease Detection from Speech, (2018)
  • [3] Bansal G., Nushi B., Kamar E., Weld D., Lasecki W.S., Horvitz E., Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff, Proceedings of the AAAI Conference on Artificial Intelligence, 33, pp. 2429-2437, (2019)
  • [4] Bibal A., Frenay B., Interpretability of Machine Learning Models and Representations: An Introduction, 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 77-82, (2016)
  • [5] Bostrom A., Fischhoff B., Granger Morgan M., Characterizing mental models of hazardous processes: A methodology and an application to radon, Journal of Social Issues, 48, 4, pp. 85-100, (1992)
  • [6] Breiman L., Classification and Regression Trees, (2017)
  • [7] Bussone A., Stumpf S., O'Sullivan D., The role of explanations on trust and reliance in clinical decision support systems, 2015 International Conference on Healthcare Informatics. IEEE, pp. 160-169, (2015)
  • [8] Caruana R., Lou Y., Gehrke J., Koch P., Sturm M., Elhadad N., Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 1721-1730, (2015)
  • [9] Choi M., Park C., Yang S., Kim Y., Choo J., Ray Hong S., AILA: Attentive interactive labeling assistant for document classification through attention-based deep neural networks, Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, (2019)
  • [10] Joon Young Chung J., Song J.Y., Kutty S., Ray Hong S., Kim J., Lasecki A.S., Efficient elicitation approaches to estimate collective crowd answers, Proceedings of the ACM on Human-Computer Interaction, 3, pp. 1-25, (2019)