IS AI GROUND TRUTH REALLY TRUE? THE DANGERS OF TRAINING AND EVALUATING AI TOOLS BASED ON EXPERTS' KNOW-WHAT1

被引:99
作者
Lebovitz, Sarah [1 ]
Levine, Natalia [2 ]
Lifshitz-Assaf, Hila [2 ]
机构
[1] Univ Virginia, McIntire Sch Commerce, Charlottesville, VA 22904 USA
[2] NYU, Stern Sch Business, New York, NY 10003 USA
基金
美国国家科学基金会;
关键词
Artificial intelligence; evaluation  uncertainty  new technology  professional  knowledge; work innovation  know-how  medical diagnosis; ground truth; INTEROBSERVER VARIABILITY; ARTIFICIAL-INTELLIGENCE; KNOWLEDGE; MAMMOGRAPHY; UNCERTAINTY; FUTURE; ERROR; JOBS;
D O I
10.25300/MISQ/2021/16564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools outperform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major U.S. hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts' know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI's know-what and experts' know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.
引用
收藏
页码:1501 / 1526
页数:26
相关论文
共 111 条
  • [1] Abbott Andrew., 2014, SYSTEM PROFESSIONS E
  • [2] [Anonymous], 2000, MIND MACHINE
  • [3] [Anonymous], 2011, Practice Theory, Work, and Organization: An Introduction
  • [4] TO QUESTION OR ACCEPT? HOW STATUS DIFFERENCES INFLUENCE RESPONSES TO NEW EPISTEMIC TECHNOLOGIES IN KNOWLEDGE WORK
    Anthony, Callen
    [J]. ACADEMY OF MANAGEMENT REVIEW, 2018, 43 (04) : 661 - 679
  • [5] End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography
    Ardila, Diego
    Kiraly, Atilla P.
    Bharadwaj, Sujeeth
    Choi, Bokyung
    Reicher, Joshua J.
    Peng, Lily
    Tse, Daniel
    Etemadi, Mozziyar
    Ye, Wenxing
    Corrado, Greg
    Naidich, David P.
    Shetty, Shravya
    [J]. NATURE MEDICINE, 2019, 25 (06) : 954 - +
  • [6] Why Are There Still So Many Jobs? The History and Future of Workplace Automation
    Autor, David H.
    [J]. JOURNAL OF ECONOMIC PERSPECTIVES, 2015, 29 (03) : 3 - 30
  • [7] Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
    Barredo Arrieta, Alejandro
    Diaz-Rodriguez, Natalia
    Del Ser, Javier
    Bennetot, Adrien
    Tabik, Siham
    Barbado, Alberto
    Garcia, Salvador
    Gil-Lopez, Sergio
    Molina, Daniel
    Benjamins, Richard
    Chatila, Raja
    Herrera, Francisco
    [J]. INFORMATION FUSION, 2020, 58 : 82 - 115
  • [8] Expl(AI)n It to Me - Explainable AI and Information Systems Research
    Bauer, Kevin
    Hinz, Oliver
    van der Aalst, Wil
    Weinhardt, Christof
    [J]. BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2021, 63 (02) : 79 - 82
  • [9] Bechky B., 2021, Blood, powder, and residue: How crime labs translate evidence into proof (Cloth)
  • [10] Object lessons: Workplace artifacts as representations of occupational jurisdiction
    Bechky, BA
    [J]. AMERICAN JOURNAL OF SOCIOLOGY, 2003, 109 (03) : 720 - 752