Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making

被引:17
作者
Inkpen, Kori [1 ]
Chappidi, Shreya [2 ]
Mallari, Keri [3 ]
Nushi, Besmira [1 ]
Ramesh, Divya [4 ]
Michelucci, Pietro [5 ]
Mandava, Vani [1 ]
Veprek, Libuse Hannah [6 ]
Quinn, Gabrielle [7 ]
机构
[1] Microsoft Res, 1 Microsoft Way, Redmond, WA 98052 USA
[2] Univ Virginia, 235 McCormick Rd, Charlottesville, VA 22904 USA
[3] Univ Washington, 1400 NE Campus Pkwy, Seattle, WA 98195 USA
[4] Univ Michigan, 440 Church St, Ann Arbor, MI 48109 USA
[5] Human Computat Inst, 21 Lone Oak Rd, Ithaca, NY 14850 USA
[6] Ludwig Maximilians Univ Munchen, Geschwister Scholl Pl 1, D-80539 Munich, Germany
[7] Western Washington Univ, 516 High St, Bellingham, WA 98225 USA
关键词
Human-AI collaboration; Human-AI performance; human-centered AI; citizen science; TRUST; AUTOMATION; AVERSION; PERFORMANCE;
D O I
10.1145/3534561
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Human-AI collaboration for decision-making strives to achieve team performance that exceeds the performance of humans or AI alone. However, many factors can impact success of Human-AI teams, including a user's domain expertise, mental models of an AI system, trust in recommendations, and more. This article reports on a study that examines users' interactions with three simulated algorithmic models, all with equivalent accuracy rates but each tuned differently in terms of true positive and true negative rates. Our study examined user performance in a non-trivial blood vessel labeling task where participants indicated whether a given blood vessel was flowing or stalled. Users completed 140 trials across multiple stages, first without an AI and then with recommendations from an AI-Assistant. Although all users had prior experience with the task, their levels of proficiency varied widely. Our results demonstrated that while recommendations from an AI-Assistant can aid in users' decision making, several underlying factors, including user base expertise and complementary human-AI tuning, significantly impact the overall team performance. First, users' base performance matters, particularly in comparison to the performance level of the AI. Novice users improved, but not to the accuracy level of the AI. Highly proficient users were generally able to discern when they should follow the AI recommendation and typically maintained or improved their performance. Mid-performers, who had a similar level of accuracy to the AI, were most variable in terms of whether the AI recommendations helped or hurt their performance. Second, tuning an AI algorithm to complement users' strengths and weaknesses also significantly impacted users' performance. For example, users in our study were better at detecting flowing blood vessels, so when the AI was tuned to reduce false negatives (at the expense of increasing false positives), users were able to reject those recommendations more easily and improve in accuracy. Finally, users' perception of the AI's performance relative to their own performance had an impact on whether users' accuracy improved when given recommendations from the AI. Overall, this work reveals important insights on the complex interplay of factors influencing Human-AI collaboration and provides recommendations on how to design and tune AI algorithms to complement users in decision-making tasks.
引用
收藏
页数:29
相关论文
共 62 条
  • [1] Why trust an algorithm? Performance, cognition, and neurophysiology
    Alexander, Veronika
    Blinder, Collin
    Zak, Paul J.
    [J]. COMPUTERS IN HUMAN BEHAVIOR, 2018, 89 : 279 - 288
  • [2] Guidelines for Human-AI Interaction
    Amershi, Saleema
    Weld, Dan
    Vorvoreanu, Mihaela
    Fourney, Adam
    Nushi, Besmira
    Collisson, Penny
    Suh, Jina
    Iqbal, Shamsi
    Bennett, Paul N.
    Inkpen, Kori
    Teevan, Jaime
    Kikin-Gil, Ruth
    Horvitz, Eric
    [J]. CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
  • [3] Bansal G., 2019, P AAAI C HUMAN COMPU, V7, P2
  • [4] Bansal G, 2019, AAAI CONF ARTIF INTE, P2429
  • [5] Bansal Gagan, 2021, P 2021 CHI C HUM FAC, P1
  • [6] High fat diet worsens Alzheimer's disease-related behavioral abnormalities and neuropathology in APP/PS1 mice, but not by synergistically decreasing cerebral blood flow
    Bracko, Oliver
    Vinarcsik, Lindsay K.
    Cruz Hernandez, Jean C.
    Ruiz-Uribe, Nancy E.
    Haft-Javaherian, Mohammad
    Falkenhain, Kaja
    Ramanauskaite, Egle M.
    Ali, Muhammad
    Mohapatra, Aditi
    Swallow, Madisen A.
    Njiru, Brendah N.
    Muse, Victorine
    Michelucci, Pietro E.
    Nishimura, Nozomi
    Schaffer, Chris B.
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [7] To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making
    Buçinca Z.
    Malaya M.B.
    Gajos K.Z.
    [J]. Proceedings of the ACM on Human-Computer Interaction, 2021, 5 (CSCW1)
  • [8] Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems
    Bucinca, Zana
    Lin, Phoebe
    Gajos, Krzysztof Z.
    Glassman, Elena L.
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2020, 2020, : 454 - 464
  • [9] The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems
    Bussone, Adrian
    Stumpf, Simone
    O'Sullivan, Dympna
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2015), 2015, : 160 - 169
  • [10] Chiang Chun-Wei, 2021, P 13 ACM WEB SCI C