Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making

被引：17

作者：

Inkpen, Kori ^{[1
]}

Chappidi, Shreya ^{[2
]}

Mallari, Keri ^{[3
]}

Nushi, Besmira ^{[1
]}

Ramesh, Divya ^{[4
]}

Michelucci, Pietro ^{[5
]}

Mandava, Vani ^{[1
]}

Veprek, Libuse Hannah ^{[6
]}

Quinn, Gabrielle ^{[7
]}

机构：

[1] Microsoft Res, 1 Microsoft Way, Redmond, WA 98052 USA

[2] Univ Virginia, 235 McCormick Rd, Charlottesville, VA 22904 USA

[3] Univ Washington, 1400 NE Campus Pkwy, Seattle, WA 98195 USA

[4] Univ Michigan, 440 Church St, Ann Arbor, MI 48109 USA

[5] Human Computat Inst, 21 Lone Oak Rd, Ithaca, NY 14850 USA

[6] Ludwig Maximilians Univ Munchen, Geschwister Scholl Pl 1, D-80539 Munich, Germany

[7] Western Washington Univ, 516 High St, Bellingham, WA 98225 USA

来源：

ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION | 2023年 / 30卷 / 05期

关键词：

Human-AI collaboration; Human-AI performance; human-centered AI; citizen science; TRUST; AUTOMATION; AVERSION; PERFORMANCE;

D O I：

10.1145/3534561

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Human-AI collaboration for decision-making strives to achieve team performance that exceeds the performance of humans or AI alone. However, many factors can impact success of Human-AI teams, including a user's domain expertise, mental models of an AI system, trust in recommendations, and more. This article reports on a study that examines users' interactions with three simulated algorithmic models, all with equivalent accuracy rates but each tuned differently in terms of true positive and true negative rates. Our study examined user performance in a non-trivial blood vessel labeling task where participants indicated whether a given blood vessel was flowing or stalled. Users completed 140 trials across multiple stages, first without an AI and then with recommendations from an AI-Assistant. Although all users had prior experience with the task, their levels of proficiency varied widely. Our results demonstrated that while recommendations from an AI-Assistant can aid in users' decision making, several underlying factors, including user base expertise and complementary human-AI tuning, significantly impact the overall team performance. First, users' base performance matters, particularly in comparison to the performance level of the AI. Novice users improved, but not to the accuracy level of the AI. Highly proficient users were generally able to discern when they should follow the AI recommendation and typically maintained or improved their performance. Mid-performers, who had a similar level of accuracy to the AI, were most variable in terms of whether the AI recommendations helped or hurt their performance. Second, tuning an AI algorithm to complement users' strengths and weaknesses also significantly impacted users' performance. For example, users in our study were better at detecting flowing blood vessels, so when the AI was tuned to reduce false negatives (at the expense of increasing false positives), users were able to reject those recommendations more easily and improve in accuracy. Finally, users' perception of the AI's performance relative to their own performance had an impact on whether users' accuracy improved when given recommendations from the AI. Overall, this work reveals important insights on the complex interplay of factors influencing Human-AI collaboration and provides recommendations on how to design and tune AI algorithms to complement users in decision-making tasks.

引用

页数：29

共 62 条

[1] Why trust an algorithm? Performance, cognition, and neurophysiology
Alexander, Veronika
Blinder, Collin
Zak, Paul J.
[J]. COMPUTERS IN HUMAN BEHAVIOR, 2018, 89 : 279 - 288
[2] Guidelines for Human-AI Interaction
Amershi, Saleema
Weld, Dan
Vorvoreanu, Mihaela
Fourney, Adam
Nushi, Besmira
Collisson, Penny
Suh, Jina
Iqbal, Shamsi
Bennett, Paul N.
Inkpen, Kori
Teevan, Jaime
Kikin-Gil, Ruth
Horvitz, Eric
[J]. CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[3] Bansal G., 2019, P AAAI C HUMAN COMPU, V7, P2
[4] Bansal G, 2019, AAAI CONF ARTIF INTE, P2429
[5] Bansal Gagan, 2021, P 2021 CHI C HUM FAC, P1
[6] High fat diet worsens Alzheimer's disease-related behavioral abnormalities and neuropathology in APP/PS1 mice, but not by synergistically decreasing cerebral blood flow
Bracko, Oliver
Vinarcsik, Lindsay K.
Cruz Hernandez, Jean C.
Ruiz-Uribe, Nancy E.
Haft-Javaherian, Mohammad
Falkenhain, Kaja
Ramanauskaite, Egle M.
Ali, Muhammad
Mohapatra, Aditi
Swallow, Madisen A.
Njiru, Brendah N.
Muse, Victorine
Michelucci, Pietro E.
Nishimura, Nozomi
Schaffer, Chris B.
[J]. SCIENTIFIC REPORTS, 2020, 10 (01)
[7] To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making
Buçinca Z.
Malaya M.B.
Gajos K.Z.
[J]. Proceedings of the ACM on Human-Computer Interaction, 2021, 5 (CSCW1)
[8] Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems
Bucinca, Zana
Lin, Phoebe
Gajos, Krzysztof Z.
Glassman, Elena L.
[J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2020, 2020, : 454 - 464
[9] The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems
Bussone, Adrian
Stumpf, Simone
O'Sullivan, Dympna
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2015), 2015, : 160 - 169
[10] Chiang Chun-Wei, 2021, P 13 ACM WEB SCI C

← 1 2 3 4 5 6 7 →