Rethink reporting of evaluation results in AI Aggregate metrics and lack of access to results limit understanding

被引:43
作者
Burnell, Ryan [1 ]
Schellaert, Wout [2 ]
Burden, John [1 ,3 ]
Ullman, Tomer D. [4 ]
Martinez-Plumed, Fernando [2 ]
Tenenbaum, Joshua B. [5 ]
Rutar, Danaja [1 ]
Cheke, Lucy G. [1 ,6 ]
Sohl-Dickstein, Jascha [7 ]
Mitchell, Melanie [8 ]
Kiela, Douwe [9 ]
Shanahan, Murray [10 ,11 ]
Voorhees, Ellen M.
Cohn, Anthony G. [12 ,13 ,14 ,15 ]
Leibo, Joel Z. [10 ]
Hernandez-Orallo, Jose [1 ,2 ,3 ]
机构
[1] Univ Cambridge, Leverhulme Ctr Future Intelligence, Cambridge, England
[2] Univ Politecn Valencia, Valencian Res Inst Artificial Intelligence, Valencia, Spain
[3] Univ Cambridge, Ctr Study Existential Risk, Cambridge, England
[4] Harvard Univ, Dept Psychol, Cambridge, MA USA
[5] MIT, Dept Brain & Cognit Sci, Cambridge, MA USA
[6] Univ Cambridge, Dept Psychol, Cambridge, England
[7] Google, Brain Team, Mountain View, CA USA
[8] Stanford Univ, Stanford, CA USA
[9] DeepMind, London, England
[10] Imperial Coll London, Dept Comp, London, England
[11] Imperial Coll London, Dept Comp, London, England
[12] Univ Leeds, Sch Comp, Leeds, England
[13] Alan Turing Inst, London, England
[14] Tongji Univ, Shanghai, Peoples R China
[15] Shandong Univ, Jinan, Peoples R China
关键词
REPRODUCIBILITY;
D O I
10.1126/science.adf6369
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
引用
收藏
页码:136 / 138
页数:3
相关论文
共 15 条
[1]  
Buolamwini J., 2018, C FAIRNESS ACCOUNTAB, P77
[2]  
Burnell R., 2022, P 31 INT JOINT C ART, P2827
[3]  
Raji ID, 2020, Arxiv, DOI arXiv:2001.00973
[4]  
Gundersen OE, 2018, AAAI CONF ARTIF INTE, P1644
[5]   Transparency and reproducibility in artificial intelligence [J].
Haibe-Kains, Benjamin ;
Adam, George Alexandru ;
Hosny, Ahmed ;
Khodakarami, Farnoosh ;
Shraddha, Thakkar ;
Kusko, Rebecca ;
Sansone, Susanna-Assunta ;
Tong, Weida ;
Wolfinger, Russ D. ;
Mason, Christopher E. ;
Jones, Wendell ;
Dopazo, Joaquin ;
Furlanello, Cesare ;
Waldron, Levi ;
Wang, Bo ;
McIntosh, Chris ;
Goldenberg, Anna ;
Kundaje, Anshul ;
Greene, Casey S. ;
Broderick, Tamara ;
Hoffman, Michael M. ;
Leek, Jeffrey T. ;
Korthauer, Keegan ;
Huber, Wolfgang ;
Brazma, Alvis ;
Pineau, Joelle ;
Tibshirani, Robert ;
Hastie, Trevor ;
Ioannidis, John P. A. ;
Quackenbush, John ;
Aerts, Hugo J. W. L. .
NATURE, 2020, 586 (7829) :E14-U7
[6]  
Hutson M, 2018, SCIENCE, V359, P725, DOI 10.1126/science.359.6377.725
[7]  
Kiela D, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P4110
[8]   Unmasking Clever Hans predictors and assessing what machines really learn [J].
Lapuschkin, Sebastian ;
Waeldchen, Stephan ;
Binder, Alexander ;
Montavon, Gregoire ;
Samek, Wojciech ;
Mueller, Klaus-Robert .
NATURE COMMUNICATIONS, 2019, 10 (1)
[9]  
Leibo JZ, 2021, PR MACH LEARN RES, V139
[10]   International evaluation of an AI system for breast cancer screening [J].
McKinney, Scott Mayer ;
Sieniek, Marcin ;
Godbole, Varun ;
Godwin, Jonathan ;
Antropova, Natasha ;
Ashrafian, Hutan ;
Back, Trevor ;
Chesus, Mary ;
Corrado, Greg C. ;
Darzi, Ara ;
Etemadi, Mozziyar ;
Garcia-Vicente, Florencia ;
Gilbert, Fiona J. ;
Halling-Brown, Mark ;
Hassabis, Demis ;
Jansen, Sunny ;
Karthikesalingam, Alan ;
Kelly, Christopher J. ;
King, Dominic ;
Ledsam, Joseph R. ;
Melnick, David ;
Mostofi, Hormuz ;
Peng, Lily ;
Reicher, Joshua Jay ;
Romera-Paredes, Bernardino ;
Sidebottom, Richard ;
Suleyman, Mustafa ;
Tse, Daniel ;
Young, Kenneth C. ;
De Fauw, Jeffrey ;
Shetty, Shravya .
NATURE, 2020, 577 (7788) :89-+