The Accuracy and Reliability of Crowdsource Annotations of Digital Retinal Images

被引:23
作者
Mitry, Danny [1 ,2 ]
Zutis, Kris [3 ]
Dhillon, Baijean [4 ,5 ]
Peto, Tunde [1 ,2 ]
Hayat, Shabina [6 ]
Khaw, Kay-Tee [7 ]
Morgan, James E. [8 ]
Moncur, Wendy [9 ]
Trucco, Emanuele [3 ]
Foster, Paul J. [1 ,2 ]
机构
[1] Moorfields Eye Hosp, NIHR Biomed Res Ctr, London, England
[2] UCL Inst Ophthalmol, London, England
[3] Univ Dundee, Sch Sci & Engn, VAMPIRE Project, Dundee, Scotland
[4] Univ Edinburgh, Ctr Clin Brain Sci, Edinburgh, Midlothian, Scotland
[5] Princess Alexandra Eye Pavil, Edinburgh, Midlothian, Scotland
[6] Univ Cambridge, Dept Publ Hlth & Primary Care, Strangeways Res Lab, Worts Causeway, Cambridge, England
[7] Univ Cambridge, Addenbrookes Hosp, Dept Clin Gerontol, Cambridge, England
[8] Cardiff Univ, Sch Optometry & Vis Sci, Cardiff, S Glam, Wales
[9] Univ Dundee, Duncan Jordanstone Coll Arts & Design, Dundee, Scotland
来源
TRANSLATIONAL VISION SCIENCE & TECHNOLOGY | 2016年 / 5卷 / 05期
基金
英国医学研究理事会;
关键词
retina; image analysis; crowdsourcing; DIABETIC-RETINOPATHY; QUALITY;
D O I
10.1167/tvst.5.5.6
中图分类号
R77 [眼科学];
学科分类号
100212 ;
摘要
Purpose: Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification. Methods: We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy. Results: In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%-74%) and 87% (95% Cl, 86%-88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% Cl, 0.91-0.96). For image annotation, a maximal Dice coefficient (-0.6) was achieved with a consensus threshold of 0.25. Conclusions: This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group. Translational Relevance: The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis.
引用
收藏
页数:9
相关论文
共 24 条
  • [1] Automated Analysis of Retinal Images for Detection of Referable Diabetic Retinopathy
    Abramoff, Michael D.
    Folk, James C.
    Han, Dennis P.
    Walker, Jonathan D.
    Williams, David F.
    Russell, Stephen R.
    Massin, Pascale
    Cochener, Beatrice
    Gain, Philippe
    Tang, Li
    Lamard, Mathieu
    Moga, Daniela C.
    Quellec, Gwenole
    Niemeijer, Meindert
    [J]. JAMA OPHTHALMOLOGY, 2013, 131 (03) : 351 - 357
  • [2] Automated Early Detection of Diabetic Retinopathy
    Abramoff, Michael D.
    Reinhardt, Joseph M.
    Russell, Stephen R.
    Folk, James C.
    Mahajan, Vinit B.
    Niemeijer, Meindert
    Quellec, Gwenole
    [J]. OPHTHALMOLOGY, 2010, 117 (06) : 1147 - 1154
  • [3] Rapid Grading of Fundus Photographs for Diabetic Retinopathy Using Crowdsourcing
    Brady, Christopher J.
    Villanti, Andrea C.
    Pearson, Jennifer L.
    Kirchner, Thomas R.
    Gupta, Omesh P.
    Shah, Chirag P.
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2014, 16 (10) : 175 - 184
  • [4] Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data?
    Buhrmester, Michael
    Kwang, Tracy
    Gosling, Samuel D.
    [J]. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2011, 6 (01) : 3 - 5
  • [5] Crowd-Sourced Assessment of Technical Skills: a novel method to evaluate surgical performance
    Chen, Carolyn
    White, Lee
    Kowalewski, Timothy
    Aggarwal, Rajesh
    Lintott, Chris
    Comstock, Bryan
    Kuksenok, Katie
    Aragon, Cecilia
    Holst, Daniel
    Lendvay, Thomas
    [J]. JOURNAL OF SURGICAL RESEARCH, 2014, 187 (01) : 65 - 71
  • [6] Nonmydriatic teleretinal imaging improves adherence to annual eye examinations in patients with diabetes
    Conlin, Paul R.
    Fisch, Barry M.
    Cavallerano, Anthony A.
    Cavallerano, Jerry D.
    Bursell, Sven-Erik
    Aiello, Lloyd M.
    [J]. JOURNAL OF REHABILITATION RESEARCH AND DEVELOPMENT, 2006, 43 (06) : 733 - 739
  • [7] Evaluating Amazon's Mechanical Turk as a Tool for Experimental Behavioral Research
    Crump, Matthew J. C.
    McDonnell, John V.
    Gureckis, Todd M.
    [J]. PLOS ONE, 2013, 8 (03):
  • [8] Diabetic retinopathy study, 1981, INVEST OPHTH VIS SCI, V21, P1226
  • [9] Excellence NIfHaC, 2011, NICE COST IMP COMM A
  • [10] Influences of Radiology Trainees on Screening Mammography Interpretation
    Hawley, Jeffrey R.
    Taylor, Clayton R.
    Cubbison, Alyssa M.
    Erdal, B. Selnur
    Yildiz, Vedat O.
    Carkaci, Selin
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2016, 13 (05) : 554 - 561