Improving Automated Pediatric Bone Age Estimation Using Ensembles of Models from the 2017 RSNA Machine Learning Challenge

被引:37
作者
Pan, Ian [1 ,2 ]
Thodberg, Hans Henrik [3 ]
Halabi, Safwan S. [4 ]
Kalpathy-Cramer, Jayashree [5 ]
Larson, David B. [4 ]
机构
[1] Brown Univ, Warren Alpert Med Sch, Dept Radiol, 593 Eddy St, Providence, RI 02903 USA
[2] Rhode Isl Hosp, Dept Diagnost Imaging, Providence, RI 02903 USA
[3] Visiana, Horsholm, Denmark
[4] Stanford Univ, Dept Radiol, Palo Alto, CA 94304 USA
[5] Harvard Med Sch, Massachusetts Gen Hosp, Athinoula A Martinos Ctr Biomed Imaging, Dept Radiol, Boston, MA 02115 USA
关键词
D O I
10.1148/ryai.2019190053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Purpose: To investigate improvements in performance for automatic bone age estimation that can be gained through model ensembling. Materials and Methods: A total of 48 submissions from the 2017 RSNA Pediatric Bone Age Machine Learning Challenge were used. Participants were provided with 12611 pediatric hand radiographs with bone ages determined by a pediatric radiologist to develop models for bone age determination. The final results were determined using a test set of 200 radiographs labeled with the weighted average of six ratings. The mean pairwise model correlation and performance of all possible model combinations for ensembles of up to 10 models using the mean absolute deviation (MAD) were evaluated. A bootstrap analysis using the 200 test radiographs was conducted to estimate the true generalization MAD. Results: The estimated generalization MAD of a single model was 4.55 months. The best-performing ensemble consisted of four models with an MAD of 3.79 months. The mean pairwise correlation of models within this ensemble was 0.47. In comparison, the lowest achievable MAD by combining the highest-ranking models based on individual scores was 3.93 months using eight models with a mean pairwise model correlation of 0.67. Conclusion: Combining less-correlated, high-performing models resulted in better performance than naively combining the top-performing models. Machine learning competitions within radiology should be encouraged to spur development of heterogeneous models whose predictions can be combined to achieve optimal performance. Supplemental material is available for this article. (c) RSNA, 2019
引用
收藏
页数:9
相关论文
共 16 条
[1]  
[Anonymous], 1971, RADIOGRAPHIC ATLAS S
[2]   Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet [J].
Bien, Nicholas ;
Rajpurkar, Pranav ;
Ball, Robyn L. ;
Irvin, Jeremy ;
Park, Allison ;
Jones, Erik ;
Bereket, Michael ;
Patel, Bhavik N. ;
Yeom, Kristen W. ;
Shpanskaya, Katie ;
Halabi, Safwan ;
Zucker, Evan ;
Fanton, Gary ;
Amanatullah, Derek F. ;
Beaulieu, Christopher F. ;
Riley, Geoffrey M. ;
Stewart, Russell J. ;
Blankenberg, Francis G. ;
Larson, David B. ;
Jones, Ricky H. ;
Langlotz, Curtis P. ;
Ng, Andrew Y. ;
Lungren, Matthew P. .
PLOS MEDICINE, 2018, 15 (11)
[3]   Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15
[4]   Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs [J].
Dunnmon, Jared A. ;
Yi, Darvin ;
Langlotz, Curtis P. ;
Re, Christopher ;
Rubin, Daniel L. ;
Lungren, Matthew P. .
RADIOLOGY, 2019, 290 (02) :537-544
[5]   Combining wrist age and third molars in forensic age estimation: how to calculate the joint age estimate and its error rate in age diagnostics [J].
Gelbrich, Bianca ;
Frerking, Carolin ;
Weiss, Sandra ;
Schwerdt, Sebastian ;
Stellzig-Eisenhauer, Angelika ;
Tausche, Eve ;
Gelbrich, Goetz .
ANNALS OF HUMAN BIOLOGY, 2015, 42 (04) :389-396
[6]   The RSNA Pediatric Bone Age Machine Learning Challenge [J].
Halabi, Safwan S. ;
Prevedello, Luciano M. ;
Kalpathy-Cramer, Jayashree ;
Mamonov, Artem B. ;
Bilbily, Alexander ;
Cicero, Mark ;
Pan, Ian ;
Pereira, Lucas Araujo ;
Sousa, Rafael Teixeira ;
Abdala, Nitamar ;
Kitamura, Felipe Campos ;
Thodberg, Hans H. ;
Chen, Leon ;
Shih, George ;
Andriole, Katherine ;
Kohli, Marc D. ;
Erickson, Bradleyj ;
Flanders, Adam E. .
RADIOLOGY, 2019, 290 (02) :498-503
[7]   NEURAL NETWORK ENSEMBLES [J].
HANSEN, LK ;
SALAMON, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (10) :993-1001
[8]   The relative performance of ensemble methods with deep convolutional neural networks for image classification [J].
Ju, Cheng ;
Bibaut, Aurelien ;
van der Laan, Mark .
JOURNAL OF APPLIED STATISTICS, 2018, 45 (15) :2800-2818
[9]   Accurate Determination of Imaging Modality using an Ensemble of Text- and Image-Based Classifiers [J].
Kahn, Charles E., Jr. ;
Kalpathy-Cramer, Jayashree ;
Lam, Cesar A. ;
Eldredge, Christina E. .
JOURNAL OF DIGITAL IMAGING, 2012, 25 (01) :37-42
[10]   Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks [J].
Lakhani, Paras ;
Sundaram, Baskaran .
RADIOLOGY, 2017, 284 (02) :574-582