An Empirical Approach for Avoiding False Discoveries When Applying High-Dimensional Radiomics to Small Datasets

被引:19
作者
Chatterjee, Avishek [1 ,2 ]
Vallieres, Martin [1 ,2 ]
Dohan, Anthony [3 ]
Levesque, Ives R. [1 ,2 ]
Ueno, Yoshiko [4 ,5 ]
Bist, Vipul [6 ,7 ]
Saif, Sameh [3 ]
Reinhold, Caroline [3 ]
Seuntjens, Jan [1 ,2 ]
机构
[1] McGill Univ, Med Phys Unit, Montreal, PQ H4A 3J1, Canada
[2] McGill Univ, Hlth Ctr, Res Inst, Montreal, PQ H4A 3J1, Canada
[3] McGill Univ, Hlth Ctr, Dept Radiol, Montreal, PQ H4A 3J1, Canada
[4] McGill Univ, Hlth Ctr, Dept Radiol, Montreal, PQ H4A 3J1, Canada
[5] Kobe Univ, Dept Radiol, Kobe, Hyogo 6500017, Japan
[6] McGill Univ, Dept Radiol, Hlth Ctr, Montreal, PQ H4A 3J1, Canada
[7] Venkateshwar Hosp, Dept Radiol & Imaging, New Delhi 110075, India
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
Big data applications; computer aided analysis; feature extraction; predictive models; statistical learning; FEATURES; CANCER; IMAGES; PREDICTION;
D O I
10.1109/TRPMS.2018.2880617
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Purpose: Radiomic studies, where correlations are drawn between patients' medical image features and patient outcomes, often deal with small datasets. Consequently, results can suffer from lack of replicability and stability. This paper establishes a methodology to assess and reduce the impact of statistical fluctuations that may occur in small datasets. Such fluctuations can lead to false discoveries, particularly when applying feature selection or machine learning (ML) methods commonly used in the radiomics literature. Methods: Two feature selection methods were created, one for choosing single predictive features, and another for obtaining features sets that could be combined in a predictive model. The features were combined using ML tools less affected by overfitting (Naive Bayes, logistic regression, and linear support vector machines). Only three features were allowed to be combined at a time, further limiting overfitting. This methodology was applied to MR images from small datasets in metastatic liver disease (69 samples) and primary uterine adenocarcinoma (93 samples), and the outcomes studied were: desmoplasia (for liver metastases), lymphovascular space invasion (LVSI), cancer staging (FIGO), and tumor grade (for uterine tumors). For outcomes in uterine cancer, the predictive models were tested on independent subsets. Results: With respect to the combined predictive feature approach: for LVSI, a prognostic factor that a human reader cannot detect, the predictive model yielded AUC = 0.87 +/- 0.07 and accuracy = 0.84 +/- 0.09 in the testing set. For FIGO staging, AUC = 0.81 +/- 0.03 and accuracy = 0.79 +/- 0.08. For tumor grade, AUC = 0.76 +/- 0.05 and accuracy = 0.70 +/- 0.08. Conclusion: Despite considering a large set (similar to 10(4)) of texture features, the false discovery avoidance methodology allowed only robust predictive models to be retained. Thus, the stringent false discovery avoidance methods introduced here do not preclude the discovery of promising correlations.
引用
收藏
页码:201 / 209
页数:9
相关论文
共 25 条
[1]   The Potential of Radiomic-Based Phenotyping in PrecisionMedicine A Review [J].
Aerts, Hugo J. W. L. .
JAMA ONCOLOGY, 2016, 2 (12) :1636-1642
[2]  
[Anonymous], 1975, Comput. Graph. Image Process., DOI [DOI 10.1016/S0146-664X(75)80008-6, 10.1016/s0146-664x(75)80008-6]
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]  
Bramer M., 2016, PRINCIPLES DATA MINI, V3rd
[5]   Can radiomics features be reproducibly measured from CBCT images for patients with non-small cell lung cancer? [J].
Fave, Xenia ;
Mackin, Dennis ;
Yang, Jinzhong ;
Zhang, Joy ;
Fried, David ;
Balter, Peter ;
Followill, David ;
Gomez, Daniel ;
Jones, A. Kyle ;
Stingo, Francesco ;
Fontenot, Jonas ;
Court, Laurence .
MEDICAL PHYSICS, 2015, 42 (12) :6784-6797
[6]   Radiomics: Images Are More than Pictures, They Are Data [J].
Gillies, Robert J. ;
Kinahan, Paul E. ;
Hricak, Hedvig .
RADIOLOGY, 2016, 278 (02) :563-577
[7]   TEXTURAL FEATURES FOR IMAGE CLASSIFICATION [J].
HARALICK, RM ;
SHANMUGAM, K ;
DINSTEIN, I .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1973, SMC3 (06) :610-621
[8]   The problem of overfitting [J].
Hawkins, DM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :1-12
[9]   Predicting Malignant Nodules from Screening CT Scans [J].
Hawkins, Samuel ;
Wang, Hua ;
Liu, Ying ;
Garcia, Alberto ;
Stringfield, Olya ;
Krewer, Henry ;
Li, Qian ;
Cherezov, Dmitry ;
Gatenby, Robert A. ;
Balagurunathan, Yoganand ;
Goldgof, Dmitry ;
Schabath, Matthew B. ;
Hall, Lawrence ;
Gillies, Robert J. .
JOURNAL OF THORACIC ONCOLOGY, 2016, 11 (12) :2120-2128
[10]   Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer [J].
Huang, Yan-qi ;
Liang, Chang-hong ;
He, Lan ;
Tian, Jie ;
Liang, Cui-shan ;
Chen, Xin ;
Ma, Ze-lan ;
Liu, Zai-yi .
JOURNAL OF CLINICAL ONCOLOGY, 2016, 34 (18) :2157-+