Imputation techniques on missing values in breast cancer treatment and fertility data

被引:13
作者
Wu, Xuetong [1 ]
Akbarzadeh Khorshidi, Hadi [1 ]
Aickelin, Uwe [1 ]
Edib, Zobaida [2 ]
Peate, Michelle [2 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Parkville, Vic, Australia
[2] Univ Melbourne, Dept Obstet & Gynaecol, Parkville, Vic, Australia
关键词
Missing data; Imputation; Classification; Breast cancer; Post-treatment amenorrhoea; WOMEN;
D O I
10.1007/s13755-019-0082-4
中图分类号
R-058 [];
学科分类号
摘要
Clinical decision support using data mining techniques offers more intelligent way to reduce the decision error in the last few years. However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly. Imputing missing values provides an opportunity to resolve the issue. Conventional imputation methods adopt simple statistical analysis, such as mean imputation or discarding missing cases, which have many limitations and thus degrade the performance of learning. This study examines a series of machine learning based imputation methods and suggests an efficient approach to in preparing a good quality breast cancer (BC) dataset, to find the relationship between BC treatment and chemotherapy-related amenorrhoea, where the performance is evaluated with the accuracy of the prediction. To this end, the reliability and robustness of six well-known imputation methods are evaluated. Our results show that imputation leads to a significant boost in the classification performance compared to the model prediction based on listwise deletion. Furthermore, the results reveal that most methods gain strong robustness and discriminant power even the dataset experiences high missing rate (> 50%).
引用
收藏
页数:8
相关论文
共 26 条
[1]  
Acuña E, 2004, ST CLASS DAT ANAL, P639
[2]  
[Anonymous], 1995, PYTHON TUTORIAL
[3]   The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance [J].
Barakat M.S. ;
Field M. ;
Ghose A. ;
Stirling D. ;
Holloway L. ;
Vinod S. ;
Dekker A. ;
Thwaites D. .
Health Information Science and Systems, 5 (1)
[4]  
Batista GE, 2002, SER FRONT ARTIF INTE, P251
[5]   Multiple imputation: dealing with missing data [J].
de Goeij, Moniek C. M. ;
van Diepen, Merel ;
Jager, Kitty J. ;
Tripepi, Giovanni ;
Zoccali, Carmine ;
Dekker, Friedo W. .
NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) :2415-2420
[6]   Pregnancy after breast cancer: population based study [J].
Ives, Angela ;
Saunders, Christobel ;
Bulsara, Max ;
Semmens, James .
BMJ-BRITISH MEDICAL JOURNAL, 2007, 334 (7586) :194-196B
[7]   Missing data imputation using statistical and machine learning methods in a real breast cancer problem [J].
Jerez, Jose M. ;
Molina, Ignacio ;
Garcia-Laencina, Pedro J. ;
Alba, Emilio ;
Ribelles, Nuria ;
Martin, Miguel ;
Franco, Leonardo .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 50 (02) :105-115
[8]   Ovarian reserve tests for predicting fertility outcomes for assisted reproductive technology: the International Systematic Collaboration of Ovarian Reserve Evaluation protocol for a systematic review of ovarian reserve test accuracy [J].
Johnson, N. P. ;
Bagrie, E. M. ;
Coomarasamy, A. ;
Bhattacharya, S. ;
Shelling, A. N. ;
Jessop, S. ;
Farquhar, C. ;
Khan, K. S. .
BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2006, 113 (12) :1472-1480
[9]   SOME EFFICIENT RANDOM IMPUTATION METHODS [J].
KALTON, G ;
KISH, L .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1984, 13 (16) :1919-1939
[10]   Adapting Surgical Models to Individual Hospitals using Transfer Learning [J].
Lee, Gyemin ;
Rubinfeld, Ilan ;
Syed, Zeeshan .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, :57-63