Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data

被引:8
作者
Adams, Lucas J. [1 ]
Bello, Ghalib [2 ]
Dumancas, Gerard G. [1 ]
机构
[1] Oklahoma Baptist Univ, Dept Chem, Shawnee, OK USA
[2] Oklahoma Med Res Fdn, Arthritis & Clin Immunol Res Program, Oklahoma City, OK 73104 USA
关键词
genetic algorithm; machine learning; NHANES; questionnaire;
D O I
10.4137/BBI.S29469
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The problem of selecting important variables for predictive modeling of a specific outcome of interest using questionnaire data has rarely been addressed in clinical settings. In this study, we implemented a genetic algorithm (GA) technique to select optimal variables from questionnaire data for predicting a five-year mortality. We examined 123 questions (variables) answered by 5,444 individuals in the National Health and Nutrition Examination Survey. The GA iterations selected the top 24 variables, including questions related to stroke, emphysema, and general health problems requiring the use of special equipment, for use in predictive modeling by various parametric and nonparametric machine learning techniques. Using these top 24 variables, gradient boosting yielded the nominally highest performance (area under curve [AUC] = 0.7654), although there were other techniques with lower but not significantly different AUC. This study shows how GA in conjunction with various machine learning techniques could be used to examine questionnaire data to predict a binary outcome.
引用
收藏
页码:31 / 41
页数:11
相关论文
共 86 条
[1]  
Aday LA, DESIGNING CONDUCTING
[2]  
Amelia II., PROGRAM MISSING DATA
[3]  
[Anonymous], 1989, GENETIC ALGORITHMS S, V27, P27, DOI 10.5860/choice.27-0936
[4]  
[Anonymous], 2010, SURVEILLANCE CERTAIN
[5]  
Anson J, 2014, MORTALITY INT PERSPE, P360
[6]  
Berglund P, 2014, MULTIPLE IMPUTATION, P164
[7]   PERFORMANCE OF A 5-ITEM MENTAL-HEALTH SCREENING-TEST [J].
BERWICK, DM ;
MURPHY, JM ;
GOLDMAN, PA ;
WARE, JE ;
BARSKY, AJ ;
WEINSTEIN, MC .
MEDICAL CARE, 1991, 29 (02) :169-176
[8]  
Bissacco Alessandro, 2007, RECOGNITION 2007 IEE, P1, DOI 10.1109/CVPR.2007.383129
[9]   Partial least squares: a versatile tool for the analysis of high-dimensional genomic data [J].
Boulesteix, Anne-Laure ;
Strimmer, Korbinian .
BRIEFINGS IN BIOINFORMATICS, 2007, 8 (01) :32-44
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32