Decision trees in epidemiological research

被引:102
作者
Venkatasubramaniam A. [1 ]
Wolfson J. [2 ]
Mitchell N. [3 ]
Barnes T. [3 ]
Jaka M. [4 ]
French S. [3 ]
机构
[1] Urban Big Data Centre, University of Glasgow, 7 Lilybank Gardens, Glasgow
[2] Division of Biostatistics, University of Minnesota, Twin Cities, MMC 303, 420 Delaware St SE, Minneapolis, 55455, MN
[3] Division of Epidemiology and Community Health, University of Minnesota, Twin Cities, West Bank Office Building, 1300 South Second St, Minneapolis, 55454, MN
[4] Division of Applied Research, Allina Health, 2925 Chicago Ave, Minneapolis, 55407, MN
来源
Emerging Themes in Epidemiology | / 14卷 / 1期
关键词
Decision trees; Predictors; Subgroup heterogeneity;
D O I
10.1186/s12982-017-0064-4
中图分类号
学科分类号
摘要
Background: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation. © 2017 The Author(s).
引用
收藏
相关论文
共 29 条
[1]  
Van Hulst A., Roy-Gagnon M.-H., Gauvin L., Kestens Y., Henderson M., Barnett T.A., Identifying risk profiles for childhood obesity using recursive partitioning based on individual, familial, and neighborhood environment factors, Int J Behav Nutr Phys Act., 12, 1, (2015)
[2]  
Garzotto M., Beer T.M., Hudson R.G., Peters L., Hsieh Y.-C., Barrera E., Klein T., Mori M., Improved detection of prostate cancer using classification and regression tree analysis, J Clin Oncol., 23, 19, pp. 4322-4329, (2005)
[3]  
Ogden C.L., Carroll M.D., Curtin L.R., McDowell M.A., Tabak C.J., Flegal K.M., Prevalence of overweight and obesity in the United States, 1999-2004, Jama, 295, 13, pp. 1549-1555, (2006)
[4]  
Flegal K.M., Kruszon-Moran D., Carroll M.D., Fryar C.D., Ogden C.L., Trends in obesity among adults in the United States, 2005 to 2014, JAMA, 315, 21, pp. 2284-2291, (2016)
[5]  
Gass K., Klein M., Chang H.H., Flanders W.D., Strickland M.J., Classification and regression trees for epidemiologic research: An air pollution example, Environ. Health, 13, 1, (2014)
[6]  
Aguiar F.S., Almeida L.L., Ruffino-Netto A., Kritski A.L., Mello F.C., Werneck G.L., Classification and regression tree (cart) model to predict pulmonary tuberculosis in hospitalized patients, BMC Pulm Med., 12, 1, (2012)
[7]  
Lei Y., Nollen N., Ahluwahlia J.S., Yu Q., Mayo M.S., An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and cart: A heuristic comparison, BMC Public Health, 15, 1, (2015)
[8]  
French S.A., Mitchell N.R., Wolfson J., Harnack L.J., Jeffery R.W., Gerlach A.F., Blundell J.E., Pentel P.R., Portion size effects on weight gain in a free living setting, Obesity, 22, 6, pp. 1400-1405, (2014)
[9]  
French S.A., Mitchell N.R., Wolfson J., Finlayson G., Blundell J.E., Jeffery R.W., Questionnaire and laboratory measures of eating behavior. Associations with energy intake and BMI in a community sample of working adults, Appetite, 72, pp. 50-58, (2014)
[10]  
Stunkard A.J., Messick S., The three-factor eating questionnaire to measure dietary restraint, disinhibition and hunger, J Psychosom Res., 29, 1, pp. 71-83, (1985)