Suitability of random forest analysis for epidemiological research: Exploring sociodemographic and lifestyle-related risk factors of overweight in a cross-sectional design

被引:23
作者
Kanerva, Noora [1 ,2 ]
Kontto, Jukka [2 ]
Erkkola, Maijaliisa [3 ]
Nevalainen, Jaakko [4 ]
Mannisto, Satu [2 ]
机构
[1] Univ Helsinki, Dept Publ Hlth, POB 20, Helsinki 00140, Finland
[2] Natl Inst Hlth & Welf, Dept Publ Hlth Solut, Helsinki, Finland
[3] Univ Helsinki, Nutr Unit, Helsinki, Finland
[4] Univ Tampere, Sch Hlth Sci, Tampere, Finland
关键词
Machine learning; mutual importance; obesity; random forest; risk factor; FOOD FREQUENCY QUESTIONNAIRE; VALIDITY; CLASSIFICATION; REGRESSION;
D O I
10.1177/1403494817736944
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Aims: Factors that contribute to the development of overweight are numerous and form a complex structure with many unknown interactions and associations. We aimed to explore this structure (i.e. the mutual importance or hierarchy of sociodemographic and lifestyle-related risk factors of being overweight) using a machine-learning technique called random forest (RF). The results were compared with traditional logistic regression (LR) analysis. Methods: The cross-sectional FINRISK 2007 Study included 4757 Finns (aged 25-74 years). Information on participants' lifestyle and sociodemographic characteristics were collected with questionnaires. Diet was assessed, using a validated food-frequency questionnaire. Height and weight were measured. Participants with a body mass index (BMI) 25 kg/m(2) were classified as overweight. R-statistical software was used to run RF analysis (randomForest') to derive estimates for variable importance and out-of-bag error, which were compared to a LR model. Results: In total, 704 (32%) men and 1119 (44%) women had normal BMI, whereas 1502 (69%) men and 1432 (57%) women had BMI 25. Estimated error rates for the models were similar (RF vs. LR: 42% vs. 40% for men, 38% vs. 35% for women). Both models ranked age, education and physical activity as the most important risk factors for being overweight, but RF ranked macronutrients (carbohydrates and protein) as more important compared to LR. Conclusions: RF did not demonstrate higher power in variable selection compared to LR in our study. The features of RF are more likely to appear beneficial in settings with a larger number of predictors.
引用
收藏
页码:557 / 564
页数:8
相关论文
共 30 条
[21]   Inaccuracies in food and physical activity diaries of obese subjects: complementary evidence from doubly labeled water and co-twin assessments [J].
Pietilainen, K. H. ;
Korkeila, M. ;
Bogl, L. H. ;
Westerterp, K. R. ;
Yki-Jarvinen, H. ;
Kaprio, J. ;
Rissanen, A. .
INTERNATIONAL JOURNAL OF OBESITY, 2010, 34 (03) :437-445
[22]   Dietary survey methodology of FINDIET 2007 with a risk assessment perspective [J].
Reinivuo, Heli ;
Hirvonen, Tero ;
Ovaskainen, Marja-Leena ;
Korhonen, Tommi ;
Valsta, Liisa M. .
PUBLIC HEALTH NUTRITION, 2010, 13 (6A) :915-919
[23]   Lifestyle correlates of overweight in adults: a hierarchical approach (the SPOTLIGHT project) [J].
Roda, Celina ;
Charreire, Helene ;
Feuillet, Thierry ;
Mackenbach, Joreintje D. ;
Compernolle, Sofie ;
Glonti, Ketevan ;
Bardos, Helga ;
Rutter, Harry ;
McKee, Martin ;
Brug, Johannes ;
De Bourdeaudhuij, Ilse ;
Lakerveld, Jeroen ;
Oppert, Jean-Michel .
INTERNATIONAL JOURNAL OF BEHAVIORAL NUTRITION AND PHYSICAL ACTIVITY, 2016, 13
[24]   Elevated BMI and Male Sex Are Associated with Greater Underreporting of Caloric Intake as Assessed by Doubly Labeled Water [J].
Stice, Eric ;
Palmrose, Christina A. ;
Burger, Kyle S. .
JOURNAL OF NUTRITION, 2015, 145 (10) :2412-2418
[25]  
Tolonen H., 2008, Recommendations for the Health Examination Surveys in Europe
[26]   Children at high risk for overweight: A classification and regression trees analysis approach [J].
Toschke, AM ;
Beyerlein, A ;
von Kries, R .
OBESITY RESEARCH, 2005, 13 (07) :1270-1274
[27]   Thirty-five-year trends in cardiovascular risk factors in Finland [J].
Vartiainen, Erkki ;
Laatikainen, Tiina ;
Peltonen, Markku ;
Juolevi, Anne ;
Mannisto, Satu ;
Sundvall, Jouko ;
Jousilahti, Pekka ;
Salomaa, Veikko ;
Valsta, Liisa ;
Puska, Pekka .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2010, 39 (02) :504-518
[28]   Physical activity and sleep profiles in Finnish men and women [J].
Wennman, Heini ;
Kronholm, Erkki ;
Partonen, Timo ;
Tolvanen, Asko ;
Peltonen, Markku ;
Vasankari, Tommi ;
Borodulin, Katja .
BMC PUBLIC HEALTH, 2014, 14
[29]  
WILLETT W, 1986, AM J EPIDEMIOL, V124, P17, DOI 10.1093/oxfordjournals.aje.a114366
[30]   Diet quality - what is it and does it matter? [J].
Wirt, Annika ;
Collins, Clare E. .
PUBLIC HEALTH NUTRITION, 2009, 12 (12) :2473-2492