Feature Selection Methods for Optimal Design of Studies for Developmental Inquiry

被引:19
作者
Brick, Timothy R. [1 ]
Koffer, Rachel E. [1 ]
Gerstorf, Denis [1 ,2 ,3 ]
Ram, Nilam [1 ,3 ]
机构
[1] Penn State Univ, Dept Human Dev & Family Studies, 115 Hlth & Human Dev Bldg, University Pk, PA 16802 USA
[2] Humboldt Univ, Dept Psychol, Berlin, Germany
[3] German Inst Econ Res DIW, Socioecon Panel, Berlin, Germany
来源
JOURNALS OF GERONTOLOGY SERIES B-PSYCHOLOGICAL SCIENCES AND SOCIAL SCIENCES | 2018年 / 73卷 / 01期
基金
美国国家科学基金会;
关键词
Big data methods; Feature selection; Longitudinal analysis; Measurement; Study design; VARIABLE IMPORTANCE; LIFE SATISFACTION; POWER EQUIVALENCE; LINEAR-REGRESSION; CLASSIFICATION; MODEL; PERSONALITY; STABILITY; PANEL; HOLD;
D O I
10.1093/geronb/gbx008
中图分类号
R592 [老年病学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 100203 ;
摘要
Objectives: As diary, panel, and experience sampling methods become easier to implement, studies of development and aging are adopting more and more intensive study designs. However, if too many measures are included in such designs, interruptions for measurement may constitute a significant burden for participants. We propose the use of feature selection-a data-driven machine learning process-in study design and selection of measures that show the most predictive power in pilot data. Method: We introduce an analytical paradigm based on the feature importance estimation and recursive feature elimination with decision tree ensembles and illustrate its utility using empirical data from the German Socio-Economic Panel (SOEP). Results: We identified a subset of 20 measures from the SOEP data set that maintain much of the ability of the original data set to predict life satisfaction and health across younger, middle, and older age groups. Discussion: Feature selection techniques permit researchers to choose measures that are maximally predictive of relevant outcomes, even when there are interactions or nonlinearities. These techniques facilitate decisions about which measures may be dropped from a study while maintaining efficiency of prediction across groups and reducing costs to the researcher and burden on the participants.
引用
收藏
页码:113 / 123
页数:11
相关论文
共 59 条
[1]   Empirical comparison of tree ensemble variable importance measures [J].
Auret, Lidia ;
Aldrich, Chris .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 105 (02) :157-170
[2]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[3]  
Blum A., 1999, Proceedings of the Twelfth Annual Conference on Computational Learning Theory, P203, DOI 10.1145/307400.307439
[4]  
Brandmaier A., 2014, CONT ISSUES EXPLORAT, P96
[5]   Theory-Guided Exploration With Structural Equation Model Forests [J].
Brandmaier, Andreas M. ;
Prindle, John J. ;
McArdle, John J. ;
Lindenberger, Ulman .
PSYCHOLOGICAL METHODS, 2016, 21 (04) :566-582
[6]   Structural Equation Model Trees [J].
Brandmaier, Andreas M. ;
von Oertzen, Timo ;
McArdle, John J. ;
Lindenberger, Ulman .
PSYCHOLOGICAL METHODS, 2013, 18 (01) :71-86
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]  
Brick Timothy R, 2009, Int Conf Affect Comput Intell Interact Workshops, V10-12, P1
[10]   Purposeful selection of variables in logistic regression [J].
Bursac, Zoran ;
Gauss, C. Heath ;
Williams, David Keith ;
Hosmer, David W. .
SOURCE CODE FOR BIOLOGY AND MEDICINE, 2008, 3 (01)