Predictive Modeling With Psychological Panel Data

被引:10
作者
Pargent, Florian [1 ]
Albert-von der Goenna, Johannes [2 ]
机构
[1] LMU, Dept Psychol, Leopoldstr 13, D-80802 Munich, Germany
[2] Bavarian Acad Sci & Humanities LRZ, Leibniz Supercomp Ctr, Garching, Germany
来源
ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY | 2018年 / 226卷 / 04期
关键词
predictive modeling; machine learning; elastic net; random forest; panel data; VARIABLE IMPORTANCE; SHRINKAGE; BIAS; REGULARIZATION; PERFORMANCE; REGRESSION; SELECTION; VARIANCE;
D O I
10.1027/2151-2604/a000343
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Longitudinal panels include several thousand participants and variables. Traditionally, psychologists analyze only a few variables partly because common unregularized linear models perform poorly when the number of variables (p) approaches the number of observations (N). Predictive modeling methods can be used when N approximate to p situations arise in psychological research. We illustrate these techniques on exemplary variables from the German GESIS Panel, while describing the choice of preprocessing, model classes, resampling techniques, hyperparameter tuning, and performance measures. In analyses with about 2,000 subjects and variables each, we predict panelists' gender, sick days, an evaluation of US President Trump, income, life satisfaction, and steep satisfaction. Elastic net and random forest models were compared to dummy predictions in benchmark experiments. While good performance was achieved, the linear elastic net performed similar to the nonlinear random forest. Elastic nets were refitted to extract the ten most important predictors. Their interpretation validates our approach, and further modeling options are discussed. Code can be found at https://osf.io/zpse3/.
引用
收藏
页码:246 / 258
页数:13
相关论文
共 44 条
  • [1] Aust F., 2018, papaja: Create APA manuscripts with R Markdown
  • [2] Resampling Methods for Meta-Model Validation with Recommendations for Evolutionary Computation
    Bischl, B.
    Mersmann, O.
    Trautmann, H.
    Weihs, C.
    [J]. EVOLUTIONARY COMPUTATION, 2012, 20 (02) : 249 - 275
  • [3] Bischl B., 2017, mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions
  • [4] Bischl B, 2016, J MACH LEARN RES, V17
  • [5] Establishing an Open Probability-Based Mixed-Mode Panel of the General Population in Germany: The GESIS Panel
    Bosnjak, Michael
    Dannwolf, Tanja
    Enderle, Tobias
    Schaurer, Ines
    Struminskaya, Bella
    Tanner, Angela
    Weyandt, Kai W.
    [J]. SOCIAL SCIENCE COMPUTER REVIEW, 2018, 36 (01) : 103 - 115
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] MEASURING THE PERFORMANCE OF ORDINAL CLASSIFICATION
    Cardoso, Jaime S.
    Sousa, Ricardo
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2011, 25 (08) : 1173 - 1195
  • [8] Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development
    Chapman, Benjamin P.
    Weiss, Alexander
    Duberstein, Paul R.
    [J]. PSYCHOLOGICAL METHODS, 2016, 21 (04) : 603 - 620
  • [9] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [10] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794