That Takes the BISCUIT Predictive Accuracy and Parsimony of Four Statistical Learning Techniques in Personality Data, With Data Missingness Conditions

被引:24
作者
Elleman, Lorien G. [1 ]
McDougald, Sarah K. [1 ]
Condon, David M. [2 ]
Revelle, William [1 ]
机构
[1] Northwestern Univ, Dept Psychol, Swift Hall 102,2029 Sheridan Rd, Evanston, IL 60208 USA
[2] Univ Oregon, Dept Psychol, Eugene, OR 97403 USA
关键词
statistical learning; machine learning; personality; nuances; Big Five; LINEAR-MODELS; REGRESSION; FACETS; REGULARIZATION; SELECTION; TRAITS; VALUES; COMMON;
D O I
10.1027/1015-5759/a000590
中图分类号
B849 [应用心理学];
学科分类号
040203 ;
摘要
The predictive accuracy of personality-criterion regression models may be improved with statistical learning (SL) techniques. This study introduced a novel SL technique, BISCUIT (Best Items Scale that is Cross-validated, Unit-weighted, Informative, and Transparent). The predictive accuracy and parsimony of BISCUIT were compared with three established SL techniques (the lasso, elastic net, and random forest) and regression using two sets of scales, for five criteria, across five levels of data missingness. BISCUIT's predictive accuracy was competitive with other SL techniques at higher levels of data missingness. BISCUIT most frequently produced the most parsimonious SL model. In terms of predictive accuracy, the elastic net and lasso dominated other techniques in the complete data condition and in conditions with up to 50% data missingness. Regression using 27 narrow traits was an intermediate choice for predictive accuracy. For most criteria and levels of data missingness, regression using the Big Five had the worst predictive accuracy. Overall, loss in predictive accuracy due to data missingness was modest, even at 90% data missingness. Findings suggest that personality researchers should consider incorporating planned data missingness and SL techniques into their designs and analyses.
引用
收藏
页码:948 / 958
页数:11
相关论文
共 47 条
  • [1] [Anonymous], 2019, R LANG ENV STAT COMP
  • [2] [Anonymous], 2019, RStudio: Integrated Development Environment for R
  • [3] A survey of cross-validation procedures for model selection
    Arlot, Sylvain
    Celisse, Alain
    [J]. STATISTICS SURVEYS, 2010, 4 : 40 - 79
  • [4] Multiple imputation for continuous variables using a Bayesian principal component analysis
    Audigier, Vincent
    Husson, Francois
    Josse, Julie
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2016, 86 (11) : 2140 - 2156
  • [5] A principal component method to impute missing values for mixed data
    Audigier, Vincent
    Husson, Francois
    Josse, Julie
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2016, 10 (01) : 5 - 26
  • [6] Bagging predictors
    Breiman, L
    [J]. MACHINE LEARNING, 1996, 24 (02) : 123 - 140
  • [7] Breiman L., 2001, RANDOM FORESTS, V45, P5, DOI DOI 10.1023/A:1010933404324
  • [8] EMPIRICAL-MODELS FOR THE SPATIAL-DISTRIBUTION OF WILDLIFE
    BUCKLAND, ST
    ELSTON, DA
    [J]. JOURNAL OF APPLIED ECOLOGY, 1993, 30 (03) : 478 - 495
  • [9] Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development
    Chapman, Benjamin P.
    Weiss, Alexander
    Duberstein, Paul R.
    [J]. PSYCHOLOGICAL METHODS, 2016, 21 (04) : 603 - 620
  • [10] Condon D.M., 2018, The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model, DOI [DOI 10.31234/OSF.IO/SC4P9, 10.31234/ osf.io/sc4p9]