Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development

被引:31
作者
Chapman, Benjamin P. [1 ,2 ]
Weiss, Alexander [3 ]
Duberstein, Paul R. [1 ]
机构
[1] Univ Rochester, Med Ctr, Dept Psychiat, 300 Crittenden Blvd, Rochester, NY 14642 USA
[2] Univ Rochester, Med Ctr, Dept Publ Hlth Sci, Rochester, NY 14642 USA
[3] Univ Edinburgh, Sch Philosophy Psychol & Language Sci, Edinburgh, Midlothian, Scotland
关键词
statistical learning theory; machine learning theory; psychometrics; personality; mortality; INTEGRATIVE DATA-ANALYSIS; BOOSTED REGRESSION; MISSING DATA; SELECTION; REGULARIZATION; INFORMATION; PERSONALITY; QUALITY; MODELS; BIAS;
D O I
10.1037/met0000088
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression.
引用
收藏
页码:603 / 620
页数:18
相关论文
共 65 条
  • [1] Anastasi A., 1997, Psychological testing
  • [2] [Anonymous], MBOOST MODEL BASED B
  • [3] [Anonymous], GLMNET LASSO ELASTIC
  • [4] [Anonymous], SUPERPC SUPERVISED P
  • [5] [Anonymous], 2010, I MATH STAT ONOGRAPH
  • [6] [Anonymous], 2013, Applied Predictive Modeling, DOI DOI 10.1007/978-1-4614-6849-3
  • [7] [Anonymous], APPL MULTIPLE REGRES
  • [8] [Anonymous], 1964, MANUAL EYSENCK PERSO
  • [9] [Anonymous], 2001, 3D DATA MANAGEMENT C
  • [10] [Anonymous], 1980, Multivariate Analysis