Common, uncommon, and novel applications of random forest in psychological research

被引:0
作者
Dustin A. Fife
Juliana D’Onofrio
机构
[1] Rowan University,
来源
Behavior Research Methods | 2023年 / 55卷
关键词
Prediction; Classification; Variable importance; Multiple regression;
D O I
暂无
中图分类号
学科分类号
摘要
Recent reform efforts have pushed toward a better understanding of the distinction between exploratory and confirmatory research, and appropriate use of each. As some utilize more exploratory tools, it may be tempting to employ multiple linear regression models. In this paper, we advocate for the use of random forest (RF) models. RF is able to obtain better predictive performance than traditional regression, while also inherently protecting against overfitting as well as detecting nonlinear effects and interactions among predictors. Given the advantages of RF compared to other statistical procedures, it is a tool commonly used within a plethora of industries, including stock trading, banking, pharmaceuticals, and patient healthcare planning. However, we find RF is used within the field of psychology comparatively less frequently. In the current paper, we advocate for RF as an important statistical tool within the context of behavioral and psychological research. In hopes of increasing the use of RF in the field of psychology, we provide information pertaining to the limitations one might confront in using RF and how to overcome such limitations. Moreover, we discuss various methods for how to optimally utilize RF with psychological data, such as nonparametric modeling, interaction and nonlinearity detection, variable selection, prediction and classification modeling, and assessing parameters of Monte Carlo simulations. Throughout, we illustrate the use of RF with visualization strategies, aimed to make RF models more comprehensible and intuitive.
引用
收藏
页码:2447 / 2466
页数:19
相关论文
共 103 条
  • [1] Ammerman BA(2018)Using exploratory data mining to identify important correlates of nonsuicidal self-injury frequency Psychology of Violence 8 515-525
  • [2] Jacobucci R(2001)Random Forests Machine Learning 45 5-32
  • [3] McCloskey MS(2016)Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies Psychonomic Bulletin and Review 23 640-647
  • [4] Breiman L(2013)Mixture class recovery in GMM under varying degrees of class separation: Frequentist versus Bayesian estimation Psychological Methods 18 186-1075
  • [5] Cramer AOJ(2020)The eight steps of data analysis: a graphical framework to promote sound statistical analysis Perspectives on Psychological Science 15 1054-34
  • [6] van Ravenzwaaij D(2012)A new variable importance measure for random forests with missing data Statistics and Computing 24 21-723
  • [7] Matzke D(2007)Change is not always linear: The study of nonlinear and discontinuous patterns of change in psychotherapy Clinical Psychology Review 27 715-23
  • [8] Steingroever H(2020)Sudden gains in day-to-day change: Revealing nonlinear patterns of individual improvement in depression Journal of Consulting and Clinical Psychology 88 119-193
  • [9] Wetzels R(2018)Random forest vs logistic regression: Binary classification for heterogeneous datasets SMU Data Science Review 1 9-81
  • [10] Grasman RPPP(2007)Using classification trees to profile adolescent smoking behaviors Addictive Behaviors 32 9-33