Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data

被引:155
作者
Muchlinski, David [1 ]
Siroky, David [2 ]
He, Jingrui [3 ]
Kocher, Matthew [4 ]
机构
[1] Univ Glasgow, Sch Social & Polit Sci, Glasgow, Lanark, Scotland
[2] Arizona State Univ, Dept Polit Sci, Tempe, AZ USA
[3] Arizona State Univ, Dept Comp Sci & Engn, Tempe, AZ 85287 USA
[4] Yale Univ, Dept Polit Sci, New Haven, CT USA
关键词
CONFLICT; MODEL; DEPENDENCIES; SEPARATION;
D O I
10.1093/pan/mpv024
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L-1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.
引用
收藏
页码:87 / 103
页数:17
相关论文
共 60 条
  • [31] Predicting Armed Conflict, 2010-2050
    Hegre, Havard
    Karlsen, Joakim
    Nygard, Havard Mokleiv
    Strand, Havard
    Urdal, Henrik
    [J]. INTERNATIONAL STUDIES QUARTERLY, 2013, 57 (02) : 250 - 270
  • [32] An Empirical Evaluation of Explanations for State Repression
    Hill, Daniel W., Jr.
    Jones, Zachary M.
    [J]. AMERICAN POLITICAL SCIENCE REVIEW, 2014, 108 (03) : 661 - 687
  • [33] Modeling dependencies in international relations networks
    Hoff, PD
    Ward, MD
    [J]. POLITICAL ANALYSIS, 2004, 12 (02) : 160 - 175
  • [34] HOLLAND PW, 1986, J AM STAT ASSOC, V81, P945, DOI 10.2307/2289064
  • [35] Honaker J, 2011, J STAT SOFTW, V45, P1
  • [36] Jones Z., 2015, 73 ANN MPSA C APR 16
  • [37] Kalyvas Stathis., 2007, HDB POLITICAL SCI, P416
  • [38] KING G, 2001, POLIT ANAL, V0009
  • [39] Improving SVM classification on imbalanced time series data sets with ghost points
    Koeknar-Tezel, Suzan
    Latecki, Longin Jan
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 28 (01) : 1 - 23
  • [40] Lee Su-In., 2006, Proceedings of the National Conference on Artificial Intelligence, V21, P401