ranger: A Fast Implementation of Random Forests for High Dimensional Data in C plus plus and R

被引:1940
作者
Wright, Marvin N. [1 ]
Ziegler, Andreas [2 ,3 ,4 ]
机构
[1] Univ Lubeck, Lubeck, Germany
[2] Univ Lubeck, Univ Klinikum Schleswig Holstein, Inst Med Biometrie & Stat, Ratzeburger Allee 160, D-23562 Lubeck, Germany
[3] Univ Lubeck, Univ Klinikum Schleswig Holstein, Zentrum Klin Studien, Ratzeburger Allee 160, D-23562 Lubeck, Germany
[4] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Durban, South Africa
来源
JOURNAL OF STATISTICAL SOFTWARE | 2017年 / 77卷 / 01期
关键词
C plus; classification; machine learning; R; random forests; Rcpp; recursive partitioning; survival analysis; MACHINE LEARNING-METHODS; PROBABILITY ESTIMATION; ASSOCIATION;
D O I
10.18637/jss.v077.i01
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 28 条
  • [1] [Anonymous], LANG ENV STAT COMP
  • [2] GenABEL: an R library for genome-wide association analysis
    Aulchenko, Yurii S.
    Ripke, Stephan
    Isaacs, Aaron
    Van Duijn, Cornelia M.
    [J]. BIOINFORMATICS, 2007, 23 (10) : 1294 - 1296
  • [3] Bischl B, 2015, J STAT SOFTW, V64, P1
  • [4] Bosch A, 2007, IEEE I CONF COMP VIS, P1863
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Breiman L., 2004, Random Forests
  • [7] Eddelbuettel D, 2011, J STAT SOFTW, V40, P1
  • [8] EVALUATING THE YIELD OF MEDICAL TESTS
    HARRELL, FE
    CALIFF, RM
    PRYOR, DB
    LEE, KL
    ROSATI, RA
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1982, 247 (18): : 2543 - 2546
  • [9] Unbiased recursive partitioning: A conditional inference framework
    Hothorn, Torsten
    Hornik, Kurt
    Zeileis, Achim
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2006, 15 (03) : 651 - 674
  • [10] Ishwaran H, 2015, RANDOMFORESTSRC RAND