Improved Random Forest for Classification

被引：245

作者：

Paul, Angshuman ^{[1
]}

Mukherjee, Dipti Prasad ^{[1
]}

Das, Prasun ^{[2
]}

Gangopadhyay, Abhinandan ^{[3
]}

Chintha, Appa Rao ^{[4
]}

Kundu, Saurabh ^{[4
]}

机构：

[1] Indian Stat Inst, Elect & Commun Sci Unit, Kolkata 700108, India

[2] Indian Stat Inst, Stat Qual Control & Operat Res Unit, Kolkata 700108, India

[3] Arizona State Univ, Sch Engn Matter Transport & Energy, Tempe, AZ 85281 USA

[4] Tata Steel Ltd, Res & Dev & Sci Serv, Jamshedpur 831001, Bihar, India

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2018年 / 27卷 / 08期

关键词：

Random forest; optimal number of trees; classification accuracy; feature reduction;

D O I：

10.1109/TIP.2018.2834830

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose an improved random forest classifier that performs classification with a minimum number of trees. The proposed method iteratively removes some unimportant features. Based on the number of important and unimportant features, we formulate a novel theoretical upper limit on the number of trees to be added to the forest to ensure improvement in classification accuracy. Our algorithm converges with a reduced but important set of features. We prove that further addition of trees or further reduction of features does not improve classification performance. The efficacy of the proposed approach is demonstrated through experiments on benchmark data sets. We further use the proposed classifier to detect mitotic nuclei in the histopathological data sets of breast tissues. We also apply our method on the industrial data set of dual-phase steel microstructures to classify different phases. Results of our method on different data sets show significant reduction in an average classification error compared with a number of competing methods.

引用

页码：4012 / 4024

页数：13

共 33 条

[1]

[Anonymous], 2013, DECISION FORESTCOM, DOI DOI 10.1007/978-1-4471-4929-3

[2]

Asuncion A., 2007, UCI MACHINE LEARNING

[3]

Breiman L., 2001, Machine Learning, V45, P5

[4] Estimating generalization error on two-class datasets using out-of-bag estimates [J].

Bylander, T .

MACHINE LEARNING, 2002, 48 (1-3) :287-297

[5]

Chapelle O., 2009, SEMISUPERVISED LEARN, V20, P542

[6] Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks [J].

Ciresan, Dan C. ;

Giusti, Alessandro ;

Gambardella, Luca M. ;

Schmidhuber, Juergen .

MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2013, PT II, 2013, 8150 :411-418

[7] An Information-Theoretic Approach for Setting the Optimal Number of Decision Trees in Random Forests [J].

Cuzzocrea, Alfredo ;

Francis, Shane Leo ;

Gaber, Mohamed Medhat .

2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, :1013-1019

[8] Ensemble methods in machine learning [J].

Dietterich, TG .

MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15

[9]

Dua D, 2017, UCI MACHINE LEARNING, DOI DOI 10.1016/J.DSS.2009.05.016

[10]

Freund Y., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P148

← 1 2 3 4 →