Multinomial random forest

被引:65
作者
Bai, Jiawang [1 ,2 ]
Li, Yiming [1 ]
Li, Jiawei [1 ]
Yang, Xue [1 ,2 ]
Jiang, Yong [1 ,2 ]
Xia, Shu-Tao [1 ,2 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Peng Cheng Lab, PCL Res Ctr Networks & Commun, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Random forest; Consistency; Differential privacy; Classification; CONSISTENCY; ENSEMBLE;
D O I
10.1016/j.patcog.2021.108331
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the impressive performance of random forests (RF), its theoretical properties have not been thoroughly understood. In this paper, we propose a novel RF framework, dubbed multinomial random forest (MRF), to analyze its consistency and privacy-preservation . Instead of deterministic greedy split rule or with simple randomness, the MRF adopts two impurity-based multinomial distributions to randomly select a splitting feature and a splitting value, respectively. Theoretically, we prove the consistency of MRF and analyze its privacy-preservation within the framework of differential privacy. We also demonstrate with multiple datasets that its performance is on par with the standard RF. To the best of our knowledge, MRF is the first consistent RF variant that has comparable performance to the standard RF. The code is available at https://github.com/jiawangbai/Multinomial- Random-Forest . (c) 2021 Published by Elsevier Ltd.
引用
收藏
页数:13
相关论文
共 46 条
[31]  
Machanavajjhala A., 2007, ACM Transactions on Knowledge Discovery from Data, V1, DOI [DOI 10.1145/1217299.1217302, DOI 10.1109/ICDE.2006.1]
[32]   Mechanism design via differential privacy [J].
McSherry, Frank ;
Talwar, Kunal .
48TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2007, :94-103
[33]   Privacy Integrated Queries: An Extensible Platform for Privacy-Preserving Data Analysis [J].
McSherry, Frank .
COMMUNICATIONS OF THE ACM, 2010, 53 (09) :89-97
[34]  
Meinshausen N, 2006, J MACH LEARN RES, V7, P983
[35]  
Menze BH, 2011, LECT NOTES ARTIF INT, V6912, P453, DOI 10.1007/978-3-642-23783-6_29
[36]   Python']Python for scientific computing [J].
Oliphant, Travis E. .
COMPUTING IN SCIENCE & ENGINEERING, 2007, 9 (03) :10-20
[37]  
Patil A, 2014, 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), P2623, DOI 10.1109/ICACCI.2014.6968348
[38]   Oblique random forest ensemble via Least Square Estimation for time series forecasting [J].
Qiu, Xueheng ;
Zhang, Le ;
Suganthan, Ponnuthurai Nagaratnam ;
Amaratunga, Gehan A. J. .
INFORMATION SCIENCES, 2017, 420 :249-262
[39]   Differentially Private Random Forest with High Utility [J].
Rana, Santu ;
Gupta, Sunil Kumar ;
Venkatesh, Svetha .
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, :955-960
[40]   Supervised quality evaluation of binary partition trees for object segmentation [J].
Randrianasoa, Jimmy Francky ;
Cettour-Janet, Pierre ;
Kurtz, Camille ;
Desjardin, Eric ;
Gancarski, Pierre ;
Bednarek, Nathalie ;
Rousseau, Francois ;
Passat, Nicolas .
PATTERN RECOGNITION, 2021, 111