Accelerating Random Forest on Memory-Constrained Devices Through Data Storage Optimization

被引:5
作者
Slimani, Camelia [1 ]
Wu, Chun-Feng [2 ]
Rubini, Stephane [1 ]
Chang, Yuan-Hao [3 ]
Boukhobza, Jalil [4 ]
机构
[1] Univ Brest, Lab STICC, CNRS, Brest F-29200, France
[2] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan
[3] Acad Sinica, Inst Informat Sci, Taipei 11529, Taiwan
[4] CNRS, Lab STICC, ENSTA Bretagne, F-29200 Brest, France
关键词
Decision trees; Random forests; Buildings; Radio frequency; Optimization; Training; Loading; memory hierarchy; I/O accesses reduction; embedded systems;
D O I
10.1109/TC.2022.3215898
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Random forests is a widely used classification algorithm. It consists of a set of decision trees each of which is a classifier built on the basis of a random subset of the training data-set. In an environment where the memory work-space is low in comparison to the data-set size, when training a decision tree, a large proportion of the execution time is related to I/O operations. These are caused by data blocks transfers between the storage device and the memory work-space (in both directions). Our analysis of random forests training algorithms showed that there are two major issues : (1) Block Under-utilization: data blocks are poorly used when loaded into memory and have to be reloaded multiple times, meaning that the algorithm exhibits a poor spatial locality; (2) Data Over-read: the data-set is supposed to be fully loaded in memory whereas a large proportion of data are not effectively useful when building a decision tree. Our proposed solution is structured to address these two issues. First, we propose to reorganize the data-set in such a way to enhance spatial locality and second, to remove the assumption that the data-set is entirely loaded into memory and access data only when effectively needed. Our experiments show that this method made it possible to reduce random forest building time by 51 to 95% in comparison to a state-of-the-art method.
引用
收藏
页码:1595 / 1609
页数:15
相关论文
共 40 条
[1]   Quantitative Regular Expressions for Monitoring Cardiac Arrhythmias [J].
Abbas, Houssam ;
Alur, Rajeev ;
Mamouras, Konstantinos ;
Mangharam, Rahul ;
Rodionova, Alena .
2018 IEEE 3RD WORKSHOP ON MONITORING AND TESTING OF CYBER-PHYSICAL SYSTEMS (MT-CPS 2018), 2018, :1-2
[2]   An IoT-Based Non-Invasive Glucose Level Monitoring System Using Raspberry Pi [J].
Alarcon-Paredes, Antonio ;
Francisco-Garcia, Victor ;
Guzman-Guzman, Iris P. ;
Cantillo-Negrete, Jessica ;
Cuevas-Valencia, Rene E. ;
Alonso-Silverio, Gustavo A. .
APPLIED SCIENCES-BASEL, 2019, 9 (15)
[3]  
Anghel A, 2019, Arxiv, DOI arXiv:1910.06853
[4]  
[Anonymous], 2021, WELCOME PYJOULESS DO
[5]   A Trustworthy Privacy Preserving Framework for Machine Learning in Industrial IoT Systems [J].
Arachchige, Pathum Chamikara Mahawaga ;
Bertok, Peter ;
Khalil, Ibrahim ;
Liu, Dongxi ;
Camtepe, Seyit ;
Atiquzzaman, Mohammed .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (09) :6092-6102
[6]  
Bickel PJ, 1997, STAT SINICA, V7, P1
[7]  
Boukhobza J, 2017, ENERG MANAG EMBED S, P1
[8]  
Bovet D., 2005, UNDERSTANDING LINUX, V3rd
[9]   Machine Learning in Resource-Scarce Embedded Systems, FPGAs, and End-Devices: A Survey [J].
Branco, Sergio ;
Ferreira, Andre G. ;
Cabral, Jorge .
ELECTRONICS, 2019, 8 (11)
[10]   Guidelines for enhancing data locality in selected machine learning algorithms [J].
Chakroun, Imen ;
Vander Aa, Tom ;
Ashby, Tomas J. .
INTELLIGENT DATA ANALYSIS, 2019, 23 (05) :1003-1020