An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

被引:1
作者
Pochelu, Pierrick [1 ]
Petiton, Serge G. [2 ]
Conche, Bruno [1 ]
机构
[1] TotalEnergies SE, Pau, France
[2] Univ Lille, CNRS, UMR 9189 CRIStAL, Lille, France
来源
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2021年
关键词
Neural network; ensemble learning; inference system;
D O I
10.1109/BigData52589.2021.9671725
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensembles of Deep Neural Networks (DNNs) has achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational resources. Unlike recent initiatives on inference servers and inference frameworks, which focus on the prediction of single DNNs, we propose a new software layer to serve with flexibility and efficiency ensembles of DNNs. Our inference system is designed with several technical innovations. First, we propose a novel procedure to found a good allocation matrix between devices (CPUs or GPUs) and DNN instances. It runs successively a worst-fit to allocate DNNs into the memory devices and a greedy algorithm to optimize allocation settings and speed up the ensemble. Second, we design the inference system based on multiple processes to run asynchronously: batching, prediction, and the combination rule with an efficient internal communication scheme to avoid overhead. Experiments show the flexibility and efficiency under extreme scenarios: It successes to serve an ensemble of 12 heavy DNNs into 4 GPUs and at the opposite, one single DNN multi-threaded into 16 GPUs. It also outperforms the simple baseline consisting of optimizing the batch size of DNNs by a speedup up to 2.7X on the image classification task.
引用
收藏
页码:5225 / 5232
页数:8
相关论文
共 21 条
[1]  
Agrawal A., 2020, 10 C INN DAT SYST RE
[2]  
[Anonymous], 2017, WORKSH ML SYST NIPS
[3]  
Bird D. D. Sarah, 2017, WORKSH NIPS2017
[4]  
Crankshaw D, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P613
[5]  
Davoodi P., 2019, GPU TECHN C
[6]  
DeVito Z., 2021, USING PYTHON MODEL I, Vabs/2104.00254, P2021
[7]  
Fischer Michael J., 1973, THESIS
[8]  
Ge Y., 2018, INFERENCE INTEL
[9]   A Deep CNN Ensemble Framework for Efficient DDoS Attack Detection in Software Defined Networks [J].
Haider, Shahzeb ;
Akhunzada, Adnan ;
Mustafa, Iqra ;
Patel, Tanil Bharat ;
Fernandez, Amanda ;
Choo, Kim-Kwang Raymond ;
Iqbal, Javed .
IEEE ACCESS, 2020, 8 :53972-53983
[10]  
Lakshminarayanan B, 2017, ADV NEUR IN, V30