DNNSplit: Latency and Cost-Efficient Split Point Identification for Multi-Tier DNN Partitioning

被引:1
作者
Kayal, Paridhika [1 ]
Leon-Garcia, Alberto [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 1A1, Canada
关键词
Costs; Computational modeling; Adaptation models; Partitioning algorithms; Complexity theory; Inference algorithms; Quality of service; Cost-efficient; multi-tier; near-edge; INFERENCE ACCELERATION; CLOUD;
D O I
10.1109/ACCESS.2024.3409057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the high computational demands inherent in Deep Neural Network (DNN) executions, multi-tier environments have emerged as preferred platforms for DNN inference tasks. Previous research on partitioning strategies for DNN models typically involved leveraging all layers of the DNN to identify optimal splits aimed at reducing latency or cost. However, due to their computational complexity, these approaches face scalability issues, particularly with models containing hundreds of layers. The novelty of our work lies in uniquely identifying specific split points within various DNN models that consistently lead to efficient latency or cost partitioning. Under the assumption that per unit computing cost decreases in higher tiers and that bandwidth is not free, we show that only these specific split points need to be considered to optimize latency or cost. Importantly, these split points are independent of different infrastructure configurations and bandwidth variations. The key contribution of our work is the significant reduction in the computational complexity of DNN partitioning, making our strategy applicable to models with a large number of layers. Introducing DNNSplit, an adaptive strategy, enables dynamic split decisions in varying conditions with the least complexity. Evaluated across nine DNN models varying in size and architecture, DNNSplit exhibits exceptional effectiveness in optimizing latency and cost. Even for a more substantial model containing 517 layers, it identifies only 5 points as potential split points, thereby reducing the partitioning complexity by more than 100x. This makes DNNSplit especially advantageous for managing larger models. DNNSplit also demonstrates significant improvements for multi-tier deployments compared to single-tier execution, including up to 15x latency speedup, 20x cost reduction, and 5x throughput enhancement.
引用
收藏
页码:80047 / 80061
页数:15
相关论文
共 23 条
[1]  
amazon, AWS GPU Instance
[2]  
Armbrust M., 2009, Above the clouds: A berkeley view of cloud computing
[3]  
arrow, Jetson Nano
[4]   Accelerating DNN Inference by Edge-Cloud Collaboration [J].
Chen, Jianan ;
Qi, Qi ;
Wang, Jingyu ;
Sun, Haifeng ;
Liao, Jianxin .
2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
[5]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[6]  
Hu C, 2019, IEEE INFOCOM SER, P1423, DOI [10.1109/INFOCOM.2019.8737614, 10.1109/infocom.2019.8737614]
[7]   A Thing-Edge-Cloud Collaborative Computing Decision-Making Method for Personalized Customization Production [J].
Jiang, Chun ;
Wan, Jiafu .
IEEE ACCESS, 2021, 9 :10962-10973
[8]   Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge [J].
Kang, Yiping ;
Hauswald, Johann ;
Gao, Cao ;
Rovinski, Austin ;
Mudge, Trevor ;
Mars, Jason ;
Tang, Lingjia .
ACM SIGPLAN NOTICES, 2017, 52 (04) :615-629
[9]  
Li C., 2023, IEEE Trans. Intell. Vehicles, DOI [10.1109/TIV.2023.334650, DOI 10.1109/TIV.2023.334650]
[10]   DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning [J].
Liang, Huanghuang ;
Sang, Qianlong ;
Hu, Chuang ;
Cheng, Dazhao ;
Zhou, Xiaobo ;
Wang, Dan ;
Bao, Wei ;
Wang, Yu .
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) :3111-3125