DNNSplit: Latency and Cost-Efficient Split Point Identification for Multi-Tier DNN Partitioning

被引:1
作者
Kayal, Paridhika [1 ]
Leon-Garcia, Alberto [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 1A1, Canada
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Costs; Computational modeling; Adaptation models; Partitioning algorithms; Complexity theory; Inference algorithms; Quality of service; Cost-efficient; multi-tier; near-edge; INFERENCE ACCELERATION; CLOUD;
D O I
10.1109/ACCESS.2024.3409057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the high computational demands inherent in Deep Neural Network (DNN) executions, multi-tier environments have emerged as preferred platforms for DNN inference tasks. Previous research on partitioning strategies for DNN models typically involved leveraging all layers of the DNN to identify optimal splits aimed at reducing latency or cost. However, due to their computational complexity, these approaches face scalability issues, particularly with models containing hundreds of layers. The novelty of our work lies in uniquely identifying specific split points within various DNN models that consistently lead to efficient latency or cost partitioning. Under the assumption that per unit computing cost decreases in higher tiers and that bandwidth is not free, we show that only these specific split points need to be considered to optimize latency or cost. Importantly, these split points are independent of different infrastructure configurations and bandwidth variations. The key contribution of our work is the significant reduction in the computational complexity of DNN partitioning, making our strategy applicable to models with a large number of layers. Introducing DNNSplit, an adaptive strategy, enables dynamic split decisions in varying conditions with the least complexity. Evaluated across nine DNN models varying in size and architecture, DNNSplit exhibits exceptional effectiveness in optimizing latency and cost. Even for a more substantial model containing 517 layers, it identifies only 5 points as potential split points, thereby reducing the partitioning complexity by more than 100x. This makes DNNSplit especially advantageous for managing larger models. DNNSplit also demonstrates significant improvements for multi-tier deployments compared to single-tier execution, including up to 15x latency speedup, 20x cost reduction, and 5x throughput enhancement.
引用
收藏
页码:80047 / 80061
页数:15
相关论文
共 23 条
  • [1] amazon, AWS GPU Instance
  • [2] Armbrust M., 2009, Rep. UCB/EECS, P1
  • [3] arrow, Jetson Nano
  • [4] Accelerating DNN Inference by Edge-Cloud Collaboration
    Chen, Jianan
    Qi, Qi
    Wang, Jingyu
    Sun, Haifeng
    Liao, Jianxin
    [J]. 2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [5] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [6] Hu C, 2019, IEEE INFOCOM SER, P1423, DOI [10.1109/infocom.2019.8737614, 10.1109/INFOCOM.2019.8737614]
  • [7] A Thing-Edge-Cloud Collaborative Computing Decision-Making Method for Personalized Customization Production
    Jiang, Chun
    Wan, Jiafu
    [J]. IEEE ACCESS, 2021, 9 : 10962 - 10973
  • [8] Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge
    Kang, Yiping
    Hauswald, Johann
    Gao, Cao
    Rovinski, Austin
    Mudge, Trevor
    Mars, Jason
    Tang, Lingjia
    [J]. ACM SIGPLAN NOTICES, 2017, 52 (04) : 615 - 629
  • [9] Li C., 2023, IEEE Trans. Intell. Vehicles, DOI [10.1109/TIV.2023.334650, DOI 10.1109/TIV.2023.334650]
  • [10] DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
    Liang, Huanghuang
    Sang, Qianlong
    Hu, Chuang
    Cheng, Dazhao
    Zhou, Xiaobo
    Wang, Dan
    Bao, Wei
    Wang, Yu
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3111 - 3125