Latency-Aware Unified Dynamic Networks for Efficient Image Recognition

被引:1
|
作者
Han, Yizeng [1 ]
Liu, Zeyu [2 ]
Yuan, Zhihang [3 ]
Pu, Yifan [1 ]
Wang, Chaofei [1 ]
Song, Shiji [1 ]
Huang, Gao [4 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[3] Houmo AI, Beijing 100088, Peoples R China
[4] Tsinghua Univ, Beijing Acad Artificial Intelligence, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Dynamic networks; efficient inference; convolutional neural networks; vision transformers;
D O I
10.1109/TPAMI.2024.3393530
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dynamic networks have become a pivotal area of study in deep learning due to their ability to selectively activate computing units (such as layers or channels) or dynamically allocate computation to information-rich regions. This capability significantly curtails unnecessary computations, adapting to varying inputs. Despite these advantages, the practical efficiency of dynamic models often falls short of theoretical computation. This discrepancy arises from three primary challenges: 1) a lack of a unified framework across different dynamic inference paradigms due to the fragmented research landscape; 2) an excessive focus on algorithm design at the expense of scheduling strategies, which are essential for optimizing resource utilization on hardware; and 3) the complexity of latency evaluation, since most current libraries cater to static operators. To tackle these issues, we introduce Latency-Aware Unified Dynamic Networks (LAUDNet), a general framework that integrates three fundamental dynamic paradigms-spatially-adaptive computation, layer skipping, and channel skipping-into a single unified formulation. LAUDNet not only refines algorithmic design but also enhances scheduling optimization with the aid of a latency predictor. This predictor efficiently and accurately predicts the inference latency of dynamic operators on specific hardware setups. Our empirical assessments across multiple vision tasks-image classification, object detection, and instance segmentation-confirm that LAUDNet significantly bridges the gap between theoretical and practical efficiency. For instance, LAUDNet cuts down the practical latency of its static counterpart, ResNet-101, by over 50% on hardware platforms like V100, RTX 3090, and TX2 GPUs. Additionally, LAUDNet excels in the accuracy-efficiency trade-off compared to other methods.
引用
收藏
页码:7760 / 7774
页数:15
相关论文
共 50 条
  • [1] Latency-aware Spatial-wise Dynamic Networks
    Han, Yizeng
    Yuan, Zhihang
    Pu, Yifan
    Xue, Chenhao
    Song, Shiji
    Sun, Guangyu
    Huang, Gao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Reprovisioning for latency-aware dynamic service chaining in metro networks
    Askari, Leila
    Musumeci, Francesco
    Tornatore, Massimo
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2020, 12 (11) : 355 - 366
  • [3] Latency-aware Traffic Grooming for Dynamic Service Chaining in Metro Networks
    Askari, Leila
    Musumeci, Francesco
    Tornatore, Massimo
    ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
  • [4] Toward Latency-Aware Dynamic Middlebox Scheduling
    Duan, Pengfei
    Li, Qing
    Jiang, Yong
    Xia, Shu-Tao
    24TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS ICCCN 2015, 2015,
  • [5] Latency-aware reinforced routing for opportunistic networks
    Sharma, Deepak Kumar
    Gupta, Sarthak
    Malik, Shubham
    Kumar, Rohit
    IET COMMUNICATIONS, 2020, 14 (17) : 2981 - 2989
  • [6] Energy-Efficient and Latency-Aware Message Replica Transmission in IoT Networks
    Rathi, Sonu
    Borkotoky, Siddhartha S.
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (04) : 6573 - 6581
  • [7] Optimization of latency-aware flow allocation in NGFI networks
    Klinkowski, Miroslaw
    COMPUTER COMMUNICATIONS, 2020, 161 : 344 - 359
  • [8] Latency-Aware Offloading for Mobile Edge Computing Networks
    Feng, Wei
    Liu, Hao
    Yao, Yingbiao
    Cao, Diqiu
    Zhao, Mingxiong
    IEEE COMMUNICATIONS LETTERS, 2021, 25 (08) : 2673 - 2677
  • [9] Latency-aware Traffic Provisioning for Content Delivery Networks
    Hei, Jinghao
    Than, Huiyou
    Zhang, Pengfei
    Tan, Haisheng
    2022 8TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS, BIGCOM, 2022, : 11 - 18
  • [10] Latency-Aware Offloading in Integrated Satellite Terrestrial Networks
    Abderrahim, Wiem
    Amin, Osama
    Alouini, Mohamed-Slim
    Shihada, Basem
    IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2020, 1 : 490 - 500