Latency-Aware Unified Dynamic Networks for Efficient Image Recognition

被引:1
|
作者
Han, Yizeng [1 ]
Liu, Zeyu [2 ]
Yuan, Zhihang [3 ]
Pu, Yifan [1 ]
Wang, Chaofei [1 ]
Song, Shiji [1 ]
Huang, Gao [4 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[3] Houmo AI, Beijing 100088, Peoples R China
[4] Tsinghua Univ, Beijing Acad Artificial Intelligence, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Dynamic networks; efficient inference; convolutional neural networks; vision transformers;
D O I
10.1109/TPAMI.2024.3393530
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dynamic networks have become a pivotal area of study in deep learning due to their ability to selectively activate computing units (such as layers or channels) or dynamically allocate computation to information-rich regions. This capability significantly curtails unnecessary computations, adapting to varying inputs. Despite these advantages, the practical efficiency of dynamic models often falls short of theoretical computation. This discrepancy arises from three primary challenges: 1) a lack of a unified framework across different dynamic inference paradigms due to the fragmented research landscape; 2) an excessive focus on algorithm design at the expense of scheduling strategies, which are essential for optimizing resource utilization on hardware; and 3) the complexity of latency evaluation, since most current libraries cater to static operators. To tackle these issues, we introduce Latency-Aware Unified Dynamic Networks (LAUDNet), a general framework that integrates three fundamental dynamic paradigms-spatially-adaptive computation, layer skipping, and channel skipping-into a single unified formulation. LAUDNet not only refines algorithmic design but also enhances scheduling optimization with the aid of a latency predictor. This predictor efficiently and accurately predicts the inference latency of dynamic operators on specific hardware setups. Our empirical assessments across multiple vision tasks-image classification, object detection, and instance segmentation-confirm that LAUDNet significantly bridges the gap between theoretical and practical efficiency. For instance, LAUDNet cuts down the practical latency of its static counterpart, ResNet-101, by over 50% on hardware platforms like V100, RTX 3090, and TX2 GPUs. Additionally, LAUDNet excels in the accuracy-efficiency trade-off compared to other methods.
引用
收藏
页码:7760 / 7774
页数:15
相关论文
共 50 条
  • [21] Latency-Aware Routing with Bandwidth Assignment for Software Defined Networks
    Zhang, Qiongyu
    Zhu, Liehuang
    Shen, Meng
    Wang, Mingzhong
    Li, Fan
    2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
  • [22] LAWIN : a Latency-AWare InterNet Architecture for Latency Support on Best-Effort Networks
    Kobayashi, Katsushi
    2015 IEEE 16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (HPSR), 2015, : 180 - 187
  • [23] TriBHMM: An Energy-Efficient and Latency-Aware Hybrid Main Memory
    Zhang, Hong
    Wang, Xiaojun
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1451 - 1456
  • [24] Energy-Efficient and Latency-Aware Data Routing in Small-World Internet of Drone Networks
    Yeduri, Sreenivasa Reddy
    Jeeru, Sindhusha
    Pandey, Om Jee
    Cenkeramaddi, Linga Reddy
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (06): : 6555 - 6565
  • [25] An Energy Efficient Sensor Network Processor with Latency-Aware Adaptive Compression
    Liu, Yongpan
    Li, Shuangchen
    Wang, Jue
    Ying, Beihua
    Yang, Huazhong
    IEICE TRANSACTIONS ON ELECTRONICS, 2011, E94C (07): : 1220 - 1228
  • [26] Latency-aware flow allocation in 5G NGFI networks
    Klinkowski, Miroslaw
    Mrozinski, Damian
    2020 22ND INTERNATIONAL CONFERENCE ON TRANSPARENT OPTICAL NETWORKS (ICTON 2020), 2020,
  • [27] Latency-Aware Dynamic Server and Cooling Capacity Provisioner for Data Centers
    Desu, Anuroop
    Puvvadi, Udaya
    Stachecki, Tyler
    Vishwakarma, Sagar
    Khalili, Sadegh
    Ghose, Kanad
    Sammakia, Bahgat G.
    PROCEEDINGS OF THE 2021 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '21), 2021, : 335 - 349
  • [28] LAC: Introducing Latency-Aware Caching in Information-Centric Networks
    Carofiglio, Giovanna
    Mekinda, Leonce
    Muscariello, Luca
    40TH ANNUAL IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2015), 2015, : 422 - 425
  • [29] Latency-Aware Path Planning for Disconnected Sensor Networks With Mobile Sinks
    Liu, Xuxun
    Qiu, Tie
    Zhou, Xiaobo
    Wang, Tian
    Yang, Lei
    Chang, Victor
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (01) : 350 - 361
  • [30] Performance Analysis of Latency-Aware Data Management in Industrial IoT Networks
    Raptis, Theofanis P.
    Passarella, Andrea
    Conti, Marco
    SENSORS, 2018, 18 (08)