Efficient Distributed Parallel Inference Strategies via Block-based DNN Structure in Edge-to-IoT Continuum

被引：0

作者：

Choi, Inhun ^{[1
]}

Akhter, Sharmen ^{[1
]}

Jeong, Hong-Ju ^{[2
]}

Huh, Eui-Nam ^{[1
]}

机构：

[1] Kyung Hee Univ, Dept Comp Sci & Engn, Gyeonggi Do 17104, South Korea

[2] Kyung Hee Univ, Dept Artificial Intelligence, Yeonggi Do 17104, South Korea

来源：

PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2024 | 2024年

关键词：

DNN inference; Block-based DNN network; trade-off between latency; and accuracy; ACCELERATION;

D O I：

10.1145/3654522.3654598

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, AI and deep neural networks have found extensive applications in mobile devices, drones, carts, and more. To meet the demands of processing large-scale data and providing DNN inference services with minimal latency, there is a need. However, IoT devices, with their limited computing capabilities, are not well-suited for AI inference. Moreover, considering the diverse requirements of different services, it is necessary to provide inference services that address these variations. To address these challenges, many previous studies have explored collaborative approaches between edge servers and cloud servers by partitioning DNN models. However, these methods face difficulties in finding optimal partitioning points for splitting DNN models and are heavily influenced by network bandwidth since intermediate computation results need to be transmitted to other devices. In this paper, we propose the Adaptive block-based DNN network inference framework. This involves breaking down a large DNN model into block-level networks, training them using knowledge distillation techniques to enable inference only through each block network. Subsequently, dynamic block-level inference calculations are offloaded based on the computing capabilities of edge clusters, providing inference results. Even when using multiple devices, our method is not affected by network bandwidth since only input images need to be transmitted. Experimental results demonstrate that our approach consistently reduces inference latency as the number of devices increases. Additionally, by controlling the trade-off between latency and accuracy, we can provide inference services tailored to various latency requirements.

引用

页码：505 / 511

页数：7

共 1 条

[1] eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference
Huang, Chao-Tsung
Ding, Yu-Chun
Wang, Huan-Ching
Weng, Chi-Wen
Lin, Kai-Ping
Wang, Li-Wei
Chen, Li-De
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 182 - 195

← 1 →