Optimizing DNN training with pipeline model parallelism for enhanced performance in embedded systems

被引：2

作者：

Al Maruf, Md ^{[1
]}

Azim, Akramul ^{[1
]}

Auluck, Nitin ^{[2
]}

Sahi, Mansi ^{[2
]}

机构：

[1] Ontario Tech Univ, 2000 Simcoe St N, Oshawa, ON L1G 0C5, Canada

[2] Indian Inst Technol Ropar, Ropar 140001, Punjab, India

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2024年 / 190卷

关键词：

Parallel Computing; Machine learning; Model Parallelism; DNN model partitioning; Embedded systems; Embedded software; ARCHITECTURE; EDGE;

D O I：

10.1016/j.jpdc.2024.104890

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Deep Neural Networks (DNNs) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopting large DNN models in embedded systems remains challenging, as most embedded applications are designed with single-core processors in mind. This limits DNN adoption in embedded systems due to inefficient leveraging of model parallelization and workload partitioning. Prior solutions attempt to address these challenges using data and model parallelism. However, they lack in finding optimal DNN model partitions and distributing them efficiently to achieve improved performance. This paper proposes a DNN model parallelism framework to accelerate model training by finding the optimal number of model partitions and resource provisions. The proposed framework combines data and model parallelism techniques to optimize the parallel processing of DNNs for embedded applications. In addition, it implements the pipeline execution of the partitioned models and integrates a task controller to manage the computing resources. The experimental results for image object detection demonstrate the applicability of our proposed framework in estimating the latest execution time and reducing overall model training time by almost 44.87% compared to the baseline AlexNet convolutional neural network (CNN) model.

引用

页数：14

共 48 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2] Accelerating Convolutional Neural Network With FFT on Embedded Hardware [J].

Abtahi, Tahmid ;

Shea, Colin ;

Kulkarni, Amey ;

Mohsenin, Tinoosh .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (09) :1737-1749

[3]

Al Maruf Md, 2023, P CAN C ART INT JUN, V5

[4]

Briggs C., 2021, SCI, P21, DOI DOI 10.1007/978-3-030-70604-3_2

[5] A Survey on Edge and Edge-Cloud Computing Assisted Cyber-Physical Systems [J].

Cao, Kun ;

Hu, Shiyan ;

Shi, Yang ;

Colombo, Armando ;

Karnouskos, Stamatis ;

Li, Xin .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (11) :7806-7819

[6] DTS: Dynamic TDMA scheduling for Networked Control Systems [J].

Chen, Xi ;

Azim, Akramul ;

Liu, Xue ;

Fischmeister, Sebastian ;

Ma, Jun .

JOURNAL OF SYSTEMS ARCHITECTURE, 2014, 60 (02) :194-205

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] Partitioning of CNN Models for Execution on Fog Devices [J].

Dey, Swarnava ;

Mukherjee, Arijit ;

Pal, Arpan ;

Balamuralidhar, P. .

PROCEEDINGS OF THE FIRST WORKSHOP ON SMART CITIES AND FOG COMPUTING (CITIFOG '18), 2018, :19-24

[9] Maximizing Utilization and Minimizing Migration in Thermal-Aware Energy-Efficient Real-Time Multiprocessor Scheduling [J].

Elena Rubio-Anguiano, Laura ;

Chils Trabanco, Abel ;

Briz Velasco, Jose Luis ;

Ramirez-Trevino, Antonio .

IEEE ACCESS, 2021, 9 :83309-83328

[10]

Harlap A, 2018, Arxiv, DOI arXiv:1806.03377

← 1 2 3 4 5 →