TensorRT-Based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards

被引：40

作者：

Jeong, Eunjin ^{[1
]}

Kim, Jangryul ^{[1
]}

Ha, Soonhoi ^{[1
]}

机构：

[1] Seoul Natl Univ, 1 Gwanak Ro, Seoul 08826, South Korea

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2022年 / 21卷 / 05期

关键词：

Deep learning; optimization; framework; acceleration;

D O I：

10.1145/3508391

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As deep learning inference applications are increasing in embedded devices, an embedded device tends to equip neural processing units (NPUs) in addition to a multi-core CPU and a GPU. NVIDIA Jetson AGX Xavier is an example. For fast and efficient development of deep learning applications, TensorRT is provided as the SDK for high-performance inference, including an optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. Like most deep learning frameworks, TensorRT assumes that the inference is executed on a single processing element, GPU or NPU, not both. In this article, we present a TensorRT-based framework supporting various optimization parameters to accelerate a deep learning application targeted on an NVIDIA Jetson embedded platform with heterogeneous processors, including multi-threading, pipelining, buffer assignment, and network duplication. Since the design space of allocating layers to diverse processing elements and optimizing other parameters is huge, we devise a parameter optimization methodology that consists of a heuristic for balancing pipeline stages among heterogeneous processors and fine-tuning the process for optimizing parameters. With nine real-life benchmarks, we could achieve 101%similar to 680% performance improvement and up to 55% energy reduction over the baseline inference using a GPU only.

引用

页数：26

共 36 条

[1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2] [Anonymous], 2021, NVIDIA TENSORRT
[3] [Anonymous], 2021, NVIDIA JETSON PLATFO
[4] Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
[5] Chen TQ, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P579
[6] CodaLab, 2021, US
[7] Densenet201+Yolo, 2020, US
[8] Fowler Martin, 2002, Patterns of Enterprise Application Architecture
[9] Google TensorFlow Lite, 2021, US
[10] CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms
Hegde, Gopalakrishna
Siddhartha
Ramasamy, Nachiappan
Kapre, Nachiket
[J]. 2016 INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURE AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES), 2016,

← 1 2 3 4 →