Efficient Video Captioning on Heterogeneous System Architectures

被引:3
|
作者
Huang, Horng-Ruey [1 ]
Hong, Ding-Yong [1 ]
Wu, Jan-Jan [1 ]
Liu, Pangfeng [2 ]
Hsu, Wei-Chung [2 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan
来源
2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2021年
关键词
Video captioning; heterogeneous system architectures; model scheduling; dynamic programming; pipelining;
D O I
10.1109/IPDPS49936.2021.00112
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning is the core technology to drive the development of many important multidisciplinary applications, such as Al-assisted medical diagnosis, storytelling through videos, video question answering, lip-reading, just to name a few. Video captioning employs a hybrid CNN+RNN neural network model to translate video scenes into natural language descriptions. For deep learning inference, a typical approach is running both the CNN and the RNN on a GPU. Such a GPU-only approach often suffers long inference time due to underutilization of the computing power offered by the CPU+GPU heterogeneous system architecture, which is a common architecture in modern computers. This work is an early effort to tackle the performance issue of performing deep learning inference using a hybrid CNN+RNN model on a heterogeneous system with a CPU and a GPU. This is a challenging task because of (1) CNN and RNN exhibit very different computing behaviors. This raises the question of how to split the two models into computing tasks and properly assign the tasks to the CPU and the GPU to minimize the inference time for a video frame, and (2) Data dependency exists between the CNN and the RNN within a video frame, as well as between the adjacent RNNs across two video frames. These data dependencies prohibit full parallelization of the hybrid model. To solve these two problems, we propose two optimizations: a fine-grained scheduling scheme for mapping computation and devices within a video frame, and a pipeline scheduling scheme to exploit maximum parallelism between the execution ()I' the video frames. To facilitate our optimizations, we also develop an accurate regression-based cost model to predict the computation time of CNN/RNN operations and the communication time for moving data between CPU and GPU. Experimental results show that our optimization improves the performance of video captioning by up to 3.24x on the CPU+GPU system, compared with the GPU-only execution.
引用
收藏
页码:1035 / 1045
页数:11
相关论文
共 50 条
  • [1] Accelerating Video Captioning on Heterogeneous System Architectures
    Huang, Horng-Ruey
    Hong, Ding-Yong
    Wu, Jan-Jan
    Chen, Kung-Fu
    Liu, Pangfeng
    Hsu, Wei-Chung
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (03)
  • [2] Image and Video Captioning with Augmented Neural Architectures
    Shetty, Rakshith
    Tavakoli, Hamed R.
    Laaksonen, Jorma
    IEEE MULTIMEDIA, 2018, 25 (02) : 34 - 46
  • [3] Towards Human-Interactive Controllable Video Captioning with Efficient Modeling
    Heo, Yoonseok
    Kim, Taehoon
    Kim, Seunghwan
    Seo, Jungyun
    Kim, Juae
    MATHEMATICS, 2024, 12 (13)
  • [4] Video captioning – a survey
    Vaishnavi J.
    Narmatha V.
    Multimedia Tools and Applications, 2025, 84 (2) : 947 - 978
  • [5] Video Captioning based on Image Captioning as Subsidiary Content
    Vaishnavi, J.
    Narmatha, V
    2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [6] Rethink video retrieval representation for video captioning
    Tian, Mingkai
    Li, Guorong
    Qi, Yuankai
    Wang, Shuhui
    Sheng, Quan Z.
    Huang, Qingming
    PATTERN RECOGNITION, 2024, 156
  • [7] A Review Of Video Captioning Methods
    Mahajan, Dewarthi
    Bhosale, Sakshi
    Nighot, Yash
    Tayal, Madhuri
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 708 - 715
  • [8] Sequence in sequence for video captioning
    Wang, Huiyun
    Gao, Chongyang
    Han, Yahong
    PATTERN RECOGNITION LETTERS, 2020, 130 (130) : 327 - 334
  • [9] Multirate Multimodal Video Captioning
    Yang, Ziwei
    Xu, Youjiang
    Wang, Huiyun
    Wang, Bo
    Han, Yahong
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1877 - 1882
  • [10] Video Captioning by Adversarial LSTM
    Yang, Yang
    Zhou, Jie
    Ai, Jiangbo
    Bin, Yi
    Hanjalic, Alan
    Shen, Heng Tao
    Ji, Yanli
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5600 - 5611