Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding

被引:2
作者
Wang, Lipeng [1 ]
Luo, Qiong [1 ]
Yan, Shengen [2 ]
机构
[1] HKUST, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] SenseTime Res, Shenzhen, Peoples R China
来源
2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) | 2020年
关键词
deep learning; image decoding; parallel processing; heterogeneous processing; GPU;
D O I
10.1109/ICPADS51040.2020.00045
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy data in the decoder is performed sequentially, and this sequential decoding repeats with the DL iterations, which takes significant time; (2) Current parallel decoding methods under-utilize the massive hardware threads on GPUs. To reduce the image decoding time, we introduce a pre-scan mechanism to avoid the repeated image scanning in DL tasks. Our pre-scan generates boundary markers for entropy data so that the decoding can be performed in parallel. To cooperate with the existing dataset storage and caching, systems, we propose two modes of the pre-scan mechanism: a compatible mode and a fist mode. The compatible mode does not change the image file structure so pre-scanned files can be stored back to disk for subsequent DL tasks. In comparison, the fast mode crafts a JPEG image into a binary format suitable for parallel decoding, which can be processed directly on the GPU. Since the GPU has thousands of hardware threads, we propose a fine-grained parallel decoding method on the pre-scanned dataset. The fine-grained parallelism utilizes the GPU effectively, and achieves speedups of around 1.5x over existing GPU-assisted image decoding libraries on real-world DL tasks.
引用
收藏
页码:274 / 281
页数:8
相关论文
共 30 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Ahmadi-Shokouh J., 2009, GLOBECOM '09, P1
  • [3] Aspar Z., 2000, PARALLEL HUFFMAN DEC, V1
  • [4] High Prevalence of Assisted Injection Among Street-Involved Youth in a Canadian Setting
    Cheng, Tessa
    Kerr, Thomas
    Small, Will
    Dong, Huiru
    Montaner, Julio
    Wood, Evan
    DeBeck, Kora
    [J]. AIDS AND BEHAVIOR, 2016, 20 (02) : 377 - 384
  • [5] DLBooster: Boosting End-to-End Deep Learning Workflows with Offloading Data Preprocessing Pipelines
    Cheng, Yang
    Li, Dan
    Guo, Zhiyuan
    Jiang, Binyao
    Lin, Jiaxin
    Fan, Xi
    Geng, Jinkun
    Yu, Xinyi
    Bai, Wei
    Qu, Lei
    Shu, Ran
    Cheng, Peng
    Xiong, Yongqiang
    Wu, Jianping
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [6] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [7] GPU-accelerated DXT and JPEG compression schemes for low-latency network transmissions of HD, 2K, and 4K video
    Holub, Petr
    Srom, Martin
    Pulec, Martin
    Matela, Jiri
    Jirman, Martin
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (08): : 1991 - 2006
  • [8] JPEG-1 standard 25 years: past, present, and future reasons for a success
    Hudson, Graham
    Leger, Alain
    Niss, Birger
    Sebestyen, Istvan
    Vaaben, Jorgen
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2018, 27 (04)
  • [9] A METHOD FOR THE CONSTRUCTION OF MINIMUM-REDUNDANCY CODES
    HUFFMAN, DA
    [J]. PROCEEDINGS OF THE INSTITUTE OF RADIO ENGINEERS, 1952, 40 (09): : 1098 - 1101
  • [10] Junming Shan, 2011, 2011 Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics, P57, DOI 10.1109/PrimeAsia.2011.6075070