FSS: algorithm and neural network accelerator for style transfer

被引:0
作者
Ling, Yi [1 ]
Huang, Yujie [1 ,2 ]
Cai, Yujie [1 ,2 ]
Li, Zhaojie [1 ,2 ]
Wang, Mingyu [1 ]
Li, Wenhong [1 ]
Zeng, Xiaoyang [1 ]
机构
[1] Fudan Univ, State Key Lab AS & Syst, Shanghai 201203, Peoples R China
[2] Shanghai ExploreX Technol Co Ltd, Shanghai 200120, Peoples R China
基金
中国国家自然科学基金;
关键词
neural network accelerator; style transfer; neural network; deep learning;
D O I
10.1007/s11432-022-3676-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Neural networks (NNs), owing to their impressive performance, have gradually begun to dominate multimedia processing. For resource-constrained and energy-sensitive mobile devices, an efficient NN accelerator is necessary. Style transfer is an important multimedia application. However, existing arbitrary style transfer networks are complex and not well supported by current NN accelerators, limiting their application on mobile devices. Moreover, the quality of style transfer needs improvement. Thus, we design the FastStyle system (FSS), where a novel algorithm and an NN accelerator are proposed for style transfer. In FSS, we first propose a novel arbitrary style transfer algorithm, FastStyle. We propose a light network that contributes to high quality and low computational complexity and a prior mechanism to avoid retraining when the style changes. Then, we redesign an NN accelerator for FastStyle by applying two improvements to the basic NVIDIA deep learning accelerator (NVDLA) architecture. First, a flexible dat FSM and wt FSM are redesigned to enable the original data path to perform other operations (including the GRAM operation) by software programming. Moreover, statistics and judgment logic are designed to utilize the continuity of a video stream and remove the data dependency in the instance normalization, which improves the accelerator performance by 18.6%. The experimental results demonstrate that the proposed FastStyle can achieve higher quality with a lower computational cost, making it more suitable for mobile devices. The proposed NN accelerator is implemented on the Xilinx VCU118 FPGA under a 180-MHz clock. Experimental results show that the accelerator can stylize 512x512-pixel video with 20 FPS, and the measured performance reaches up to 306.07 GOPS. The ASIC implementation in TSMC 28 nm achieves about 22 FPS in the case of a 720-p video.
引用
收藏
页数:14
相关论文
共 48 条
  • [1] Abadi M, 2016, arXiv, DOI DOI 10.48550/ARXIV.1603.04467
  • [2] Real Image Denoising with Feature Attention
    Anwar, Saeed
    Barnes, Nick
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3155 - 3164
  • [3] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
    Chen, Tianshi
    Du, Zidong
    Sun, Ninghui
    Wang, Jia
    Wu, Chengyong
    Chen, Yunji
    Temam, Olivier
    [J]. ACM SIGPLAN NOTICES, 2014, 49 (04) : 269 - 283
  • [4] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
    Chen, Yu-Hsin
    Krishna, Tushar
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138
  • [5] Structure-Preserving Neural Style Transfer
    Cheng, Ming-Ming
    Liu, Xiao-Chang
    Wang, Jie
    Lu, Shao-Ping
    Lai, Yu-Kun
    Rosin, Paul L.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 909 - 920
  • [6] Describing Textures in the Wild
    Cimpoi, Mircea
    Maji, Subhransu
    Kokkinos, Iasonas
    Mohamed, Sammy
    Vedaldi, Andrea
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3606 - 3613
  • [7] Image Super-Resolution Using Deep Convolutional Networks
    Dong, Chao
    Loy, Chen Change
    He, Kaiming
    Tang, Xiaoou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) : 295 - 307
  • [8] Dumoulin V., 2017, ICLR
  • [9] Image Style Transfer Using Convolutional Neural Networks
    Gatys, Leon A.
    Ecker, Alexander S.
    Bethge, Matthias
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2414 - 2423
  • [10] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778