Deep Feature Compression Using Spatio-Temporal Arrangement Toward Collaborative Intelligent World

被引:10
作者
Suzuki, Satoshi [1 ,2 ]
Takeda, Shoichiro [3 ]
Takagi, Motohiro [3 ]
Tanida, Ryuichi [3 ]
Kimata, Hideaki [4 ]
Shouno, Hayaru [2 ]
机构
[1] NTT Corp, NTT Comp & Data Sci Labs, Yokosuka, Kanagawa 2390847, Japan
[2] Univ Electrocommun, Dept Informat, Chofu, Tokyo 1828585, Japan
[3] NTT Corp, NTT Human Informat Labs, Yokosuka, Kanagawa, Japan
[4] Kogakuin Univ, Dept Informat Design, Shinjuku City 1638677, Japan
基金
日本学术振兴会;
关键词
Image coding; Correlation; Image edge detection; Video compression; Cloud computing; Quantization (signal); Collaborative intelligence; Deep feature compression; collaborative intelligence; deep neural network; spatio-temporal arrangement; ordering search algorithm; HEVC;
D O I
10.1109/TCSVT.2021.3107716
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Collaborative Intelligence is a new paradigm that splits a deep neural network (DNN) into an edge and cloud for deploying a DNN-based image recognition application. In this paradigm, deep features, which are the outputs of the edge DNN, are compressed and transmitted to the cloud DNN. Because the deep features have a number of responses that are similar to each other, for efficient compression, previous methods spatially arrange and compress the deep features as an image to utilize the similarity as a spatial correlation. However, if the deep features are arranged in not only spatial but also temporal directions like those in a video, it may be possible to compress them more efficiently by increasing a temporal correlation. To explore this possibility, we propose a "spatio-temporal arrangement". This method spatially arranges the deep features as images and temporally arranges them as a video with a novel ordering search algorithm. Our method effectively increases the spatial and temporal correlations hidden in the deep features and achieves high compression efficiency compared with the previous methods. Experimental results demonstrate the compression efficiency of our method is better than that of the previous methods (1.50% to 4.98% on BD-Rate evaluation in a lossy setting). Our analysis shows that our method effectively increases the correlation when the input is an image with rich edges and textures.
引用
收藏
页码:3934 / 3946
页数:13
相关论文
共 41 条
  • [1] Alvar S. R., 2020, ARXIV200912430
  • [2] Alvar SR, 2020, INT CONF ACOUST SPEE, P4342, DOI [10.1109/icassp40776.2020.9054770, 10.1109/ICASSP40776.2020.9054770]
  • [3] Alvar SR, 2019, IEEE IMAGE PROC, P1705, DOI [10.1109/icip.2019.8803110, 10.1109/ICIP.2019.8803110]
  • [4] [Anonymous], 2015, INT C MACHINE LEARN, DOI [10.5555/3045118.3045167, DOI 10.5555/3045118.3045167]
  • [5] [Anonymous], 2020, JTC1SC29WG11 ISOIEC
  • [6] [Anonymous], 2013, JCTVCL1100
  • [7] Theoretical modeling of inter-frame prediction error for high frame-rate video signal
    Bandoh, Yukihiro
    Hayase, Kazuya
    Takamura, Seishi
    Kamikura, Kazuto
    Yashima, Yoshiyuki
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2008, E91A (03) : 730 - 739
  • [8] Bjontegaard G., 2001, VCEGM33
  • [9] Bross B., 2014, High Efficiency Video Coding HEVC, P113
  • [10] Chen Z., 2018, ARXIV180906196