Wireless Deep Video Semantic Transmission

被引:70
作者
Wang, Sixian [1 ]
Dai, Jincheng [1 ]
Liang, Zijian [1 ]
Niu, Kai [1 ,2 ]
Si, Zhongwei [1 ]
Dong, Chao [1 ]
Qin, Xiaoqi [3 ]
Zhang, Ping [3 ]
机构
[1] Beijing Univ Posts & Telecommun, Minist Educ, Key Lab Universal Wireless Commun, Beijing 100876, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China
[3] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Semantic communications; video transmission; nonlinear transform; joint source-channel coding; rate-distortion; JOINT SOURCE;
D O I
10.1109/JSAC.2022.3221977
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we design a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels. The proposed methods exploit nonlinear transform and conditional coding architecture to adaptively extract semantic features across video frames, and transmit semantic feature domain representations over wireless channels via deep joint source-channel coding. Our framework is collected under the name deep video semantic transmission (DVST). In particular, benefiting from the strong temporal prior provided by the feature domain context, the learned nonlinear transform function becomes temporally adaptive, resulting in a richer and more accurate entropy model guiding the transmission of current frame. Accordingly, a novel rate adaptive transmission mechanism is developed to customize deep joint source-channel coding for video sources. It learns to allocate the limited channel bandwidth within and among video frames to maximize the overall transmission performance. The whole DVST design is formulated as an optimization problem whose goal is to minimize the end-to-end transmission rate-distortion performance under perceptual quality metrics or machine vision task performance metrics. Across standard video source test sequences and various communication scenarios, experiments show that our DVST can generally surpass traditional wireless video coded transmission schemes. The proposed DVST framework can well support future semantic communications due to its video content-aware and machine vision task integration abilities.
引用
收藏
页码:214 / 229
页数:16
相关论文
共 48 条
[1]  
Balle J., 2016, 2016 PICTURE CODING, DOI DOI 10.1109/PCS.2016.7906310
[2]  
Balle J, 2018, P INT C LEARN REPR
[3]  
Balle J, 2017, INT C LEARN REPR ICL
[4]   Nonlinear Transform Coding [J].
Balle, Johannes ;
Chou, Philip A. ;
Minnen, David ;
Singh, Saurabh ;
Johnston, Nick ;
Agustsson, Eirikur ;
Hwang, Sung Jin ;
Toderici, George .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) :339-353
[5]  
Ballé J, 2018, PICT COD SYMP, P248, DOI 10.1109/PCS.2018.8456272
[6]  
Bjontegaard G., 2001, ITUTVCEGM33
[7]  
Bossen F., 2013, JCTVC-L1100, V12, P1
[8]   Deep Joint Source-Channel Coding for Wireless Image Transmission [J].
Bourtsoulatze, Eirina ;
Kurka, David Burth ;
Gunduz, Deniz .
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2019, 5 (03) :567-579
[9]   Segmentation and Recognition Using Structure from Motion Point Clouds [J].
Brostow, Gabriel J. ;
Shotton, Jamie ;
Fauqueur, Julien ;
Cipolla, Roberto .
COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 :44-+
[10]   Semantic object classes in video: A high-definition ground truth database [J].
Brostow, Gabriel J. ;
Fauqueur, Julien ;
Cipolla, Roberto .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97