Distributed Semantic Communications for Multimodal Audio-Visual Parsing Tasks

被引：0

作者：

Wang, Penghong ^{[1
,2
]}

Li, Jiahui ^{[3
]}

Liu, Chen ^{[1
,2
]}

Fan, Xiaopeng ^{[1
,2
]}

Ma, Mengyao ^{[3
]}

Wang, Yaowei ^{[2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci, Harbin 150001, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

[3] Huawei, Wireless Technol Lab, Shenzhen 518129, Peoples R China

来源：

IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING | 2024年 / 8卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Distributed semantic communication; deep joint source-channel coding; audio-visual parsing; auxiliary information feedback; CHANNEL; COMPRESSION;

D O I：

10.1109/TGCN.2024.3374700

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Semantic communication has significantly improved in single-modal single-task scenarios, but its progress is limited in multimodal and multi-task transmission contexts. To address this issue, this paper investigates a distributed semantic communication system for audio-visual parsing (AVP) task. The system acquires audio-visual information from distributed terminals and conducts multi-task analysis on the far-end server, which involves event categorization and boundary recording. We propose a distributed deep joint source-channel coding scheme with auxiliary information feedback to implement this system, aiming to enhance parsing performance and reduce bandwidth consumption during communication. Specifically, the server initially receives the audio feature from the audio terminal and then sends the semantic information extracted from the audio feature back to the visual terminal. The received semantic and visual information are interactively processed by the visual terminal before being encoded and transmitted. The audio and visual semantic information received is processed and parsed on the far-end server. The experimental results demonstrate a significant reduction in transmission bandwidth consumption and notable performance improvements across various evaluation metrics for distributed AVP task compared to current state-of-the-art methods.

引用

页码：1707 / 1716

页数：10

共 50 条

[41] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[42] Catching audio-visual mice:: The extrapolation of audio-visual speed
Hofbauer, MM
Wuerger, SM
Meyer, GF
Röhrbein, F
Schill, K
Zetzsche, C
PERCEPTION, 2003, 32 : 96 - 96
[43] The temporal dynamics of conscious and unconscious audio-visual semantic integration
Gao, Mingjie
Zhu, Weina
Drewes, Jan
HELIYON, 2024, 10 (13)
[44] Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
Wu, Yu
Yang, Yi
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1326 - 1335
[45] Semantic indexing of sports program sequences by audio-visual analysis
Leonardi, R
Migliorati, P
Prandini, M
2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 1, PROCEEDINGS, 2003, : 9 - 12
[46] Semantic incongruity influences response caution in audio-visual integration
Benjamin Steinweg
Fred W. Mast
Experimental Brain Research, 2017, 235 : 349 - 363
[47] Semantic incongruity influences response caution in audio-visual integration
Steinweg, Benjamin
Mast, Fred W.
EXPERIMENTAL BRAIN RESEARCH, 2017, 235 (01) : 349 - 363
[48] Semantic audio-visual data fusion for automatic emotion recognition
Datcu, Dragos
Rothkrantz, Leon J. M.
EUROMEDIA '2008, 2008, : 58 - 65
[49] Improving Semantic Scene Categorization by Exploiting Audio-Visual Features
Zhu, Songhao
Yan, Junchi
Liu, Yuncai
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009), 2009, : 435 - 440
[50] Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing
Rachavarapu, Kranthi Kumar
Rajagopalan, A. N.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10158 - 10168

← 1 2 3 4 5 →