Distributed Semantic Communications for Multimodal Audio-Visual Parsing Tasks

被引：0

作者：

Wang, Penghong ^{[1
,2
]}

Li, Jiahui ^{[3
]}

Liu, Chen ^{[1
,2
]}

Fan, Xiaopeng ^{[1
,2
]}

Ma, Mengyao ^{[3
]}

Wang, Yaowei ^{[2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci, Harbin 150001, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

[3] Huawei, Wireless Technol Lab, Shenzhen 518129, Peoples R China

来源：

IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING | 2024年 / 8卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Distributed semantic communication; deep joint source-channel coding; audio-visual parsing; auxiliary information feedback; CHANNEL; COMPRESSION;

D O I：

10.1109/TGCN.2024.3374700

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Semantic communication has significantly improved in single-modal single-task scenarios, but its progress is limited in multimodal and multi-task transmission contexts. To address this issue, this paper investigates a distributed semantic communication system for audio-visual parsing (AVP) task. The system acquires audio-visual information from distributed terminals and conducts multi-task analysis on the far-end server, which involves event categorization and boundary recording. We propose a distributed deep joint source-channel coding scheme with auxiliary information feedback to implement this system, aiming to enhance parsing performance and reduce bandwidth consumption during communication. Specifically, the server initially receives the audio feature from the audio terminal and then sends the semantic information extracted from the audio feature back to the visual terminal. The received semantic and visual information are interactively processed by the visual terminal before being encoded and transmitted. The audio and visual semantic information received is processed and parsed on the far-end server. The experimental results demonstrate a significant reduction in transmission bandwidth consumption and notable performance improvements across various evaluation metrics for distributed AVP task compared to current state-of-the-art methods.

引用

页码：1707 / 1716

页数：10

共 50 条

[21] Distributed audio-visual archives network (DiVAN)
Tirakis, A
Katalagarianos, P
Papathomas, M
Hamilakis, C
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 1086 - 1088
[22] Multimodal integration: Audio-visual integration by swarming mosquitoes
Bomphrey, Richard J.
CURRENT BIOLOGY, 2024, 34 (18) : R866 - R868
[23] DEEP MULTIMODAL LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
Mroueh, Youssef
Marcheret, Etienne
Goel, Vaibhava
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2130 - 2134
[24] An audio-visual corpus for multimodal automatic speech recognition
Andrzej Czyzewski
Bozena Kostek
Piotr Bratoszewski
Jozef Kotus
Marcin Szykulski
Journal of Intelligent Information Systems, 2017, 49 : 167 - 192
[25] Effects of spatial congruity on audio-visual multimodal integration
Teder-Sälejärvi, WA
Di Russo, F
McDonald, JJ
Hillyard, SA
JOURNAL OF COGNITIVE NEUROSCIENCE, 2005, 17 (09) : 1396 - 1409
[26] Multimodal pattern matching for audio-visual query and retrieval
Naphade, MR
Wang, R
Huang, TS
STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2001, 2001, 4315 : 188 - 195
[27] An audio-visual corpus for multimodal automatic speech recognition
Czyzewski, Andrzej
Kostek, Bozena
Bratoszewski, Piotr
Kotus, Jozef
Szykulski, Marcin
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (02) : 167 - 192
[28] Statistical multimodal integration for audio-visual speech processing
Nakamura, S
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (04): : 854 - 866
[29] Audio-visual perception-based multimodal HCI
Yang, Shu
Guan, Ye-peng
JOURNAL OF ENGINEERING-JOE, 2018, (04): : 190 - 198
[30] Semantic and Relation Modulation for Audio-Visual Event Localization
Wang, Hao
Zha, Zheng-Jun
Li, Liang
Chen, Xuejin
Luo, Jiebo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7711 - 7725

← 1 2 3 4 5 →