Distributed Semantic Communications for Multimodal Audio-Visual Parsing Tasks

被引:0
|
作者
Wang, Penghong [1 ,2 ]
Li, Jiahui [3 ]
Liu, Chen [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Ma, Mengyao [3 ]
Wang, Yaowei [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci, Harbin 150001, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Huawei, Wireless Technol Lab, Shenzhen 518129, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed semantic communication; deep joint source-channel coding; audio-visual parsing; auxiliary information feedback; CHANNEL; COMPRESSION;
D O I
10.1109/TGCN.2024.3374700
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Semantic communication has significantly improved in single-modal single-task scenarios, but its progress is limited in multimodal and multi-task transmission contexts. To address this issue, this paper investigates a distributed semantic communication system for audio-visual parsing (AVP) task. The system acquires audio-visual information from distributed terminals and conducts multi-task analysis on the far-end server, which involves event categorization and boundary recording. We propose a distributed deep joint source-channel coding scheme with auxiliary information feedback to implement this system, aiming to enhance parsing performance and reduce bandwidth consumption during communication. Specifically, the server initially receives the audio feature from the audio terminal and then sends the semantic information extracted from the audio feature back to the visual terminal. The received semantic and visual information are interactively processed by the visual terminal before being encoded and transmitted. The audio and visual semantic information received is processed and parsed on the far-end server. The experimental results demonstrate a significant reduction in transmission bandwidth consumption and notable performance improvements across various evaluation metrics for distributed AVP task compared to current state-of-the-art methods.
引用
收藏
页码:1707 / 1716
页数:10
相关论文
共 50 条
  • [21] Distributed audio-visual archives network (DiVAN)
    Tirakis, A
    Katalagarianos, P
    Papathomas, M
    Hamilakis, C
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 1086 - 1088
  • [22] Multimodal integration: Audio-visual integration by swarming mosquitoes
    Bomphrey, Richard J.
    CURRENT BIOLOGY, 2024, 34 (18) : R866 - R868
  • [23] DEEP MULTIMODAL LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
    Mroueh, Youssef
    Marcheret, Etienne
    Goel, Vaibhava
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2130 - 2134
  • [24] An audio-visual corpus for multimodal automatic speech recognition
    Andrzej Czyzewski
    Bozena Kostek
    Piotr Bratoszewski
    Jozef Kotus
    Marcin Szykulski
    Journal of Intelligent Information Systems, 2017, 49 : 167 - 192
  • [25] Effects of spatial congruity on audio-visual multimodal integration
    Teder-Sälejärvi, WA
    Di Russo, F
    McDonald, JJ
    Hillyard, SA
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2005, 17 (09) : 1396 - 1409
  • [26] Multimodal pattern matching for audio-visual query and retrieval
    Naphade, MR
    Wang, R
    Huang, TS
    STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2001, 2001, 4315 : 188 - 195
  • [27] An audio-visual corpus for multimodal automatic speech recognition
    Czyzewski, Andrzej
    Kostek, Bozena
    Bratoszewski, Piotr
    Kotus, Jozef
    Szykulski, Marcin
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (02) : 167 - 192
  • [28] Statistical multimodal integration for audio-visual speech processing
    Nakamura, S
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (04): : 854 - 866
  • [29] Audio-visual perception-based multimodal HCI
    Yang, Shu
    Guan, Ye-peng
    JOURNAL OF ENGINEERING-JOE, 2018, (04): : 190 - 198
  • [30] Semantic and Relation Modulation for Audio-Visual Event Localization
    Wang, Hao
    Zha, Zheng-Jun
    Li, Liang
    Chen, Xuejin
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7711 - 7725