CA_DeepSC: Cross-Modal Alignment for Multi-Modal Semantic Communications

被引:0
|
作者
Wang, Wenjun [1 ]
Liu, Minghao [1 ]
Chen, Mingkai [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Key Lab Broadband Wireless Commun & Sensor Networ, Nanjing 210003, Peoples R China
基金
中国国家自然科学基金;
关键词
semantic communication; multi-modal transmission; deep learning; error correction;
D O I
10.1109/GLOBECOM54140.2023.10437917
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal semantic communication has attracted great attention due to its broad application prospects. However, the existing multi-modal semantic communications mostly focus on task-oriented approaches, which ignore the correlation among multi-modal data, leading to a decrease in the robustness. In this paper, we propose a deep learning enabled semantic communication system with cross-modal alignment, called CA DeepSC, which effectively utilizes the correlation across multi-modal signals to enhance the robustness of transmission. Firstly, we train the semantic encoder at the transmitter to learn the relationship of cross-modal alignment at the semantic level. Meanwhile, the cross-modal alignment allows to modify the errors caused by semantic or physical noise. Secondly, we propose a novel cross-modal amendment scheme that dynamically assigns weights to auxiliary multi-modal semantic information based on their correlation levels, and integrates modal semantic information with auxiliary modal semantic information at the receiver, optimizing the performance on recovery. Finally, experimental results demonstrate that CA DeepSC effectively reduces semantic distortion caused by semantic and physical noise, thereby improving the quality and robustness in the mult-imodal semantic communication.
引用
收藏
页码:5871 / 5876
页数:6
相关论文
共 50 条
  • [1] Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis
    Ding, Meirong
    Chen, Hongye
    Zeng, Biqing
    Computer Engineering and Applications, 2024, 60 (22) : 114 - 125
  • [2] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [3] Cross-Modal Semantic Communications
    Li, Ang
    Wei, Xin
    Wu, Dan
    Zhou, Liang
    IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 144 - 151
  • [4] Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval
    Zou, Zhuoyang
    Zhu, Xinghui
    Zhu, Qinying
    Zhang, Hongyan
    Zhu, Lei
    FOODS, 2024, 13 (11)
  • [5] Multi-Modal Pulmonary Mass Segmentation Network Based on Cross-Modal Spatial Alignment
    LI Jiaxin
    CHEN Houjin
    PENG Yahui
    LI Yanfeng
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (01) : 11 - 17
  • [6] Adversarial Cross-modal Domain Adaptation for Multi-modal Semantic Segmentation in Autonomous Driving
    Shi, Mengqi
    Cao, Haozhi
    Xie, Lihua
    Yang, Jianfei
    2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 850 - 855
  • [7] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    MEDICAL IMAGE ANALYSIS, 2022, 82
  • [8] Multi-modal and cross-modal for lecture videos retrieval
    Nhu Van Nguyen
    Coustaty, Mickal
    Ogier, Jean-Marc
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
  • [9] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Jun Yu
    Xiao-Jun Wu
    Donglin Zhang
    Cognitive Computation, 2022, 14 : 1159 - 1171
  • [10] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171