A Unified Multi-Task Semantic Communication System for Multimodal Data

被引：49

作者：

Zhang, Guangyi ^{[1
]}

Hu, Qiyu ^{[1
]}

Qin, Zhijin ^{[2
,3
]}

Cai, Yunlong ^{[1
]}

Yu, Guanding ^{[1
]}

Tao, Xiaoming ^{[2
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Peoples R China

[2] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

[3] Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON COMMUNICATIONS | 2024年 / 72卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Deep learning; dynamic overhead; multimodal data; multi-task semantic communication;

D O I：

10.1109/TCOMM.2024.3364990

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Task-oriented semantic communications have achieved significant performance gains. However, the employed deep neural networks in semantic communications have to be updated when the task is changed or multiple models need to be stored for performing different tasks. To address this issue, we develop a unified deep learning-enabled semantic communication system (U-DeepSC), where a unified end-to-end framework can serve many different tasks with multiple modalities of data. As the number of required features varies from task to task, we propose a vector-wise dynamic scheme that can adjust the number of transmitted symbols for different tasks. Moreover, our dynamic scheme can also adaptively adjust the number of transmitted features under different channel conditions to optimize the transmission efficiency. Particularly, we devise a lightweight feature selection module (FSM) to evaluate the importance of feature vectors, which can hierarchically drop redundant feature vectors and significantly accelerate the inference. To reduce the transmission overhead, we then design a unified codebook for feature representation to serve multiple tasks, where only the indices of these task-specific features in the codebook are transmitted. According to the simulation results, the proposed U-DeepSC achieves comparable performance to the task-oriented semantic communication system designed for a specific task but with significant reduction in both transmission overhead and model size.

引用

页码：4101 / 4116

页数：16

共 44 条

[1]

[Anonymous], 2023, IEEE J.Sel. Areas Commun., V41, P214

[2] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[3]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]

[4]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, 10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]

[5] MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis [J].

Hazarika, Devamanyu ;

Zimmermann, Roger ;

Poria, Soujanya .

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :1122-1131

[6] Masked Autoencoders Are Scalable Vision Learners [J].

He, Kaiming ;

Chen, Xinlei ;

Xie, Saining ;

Li, Yanghao ;

Dollar, Piotr ;

Girshick, Ross .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988

[7] Robust Semantic Communications With Masked VQ-VAE Enabled Codebook [J].

Hu, Qiyu ;

Zhang, Guangyi ;

Qin, Zhijin ;

Cai, Yunlong ;

Yu, Guanding ;

Li, Geoffrey Ye .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (12) :8707-8722

[8] UniT: Multimodal Multitask Learning with a Unified Transformer [J].

Hu, Ronghang ;

Singh, Amanpreet .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :1419-1429

[9]

Huang D., 2022, arXiv

[10] Deep Learning-Based Image Semantic Coding for Semantic Communications [J].

Huang, Danlan ;

Tao, Xiaoming ;

Gao, Feifei ;

Lu, Jianhua .

2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,

← 1 2 3 4 5 →