TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution

被引:4
作者
Zhu, Jiang [1 ]
Koh, Van Kwan Zhi [1 ]
Lin, Zhiping [1 ]
Wen, Bihan [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 39798, Singapore
关键词
Transformers; Superresolution; Generative adversarial networks; Convolutional neural networks; Task analysis; Spatial resolution; Image reconstruction; Depth images; guided image super-resolution; vision transformer; generative adversarial network; RGB-D; MAP SUPERRESOLUTION; FUSION; 3D;
D O I
10.1109/JETCAS.2024.3394495
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Despite significant strides in deep single image super-resolution (SISR), the development of robust guided depth image super-resolution (GDSR) techniques presents a notable challenge. Effective GDSR methods must not only exploit the properties of the target image but also integrate complementary information from the guidance image. The state-of-the-art in guided image super-resolution has been dominated by convolutional neural network (CNN) based methods, which leverage CNN as their architecture. However, CNN has limitations in capturing global information effectively, and their traditional regression training techniques can sometimes lead to challenges in the precise generating of high-frequency details, unlike transformers that have shown remarkable success in deep learning through the self-attention mechanism. Drawing inspiration from the transformative impact of transformers in both language and vision applications, we propose a Transformer-based Multi-modal Generative Adversarial Network dubbed TM-GAN. TM-GAN is designed to effectively process and integrate multi-modal data, leveraging the global contextual understanding and detailed feature extraction capabilities of transformers within a GAN architecture for GDSR, aiming to effectively integrate and utilize multi-modal data sources. Experimental evaluations of TM-GAN on a variety of RGB-D datasets demonstrate its superiority over the state-of-the-art methods, showcasing its effectiveness in leveraging transformer-based techniques for GDSR.
引用
收藏
页码:261 / 274
页数:14
相关论文
共 89 条
[1]   Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications [J].
Adrian Corneanu, Ciprian ;
Oliu Simon, Marc ;
Cohn, Jeffrey F. ;
Escalera Guerrero, Sergio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) :1548-1568
[2]  
[Anonymous], 2008, P WORKSH MULT MULT S
[3]   Depth Map Super-Resolution via Cascaded Transformers Guidance [J].
Ariav, Ido ;
Cohen, Israel .
FRONTIERS IN SIGNAL PROCESSING, 2022, 2
[4]   Fully Cross-Attention Transformer for Guided Depth Super-Resolution [J].
Ariav, Ido ;
Cohen, Israel .
SENSORS, 2023, 23 (05)
[5]   Dual Aggregation Transformer for Image Super-Resolution [J].
Chen, Zheng ;
Zhang, Yulun ;
Gu, Jinjin ;
Kong, Linghe ;
Yang, Xiaokang ;
Yu, Fisher .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :12278-12287
[6]   Second-order Attention Network for Single Image Super-Resolution [J].
Dai, Tao ;
Cai, Jianrui ;
Zhang, Yongbing ;
Xia, Shu-Tao ;
Zhang, Lei .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11057-11066
[7]   Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion [J].
Deng, Xin ;
Dragotti, Pier Luigi .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) :3333-3348
[8]   Deep Coupled ISTA Network for Multi-Modal Image Super-Resolution [J].
Deng, Xin ;
Dragotti, Pier Luigi .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :1683-1698
[9]  
Diebel J., 2006, P INT C ADV NEUR INF, P291
[10]   Learning a Deep Convolutional Network for Image Super-Resolution [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
COMPUTER VISION - ECCV 2014, PT IV, 2014, 8692 :184-199