CVIformer: Cross-View Interactive Transformer for Efficient Stereoscopic Image Super-Resolution

被引：0

作者：

Zhang, Dongyang ^{[1
]}

Liang, Shuang ^{[1
]}

He, Tao ^{[1
]}

Shao, Jie ^{[2
]}

Qin, Ke ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Inst Intelligent Comp, Chengdu 611731, Peoples R China

[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2025年 / 9卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Transformers; Stereo image processing; Feature extraction; Superresolution; Spatial resolution; Cameras; Task analysis; Stereoscopic image processing; lightweight transformer; super-resolution (SR); PARALLAX ATTENTION; NETWORK; MODULE;

D O I：

10.1109/TETCI.2024.3436904

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Inspired by the great success of the Transformer in computer vision, some works have started to explore the use of the Transformer for super-resolution (SR). However, with regard to stereoscopic SR, which aims to recover details from input pairs, how to efficiently integrate cross-view interactions into the Transformer architecture is still an ongoing development. Additionally, most existing stereoscopic SR methods only adopt a parallax mechanism in the middle of the network, and another issue is that the feature correlation from different viewpoints inevitably weakens as the network depth increases. To address these issues, we first utilize an efficient residual transformer block (ERTB) as the backbone for long-range intra-view feature extraction. Subsequently, we propose a novel multi-Dconv cross attentive block (MCAB) to enhance the cross-view interactions at the rear part of the Transformer architecture. Notably, the proposed MCAB promotes feature fusion from two viewpoints by employing bidirectional cross-attention, as opposed to an unidirectional flow from left to right or vice versa. This approach results in an efficient cross-view interaction from both branches. By leveraging the advantages of the proposed ERTB and MCAB, we introduce an efficient cross-view interaction Transformer (CVIformer) for stereoscopic SR. This architecture is capable of incorporating long-range intra-view and cross-view information with an acceptable computational overhead. Without excessive complexity, extensive experiments conducted on four public datasets demonstrate that our model achieves state-of-the-art results using only 1.17 million parameters, with approximately a 40% reduction in parameters compared to leading methods like iPASSR.

引用

页码：1107 / 1118

页数：12

共 50 条

[31] Cross Transformer Network for Scale-Arbitrary Image Super-Resolution [J].

He, Dehong ;

Wu, Song ;

Liu, Jinpeng ;

Xiao, Guoqiang .

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 :633-644

[32] Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution [J].

Cong, Ruixuan ;

Sheng, Hao ;

Yang, Da ;

Cui, Zhenglong ;

Chen, Rongshan .

IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :1421-1435

[33] Fully Cross-Attention Transformer for Guided Depth Super-Resolution [J].

Ariav, Ido ;

Cohen, Israel .

SENSORS, 2023, 23 (05)

[34] Remote sensing image super-resolution via cross-scale hierarchical transformer [J].

Xiao, Yi ;

Yuan, Qiangqiang ;

He, Jiang ;

Zhang, Liangpei .

GEO-SPATIAL INFORMATION SCIENCE, 2024, 27 (06) :1914-1930

[35] Spatial relaxation transformer for image super-resolution [J].

Li, Yinghua ;

Zhang, Ying ;

Zeng, Hao ;

He, Jinglu ;

Guo, Jie .

JOURNAL OF KING SAUD UNIVERSITY COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)

[36] Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-Resolution [J].

Hu, Zeke Zexi ;

Chen, Xiaoming ;

Chung, Vera Yuk Ying ;

Shen, Yiran .

IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 :1334-1348

[37] Cross-Frame Transformer-Based Spatio-Temporal Video Super-Resolution [J].

Zhang, Wenhui ;

Zhou, Mingliang ;

Ji, Cheng ;

Sui, Xiubao ;

Bai, Junqi .

IEEE TRANSACTIONS ON BROADCASTING, 2022, 68 (02) :359-369

[38] MambaHSISR: Mamba Hyperspectral Image Super-Resolution [J].

Xu, Yinghao ;

Wang, Hao ;

Zhou, Fei ;

Luo, Chunbo ;

Sun, Xin ;

Rahardja, Susanto ;

Ren, Peng .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63

[39] Batch-transformer for scene text image super-resolution [J].

Sun, Yaqi ;

Xie, Xiaolan ;

Li, Zhi ;

Yang, Kai .

VISUAL COMPUTER, 2024, 40 (10) :7399-7409

[40] Transformer-based image super-resolution and its lightweight [J].

Zhang, Dongxiao ;

Qi, Tangyao ;

Gao, Juhao .

MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (26) :68625-68649

← 1 2 3 4 5 →