S2-Transformer for Mask-Aware Hyperspectral Image Reconstruction

被引：0

作者：

Wang, Jiamian ^{[1
]}

Li, Kunpeng ^{[2
]}

Zhang, Yulun ^{[3
]}

Yuan, Xin ^{[4
]}

Tao, Zhiqiang ^{[1
]}

机构：

[1] Rochester Inst Technol, Sch Informat, Rochester, NY 14623 USA

[2] Meta GenAI, Menlo Pk, CA 94025 USA

[3] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China

[4] Westlake Univ, Hangzhou 310024, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2025年 / 47卷 / 06期

关键词：

Image reconstruction; Hyperspectral imaging; Uncertainty; Image coding; Encoding; Transformers; Optical sensors; Optical imaging; Imaging; Optical variables measurement; Snapshot compressive imaging (SCI); hyperspectral image reconstruction; coded aperture snapshot spectral imaging (CASSI); transformer; interpretability; ALGORITHMS; RESOLUTION; DESIGN;

D O I：

10.1109/TPAMI.2025.3543842

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Snapshot compressive imaging (SCI) surges as a novel way of capturing hyperspectral images. It operates an optical encoder to compress the 3D data into a 2D measurement and adopts a software decoder for the signal reconstruction. Recently, a representative SCI set-up of coded aperture snapshot compressive imager (CASSI) with Transformer reconstruction backend remarks high-fidelity sensing performance. However, dominant spatial and spectral attention designs show limitations in hyperspectral modeling. The spatial attention values describe the inter-pixel correlation but overlook the across-spectra variation within each pixel. The spectral attention size is unscalable to the token spatial size and thus bottlenecks information allocation. Besides, CASSI entangles the spatial and spectral information into a 2D measurement, placing a barrier for information disentanglement and modeling. In addition, CASSI blocks the light with a physical binary mask, yielding the masked data loss. To tackle above challenges, we propose a spatial-spectral (S-2-) Transformer implemented by a paralleled attention design and a mask-aware learning strategy. First, we systematically explore pros and cons of different spatial (-spectral) attention designs, based on which we find performing both attentions in parallel well disentangles and models the blended information. Second, the masked pixels induce higher prediction difficulty and should be treated differently from unmasked ones. We adaptively prioritize the loss penalty attributing to the mask structure by referring to the mask-encoded prediction as an uncertainty estimator. We theoretically discuss the distinct convergence tendencies between masked/unmasked regions of the proposed learning strategy. Extensive experiments demonstrate that on average, the results of the proposed method are superior over the state-of-the-art methods. We empirically visualize and reason the behaviour of spatial and spectral attentions, and comprehensively examine the impact of the mask-aware learning, both of which advances the physics-driven deep network design for the reconstruction with CASSI.

引用

页码：4299 / 4316

页数：18

共 83 条

[1] Higher-order computational model for coded aperture spectral imaging [J].

Arguello, Henry ;

Rueda, Hoover ;

Wu, Yuehao ;

Prather, Dennis W. ;

Arce, Gonzalo R. .

APPLIED OPTICS, 2013, 52 (10) :D12-D21

[2]

Bao HB, 2020, PR MACH LEARN RES, V119

[3] A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration [J].

Bioucas-Dias, Jose M. ;

Figueiredo, Mario A. T. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2007, 16 (12) :2992-3004

[4] Distributed optimization and statistical learning via the alternating direction method of multipliers [J].

Boyd S. ;

Parikh N. ;

Chu E. ;

Peleato B. ;

Eckstein J. .

Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122

[5] Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [J].

Cai, Yuanhao ;

Lin, Jing ;

Hu, Xiaowan ;

Wang, Haoqian ;

Yuan, Xin ;

Zhang, Yulun ;

Timofte, Radu ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2022, PT XVII, 2022, 13677 :686-704

[6]

Cai YH, 2022, Arxiv, DOI arXiv:2205.10102

[7] Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction [J].

Cai, Yuanhao ;

Lin, Jing ;

Hu, Xiaowan ;

Wang, Haoqian ;

Yuan, Xin ;

Zhang, Yulun ;

Timofte, Radu ;

Van Gool, Luc .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :17481-17490

[8] Robust uncertainty principles:: Exact signal reconstruction from highly incomplete frequency information [J].

Candès, EJ ;

Romberg, J ;

Tao, T .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (02) :489-509

[9] A Prism-Mask System for Multispectral Video Acquisition [J].

Cao, Xun ;

Du, Hao ;

Tong, Xin ;

Dai, Qionghai ;

Lin, Stephen .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (12) :2423-2435

[10] Plug-and-Play ADMM for Image Restoration: Fixed-Point Convergence and Applications [J].

Chan, Stanley H. ;

Wang, Xiran ;

Elgendy, Omar A. .

IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2017, 3 (01) :84-98

← 1 2 3 4 5 6 7 8 9 →