End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention

被引:181
作者
Meng, Ziyi [1 ,2 ]
Ma, Jiawei [3 ]
Yuan, Xin [4 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
[2] New Jersey Inst Technol, Newark, NJ 07102 USA
[3] Columbia Univ, New York, NY 10027 USA
[4] Nokia Bell Labs, Murray Hill, NJ 07974 USA
来源
COMPUTER VISION - ECCV 2020, PT XXIII | 2020年 / 12368卷
关键词
Compressive spectral imaging; Spatial-Spectral Self-Attention; Large-scale real data; RECONSTRUCTION; VIDEO; DESIGN; NOISE; MODEL;
D O I
10.1007/978-3-030-58592-1_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Coded aperture snapshot spectral imaging (CASSI) is an effective tool to capture real-world 3D hyperspectral images. While a number of existing work has been conducted for hardware and algorithm design, we make a step towards the low-cost solution that enjoys video-rate high-quality reconstruction. To make solid progress on this challenging yet under-investigated task, we reproduce a stable single disperser (SD) CASSI system to gather large-scale real-world CASSI data and propose a novel deep convolutional network to carry out the real-time reconstruction by using self-attention. In order to jointly capture the self-attention across different dimensions in hyperspectral images (i.e., channel-wise spectral correlation and non-local spatial regions), we propose Spatial-Spectral Self-Attention (TSA) to process each dimension sequentially, yet in an order-independent manner. We employ TSA in an encoder-decoder network, dubbed TSA-Net, to reconstruct the desired 3D cube. Furthermore, we investigate how noise affects the results and propose to add shot noise in model training, which improves the real data results significantly. We hope our large-scale CASSI data serve as a benchmark in future research and our TSA model as a baseline in deep learning based reconstruction algorithms. Our code and data are available at https://github.com/mengziyi64/TSA-Net.
引用
收藏
页码:187 / 204
页数:18
相关论文
共 72 条
[31]  
Kingma DP, 2014, Arxiv, DOI [arXiv:1312.6114, DOI 10.48550/ARXIV.1312.6114]
[32]   Solving Inverse Problems via Auto-Encoders [J].
Peng, Pei ;
Jalali, Shirin ;
Yuan, Xin .
IEEE JOURNAL ON SELECTED AREAS IN INFORMATION THEORY, 2020, 1 (01) :312-323
[33]  
Perez L, 2017, Arxiv, DOI [arXiv:1712.04621, DOI 10.48550/ARXIV.1712.04621]
[34]  
Qiao M., 2019, DIGITAL HOLOGRAPHY 3
[35]   Snapshot spatial-temporal compressive imaging [J].
Qiao, Mu ;
Liu, Xuan ;
Yuan, Xin .
OPTICS LETTERS, 2020, 45 (07) :1659-1662
[36]   Deep learning for video compressive sensing [J].
Qiao, Mu ;
Meng, Ziyi ;
Ma, Jiawei ;
Yuan, Xin .
APL PHOTONICS, 2020, 5 (03)
[37]   Classification and Reconstruction of High-Dimensional Signals From Low-Dimensional Features in the Presence of Side Information [J].
Renna, Francesco ;
Wang, Liming ;
Yuan, Xin ;
Yang, Jianbo ;
Reeves, Galen ;
Calderbank, Robert ;
Carin, Lawrence ;
Rodrigues, Miguel R. D. .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2016, 62 (11) :6459-6492
[38]   U-Net: Convolutional Networks for Biomedical Image Segmentation [J].
Ronneberger, Olaf ;
Fischer, Philipp ;
Brox, Thomas .
MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :234-241
[39]  
Shen ZR, 2024, Arxiv, DOI [arXiv:1812.01243, DOI 10.48550/ARXIV.1812.01243]
[40]  
Shi Z., 2018, P IEEE C COMP VIS PA, P939, DOI DOI 10.1109/CVPRW.2018.00139