Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引:28
作者
Wang, Lishun [1 ,2 ]
Cao, Miao [3 ,4 ]
Zhong, Yong [1 ,2 ]
Yuan, Xin [3 ,4 ]
机构
[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China
[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China
关键词
Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;
D O I
10.1109/TPAMI.2022.3225382
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.
引用
收藏
页码:9072 / 9089
页数:18
相关论文
共 69 条
  • [1] Quantum-inspired computational imaging
    Altmann, Yoann
    McLaughlin, Stephen
    Padgett, Miles J.
    Goyal, Vivek K.
    Hero, Alfred O.
    Faccio, Daniele
    [J]. SCIENCE, 2018, 361 (6403) : 660 - +
  • [2] Ba JL., 2016, arXiv
  • [3] Bao H., 2021, arXiv, DOI DOI 10.48550/ARXIV.2106.08254
  • [4] Behrmann J, 2019, PR MACH LEARN RES, V97
  • [5] Bertasius G, 2021, PR MACH LEARN RES, V139
  • [6] Distributed optimization and statistical learning via the alternating direction method of multipliers
    Boyd S.
    Parikh N.
    Chu E.
    Peleato B.
    Eckstein J.
    [J]. Foundations and Trends in Machine Learning, 2010, 3 (01): : 1 - 122
  • [7] Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
    Cai, Yuanhao
    Lin, Jing
    Hu, Xiaowan
    Wang, Haoqian
    Yuan, Xin
    Zhang, Yulun
    Timofte, Radu
    Van Gool, Luc
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17481 - 17490
  • [8] Physics-driven deep learning enables temporal compressive coherent diffraction imaging
    Chen, Ziyang
    Zheng, Siming
    Tong, Zhishen
    Yuan, Xin
    [J]. OPTICA, 2022, 9 (06): : 677 - 680
  • [9] Recurrent Neural Networks for Snapshot Compressive Imaging
    Cheng, Ziheng
    Chen, Bo
    Lu, Ruiying
    Wang, Zhengjue
    Zhang, Hao
    Meng, Ziyi
    Yuan, Xin
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2264 - 2281
  • [10] Memory-Efficient Network for Large-scale Video Compressive Sensing
    Cheng, Ziheng
    Chen, Bo
    Liu, Guanliang
    Zhang, Hao
    Lu, Ruiying
    Wang, Zhengjue
    Yuan, Xin
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16241 - 16250