FlowFormer: A Transformer Architecture for Optical Flow

被引:108
作者
Huang, Zhaoyang [1 ,3 ]
Shi, Xiaoyu [1 ,3 ]
Zhang, Chao [2 ]
Wang, Qiang [2 ]
Cheung, Ka Chun [3 ]
Qin, Hongwei [4 ]
Dai, Jifeng [4 ]
Li, Hongsheng [1 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Shatin, Hong Kong, Peoples R China
[2] Samsung Telecommun Res, Suwon, South Korea
[3] NVIDIA AI Technol Ctr, Shanghai, Peoples R China
[4] SenseTime Res, Shanghai, Peoples R China
来源
COMPUTER VISION - ECCV 2022, PT XVII | 2022年 / 13677卷
关键词
Optical flow; Cost volume; Transformer; RAFT;
D O I
10.1007/978-3-031-19790-1_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark, FlowFormer achieves 1.144 and 2.183 average end-ponit-error (AEPE) on the clean and final pass, a 17.6% and 11.6% error reduction from the best published result (1.388 and 2.47). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 0.95 AEPE on the Sintel training set clean pass, outperforming the best published result (1.29) by 26.9%.
引用
收藏
页码:668 / 685
页数:18
相关论文
共 61 条
  • [1] Black M. J., 1993, [1993] Proceedings Fourth International Conference on Computer Vision, P231, DOI 10.1109/ICCV.1993.378214
  • [2] Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods
    Bruhn A.
    Weickert J.
    Schnörr C.
    [J]. International Journal of Computer Vision, 2005, 61 (3) : 1 - 21
  • [3] A Naturalistic Open Source Movie for Optical Flow Evaluation
    Butler, Daniel J.
    Wulff, Jonas
    Stanley, Garrett B.
    Black, Michael J.
    [J]. COMPUTER VISION - ECCV 2012, PT VI, 2012, 7577 : 611 - 625
  • [4] Carion N., 2020, EUROPEAN C COMPUTER, V12346, P213, DOI 10.1007/978-3-030-58452-8_13
  • [5] BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
    Chan, Kelvin C. K.
    Wang, Xintao
    Yu, Ke
    Dong, Chao
    Loy, Chen Change
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4945 - 4954
  • [6] Chen Gao, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P713, DOI 10.1007/978-3-030-58610-2_42
  • [7] Pre-Trained Image Processing Transformer
    Chen, Hanting
    Wang, Yunhe
    Guo, Tianyu
    Xu, Chang
    Deng, Yiping
    Liu, Zhenhua
    Ma, Siwei
    Xu, Chunjing
    Xu, Chao
    Gao, Wen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
  • [8] Cho S., 2021, Advances in Neural Information Processing Systems, V34
  • [9] Chu X., 2021, arXiv
  • [10] Dai ZH, 2019, Arxiv, DOI arXiv:1901.02860