FlowFormer: A Transformer Architecture for Optical Flow

被引:146
作者
Huang, Zhaoyang [1 ,3 ]
Shi, Xiaoyu [1 ,3 ]
Zhang, Chao [2 ]
Wang, Qiang [2 ]
Cheung, Ka Chun [3 ]
Qin, Hongwei [4 ]
Dai, Jifeng [4 ]
Li, Hongsheng [1 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Shatin, Hong Kong, Peoples R China
[2] Samsung Telecommun Res, Suwon, South Korea
[3] NVIDIA AI Technol Ctr, Shanghai, Peoples R China
[4] SenseTime Res, Shanghai, Peoples R China
来源
COMPUTER VISION - ECCV 2022, PT XVII | 2022年 / 13677卷
关键词
Optical flow; Cost volume; Transformer; RAFT;
D O I
10.1007/978-3-031-19790-1_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark, FlowFormer achieves 1.144 and 2.183 average end-ponit-error (AEPE) on the clean and final pass, a 17.6% and 11.6% error reduction from the best published result (1.388 and 2.47). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 0.95 AEPE on the Sintel training set clean pass, outperforming the best published result (1.29) by 26.9%.
引用
收藏
页码:668 / 685
页数:18
相关论文
共 61 条
[1]  
Black M. J., 1993, [1993] Proceedings Fourth International Conference on Computer Vision, P231, DOI 10.1109/ICCV.1993.378214
[2]   Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods [J].
Bruhn A. ;
Weickert J. ;
Schnörr C. .
International Journal of Computer Vision, 2005, 61 (3) :1-21
[3]   A Naturalistic Open Source Movie for Optical Flow Evaluation [J].
Butler, Daniel J. ;
Wulff, Jonas ;
Stanley, Garrett B. ;
Black, Michael J. .
COMPUTER VISION - ECCV 2012, PT VI, 2012, 7577 :611-625
[4]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[5]   BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond [J].
Chan, Kelvin C. K. ;
Wang, Xintao ;
Yu, Ke ;
Dong, Chao ;
Loy, Chen Change .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4945-4954
[6]   Flow-edge Guided Video Completion [J].
Gao, Chen ;
Saraf, Ayush ;
Huang, Jia-Bin ;
Kopf, Johannes .
COMPUTER VISION - ECCV 2020, PT XII, 2020, 12357 :713-729
[7]   Pre-Trained Image Processing Transformer [J].
Chen, Hanting ;
Wang, Yunhe ;
Guo, Tianyu ;
Xu, Chang ;
Deng, Yiping ;
Liu, Zhenhua ;
Ma, Siwei ;
Xu, Chunjing ;
Xu, Chao ;
Gao, Wen .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305
[8]  
Cho Seokju, 2021, Advances in Neural Information Processing Systems, V34
[9]  
Chu X., 2021, arXiv, DOI 10.48550/arXiv.2104.13840
[10]  
Dai ZH, 2019, Arxiv, DOI arXiv:1901.02860