Fully Motion-Aware Network for Video Object Detection

被引:153
作者
Wang, Shiyao [1 ]
Zhou, Yucong [2 ]
Yan, Junjie [2 ]
Deng, Zhidong [1 ]
机构
[1] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Dept Comp Sci, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] SenseTime Res Inst, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2018, PT XIII | 2018年 / 11217卷
基金
国家重点研发计划;
关键词
Video object detection; Feature calibration; Pixel-level; Instance-level; End-to-end;
D O I
10.1007/978-3-030-01261-8_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video objection detection is challenging in the presence of appearance deterioration in certain video frames. One of typical solutions is to enhance per-frame features through aggregating neighboring frames. But the features of objects are usually not spatially calibrated across frames due to motion from object and camera. In this paper, we propose an end-to-end model called fully motion-aware network (MANet), which jointly calibrates the features of objects on both pixel-level and instance-level in a unified framework. The pixel-level calibration is flexible in modeling detailed motion while the instance-level calibration captures more global motion cues in order to be robust to occlusion. To our best knowledge, MANet is the first work that can jointly train the two modules and dynamically combine them according to the motion patterns. It achieves leading performance on the large-scale ImageNet VID dataset.
引用
收藏
页码:557 / 573
页数:17
相关论文
共 31 条
[1]  
[Anonymous], 2016, SEQ NMS VIDEO OBJECT
[2]  
Chen Y., 2017, ABS170701629 CORR
[3]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[4]  
Dai J, 2016, PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), P1796, DOI 10.1109/ICIT.2016.7475036
[5]   FlowNet: Learning Optical Flow with Convolutional Networks [J].
Dosovitskiy, Alexey ;
Fischer, Philipp ;
Ilg, Eddy ;
Haeusser, Philip ;
Hazirbas, Caner ;
Golkov, Vladimir ;
van der Smagt, Patrick ;
Cremers, Daniel ;
Brox, Thomas .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766
[6]  
Feichtenhofer C., 2017, INT C COMP VIS ICCV
[7]  
Girshick R., 2014, P IEEE C COMP VIS PA, DOI [10.1109/CVPR.2014.81, DOI 10.1109/CVPR.2014.81, 10.1109/cvpr.2014.81]
[8]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[9]   Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels [J].
Han, Bo ;
Yao, Quanming ;
Yu, Xingrui ;
Niu, Gang ;
Xu, Miao ;
Hu, Weihua ;
Tsang, Ivor W. ;
Sugiyama, Masashi .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[10]  
He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]