Short-term anchor linking and long-term self-guided attention for video object detection

被引:9
作者
Cores, Daniel [1 ]
Brea, Victor M. [1 ]
Mucientes, Manuel [1 ]
机构
[1] Univ Santiago de Compostela, Ctr Singular Invest Tecnoloxias Intelixentes CiTI, Santiago De Compostela, Spain
关键词
Video object detection; Spatio-temporal features; Convolutional neural networks;
D O I
10.1016/j.imavis.2021.104179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new network architecture able to take advantage of spatio-temporal information available in videos to boost object detection precision. First, box features are associated and aggregated by linking proposals that come from the same anchor box in the nearby frames. Then, we design a new attention module that aggregates short-term enhanced box features to exploit long-term spatio-temporal information. This module takes advantage of geometrical features in the long-term for the first time in the video object detection domain. Finally, a spatio-temporal double head is fed with both spatial information from the reference frame and the aggregated information that takes into account the shortand long-term temporal context. We have tested our proposal in five video object detection datasets with very different characteristics, in order to prove its robustness in a wide number of scenarios. Non-parametric statistical tests show that our approach outperforms the state-ofthe-art. Our code is available at https://github.com/daniel-cores/SLTnet. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:9
相关论文
共 40 条
[1]   Object Detection in Video with Spatiotemporal Sampling Networks [J].
Bertasius, Gedas ;
Torresani, Lorenzo ;
Shi, Jianbo .
COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :342-357
[2]  
Bosquet B., 2018, P BMVC
[3]   STDnet: Exploiting high resolution feature maps for small object detection [J].
Bosquet, Brais ;
Mucientes, Manuel ;
Brea, Victor M. .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 91
[4]   Memory Enhanced Global-Local Aggregation for Video Object Detection [J].
Chen, Yihong ;
Cao, Yue ;
Hu, Han ;
Wang, Liwei .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10334-10343
[5]   RoI Feature Propagation for Video Object Detection [J].
Cores, Daniel ;
Mucientes, Manuel ;
Brea, Victor M. .
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 :2680-2687
[6]  
Dendorfer Patrick, 2020, CoRR abs/2003.09003
[7]   Object Guided External Memory Network for Video Object Detection [J].
Deng, Hanming ;
Hua, Yang ;
Song, Tao ;
Zhang, Zongpu ;
Xue, Zhengui ;
Ma, Ruhui ;
Robertson, Neil ;
Guan, Haibing .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6677-6686
[8]   Relation Distillation Networks for Video Object Detection [J].
Deng, Jiajun ;
Pan, Yingwei ;
Yao, Ting ;
Zhou, Wengang ;
Li, Houqiang ;
Mei, Tao .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7022-7031
[9]   The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking [J].
Du, Dawei ;
Qi, Yuankai ;
Yu, Hongyang ;
Yang, Yifan ;
Duan, Kaiwen ;
Li, Guorong ;
Zhang, Weigang ;
Huang, Qingming ;
Tian, Qi .
COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 :375-391
[10]   Detect to Track and Track to Detect [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3057-3065