LaSOT: A High-quality Large-scale Single Object Tracking Benchmark

被引:126
作者
Fan, Heng [1 ]
Bai, Hexin [2 ]
Lin, Liting [3 ,4 ]
Yang, Fan [2 ]
Chu, Peng [2 ]
Deng, Ge [2 ]
Yu, Sijia [2 ]
Harshit [1 ]
Huang, Mingzhen [1 ]
Liu, Juehuan [2 ]
Xu, Yong [3 ,4 ]
Liao, Chunyuan [5 ]
Yuan, Lin [6 ]
Ling, Haibin [1 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
[2] Temple Univ, Philadelphia, PA 19122 USA
[3] South China Univ Technol, Guangzhou, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
[5] HiScene Informat Technol, Shanghai, Peoples R China
[6] Amazon Web Serv, Palo Alto, CA USA
关键词
Visual tracking; Large-scale benchmark; High-quality dense annotation; Tracking evaluation;
D O I
10.1007/s11263-020-01387-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we presentLaSOT, a high-qualityLarge-scaleSingleObjectTracking benchmark. LaSOT contains a diverse selection of 85 object classes, and offers 1550 totaling more than 3.87 million frames. Each video frame is carefully and manually annotated with a bounding box. This makes LaSOT, to our knowledge, the largest densely annotated tracking benchmark. Our goal in releasing LaSOT is to provide a dedicated high quality platform for both training and evaluation of trackers. The average video length of LaSOT is around 2500 frames, where each video contains various challenge factors that exist in real world video footage,such as the targets disappearing and re-appearing. These longer video lengths allow for the assessment of long-term trackers. To take advantage of the close connection between visual appearance and natural language, we provide language specification for each video in LaSOT. We believe such additions will allow for future research to use linguistic features to improve tracking. Two protocols,full-overlapandone-shot, are designated for flexible assessment of trackers. We extensively evaluate 48 baseline trackers on LaSOT with in-depth analysis, and results reveal that there still exists significant room for improvement. The complete benchmark, tracking results as well as analysis are available at.
引用
收藏
页码:439 / 461
页数:23
相关论文
共 92 条
[1]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.465
[2]  
[Anonymous], 2014, ECCV
[3]  
Babenko B, 2009, PROC CVPR IEEE, P983, DOI 10.1109/CVPRW.2009.5206737
[4]  
Bao CL, 2012, PROC CVPR IEEE, P1830, DOI 10.1109/CVPR.2012.6247881
[5]   Staple: Complementary Learners for Real-Time Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Golodetz, Stuart ;
Miksik, Ondrej ;
Torr, Philip H. S. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1401-1409
[6]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[7]   Learning Discriminative Model Prediction for Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190
[8]  
Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
[9]   Context-aware Deep Feature Compression for High-speed Visual Tracking [J].
Choi, Jongwon ;
Chang, Hyung Jin ;
Fischer, Tobias ;
Yun, Sangdoo ;
Lee, Kyuewang ;
Jeong, Jiyeoup ;
Demiris, Yiannis ;
Choi, Jin Young .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :479-488
[10]   Visual Tracking Using Attention-Modulated Disintegration and Integration [J].
Choi, Jongwon ;
Chang, Hyung Jin ;
Jeong, Jiyeoup ;
Demiris, Yiannis ;
Choi, Jin Young .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4321-4330