Multiple Temporal Fusion based Weakly-supervised Pre-training Techniques for Video Categorization

被引:1
作者
Cai, Xiaochen [1 ]
Cai, Hengxing [1 ]
Zhu, Boqing [2 ]
Xu, Kele [2 ]
Tu, Weiwei [1 ]
Feng, Dawei [2 ]
机构
[1] 4Paradigm Inc, Beijing, Peoples R China
[2] Natl Univ Def Technol, Changsha, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
Representation Learning; Video Understanding; Pre-training; Video Transformer; Temporal Resolution;
D O I
10.1145/3503161.3551585
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we present our solution of the ACM Multimedia 2022 pre-training for video understanding challenge. First, we pre-train the models on large-scale weakly-supervised video datasets with different temporal resolutions, then fine-tune the model for downstream application. Quantitative comparisons are conducted to evaluate the performance of different networks at multiple temporal resolutions. Moreover, we fusion different pre-trained models through weighted averaging. We achieve an accuracy of 62.39% in the testing set, which ranked as the first place in the video categorization track of this challenge.
引用
收藏
页码:7089 / 7093
页数:5
相关论文
共 24 条
[1]   PoseTrack: A Benchmark for Human Pose Estimation and Tracking [J].
Andriluka, Mykhaylo ;
Iqbal, Umar ;
Insafutdinov, Eldar ;
Pishchulin, Leonid ;
Milan, Anton ;
Gall, Juergen ;
Schiele, Bernt .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5167-5176
[2]  
[Anonymous], 2016, P IEEE C COMP VIS PA
[3]  
[Anonymous], 2016, What makes imagenet good for transfer learning?
[4]  
Bao Hangbo, 2021, PROC INT C LEARN REP
[5]  
Bertasius G, 2021, PR MACH LEARN RES, V139
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[8]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9]  
Devlin J., 2018, P C N AM CHAPT ASS C, P1
[10]  
Donahue J, 2014, PR MACH LEARN RES, V32