Learnable Pooling Methods for Video Classification

被引:2
作者
Kmiec, Sebastian [1 ]
Bae, Juhan [1 ]
An, Ruijian [1 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
来源
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV | 2019年 / 11132卷
关键词
Video classification; Youtube-8M; NetVLAD; Attention; Pooling; Aggregation;
D O I
10.1007/978-3-030-11018-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce modifications to state-of-the-art approaches to aggregating local video descriptors by using attention mechanisms and function approximations. Rather than using ensembles of existing architectures, we provide an insight on creating new architectures. We demonstrate our solutions in the "The 2nd YouTube-8M Video Understanding Challenge", by using frame-level video and audio descriptors. We obtain testing accuracy similar to the state of the art, while meeting budget constraints, and touch upon strategies to improve the state of the art. Model implementations are available in https://github.com/pomonam/LearnablePoolingMethods.
引用
收藏
页码:229 / 238
页数:10
相关论文
共 20 条
  • [11] Kingma DP, 2014, ADV NEUR IN, V27
  • [12] Long X., 2017, ABS171109550 CORR
  • [13] Radenovic F., 2018, CORR
  • [14] Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
    Radenovic, Filip
    Iscen, Ahmet
    Tolias, Giorgos
    Avrithis, Yannis
    Chum, Ondrej
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5706 - 5715
  • [15] Do TT, 2015, PROC CVPR IEEE, P3556, DOI 10.1109/CVPR.2015.7298978
  • [16] Tolias G, 2016, INT J COMPUT VISION, V116, P247, DOI 10.1007/s11263-015-0810-4
  • [17] Vaswani A, 2017, ADV NEUR IN, V30
  • [18] Xie S., 2017, arXiv, P5
  • [19] Yu K., 2010, ICML 10, P1215
  • [20] Zhu Y., 2018, ARXIV180800288