Learnable Pooling Methods for Video Classification

被引:2
作者
Kmiec, Sebastian [1 ]
Bae, Juhan [1 ]
An, Ruijian [1 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
来源
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV | 2019年 / 11132卷
关键词
Video classification; Youtube-8M; NetVLAD; Attention; Pooling; Aggregation;
D O I
10.1007/978-3-030-11018-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce modifications to state-of-the-art approaches to aggregating local video descriptors by using attention mechanisms and function approximations. Rather than using ensembles of existing architectures, we provide an insight on creating new architectures. We demonstrate our solutions in the "The 2nd YouTube-8M Video Understanding Challenge", by using frame-level video and audio descriptors. We obtain testing accuracy similar to the state of the art, while meeting budget constraints, and touch upon strategies to improve the state of the art. Model implementations are available in https://github.com/pomonam/LearnablePoolingMethods.
引用
收藏
页码:229 / 238
页数:10
相关论文
共 20 条
  • [1] Abu-El-Haija S., 2016, ARXIV160908675
  • [2] [Anonymous], 2017, CORR
  • [3] [Anonymous], 2017, ARXIV170803805
  • [4] Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/CVPR.2016.572, 10.1109/TPAMI.2017.2711011]
  • [5] Brock Andre, 2016, CORR
  • [6] Girdhar R., ACTIONVLAD LEARNING
  • [7] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [8] Aggregating Local Image Descriptors into Compact Codes
    Jegou, Herve
    Perronnin, Florent
    Douze, Matthijs
    Sanchez, Jorge
    Perez, Patrick
    Schmid, Cordelia
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (09) : 1704 - 1716
  • [9] Triangulation embedding and democratic aggregation for image search
    Jegou, Herve
    Zisserman, Andrew
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3310 - 3317
  • [10] HIERARCHICAL MIXTURES OF EXPERTS AND THE EM ALGORITHM
    JORDAN, MI
    JACOBS, RA
    [J]. NEURAL COMPUTATION, 1994, 6 (02) : 181 - 214