Learnable Pooling Methods for Video Classification

被引：2

作者：

Kmiec, Sebastian ^{[1
]}

Bae, Juhan ^{[1
]}

An, Ruijian ^{[1
]}

机构：

[1] Univ Toronto, Toronto, ON, Canada

来源：

COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV | 2019年 / 11132卷

关键词：

Video classification; Youtube-8M; NetVLAD; Attention; Pooling; Aggregation;

D O I：

10.1007/978-3-030-11018-5_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce modifications to state-of-the-art approaches to aggregating local video descriptors by using attention mechanisms and function approximations. Rather than using ensembles of existing architectures, we provide an insight on creating new architectures. We demonstrate our solutions in the "The 2nd YouTube-8M Video Understanding Challenge", by using frame-level video and audio descriptors. We obtain testing accuracy similar to the state of the art, while meeting budget constraints, and touch upon strategies to improve the state of the art. Model implementations are available in https://github.com/pomonam/LearnablePoolingMethods.

引用

页码：229 / 238

页数：10

共 20 条

[11] Kingma DP, 2014, ADV NEUR IN, V27
[12] Long X., 2017, ABS171109550 CORR
[13] Radenovic F., 2018, CORR
[14] Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
Radenovic, Filip
Iscen, Ahmet
Tolias, Giorgos
Avrithis, Yannis
Chum, Ondrej
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5706 - 5715
[15] Do TT, 2015, PROC CVPR IEEE, P3556, DOI 10.1109/CVPR.2015.7298978
[16] Tolias G, 2016, INT J COMPUT VISION, V116, P247, DOI 10.1007/s11263-015-0810-4
[17] Vaswani A, 2017, ADV NEUR IN, V30
[18] Xie S., 2017, arXiv, P5
[19] Yu K., 2010, ICML 10, P1215
[20] Zhu Y., 2018, ARXIV180800288

← 1 2 →