Large-Scale Training Framework for Video Annotation

被引:0
|
作者
Hwang, Seong Jae [1 ,2 ]
Lee, Joonseok [2 ]
Varadarajan, Balakrishnan [2 ]
Gordon, Ariel [2 ]
Xu, Zheng [2 ]
Natsev, Apostol [2 ]
机构
[1] Univ Wisconsin, Madison, WI 53706 USA
[2] Google Res, Mountain View, CA USA
关键词
Scalability; Distributed framework; Video annotation; MapReduce;
D O I
10.1145/3292500.3330653
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video is one of the richest sources of information available online but extracting deep insights from video content at internet scale is still an open problem, both in terms of depth and breadth of understanding, as well as scale. Over the last few years, the field of video understanding has made great strides due to the availability of large-scale video datasets and core advances in image, audio, and video modeling architectures. However, the state-of-the-art architectures on small scale datasets are frequently impractical to deploy at internet scale, both in terms of the ability to train such deep networks on hundreds of millions of videos, and to deploy them for inference on billions of videos. In this paper, we present a MapReduce-based training framework, which exploits both data parallelism and model parallelism to scale training of complex video models. The proposed framework uses alternating optimization and full-batch fine-tuning, and supports large Mixture-of-Experts classifiers with hundreds of thousands of mixtures, which enables a trade-off between model depth and breadth, and the ability to shift model capacity between shared (generalization) layers and per-class (specialization) layers. We demonstrate that the proposed framework is able to reach state-of-the-art performance on the largest public video datasets, YouTube-8M and Sports-1M, and can scale to 100 times larger datasets.
引用
收藏
页码:2394 / 2402
页数:9
相关论文
共 50 条
  • [1] Automatic Concept Detector Refinement for Large-Scale Video Semantic Annotation
    Liu, Xueliang
    Huet, Benoit
    2010 IEEE FOURTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2010), 2010, : 97 - 100
  • [2] A Framework of Large-scale and Real-time Image Annotation System
    Li, Ran
    Lu, Jianjiang
    Zhang, Yafei
    Lu, Zining
    Xu, Weiguang
    FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 576 - 579
  • [3] Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation
    Jiang, Yu-Gang
    Dai, Qi
    Wang, Jun
    Ngo, Chong-Wah
    Xue, Xiangyang
    Chang, Shih-Fu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2012, 21 (06) : 3080 - 3091
  • [4] Large-scale annotation of proteins with labelling methods
    Casadio, R.
    Martelli, P. L.
    Savojardo, C.
    Fariselli, P.
    NUOVO CIMENTO C-COLLOQUIA AND COMMUNICATIONS IN PHYSICS, 2012, 35 (05): : 7 - 25
  • [5] A Framework for Large-Scale Analysis of Video "in the Wild" to Assist Digital Forensic Examination
    Axenopoulos, Apostolos
    Eiselein, Volker
    Penta, Antonio
    Koblents, Eugenia
    La Mattina, Ernesto
    Daras, Petros
    IEEE SECURITY & PRIVACY, 2019, 17 (01) : 23 - 33
  • [6] Large-scale video monitoring system
    Kobayashi, Kazuaki
    NEC Technical Journal, 2010, 5 (03): : 39 - 42
  • [7] Large Scale Arabic Error Annotation: Guidelines and Framework
    Zaghouani, Wajdi
    Mohit, Behrang
    Habash, Nizar
    Obeid, Ossama
    Tomeh, Nadi
    Rozovskaya, Alla
    Farra, Noura
    Alkuhlani, Sarah
    Oflazer, Kemal
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2362 - 2369
  • [8] Large-scale protein annotation through gene ontology
    Xie, HQ
    Wasserman, A
    Levine, Z
    Novik, A
    Grebinskiy, V
    Shoshan, A
    Mintz, L
    GENOME RESEARCH, 2002, 12 (05) : 785 - 794
  • [9] Large-scale mutational analysis for the annotation of the mouse genome
    Beckers, J
    de Angelis, MH
    CURRENT OPINION IN CHEMICAL BIOLOGY, 2002, 6 (01) : 17 - 23
  • [10] Large-Scale Image Annotation using Visual Synset
    Tsai, David
    Jing, Yushi
    Liu, Yi
    Rowley, Henry A.
    Ioffe, Sergey
    Rehg, James M.
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 611 - 618