MULTISTREAM HIERARCHICAL BOUNDARY NETWORK FOR VIDEO CAPTIONING

被引：0

作者：

Thang Nguyen ^{[1
]}

Sah, Shagan ^{[1
]}

Ptucha, Raymond ^{[1
]}

机构：

[1] Rochester Inst Technol, Rochester, NY 14623 USA

来源：

2017 IEEE WESTERN NEW YORK IMAGE AND SIGNAL PROCESSING WORKSHOP (WNYISPW) | 2017年

关键词：

video captioning; video boundary; hierarchical models; attention;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Video understanding has become increasingly important as surveillance, social, and informational videos weave themselves into our everyday lives. Video captioning offers a way to summarize, index, and search the data. Most captioning models utilize a video encoder and caption decoder framework. Hierarchical encoders can abstractly capture clip level temporal features to represent a video, but the clips are at fixed time steps. This paper introduces a novel Multistream Hierarchical Boundary (MHB) model which combines a fixed hierarchy recurrent architecture with a soft hierarchy layer by using intrinsic feature boundary cuts within a video to define clips. A novel parametric Gaussian attention allows handling of variable length videos. The intrinsic properties of videos are utilized to form an adaptive hierarchical video representation. This model is trained in an end-to-end fashion for video captioning. The MHB model demonstrates state-of-the-art video captioning results on recent datasets.

引用

页数：5