Better Learning Shot Boundary Detection via Multi-task

被引:2
作者
Zhang, Haoxin [1 ]
Li, Zhimin [2 ]
Lu, Qinglin [1 ]
机构
[1] Tencent Data Platform, Shenzhen, Peoples R China
[2] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
关键词
Shot boundary detection; Spatio-temporal attention; Multi-task; learning; Dynamic loss;
D O I
10.1145/3474085.3479206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Shot boundary detection (SBD) plays an important role in video understanding, since most recent works take the shot as minimal granularity instead of frames for upstream tasks. However, the large variations of hard-cut and gradual-change transitions within shots significantly limit the performance of SBD. To deal with the variations, we propose a multi-task architecture called Transnet++. Transnet++ disentangles the two types of transition and adopts two separate branches to predict them respectively. Two branches share the same video knowledge space and their results are fused for final prediction. Moreover, we propose a spatial attention module (SAM) to enhance the feature representations which suffers from redundant padding region. Meanwhile, a temporal attention module (TAM) is applied to capture the long-term information of the video for alleviating the over-segmentation problem. Experimental results (91.16%.. 1-score) on Tencent AVS Dataset demonstrate the effectiveness and superiority of Transnet++ for SBD.
引用
收藏
页码:4730 / 4734
页数:5
相关论文
共 21 条
[21]   A formal study of shot boundary detection [J].
Yuan, Jinhui ;
Wang, Huiyi ;
Xiao, Lan ;
Zheng, Wujie ;
Li, Jianmin ;
Lin, Fuzong ;
Zhang, Bo .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2007, 17 (02) :168-186