VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation

被引:9
作者
Wang, Weiying [1 ]
Chen, Jieting [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
danmaku dataset; comments generation; multimodal interaction;
D O I
10.1145/3394171.3413890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Live video interactive commenting, a.k.a. danmaku, is an emerging social feature on online video sites, which involves rich multimodal information interaction among viewers. In order to support various related research, we build a large scale video interactive comments dataset called VideoIC, which consists of 4951 videos spanning 557 hours and 5 million comments. Videos are collected from popular categories on the `Bilibili' video streaming website. Comparing to other existing danmaku datasets, our VideoIC contains richer and denser comments information, with 1077 comments per video on average. High comment density and diverse video types make VideoIC a challenging corpus for various research such as automatic video comments generation. We also propose a novel model based on multimodal multitask learning for comment generation (MML-CG), which integrates multiple modalities to achieve effective comment generation and temporal relation prediction. A multitask loss function is designed to train both tasks jointly in the end-to-end manner. We conduct extensive experiments on both VideoIC and Livebot datasets. The results prove the effectiveness of our model and reveal some features of danmaku.
引用
收藏
页码:2599 / 2607
页数:9
相关论文
共 36 条
[31]   A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPU [J].
Yang, Wangdong ;
Li, Kenli ;
Li, Keqin .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2019, 13 (06)
[32]   HERDING EFFECT BASED ATTENTION FOR PERSONALIZED TIME-SYNC VIDEO RECOMMENDATION [J].
Yang, Wenmian ;
Gao, Wenyuan ;
Zhou, Xiaojie ;
Jia, Weijia ;
Zhang, Shaohua ;
Luo, Yutao .
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, :454-459
[33]   Deep Modular Co-Attention Networks for Visual Question Answering [J].
Yu, Zhou ;
Yu, Jun ;
Cui, Yuhao ;
Tao, Dacheng ;
Tian, Qi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6274-6283
[34]   Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering [J].
Yu, Zhou ;
Yu, Jun ;
Fan, Jianping ;
Tao, Dacheng .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1839-1848
[35]  
Zhong Victor, 2017, INT C LEARN REPR
[36]  
Zhou LW, 2018, AAAI CONF ARTIF INTE, P7590