VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation

被引:9
作者
Wang, Weiying [1 ]
Chen, Jieting [1 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
danmaku dataset; comments generation; multimodal interaction;
D O I
10.1145/3394171.3413890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Live video interactive commenting, a.k.a. danmaku, is an emerging social feature on online video sites, which involves rich multimodal information interaction among viewers. In order to support various related research, we build a large scale video interactive comments dataset called VideoIC, which consists of 4951 videos spanning 557 hours and 5 million comments. Videos are collected from popular categories on the `Bilibili' video streaming website. Comparing to other existing danmaku datasets, our VideoIC contains richer and denser comments information, with 1077 comments per video on average. High comment density and diverse video types make VideoIC a challenging corpus for various research such as automatic video comments generation. We also propose a novel model based on multimodal multitask learning for comment generation (MML-CG), which integrates multiple modalities to achieve effective comment generation and temporal relation prediction. A multitask loss function is designed to train both tasks jointly in the end-to-end manner. We conduct extensive experiments on both VideoIC and Livebot datasets. The results prove the effectiveness of our model and reveal some features of danmaku.
引用
收藏
页码:2599 / 2607
页数:9
相关论文
共 36 条
[21]  
Lei Jie, 2019, Tvqa+: Spatio-temporal grounding for video question answering
[22]   Visual-Texual Emotion Analysis With Deep Coupled Video and Danmu Neural Networks [J].
Li, Chenchen ;
Wang, Jialin ;
Wang, Hongwei ;
Zhao, Miao ;
Li, Wenjie ;
Deng, Xiaotie .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (06) :1634-1646
[23]  
Lu JS, 2016, ADV NEUR IN, V29
[24]  
Ma SM, 2019, AAAI CONF ARTIF INTE, P6810
[25]   BLEU: a method for automatic evaluation of machine translation [J].
Papineni, K ;
Roukos, S ;
Ward, T ;
Zhu, WJ .
40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, :311-318
[26]  
Sutskever I, 2014, ADV NEUR IN, V27
[27]   COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis [J].
Tang, Yansong ;
Ding, Dajun ;
Rao, Yongming ;
Zheng, Yu ;
Zhang, Danyang ;
Zhao, Lili ;
Lu, Jiwen ;
Zhou, Jie .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1207-1216
[28]  
Vedantam R, 2015, PROC CVPR IEEE, P4566, DOI 10.1109/CVPR.2015.7299087
[29]  
Vinyals O, 2015, PROC CVPR IEEE, P3156, DOI 10.1109/CVPR.2015.7298935
[30]   Beyond the Watching: Understanding Viewer Interactions in Crowdsourced Live Video Broadcasting Services [J].
Wang, Xiaodong ;
Tian, Ye ;
Lan, Rongheng ;
Yang, Wen ;
Zhang, Xinming .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) :3454-3468