VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation

被引：9

作者：

Wang, Weiying ^{[1
]}

Chen, Jieting ^{[1
]}

Jin, Qin ^{[1
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

danmaku dataset; comments generation; multimodal interaction;

D O I：

10.1145/3394171.3413890

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Live video interactive commenting, a.k.a. danmaku, is an emerging social feature on online video sites, which involves rich multimodal information interaction among viewers. In order to support various related research, we build a large scale video interactive comments dataset called VideoIC, which consists of 4951 videos spanning 557 hours and 5 million comments. Videos are collected from popular categories on the `Bilibili' video streaming website. Comparing to other existing danmaku datasets, our VideoIC contains richer and denser comments information, with 1077 comments per video on average. High comment density and diverse video types make VideoIC a challenging corpus for various research such as automatic video comments generation. We also propose a novel model based on multimodal multitask learning for comment generation (MML-CG), which integrates multiple modalities to achieve effective comment generation and temporal relation prediction. A multitask loss function is designed to train both tasks jointly in the end-to-end manner. We conduct extensive experiments on both VideoIC and Livebot datasets. The results prove the effectiveness of our model and reveal some features of danmaku.

引用

页码：2599 / 2607

页数：9

共 36 条

[1]

[Anonymous], 2004, WORKSH TEXT SUMM BRA

[2] Stories That Big Danmaku Data Can Tell as a New Media [J].

Bai, Qingchun ;

Hu, Qinmin Vivian ;

Ge, Linhui ;

He, Liang .

IEEE ACCESS, 2019, 7 :53509-53519

[3]

Chen Shizhe, 2019, ARXIV190705092

[4] Personalized Key Frame Recommendation [J].

Chen, Xu ;

Zhang, Yongfeng ;

Ai, Qingyao ;

Xu, Hongteng ;

Yan, Junchi ;

Qin, Zheng .

SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :315-324

[5]

Cho K., 2014, P EMPIRICAL METHODS, P1724, DOI DOI 10.3115/V1/D14-1179

[6] Visual Dialog [J].

Das, Abhishek ;

Kottur, Satwik ;

Gupta, Khushi ;

Singh, Avi ;

Yadav, Deshraj ;

Moura, Jose M. F. ;

Parikh, Devi ;

Batra, Dhruv .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1080-1089

[7] A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching [J].

Das, Pradipto ;

Xu, Chenliang ;

Doell, Richard F. ;

Corso, Jason J. .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2634-2641

[8]

Denkowski M., 2014, P 9 WORKSH STAT MACH, P376

[9] Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering [J].

Duy-Kien Nguyen ;

Okatani, Takayuki .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6087-6096

[10]

Fjord, 2018, LIBROSA LIBROSA 0 6, DOI DOI 10.5281/ZEN0D0.1342708

← 1 2 3 4 →