Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video Commenting

被引:3
作者
Fu, Fengyi [1 ]
Fang, Shancheng [1 ]
Chen, Weidong [1 ]
Mao, Zhendong [1 ]
机构
[1] Univ Sci & Technol China, 100 Fuxing Rd, Hefei 230000, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Automatic live video commenting; multi-modal learning; variational autoencoder; batch attention mechanism;
D O I
10.1145/3633334
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic live video commenting is getting increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from current methods. Sentimental factors are critical in interactive commenting, and there has been lack of research so far. Thus, in this article, we propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network, which consists of a sentiment-oriented diversity encoder module and a batch attention module, to achieve diverse video commenting with multiple sentiments and multiple semantics. Specifically, our sentiment-oriented diversity encoder elegantly combines a VAE and random mask mechanism to achieve semantic diversity under sentiment guidance, which is then fused with cross-modal features to generate live video comments. A batch attention module is also proposed in this article to alleviate the problem of missing sentimental samples, caused by the data imbalance that is common in live videos as the popularity of videos varies. Extensive experiments on Livebot and VideoIC datasets demonstrate that the proposed So-TVAE outperforms the state-of-the-art methods in terms of the quality and diversity of generated comments. Related code is available at https://github.com/fufy1024/So-TVAE.
引用
收藏
页数:24
相关论文
共 117 条
  • [21] Duan Chaoqun, 2020, Multimodal matching transformer for live commenting
  • [22] Every Picture Tells a Story: Generating Sentences from Images
    Farhadi, Ali
    Hejrati, Mohsen
    Sadeghi, Mohammad Amin
    Young, Peter
    Rashtchian, Cyrus
    Hockenmaier, Julia
    Forsyth, David
    [J]. COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 : 15 - +
  • [23] StyleNet: Generating Attractive Visual Captions with Styles
    Gan, Chuang
    Gan, Zhe
    He, Xiaodong
    Gao, Jianfeng
    Deng, Li
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 955 - 964
  • [24] Self-critical n-step Training for Image Captioning
    Gao, Junlong
    Wang, Shiqi
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6293 - 6301
  • [25] Gong YC, 2014, LECT NOTES COMPUT SC, V8692, P529, DOI 10.1007/978-3-319-10593-2_35
  • [26] MSCap: Multi-Style Image Captioning with Unpaired Stylized Text
    Guo, Longteng
    Liu, Jing
    Yao, Peng
    Li, Jiangwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4199 - 4208
  • [27] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [28] Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [29] Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics
    Hodosh, Micah
    Young, Peter
    Hockenmaier, Julia
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 : 853 - 899
  • [30] Hou RB, 2019, ADV NEUR IN, V32