Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video Commenting

被引:3
作者
Fu, Fengyi [1 ]
Fang, Shancheng [1 ]
Chen, Weidong [1 ]
Mao, Zhendong [1 ]
机构
[1] Univ Sci & Technol China, 100 Fuxing Rd, Hefei 230000, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Automatic live video commenting; multi-modal learning; variational autoencoder; batch attention mechanism;
D O I
10.1145/3633334
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic live video commenting is getting increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from current methods. Sentimental factors are critical in interactive commenting, and there has been lack of research so far. Thus, in this article, we propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network, which consists of a sentiment-oriented diversity encoder module and a batch attention module, to achieve diverse video commenting with multiple sentiments and multiple semantics. Specifically, our sentiment-oriented diversity encoder elegantly combines a VAE and random mask mechanism to achieve semantic diversity under sentiment guidance, which is then fused with cross-modal features to generate live video comments. A batch attention module is also proposed in this article to alleviate the problem of missing sentimental samples, caused by the data imbalance that is common in live videos as the popularity of videos varies. Extensive experiments on Livebot and VideoIC datasets demonstrate that the proposed So-TVAE outperforms the state-of-the-art methods in terms of the quality and diversity of generated comments. Related code is available at https://github.com/fufy1024/So-TVAE.
引用
收藏
页数:24
相关论文
共 117 条
  • [91] Wu Hao, 2021, INT C MULTIMODAL INT
  • [92] Wu L, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1431
  • [93] Learning Multimodal Attention LSTM Networks for Video Captioning
    Xu, Jun
    Yao, Ting
    Zhang, Yongdong
    Mei, Tao
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 537 - 545
  • [94] Xu K, 2015, PR MACH LEARN RES, V37, P2048
  • [95] Yan HQ, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P3364
  • [96] Constrained LSTM and Residual Attention for Image Captioning
    Yang, Liang
    Hu, Haifeng
    Xing, Songlong
    Lu, Xinlong
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
  • [97] Yang PC, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2680
  • [98] A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPU
    Yang, Wangdong
    Li, Kenli
    Li, Keqin
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2019, 13 (06)
  • [99] HERDING EFFECT BASED ATTENTION FOR PERSONALIZED TIME-SYNC VIDEO RECOMMENDATION
    Yang, Wenmian
    Gao, Wenyuan
    Zhou, Xiaojie
    Jia, Weijia
    Zhang, Shaohua
    Luo, Yutao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 454 - 459
  • [100] Describing Videos by Exploiting Temporal Structure
    Yao, Li
    Torabi, Atousa
    Cho, Kyunghyun
    Ballas, Nicolas
    Pal, Christopher
    Larochelle, Hugo
    Courville, Aaron
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4507 - 4515