Semantic similarity information discrimination for video captioning

被引:3
|
作者
Du, Sen [1 ]
Zhu, Hong [1 ]
Xiong, Ge [1 ]
Lin, Guangfeng [2 ]
Wang, Dong [1 ]
Shi, Jing [1 ]
Wang, Jing [2 ]
Xing, Nan [1 ]
机构
[1] Xian Univ Technol, Sch Automation & Informat Engn, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
[2] Xian Univ Technol, Informat Sci Dept, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
关键词
Video captioning; Semantic detection; Bilinear pooling; Channel attention; Natural language processing; NETWORK;
D O I
10.1016/j.eswa.2022.118985
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning is a task that aims to automatically describe objects and their actions in videos using natural language sentences. The correct understanding of vision and language information is critical for video captioning tasks. Many existing methods usually fuse different features to generate sentences. However, the sentences have many improper nouns and verbs. Inspired by the successes of fine-grained visual recognition, we treat the problem of improper words to discriminate semantic similarity information. In this paper, we designed a semantic bilinear block (SBB) to widen the gap between the probability of existing and nonexistent words, which can capture more fine-grained features to discriminate semantic information. Moreover, our designed linear attention block (LAB) implements the channelwise attention for the 1-D feature by simplifying the squeeze-and-excitation structure. Furthermore, we designed a semantic discrimination network (SDN) that integrates the LAB and SBB into video encoder and decoder to leverage successful channelwise attention and discriminate semantic similarity information for better video captioning. Experiments on two widely used datasets, MSVD and MSR-VTT, demonstrate that our proposed SDN can achieve better performance than state-of-the-art methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Video Captioning with Semantic Guiding
    Yuan, Jin
    Tian, Chunna
    Zhang, Xiangnan
    Ding, Yuxuan
    Wei, Wei
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [2] Video Captioning with Visual and Semantic Features
    Lee, Sujin
    Kim, Incheol
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (06): : 1318 - 1330
  • [3] Incorporating Textual Similarity in Video Captioning Schemes
    Gkountakos, Konstantinos
    Dimou, Anastasios
    Papadopoulos, Georgios Th.
    Daras, Petros
    2019 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND INNOVATION (ICE/ITMC), 2019,
  • [4] Semantic Enhanced Video Captioning with Multi-feature Fusion
    Niu, Tian-Zi
    Dong, Shan-Shan
    Chen, Zhen-Duo
    Luo, Xin
    Guo, Shanqing
    Huang, Zi
    Xu, Xin-Shun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [5] Improving distinctiveness in video captioning with text-video similarity
    Velda, Vania
    Immanuel, Steve Andreas
    Hendria, Willy Fitra
    Jeong, Cheol
    IMAGE AND VISION COMPUTING, 2023, 136
  • [6] Global semantic enhancement network for video captioning
    Luo, Xuemei
    Luo, Xiaotong
    Wang, Di
    Liu, Jinhui
    Wan, Bo
    Zhao, Lin
    PATTERN RECOGNITION, 2024, 145
  • [7] Adaptive semantic guidance network for video captioning☆
    Liu, Yuanyuan
    Zhu, Hong
    Wu, Zhong
    Du, Sen
    Wu, Shuning
    Shi, Jing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [8] Chained semantic generation network for video captioning
    Mao L.
    Gao H.
    Yang D.
    Zhang R.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2022, 30 (24): : 3198 - 3209
  • [9] MULTIMODAL SEMANTIC ATTENTION NETWORK FOR VIDEO CAPTIONING
    Sun, Liang
    Li, Bing
    Yuan, Chunfeng
    Zha, Zhengjun
    Hu, Weiming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1300 - 1305
  • [10] Discriminative Latent Semantic Graph for Video Captioning
    Bai, Yang
    Wang, Junyan
    Long, Yang
    Hu, Bingzhang
    Song, Yang
    Pagnucco, Maurice
    Guan, Yu
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3556 - 3564