Semantic similarity information discrimination for video captioning

被引:4
作者
Du, Sen [1 ]
Zhu, Hong [1 ]
Xiong, Ge [1 ]
Lin, Guangfeng [2 ]
Wang, Dong [1 ]
Shi, Jing [1 ]
Wang, Jing [2 ]
Xing, Nan [1 ]
机构
[1] Xian Univ Technol, Sch Automation & Informat Engn, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
[2] Xian Univ Technol, Informat Sci Dept, 5 South Jinhua Rd, Xian 710048, Shaanxi, Peoples R China
关键词
Video captioning; Semantic detection; Bilinear pooling; Channel attention; Natural language processing; NETWORK;
D O I
10.1016/j.eswa.2022.118985
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning is a task that aims to automatically describe objects and their actions in videos using natural language sentences. The correct understanding of vision and language information is critical for video captioning tasks. Many existing methods usually fuse different features to generate sentences. However, the sentences have many improper nouns and verbs. Inspired by the successes of fine-grained visual recognition, we treat the problem of improper words to discriminate semantic similarity information. In this paper, we designed a semantic bilinear block (SBB) to widen the gap between the probability of existing and nonexistent words, which can capture more fine-grained features to discriminate semantic information. Moreover, our designed linear attention block (LAB) implements the channelwise attention for the 1-D feature by simplifying the squeeze-and-excitation structure. Furthermore, we designed a semantic discrimination network (SDN) that integrates the LAB and SBB into video encoder and decoder to leverage successful channelwise attention and discriminate semantic similarity information for better video captioning. Experiments on two widely used datasets, MSVD and MSR-VTT, demonstrate that our proposed SDN can achieve better performance than state-of-the-art methods.
引用
收藏
页数:12
相关论文
共 50 条
[31]   Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning [J].
Shi, Botian ;
Ji, Lei ;
Niu, Zhendong ;
Duan, Nan ;
Zhou, Ming ;
Chen, Xilin .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :4337-4345
[32]   Global-Local Combined Semantic Generation Network for Video Captioning [J].
Mao L. ;
Gao H. ;
Yang D. .
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (09) :1374-1382
[33]   A Review Of Video Captioning Methods [J].
Mahajan, Dewarthi ;
Bhosale, Sakshi ;
Nighot, Yash ;
Tayal, Madhuri .
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05) :708-715
[34]   Survey of Dense Video Captioning [J].
Huang, Xiankai ;
Zhang, Jiayu ;
Wang, Xinyu ;
Wang, Xiaochuan ;
Liu, Ruijun .
Computer Engineering and Applications, 2023, 59 (12) :28-48
[35]   The Assisted Environment Information for Blind based on Video Captioning Method [J].
Huang, Yung-Hsin ;
Hsieh, Yi-Zeng .
2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
[36]   MIVCN: Multimodal interaction video captioning network based on semantic association graph [J].
Wang, Ying ;
Huang, Guoheng ;
Lin Yuming ;
Yuan, Haoliang ;
Pun, Chi-Man ;
Ling, Wing-Kuen ;
Cheng, Lianglun .
APPLIED INTELLIGENCE, 2022, 52 (05) :5241-5260
[37]   Memory-attended semantic context-aware network for video captioning [J].
Chen, Shuqin ;
Zhong, Xian ;
Wu, Shifeng ;
Sun, Zhixin ;
Liu, Wenxuan ;
Jia, Xuemei ;
Xia, Hongxia .
SOFT COMPUTING, 2021, 28 (Suppl 2) :425-425
[38]   MIVCN: Multimodal interaction video captioning network based on semantic association graph [J].
Ying Wang ;
Guoheng Huang ;
Lin Yuming ;
Haoliang Yuan ;
Chi-Man Pun ;
Wing-Kuen Ling ;
Lianglun Cheng .
Applied Intelligence, 2022, 52 :5241-5260
[39]   Bilingual video captioning model for enhanced video retrieval [J].
Alrebdi, Norah ;
Al-Shargabi, Amal A. .
JOURNAL OF BIG DATA, 2024, 11 (01)
[40]   Modeling Context-Guided Visual and Linguistic Semantic Feature for Video Captioning [J].
Sun, Zhixin ;
Zhong, Xian ;
Chen, Shuqin ;
Liu, Wenxuan ;
Feng, Duxiu ;
Li, Lin .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 :677-689