Video Captioning with Visual and Semantic Features

被引：5

作者：

Lee, Sujin ^{[1
]}

Kim, Incheol ^{[2
]}

机构：

[1] Kyonggi Univ, Dept Comp Sci, Grad Sch, Suwon, South Korea

[2] Kyonggi Univ, Dept Comp Sci, Suwon, South Korea

来源：

JOURNAL OF INFORMATION PROCESSING SYSTEMS | 2018年 / 14卷 / 06期

关键词：

Attention-Based Caption Generation; Deep Neural Networks; Semantic Feature; Video Captioning;

D O I：

10.3745/JIPS.02.0098

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).

引用

页码：1318 / 1330

页数：13

共 50 条

[1] Attentive Visual Semantic Specialized Network for Video Captioning
Perez-Martin, Jesus
Bustos, Benjamin
Perez, Jorge
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5767 - 5774
[2] Richer Semantic Visual and Language Representation for Video Captioning
Tang, Pengjie
Wang, Hanli
Wang, Hanzhang
Xu, Kaisheng
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1871 - 1876
[3] Video Captioning with Semantic Guiding
Yuan, Jin
Tian, Chunna
Zhang, Xiangnan
Ding, Yuxuan
Wei, Wei
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[4] Chained semantic generation network for video captioning
Mao L.
Gao H.
Yang D.
Zhang R.
Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2022, 30 (24): : 3198 - 3209
[5] Modeling Context-Guided Visual and Linguistic Semantic Feature for Video Captioning
Sun, Zhixin
Zhong, Xian
Chen, Shuqin
Liu, Wenxuan
Feng, Duxiu
Li, Lin
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 677 - 689
[6] Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning
Dong, Shanshan
Niu, Tianzi
Luo, Xin
Liu, Wu
Xu, Xinshun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
[7] Multi-Level Visual Representation with Semantic-Reinforced Learning for Video Captioning
Dong, Chengbo
Chen, Xinru
Chen, Aozhu
Hu, Fan
Wang, Zihan
Li, Xirong
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4750 - 4754
[8] Global semantic enhancement network for video captioning
Luo, Xuemei
Luo, Xiaotong
Wang, Di
Liu, Jinhui
Wan, Bo
Zhao, Lin
PATTERN RECOGNITION, 2024, 145
[9] Adaptive semantic guidance network for video captioning☆
Liu, Yuanyuan
Zhu, Hong
Wu, Zhong
Du, Sen
Wu, Shuning
Shi, Jing
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
[10] MULTIMODAL SEMANTIC ATTENTION NETWORK FOR VIDEO CAPTIONING
Sun, Liang
Li, Bing
Yuan, Chunfeng
Zha, Zhengjun
Hu, Weiming
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1300 - 1305

← 1 2 3 4 5 →