Bilingual video captioning model for enhanced video retrieval

被引:1
|
作者
Alrebdi, Norah [1 ]
Al-Shargabi, Amal A. [1 ]
机构
[1] Qassim Univ, Coll Comp, Dept Informat Technol, Buraydah 51452, Saudi Arabia
关键词
Artificial intelligence; Computer vision; Natural language processing; Video retrieval; English video captioning; Arabic video captioning; LANGUAGE; NETWORK; VISION; TEXT;
D O I
10.1186/s40537-024-00878-w
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many video platforms rely on the descriptions that uploaders provide for video retrieval. However, this reliance may cause inaccuracies. Although deep learning-based video captioning can resolve this problem, it has some limitations: (1) traditional keyframe extraction techniques do not consider video length/content, resulting in low accuracy, high storage requirements, and long processing times; (2) Arabic language support in video captioning is not extensive. This study proposes a new video captioning approach that uses an efficient keyframe extraction method and supports both Arabic and English. The proposed keyframe extraction technique uses time- and content-based approaches for better quality captions, fewer storage space requirements, and faster processing. The English and Arabic models use a sequence-to-sequence framework with long short-term memory in both the encoder and decoder. Both models were evaluated on caption quality using four metrics: bilingual evaluation understudy (BLEU), metric for evaluation of translation with explicit ORdering (METEOR), recall-oriented understudy of gisting evaluation (ROUGE-L), and consensus-based image description evaluation (CIDE-r). They were also evaluated using cosine similarity to determine their suitability for video retrieval. The results demonstrated that the English model performed better with regards to caption quality and video retrieval. In terms of BLEU, METEOR, ROUGE-L, and CIDE-r, the English model scored 47.18, 30.46, 62.07, and 59.98, respectively, whereas the Arabic model scored 21.65, 36.30, 44.897, and 45.52, respectively. According to the video retrieval, the English and Arabic models successfully retrieved 67% and 40% of the videos, respectively, with 20% similarity. These models have potential applications in storytelling, sports commentaries, and video surveillance.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Bilingual video captioning model for enhanced video retrieval
    Norah Alrebdi
    Amal A. Al-Shargabi
    Journal of Big Data, 11
  • [2] Rethink video retrieval representation for video captioning
    Tian, Mingkai
    Li, Guorong
    Qi, Yuankai
    Wang, Shuhui
    Sheng, Quan Z.
    Huang, Qingming
    PATTERN RECOGNITION, 2024, 156
  • [3] A Sentence Retrieval Generation Network Guided Video Captioning
    Ye, Ou
    Wang, Mimi
    Yu, Zhenhua
    Fu, Yan
    Yi, Shun
    Deng, Jun
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5675 - 5696
  • [4] Retrieval-augmented Video Encoding for Instructional Captioning
    Jung, Yeonjoon
    Kim, Minsoo
    Choi, Seungtaek
    Seo, Minji
    Hwang, Seung-won
    Kim, Jihyuk
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8554 - 8568
  • [5] Center-enhanced video captioning model with multimodal semantic alignment
    Zhang, Benhui
    Gao, Junyu
    Yuan, Yuan
    NEURAL NETWORKS, 2024, 180
  • [6] A Deep Structured Model for Video Captioning
    Vinodhini, V.
    Sathiyabhama, B.
    Sankar, S.
    Somula, Ramasubbareddy
    INTERNATIONAL JOURNAL OF GAMING AND COMPUTER-MEDIATED SIMULATIONS, 2020, 12 (02) : 44 - 56
  • [7] Hierarchical Video-Moment Retrieval and Step-Captioning
    Zola, Abhay
    Cho, Jaemin
    Kottur, Satwik
    Chen, Xilun
    Oguz, Barlas
    Mehdad, Yashar
    Bansal, Mohit
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23056 - 23065
  • [8] Learning Text-to-Video Retrieval from Image Captioning
    Ventura, Lucas
    Schmid, Cordelia
    Varol, Gul
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854
  • [9] Contrastive topic-enhanced network for video captioning
    Zeng, Yawen
    Wang, Yiru
    Liao, Dongliang
    Li, Gongfu
    Xu, Jin
    Man, Hong
    Liu, Bo
    Xu, Xiangmin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [10] Multiple Videos Captioning Model for Video Storytelling
    Han, Seung-Ho
    Go, Bo-Won
    Choi, Ho-Jin
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 355 - 358