Quality Enhancement Based Video Captioning in Video Communication Systems

被引:0
作者
Le, The Van [1 ]
Lee, Jin Young [1 ]
机构
[1] Sejong Univ, Dept Intelligent Mechatron Engn, Seoul 05006, South Korea
基金
新加坡国家研究基金会;
关键词
Quality enhancement; quality classification; super-resolution (SR); video captioning; NEURAL-NETWORKS;
D O I
10.1109/ACCESS.2024.3378313
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning is an automatic task that collects natural language to represent visual content. Recently, it has achieved lots of amazing progress thanks to deep learning techniques. Most techniques have mainly focused on a deep learning network architecture, whereas video quality and resolution have not been fully considered, although their impact on captioning performance is very strong. Since video communication systems usually perform compression, original quality and resolution can be degraded and down-sampled for significant reduction of the data size, which results in severe quality degradation. Hence, this paper analyzes the impact of the compression and the down-sampling on the captioning, and proposes a quality enhancement method for the video captioning. First, the proposed method performs quality classification to investigate the quality of each frame. Next, super-resolution (SR) is used to enhance the frames in terms of their quality and resolution. Finally, a video captioning network uses the enhanced frames to generate accurate sentences. Experimental results show that the proposed method drastically improves the captioning performance, when both quality and resolution of input videos are randomly determined.
引用
收藏
页码:40989 / 40999
页数:11
相关论文
共 83 条
  • [1] [Anonymous], 2022, Int. J. Commun. Netw. Inf. Secur. (IJCNIS), V14, P91
  • [2] A new super resolution Faster R-CNN model based detection and classification of urine sediments
    Avci, Derya
    Sert, Eser
    Dogantekin, Esin
    Yildirim, Ozal
    Tadeusiewicz, Ryszard
    Plawiak, Pawel
    [J]. BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2023, 43 (01) : 58 - 68
  • [3] Ben Niu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P191, DOI 10.1007/978-3-030-58610-2_12
  • [4] Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding
    Bevilacqua, Marco
    Roumy, Aline
    Guillemot, Christine
    Morel, Marie-Line Alberi
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
  • [5] On the use of deep learning for blind image quality assessment
    Bianco, Simone
    Celona, Luigi
    Napoletano, Paolo
    Schettini, Raimondo
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (02) : 355 - 362
  • [6] Bosse S, 2016, IEEE IMAGE PROC, P3773, DOI 10.1109/ICIP.2016.7533065
  • [7] Overview of the Versatile Video Coding (VVC) Standard and its Applications
    Bross, Benjamin
    Wang, Ye-Kui
    Ye, Yan
    Liu, Shan
    Chen, Jianle
    Sullivan, Gary J.
    Ohm, Jens-Rainer
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
  • [8] Chen David, 2011, P 49 ANN M ASS COMPU, P190
  • [9] Super-resolution guided knowledge distillation for low-resolution image classification
    Chen, Hongyuan
    Pei, Yanting
    Zhao, Hongwei
    Huang, Yaping
    [J]. PATTERN RECOGNITION LETTERS, 2022, 155 : 62 - 68
  • [10] Chen Z, 2023, Arxiv, DOI arXiv:2308.03364