Quality Enhancement Based Video Captioning in Video Communication Systems

被引:0
作者
Le, The Van [1 ]
Lee, Jin Young [1 ]
机构
[1] Sejong Univ, Dept Intelligent Mechatron Engn, Seoul 05006, South Korea
基金
新加坡国家研究基金会;
关键词
Quality enhancement; quality classification; super-resolution (SR); video captioning; NEURAL-NETWORKS;
D O I
10.1109/ACCESS.2024.3378313
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning is an automatic task that collects natural language to represent visual content. Recently, it has achieved lots of amazing progress thanks to deep learning techniques. Most techniques have mainly focused on a deep learning network architecture, whereas video quality and resolution have not been fully considered, although their impact on captioning performance is very strong. Since video communication systems usually perform compression, original quality and resolution can be degraded and down-sampled for significant reduction of the data size, which results in severe quality degradation. Hence, this paper analyzes the impact of the compression and the down-sampling on the captioning, and proposes a quality enhancement method for the video captioning. First, the proposed method performs quality classification to investigate the quality of each frame. Next, super-resolution (SR) is used to enhance the frames in terms of their quality and resolution. Finally, a video captioning network uses the enhanced frames to generate accurate sentences. Experimental results show that the proposed method drastically improves the captioning performance, when both quality and resolution of input videos are randomly determined.
引用
收藏
页码:40989 / 40999
页数:11
相关论文
共 83 条
[1]  
[Anonymous], 2022, Int. J. Commun. Netw. Inf. Secur. (IJCNIS), V14, P91
[2]   A new super resolution Faster R-CNN model based detection and classification of urine sediments [J].
Avci, Derya ;
Sert, Eser ;
Dogantekin, Esin ;
Yildirim, Ozal ;
Tadeusiewicz, Ryszard ;
Plawiak, Pawel .
BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2023, 43 (01) :58-68
[3]   Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding [J].
Bevilacqua, Marco ;
Roumy, Aline ;
Guillemot, Christine ;
Morel, Marie-Line Alberi .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[4]   On the use of deep learning for blind image quality assessment [J].
Bianco, Simone ;
Celona, Luigi ;
Napoletano, Paolo ;
Schettini, Raimondo .
SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (02) :355-362
[5]  
Bosse S, 2016, IEEE IMAGE PROC, P3773, DOI 10.1109/ICIP.2016.7533065
[6]   Overview of the Versatile Video Coding (VVC) Standard and its Applications [J].
Bross, Benjamin ;
Wang, Ye-Kui ;
Ye, Yan ;
Liu, Shan ;
Chen, Jianle ;
Sullivan, Gary J. ;
Ohm, Jens-Rainer .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) :3736-3764
[7]  
Chen D.L., 2011, ACL, V1, P190
[8]   Super-resolution guided knowledge distillation for low-resolution image classification [J].
Chen, Hongyuan ;
Pei, Yanting ;
Zhao, Hongwei ;
Huang, Yaping .
PATTERN RECOGNITION LETTERS, 2022, 155 :62-68
[9]  
Chen Z, 2023, Arxiv, DOI arXiv:2308.03364
[10]  
Cho Kyunghyun., 2014, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing EMNLP, P1724