A Character Position-Aware Compression Framework for Screen Text Image

被引:0
|
作者
Zhu, Chen [1 ]
Lu, Guo [1 ]
Chen, Huanbang [2 ]
Feng, Donghui [1 ]
Wang, Shen [1 ]
Zhao, Yan [1 ]
Xie, Rong [1 ]
Song, Li [1 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[2] Huawei Technol Co Ltd, Shenzhen 518129, Peoples R China
[3] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
关键词
Screen content coding; text detection; motion vector prediction and coding; in-loop filter; SCENE TEXT; SEGMENTATION; PREDICTION;
D O I
10.1109/TCSVT.2024.3379675
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text patterns typically exhibit distinct boundaries and sparse color histograms. However, in current hybrid codec frameworks, the positions of coding units are often misaligned with the text patterns, resulting in prediction and color mapping tools consuming a large number of bits to indicate these patterns. Nowadays, some text detection and recognition methods have been proposed to accurately locate and analyze the text regions in screen images. Combined with these techniques, we propose a character position-aware compression framework for screen text image. On the encoder side, a low-complexity detection method is adopted to locate the text characters. Then it copies the detected characters to the position aligned with the coding unit (CU) grid to form a text layer. This text-layer representation can further increase the efficiency of existing screen content coding tools such as Intra Block Copy (IBC). Moreover, we design several compression tools based on this representation. We extend the two Motion Vector (MV) prediction modes: Adaptive Motion Vector Prediction (AMVP) and Merge. We modify the MV encoding syntax according to the layout characteristics of the text layer. We present a Gradient-guided In-loop Filter (GIF) to sharpen the text lines using a convolutional network. Experiments conducted on VVC reference software VTM all_intra configuration show that the proposed framework can achieve an average bitrate savings of 4.6% and 3.6% under the w/GIF and w/o GIF versions, with a corresponding increase in CPU encoding complexity of 72% and 10%.
引用
收藏
页码:8821 / 8835
页数:15
相关论文
共 50 条
  • [1] A position-aware transformer for image captioning
    Deng, Zelin
    Zhou, Bo
    He, Pei
    Huang, Jianfeng
    Alfarraj, Osama
    Tolba, Amr
    Deng, Zelin (zl_deng@sina.com), 2005, Tech Science Press (70): : 2005 - 2021
  • [2] A Position-Aware Transformer for Image Captioning
    Deng, Zelin
    Zhou, Bo
    He, Pei
    Huang, Jianfeng
    Alfarraj, Osama
    Tolba, Amr
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 2065 - 2081
  • [3] Position-aware image captioning with spatial relation
    Duan, Yiqun
    Wang, Zhen
    Wang, Jingya
    Wang, Yu-Kai
    Lin, Chin-Teng
    Neurocomputing, 2022, 497 : 28 - 38
  • [4] Position-aware image captioning with spatial relation
    Duan, Yiqun
    Wang, Zhen
    Wang, Jingya
    Wang, Yu-Kai
    Lin, Chin-Teng
    NEUROCOMPUTING, 2022, 497 : 28 - 38
  • [5] Improving Controllable Text Generation with Position-Aware Weighted Decoding
    Gu, Yuxuan
    Feng, Xiaocheng
    Ma, Sicheng
    Wu, Jiaming
    Gong, Heng
    Qin, Bing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3449 - 3467
  • [6] Geometry Attention Transformer with position-aware LSTMs for image
    Wang, Chi
    Shen, Yulin
    Ji, Luping
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 201
  • [7] A spatial hypermedia framework for position-aware information delivery systems
    Hiramatsu, H
    Sumiya, K
    Uehara, K
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, 2001, 2113 : 754 - 763
  • [8] Compact Position-Aware Attention Network for Image Semantic Segmentation
    Xu, Yajun
    Mao, Zhendong
    Zhang, Peng
    Wang, Bin
    MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 639 - 650
  • [9] Deep Position-Aware Hashing for Semantic Continuous Image Retrieval
    Wang, Ruikui
    Wang, Ruiping
    Qiao, Shishi
    Shan, Shiguang
    Chen, Xilin
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2482 - 2491
  • [10] Position-Aware Cuckoo Filters
    Kwon, Minseok
    Shankar, Vijay
    Reviriego, Pedro
    PROCEEDINGS OF THE 2018 SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS '18), 2018, : 151 - 153