A Character Position-Aware Compression Framework for Screen Text Image

被引:0
|
作者
Zhu, Chen [1 ]
Lu, Guo [1 ]
Chen, Huanbang [2 ]
Feng, Donghui [1 ]
Wang, Shen [1 ]
Zhao, Yan [1 ]
Xie, Rong [1 ]
Song, Li [1 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[2] Huawei Technol Co Ltd, Shenzhen 518129, Peoples R China
[3] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
关键词
Screen content coding; text detection; motion vector prediction and coding; in-loop filter; SCENE TEXT; SEGMENTATION; PREDICTION;
D O I
10.1109/TCSVT.2024.3379675
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text patterns typically exhibit distinct boundaries and sparse color histograms. However, in current hybrid codec frameworks, the positions of coding units are often misaligned with the text patterns, resulting in prediction and color mapping tools consuming a large number of bits to indicate these patterns. Nowadays, some text detection and recognition methods have been proposed to accurately locate and analyze the text regions in screen images. Combined with these techniques, we propose a character position-aware compression framework for screen text image. On the encoder side, a low-complexity detection method is adopted to locate the text characters. Then it copies the detected characters to the position aligned with the coding unit (CU) grid to form a text layer. This text-layer representation can further increase the efficiency of existing screen content coding tools such as Intra Block Copy (IBC). Moreover, we design several compression tools based on this representation. We extend the two Motion Vector (MV) prediction modes: Adaptive Motion Vector Prediction (AMVP) and Merge. We modify the MV encoding syntax according to the layout characteristics of the text layer. We present a Gradient-guided In-loop Filter (GIF) to sharpen the text lines using a convolutional network. Experiments conducted on VVC reference software VTM all_intra configuration show that the proposed framework can achieve an average bitrate savings of 4.6% and 3.6% under the w/GIF and w/o GIF versions, with a corresponding increase in CPU encoding complexity of 72% and 10%.
引用
收藏
页码:8821 / 8835
页数:15
相关论文
共 50 条
  • [21] A Position-Aware Language Modeling Framework for Extractive Broadcast News Speech Summarization
    Liu, Shih-Hung
    Chen, Kuan-Yu
    Hsieh, Yu-Lun
    Chen, Berlin
    Wang, Hsin-Min
    Yen, Hsu-Chun
    Hsu, Wen-Lian
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2017, 16 (04)
  • [22] Position-aware activity recognition with wearable devices
    Sztyler, Timo
    Stuckenschmidt, Heiner
    Petrich, Wolfgang
    PERVASIVE AND MOBILE COMPUTING, 2017, 38 : 281 - 295
  • [23] Position-Aware Safe Boundary Interpolation Oversampling
    Liu, Yongxu
    Liu, Yan
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5519 - 5526
  • [24] Position-aware Attention for Enhancing the Machine Comprehension
    Liu, Weijie
    Zhao, Jianbo
    Li, Mingzheng
    Li, Si
    Guo, Jun
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 20 - 24
  • [25] POSITION-AWARE ACTIVITY RECOGNITION ON MOBILE PHONES
    Coskun, Doruk
    Incel, Ozlem Durmaz
    Ozgovde, Atay
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1930 - 1933
  • [26] GAP: A Grammar and Position-Aware Framework for Efficient Recognition of Multi-Line Mathematical Formulas
    Yang, Zhe
    Liu, Qi
    Zhang, Kai
    Tong, Shwei
    Chen, Enhong
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 901 - 910
  • [27] Position-Aware Anti-Aliasing Filters for 3D Medical Image Analysis
    Yu, Stanley T.
    Zhou, Hong-Yu
    IEEE ACCESS, 2022, 10 : 100151 - 100159
  • [28] SEPAKE: a structure-enhanced and position-aware knowledge embedding framework for knowledge graph completion
    Yu, Mei
    Jiang, Tingxu
    Yu, Jian
    Zhao, Mankun
    Guo, Jiujiang
    Yang, Ming
    Yu, Ruiguo
    Li, Xuewei
    APPLIED INTELLIGENCE, 2023, 53 (20) : 23113 - 23123
  • [29] Position-aware multimedia mobile learning systems in museums
    Chou, LD
    Wu, CH
    Ho, SP
    Lee, CC
    Proceedings of the IASTED International Conference on Web-Based Education, 2004, : 148 - 150
  • [30] Position-Aware Tagging for Aspect Sentiment Triplet Extraction
    Xu, Lu
    Li, Hao
    Lu, Wei
    Bing, Lidong
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2339 - 2349