HGR-Net: Hierarchical Graph Reasoning Network for Arbitrary Shape Scene Text Detection

被引:9
作者
Bi, Hengyue [1 ]
Xu, Canhui [1 ]
Shi, Cao [1 ]
Liu, Guozhu [1 ]
Zhang, Honghong [1 ]
Li, Yuteng [1 ]
Dong, Junyu [2 ]
机构
[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266100, Peoples R China
[2] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266100, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text detection; arbitrary shape text; hierarchical relation modeling; graph convolutional network;
D O I
10.1109/TIP.2023.3294822
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a prerequisite step of scene text reading, scene text detection is known as a challenging task due to natural scene text diversity and variability. Most existing methods either adopt bottom-up sub-text component extraction or focus on top-down text contour regression. From a hybrid perspective, we explore hierarchical text instance-level and component-level representation for arbitrarily-shaped scene text detection. In this work, we propose a novel Hierarchical Graph Reasoning Network (HGR-Net), which consists of a Text Feature Extraction Network (TFEN) and a Text Relation Learner Network (TRLN). TFEN adaptively learns multi-grained text candidates based on shared convolutional feature maps, including instance-level text contours and component-level quadrangles. In TRLN, an inter-text graph is constructed to explore global contextual information with position-awareness between text instances, and an intra-text graph is designed to estimate geometric attributes for establishing component-level linkages. Next, we bridge the cross-feed interaction between instance-level and component-level, and it further achieves hierarchical relational reasoning by learning complementary graph embeddings across levels. Experiments conducted on three publicly available benchmarks SCUT-CTW1500, Total-Text, and ICDAR15 have demonstrated that HGR-Net achieves state-of-the-art performance on arbitrary orientation and arbitrary shape scene text detection.
引用
收藏
页码:4142 / 4155
页数:14
相关论文
共 78 条
[1]   Character Region Awareness for Text Detection [J].
Baek, Youngmin ;
Lee, Bado ;
Han, Dongyoon ;
Yun, Sangdoo ;
Lee, Hwalsuk .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9357-9366
[2]   SRRV: A Novel Document Object Detector Based on Spatial-Related Relation and Vision [J].
Bi, Hengyue ;
Xu, Canhui ;
Shi, Cao ;
Liu, Guozhu ;
Li, Yuteng ;
Zhang, Honghong ;
Qu, Jing .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :3788-3798
[3]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[4]   Total-Text: toward orientation robustness in scene text detection [J].
Ch'ng, Chee-Kheng ;
Chan, Chee Seng ;
Liu, Cheng-Lin .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2020, 23 (01) :31-52
[5]  
Chen J., 2021, ADV NEUR IN, V34
[6]   Iterative Visual Reasoning Beyond Convolutions [J].
Chen, Xinlei ;
Li, Li-Jia ;
Li Fei-Fei ;
Gupta, Abhinav .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7239-7248
[7]   ACE: Anchor-Free Corner Evolution for Real-Time Arbitrarily-Oriented Object Detection [J].
Dai, Pengwen ;
Yao, Siyuan ;
Li, Zekun ;
Zhang, Sanyi ;
Cao, Xiaochun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :4076-4089
[8]   Progressive Contour Regression for Arbitrary-Shape Scene Text Detection [J].
Dai, Pengwen ;
Zhang, Sanyi ;
Zhang, Hua ;
Cao, Xiaochun .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7389-7398
[9]   Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint [J].
Dai, Pengwen ;
Li, Yang ;
Zhang, Hua ;
Li, Jingzhi ;
Cao, Xiaochun .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :1883-1895
[10]   Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text Detection [J].
Dai, Pengwen ;
Zhang, Hua ;
Cao, Xiaochun .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (08) :1969-1984