HGR-Net: Hierarchical Graph Reasoning Network for Arbitrary Shape Scene Text Detection

被引:9
作者
Bi, Hengyue [1 ]
Xu, Canhui [1 ]
Shi, Cao [1 ]
Liu, Guozhu [1 ]
Zhang, Honghong [1 ]
Li, Yuteng [1 ]
Dong, Junyu [2 ]
机构
[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266100, Peoples R China
[2] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266100, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text detection; arbitrary shape text; hierarchical relation modeling; graph convolutional network;
D O I
10.1109/TIP.2023.3294822
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a prerequisite step of scene text reading, scene text detection is known as a challenging task due to natural scene text diversity and variability. Most existing methods either adopt bottom-up sub-text component extraction or focus on top-down text contour regression. From a hybrid perspective, we explore hierarchical text instance-level and component-level representation for arbitrarily-shaped scene text detection. In this work, we propose a novel Hierarchical Graph Reasoning Network (HGR-Net), which consists of a Text Feature Extraction Network (TFEN) and a Text Relation Learner Network (TRLN). TFEN adaptively learns multi-grained text candidates based on shared convolutional feature maps, including instance-level text contours and component-level quadrangles. In TRLN, an inter-text graph is constructed to explore global contextual information with position-awareness between text instances, and an intra-text graph is designed to estimate geometric attributes for establishing component-level linkages. Next, we bridge the cross-feed interaction between instance-level and component-level, and it further achieves hierarchical relational reasoning by learning complementary graph embeddings across levels. Experiments conducted on three publicly available benchmarks SCUT-CTW1500, Total-Text, and ICDAR15 have demonstrated that HGR-Net achieves state-of-the-art performance on arbitrary orientation and arbitrary shape scene text detection.
引用
收藏
页码:4142 / 4155
页数:14
相关论文
共 78 条
[21]  
Jaderberg M, 2014, LECT NOTES COMPUT SC, V8692, P512, DOI 10.1007/978-3-319-10593-2_34
[22]  
Karatzas D, 2015, PROC INT CONF DOC, P1156, DOI 10.1109/ICDAR.2015.7333942
[23]   ICDAR 2013 Robust Reading Competition [J].
Karatzas, Dimosthenis ;
Shafait, Faisal ;
Uchida, Seiichi ;
Iwamura, Masakazu ;
Gomez i Bigorda, Lluis ;
Robles Mestre, Sergi ;
Mas, Joan ;
Fernandez Mota, David ;
Almazan Almazan, Jon ;
Pere de las Heras, Lluis .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :1484-1493
[24]   AdaBoost for Text Detection in Natural Scene [J].
Lee, Jung-Jin ;
Lee, Pyoung-Hean ;
Lee, Seong-Whan ;
Yuille, Alan ;
Koch, Christof .
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, :429-434
[25]   DeepGCNs: Can GCNs Go as Deep as CNNs? [J].
Li, Guohao ;
Mueller, Matthias ;
Thabet, Ali ;
Ghanem, Bernard .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9266-9275
[26]   Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion [J].
Liao, Minghui ;
Zou, Zhisheng ;
Wan, Zhaoyi ;
Yao, Cong ;
Bai, Xiang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) :919-931
[27]  
Liao MH, 2020, AAAI CONF ARTIF INTE, V34, P11474
[28]   Rotation-sensitive Regression for Oriented Scene Text Detection [J].
Liao, Minghui ;
Zhu, Zhen ;
Shi, Baoguang ;
Xia, Gui-song ;
Bai, Xiang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5909-5918
[29]   TextBoxes plus plus : A Single-Shot Oriented Scene Text Detector [J].
Liao, Minghui ;
Shi, Baoguang ;
Bai, Xiang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) :3676-3690
[30]   Feature Pyramid Networks for Object Detection [J].
Lin, Tsung-Yi ;
Dollar, Piotr ;
Girshick, Ross ;
He, Kaiming ;
Hariharan, Bharath ;
Belongie, Serge .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :936-944