FSANet: Feature shuffle and adaptive channel attention network for arbitrary shape scene text detection

被引：0

作者：

Xu, Juan ^{[1
]}

Wang, Runmin ^{[1
]}

Hei, Jielei ^{[1
]}

Cao, Xiaofei ^{[1
]}

Wan, Zukun ^{[1
]}

Yu, Congzhen ^{[1
]}

Ding, Yajun ^{[1
]}

Gao, Changxin ^{[3
]}

Qian, Shengyou ^{[2
]}

机构：

[1] Hunan Normal Univ, Sch Informat Sci & Engn, Changsha 410081, Hunan, Peoples R China

[2] Hunan Normal Univ, Sch Phys & Elect Sci, Changsha 410081, Hunan, Peoples R China

[3] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Hubei, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 624卷

基金：

中国国家自然科学基金;

关键词：

Scene text detection; Arbitrary shape text; Feature shuffle; Attention mechanism; NET;

D O I：

10.1016/j.neucom.2025.129443

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Natural scene text detection has made significant progress in the era of deep learning. However, existing methods still exhibit deficiencies when faced with challenges such as complex backgrounds, extreme aspect ratios, and arbitrary-shaped text. To address these issues, we propose a segmentation-based Feature Shuffle Attention Network (FSANet) designed to enhance high-resolution feature extraction and multi-scale feature enhancement for robust detection of arbitrary-shaped text. FSANet is composed of two principal modules: (1) the High-Resolution Feature Extraction Network (FEN), which employs two Group Shuffle Blocks (GSBs) to maintain high-resolution details and promote feature interaction and information flow, and (2) the Adaptive Channel Attention Module (ACAM), which reduces background noise and redundant features by adaptively learning inter-feature correlations across scales, assigning weights to prioritize local features within a global context. Extensive experiments conducted on four public benchmark datasets show that, compared to the baseline method, the proposed method demonstrates an improved F-measure on the ICDAR2015, Total-Text, MSRA-TD500, and ICDAR2017-MLT datasets, with an average increase of 1.68%. The recall metric also consistently improves, with an average increase of 2.0%. Notably, the proposed method achieves the highest F-measure of 85.4%, 75.9% on the ICDAR2015, ICDAR2017-MLT datasets, respectively. Furthermore, on the other two datasets, the performance of the proposed method surpasses that of most existing methods, indicating that FSANet outperforms the majority of state-of-the-art approaches. The code will be publicly released at https://github.com/runminwang/FSANet.

引用

页数：13

共 73 条

[1] Character Region Awareness for Text Detection
Baek, Youngmin
Lee, Bado
Han, Dongyoon
Yun, Sangdoo
Lee, Hwalsuk
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9357 - 9366
[2] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Ch'ng, Chee Kheng
Chan, Chee Seng
[J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 935 - 942
[3] Chakraborty A, 2014, Arxiv, DOI arXiv:1403.0917
[4] Direct regression scene text detection with accuracy scoring
Cheng, Peirui
Zhao, Yuzhong
Cai, Yuanqiang
Wang, Weiqiang
[J]. NEUROCOMPUTING, 2022, 501 : 705 - 714
[5] Progressive Contour Regression for Arbitrary-Shape Scene Text Detection
Dai, Pengwen
Zhang, Sanyi
Zhang, Hua
Cao, Xiaochun
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7389 - 7398
[6] Dai YC, 2018, INT C PATT RECOG, P3604, DOI 10.1109/ICPR.2018.8546066
[7] Deng D, 2018, AAAI CONF ARTIF INTE, P6773
[8] Deng LE, 2024, Arxiv, DOI arXiv:2312.13778
[9] Goceri E, 2021, 15 INT C COMP GRAPH
[10] Goceri E., 2021, INT C COMP GRAPH VIS, P29, DOI DOI 10.33965/MCCSIS2021_202107L004

← 1 2 3 4 5 6 7 8 →