FSANet: Feature shuffle and adaptive channel attention network for arbitrary shape scene text detection

被引:0
作者
Xu, Juan [1 ]
Wang, Runmin [1 ]
Hei, Jielei [1 ]
Cao, Xiaofei [1 ]
Wan, Zukun [1 ]
Yu, Congzhen [1 ]
Ding, Yajun [1 ]
Gao, Changxin [3 ]
Qian, Shengyou [2 ]
机构
[1] Hunan Normal Univ, Sch Informat Sci & Engn, Changsha 410081, Hunan, Peoples R China
[2] Hunan Normal Univ, Sch Phys & Elect Sci, Changsha 410081, Hunan, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text detection; Arbitrary shape text; Feature shuffle; Attention mechanism; NET;
D O I
10.1016/j.neucom.2025.129443
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural scene text detection has made significant progress in the era of deep learning. However, existing methods still exhibit deficiencies when faced with challenges such as complex backgrounds, extreme aspect ratios, and arbitrary-shaped text. To address these issues, we propose a segmentation-based Feature Shuffle Attention Network (FSANet) designed to enhance high-resolution feature extraction and multi-scale feature enhancement for robust detection of arbitrary-shaped text. FSANet is composed of two principal modules: (1) the High-Resolution Feature Extraction Network (FEN), which employs two Group Shuffle Blocks (GSBs) to maintain high-resolution details and promote feature interaction and information flow, and (2) the Adaptive Channel Attention Module (ACAM), which reduces background noise and redundant features by adaptively learning inter-feature correlations across scales, assigning weights to prioritize local features within a global context. Extensive experiments conducted on four public benchmark datasets show that, compared to the baseline method, the proposed method demonstrates an improved F-measure on the ICDAR2015, Total-Text, MSRA-TD500, and ICDAR2017-MLT datasets, with an average increase of 1.68%. The recall metric also consistently improves, with an average increase of 2.0%. Notably, the proposed method achieves the highest F-measure of 85.4%, 75.9% on the ICDAR2015, ICDAR2017-MLT datasets, respectively. Furthermore, on the other two datasets, the performance of the proposed method surpasses that of most existing methods, indicating that FSANet outperforms the majority of state-of-the-art approaches. The code will be publicly released at https://github.com/runminwang/FSANet.
引用
收藏
页数:13
相关论文
共 73 条
  • [1] Character Region Awareness for Text Detection
    Baek, Youngmin
    Lee, Bado
    Han, Dongyoon
    Yun, Sangdoo
    Lee, Hwalsuk
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9357 - 9366
  • [2] Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
    Ch'ng, Chee Kheng
    Chan, Chee Seng
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 935 - 942
  • [3] Chakraborty A, 2014, Arxiv, DOI arXiv:1403.0917
  • [4] Direct regression scene text detection with accuracy scoring
    Cheng, Peirui
    Zhao, Yuzhong
    Cai, Yuanqiang
    Wang, Weiqiang
    [J]. NEUROCOMPUTING, 2022, 501 : 705 - 714
  • [5] Progressive Contour Regression for Arbitrary-Shape Scene Text Detection
    Dai, Pengwen
    Zhang, Sanyi
    Zhang, Hua
    Cao, Xiaochun
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7389 - 7398
  • [6] Dai YC, 2018, INT C PATT RECOG, P3604, DOI 10.1109/ICPR.2018.8546066
  • [7] Deng D, 2018, AAAI CONF ARTIF INTE, P6773
  • [8] Deng LE, 2024, Arxiv, DOI arXiv:2312.13778
  • [9] Goceri E, 2021, 15 INT C COMP GRAPH
  • [10] Goceri E., 2021, INT C COMP GRAPH VIS, P29, DOI DOI 10.33965/MCCSIS2021_202107L004