Dual-branch interactive cross-frequency attention network for deep feature learning

被引:2
作者
Li, Qiufu [1 ,2 ,3 ,4 ]
Shen, Linlin [1 ,2 ,3 ,4 ]
机构
[1] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Guangdong, Peoples R China
[2] Shenzhen Univ, Comp Vis Inst, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[3] Shenzhen Inst Artificial Intelligence Robot Soc AI, Shenzhen 518129, Guangdong, Peoples R China
[4] Shenzhen Univ, Guangdong Prov Key Lab Intelligent Informat Proc, Shenzhen 518060, Guangdong, Peoples R China
关键词
High-frequency data; Dual-branch network; Interactive cross-frequency attention; Image classification; Object detection;
D O I
10.1016/j.eswa.2024.124406
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As random noises contained in the high-frequency data could interfere with the feature learning of deep networks, low-pass filtering or wavelet transform have been integrated with deep networks to exclude the highfrequency component of input image. However, useful image details like contour and texture are also lost in such a process. In this paper, we propose D ual-branch i nteractive C ross-frequency a ttention N etwork (DiCaN) to separately process low-frequency and high-frequency components of input image, such that useful information is extracted from high-frequency data and included in deep learning. Our DiCaN first decomposes input image into low-frequency and high-frequency components using wavelet decomposition, and then applies two parallel residual-style branches to extract features from the two components. We further design an interactive crossfrequency attention mechanism, to highlight the useful information in the high-frequency data and interactively fuse them with the features in low-frequency branch. The features learned by our framework are then applied for both image classification and object detection and tested using ImageNet-1K and COCO datasets. The results suggest that DiCaN achieves better classification performance than different ResNet variants. Both one-stage and two-stage detectors with our DiCaN backbone also achieve better detection performance than that with ResNet backbone. The code of DiCaN will be released.
引用
收藏
页数:11
相关论文
共 59 条
  • [1] Anshumaan Divyam, 2020, Computer Vision - ECCV 2020 Workshops. Proceedings. Lecture Notes in Computer Science (LNCS 12535), P152, DOI 10.1007/978-3-030-66415-2_10
  • [2] Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving
    Cai, Mu
    Zhang, Hong
    Huang, Huijuan
    Geng, Qichuan
    Li, Yixuan
    Huang, Gao
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13910 - 13920
  • [3] Chen K, 2019, Arxiv, DOI arXiv:1906.07155
  • [4] Daubechies I., 1992, 10 LECT WAVELETS, V61
  • [5] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [6] Dosovitskiy A, 2021, INT C LEARN REPR
  • [7] Fei-Fei L., 2011, 1 WORKSH FIN GRAIN V
  • [8] Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition
    Fu, Jianlong
    Zheng, Heliang
    Mei, Tao
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4476 - 4484
  • [9] Haohan Wang, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P8681, DOI 10.1109/CVPR42600.2020.00871
  • [10] Hassanin M., 2024, Information Fusion, V108