ETDNet: Efficient Transformer-Based Detection Network for Surface Defect Detection

被引:31
作者
Zhou, Hantao [1 ]
Yang, Rui [1 ]
Hu, Runze [2 ]
Shu, Chang [3 ,4 ]
Tang, Xiaochu [4 ]
Li, Xiu [1 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
[2] Beijing Inst Technol, Sch Informat & Elect, Beijing 100086, Peoples R China
[3] Ping Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[4] Ping Technol Shenzhen Co Ltd, Shenzhen 518000, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Task analysis; Detectors; Head; Shape; Deep learning; Attention mechanism; feature fusion; surface defect detection; task-oriented decoupled (TOD) head; transformer;
D O I
10.1109/TIM.2023.3307753
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning (DL)-based surface defect detectors play a crucial role in ensuring product quality during inspection processes. However, accurately and efficiently detecting defects remain challenging due to specific characteristics inherent in defective images, involving a high degree of foreground-background similarity, scale variation, and shape variation. To address this challenge, we propose an efficient transformer-based detection network, ETDNet, consisting of three novel designs to achieve superior performance. First, ETDNet takes a lightweight vision transformer (ViT) to extract representative global features. This approach ensures an accurate feature characterization of defects even with similar backgrounds. Second, a channel-modulated feature pyramid network (CM-FPN) is devised to fuse multilevel features and maintain critical information from corresponding levels. Finally, a novel task-oriented decoupled (TOD) head is introduced to tackle inconsistent representation between classification and regression tasks. The TOD head employs a local feature representation (LFR) module to learn object-aware local features and introduces a global feature representation (GFR) module, based on the attention mechanism, to learn content-aware global features. By integrating these two modules into the head, ETDNet can effectively classify and perceive defects with varying shapes and scales. Extensive experiments on various defect detection datasets demonstrate the effectiveness of the proposed ETDNet. For instance, it achieves AP 46.7% (versus 45.9%) and AP50 80.2% (versus 79.1%) with 49 frames/s on NEU-DET. The code is available at https://github.com/zht8506/ETDNet.
引用
收藏
页数:14
相关论文
共 64 条
  • [1] Ba J. L., 2016, arXiv, DOI DOI 10.48550/ARXIV.1607.06450
  • [2] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, DOI 10.48550/ARXIV.2004.10934]
  • [3] Cascade R-CNN: High Quality Object Detection and Instance Segmentation
    Cai, Zhaowei
    Vasconcelos, Nuno
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) : 1483 - 1498
  • [4] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
    Cao, Yue
    Xu, Jiarui
    Lin, Stephen
    Wei, Fangyun
    Hu, Han
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1971 - 1980
  • [5] Chen K, 2019, Arxiv, DOI arXiv:1906.07155
  • [6] Chen P., 2022, arXiv
  • [7] RetinaNet With Difference Channel Attention and Adaptively Spatial Feature Fusion for Steel Surface Defect Detection
    Cheng, Xun
    Yu, Jianbo
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70 (70)
  • [8] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [9] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [10] Sigmoid-weighted linear units for neural network function approximation in reinforcement learning
    Elfwing, Stefan
    Uchibe, Eiji
    Doya, Kenji
    [J]. NEURAL NETWORKS, 2018, 107 : 3 - 11