Multi-level attention network: Mixed time-frequency channel attention and multi-scale self-attentive standard deviation pooling for speaker recognition

被引:3
|
作者
Deng, Lihong [1 ]
Deng, Fei [2 ]
Zhou, Kepeng [2 ]
Jiang, Peifan [2 ]
Zhang, Gexiang [3 ]
Yang, Qiang [3 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu, Peoples R China
[2] Chengdu Univ Technol, Coll Comp Sci & Cyber Secur, Chengdu, Peoples R China
[3] Chengdu Univ Informat Technol, Sch Automat, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
Speaker recognition; Attention mechanism; Aggregation method; Multi-level attention; ARCHITECTURE;
D O I
10.1016/j.engappai.2023.107439
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a more efficient lightweight speaker recognition network, the multi-level attention network (MANet). MANet aims to generate more robust and discriminative speaker features by emphasizing features at different levels in the speaker recognition network through multi-level attention. The multi-level attention contains mixed time-frequency channel (MTFC) attention and multi-scale self-attentive standard deviation pooling (MSSDP). MTFC attention combines channel, time, and frequency information to capture global features and model long-term contexts. MSSDP can capture changes in frame-level features and aggregate frame-level features with different scales, generating a long-term, robust, and discriminative utterance-level feature. Therefore, MANet emphasizes the features of different levels. We performed extensive experiments on two popular datasets, Voxceleb and CN-Celeb. The proposed method is compared with the current state-of-the-art speaker recognition methods. It achieved EER/minDCF of 1.82%/0.1965, 1.94%/0.2059, 3.69%/0.3626, and 11.98%/0.4814 on the test sets Voxceleb1-O, Voxceleb1-E, Voxceleb1-H, and CN-Celeb, respectively. It is a more effective lightweight speaker recognition network, superior to most large speaker recognition networks and all lightweight speaker recognition networks tested, with an improved performance of 64% compared to the baseline system ThinResNet-34. Compared to the lightest EfficientTDNN-Small, it has only 0.6 million more parameters but 63% better performance. The performance of MANet is only 4% different compared to the state-of-the-art large model LE-Conformer. In the ablation experiments, our proposed attention method and aggregation model achieved the best experimental performance in Voxceleb1-O with EER/minDCF of 2.46%/0.2708, 2.39%/0.2417, respectively, which indicates that our proposed methods are a significant improvement over previous methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] MAPPNet: A Multi-Scale Attention Pyramid Pooling Network for Dental Calculus Segmentation
    Nie, Tianyu
    Yao, Shihong
    Wang, Di
    Wang, Conger
    Zhao, Yishi
    APPLIED SCIENCES-BASEL, 2024, 14 (16):
  • [22] (M)SLAe-Net: Multi-Scale Multi-Level Attention embedded Network for Retinal Vessel Segmentation
    Saini, Shreshth
    Agrawal, Geetika
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 219 - 223
  • [23] Multi-level and multi-scale cross attention network of wavelet packet transform for supersonic inlet unstart prediction
    Wang, Yu-Jie
    Zhao, Yong-Ping
    Jin, Yi
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272
  • [24] Multi-Scale Regional Attention InfoGAN License Plate Recognition Network
    Xu S.
    Du M.
    Duan Z.
    Li M.
    Han J.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2023, 57 (08): : 206 - 218
  • [25] Multi-Scale Integrated Attention Mechanism for Facial Expression Recognition Network
    Luo, Sishi
    Li, Maojun
    Chen, Man
    Computer Engineering and Applications, 2023, 59 (01): : 199 - 206
  • [26] MAFormer: A transformer network with multi-scale attention fusion for visual recognition
    Sun, Huixin
    Wang, Yunhao
    Wang, Xiaodi
    Zhang, Bin
    Xin, Ying
    Zhang, Baochang
    Cao, Xianbin
    Ding, Errui
    Han, Shumin
    NEUROCOMPUTING, 2024, 595
  • [27] Pedestrian Attribute Recognition Algorithm Based on Multi-Scale Attention Network
    Li Na
    Wu Yangyang
    Liu Ying
    Xing Jin
    LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (04)
  • [28] Object Tracking Algorithm for Multi-Scale Channel Attention and Siamese Network
    Wang, Shuxian
    Ge, Haibo
    Li, Wenhao
    Computer Engineering and Applications, 2023, 59 (14) : 142 - 150
  • [29] DCAN: Dynamic Channel Attention Network for Multi-Scale Distortion Correction
    Zhang, Jianhua
    Peng, Saijie
    Liu, Jingjing
    Guo, Aiying
    SENSORS, 2025, 25 (05)
  • [30] Multi-Scale Mixed Attention Network for CT and MRI Image Fusion
    Liu, Yang
    Yan, Binyu
    Zhang, Rongzhu
    Liu, Kai
    Jeon, Gwanggil
    Yang, Xiaoming
    ENTROPY, 2022, 24 (06)