Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *

被引:1
|
作者
Ge, Zirui [1 ]
Xu, Xinzhou [2 ]
Guo, Haiyan [1 ]
Wang, Tingting [1 ]
Yang, Zhen [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Commun & Informat Engn, Nanjing 2100023, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 2100023, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Speaker recognition; Self-supervised representation; Isomorphic graph attention network; Pooling; ANGULAR MARGIN LOSS;
D O I
10.1016/j.apacoust.2024.109929
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The emergence of self -supervised representation (i.e., wav2vec 2.0) allows speaker -recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub -optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self -supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self -supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self -supervised representation.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Graph Multihead Attention Pooling with Self-Supervised Learning
    Wang, Yu
    Hu, Liang
    Wu, Yang
    Gao, Wanfu
    ENTROPY, 2022, 24 (12)
  • [2] Self-supervised Hierarchical Graph Neural Network for Graph Representation
    Bandyopadhyay, Sambaran
    Aggarwal, Manasvi
    Murty, M. Narasimha
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 603 - 608
  • [3] Siamese Network Based Multiscale Self-Supervised Heterogeneous Graph Representation Learning
    Chen, Zijun
    Luo, Lihui
    Li, Xunkai
    Jiang, Bin
    Guo, Qiang
    Wang, Chunpeng
    IEEE ACCESS, 2022, 10 : 98490 - 98500
  • [4] Self-Supervised Signed Graph Attention Network for Social Recommendation
    Zhao, Qin
    Liu, Gang
    Yang, Fuli
    Yang, Ru
    Kou, Zuliang
    Wang, Dong
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
    Sang, Mufan
    Li, Haoqi
    Liu, Fang
    Arnold, Andrew O.
    Wan, Li
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
  • [6] CopGAT: Co-propagation Self-supervised Graph Attention Network
    Zhang, Baoming
    Xu, Ming
    Chen, Mingcai
    Chen, Mingyuan
    Wang, Chongjun
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 18 - 25
  • [7] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    INTERSPEECH 2020, 2020, : 941 - 945
  • [8] Adaptive Self-Supervised Graph Representation Learning
    Gong, Yunchi
    36TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2022), 2022, : 254 - 259
  • [9] A Novel Self-supervised Representation Learning Model for an Open-Set Speaker Recognition
    Ohi, Abu Quwsar
    Gavrilova, Marina L.
    COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2023, 2023, 14164 : 270 - 282
  • [10] Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition
    Cai, Danwei
    Wang, Weiqing
    Li, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1422 - 1435