Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *

被引:1
|
作者
Ge, Zirui [1 ]
Xu, Xinzhou [2 ]
Guo, Haiyan [1 ]
Wang, Tingting [1 ]
Yang, Zhen [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Commun & Informat Engn, Nanjing 2100023, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 2100023, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Speaker recognition; Self-supervised representation; Isomorphic graph attention network; Pooling; ANGULAR MARGIN LOSS;
D O I
10.1016/j.apacoust.2024.109929
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The emergence of self -supervised representation (i.e., wav2vec 2.0) allows speaker -recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub -optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self -supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self -supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self -supervised representation.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Enhanced Graph Representation Convolution: Effective Inferring Gene Regulatory Network Using Graph Convolution Network with Self-Attention Graph Pooling Layer
    Alawad, Duaa Mohammad
    Katebi, Ataur
    Hoque, Md Tamjidul
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (03): : 1818 - 1839
  • [32] Self-supervised air quality estimation with graph neural network assistance and attention enhancement
    Vu V.H.
    Nguyen D.L.
    Nguyen T.H.
    Nguyen Q.V.H.
    Nguyen P.L.
    Huynh T.T.
    Neural Computing and Applications, 2024, 36 (19) : 11171 - 11193
  • [33] Self-supervised representation learning for surgical activity recognition
    Paysan, Daniel
    Haug, Luis
    Bajka, Michael
    Oelhafen, Markus
    Buhmann, Joachim M.
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (11) : 2037 - 2044
  • [34] GAN-based self-supervised message passing graph representation learning
    Yang, Yining
    Xu, Ke
    Tang, Ying
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [35] Self-supervised Implicit Glyph Attention for Text Recognition
    Guan, Tongkun
    Gu, Chaochen
    Tu, Jingzheng
    Yang, Xue
    Feng, Qi
    Zhao, Yudi
    Shen, Wei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15285 - 15294
  • [36] Self-Supervised ECG Representation Learning for Emotion Recognition
    Sarkar, Pritam
    Etemad, Ali
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1541 - 1554
  • [37] Self-Supervised Graph Representation Learning Method Based on Data and Feature Augmentation
    Xu, Yunfeng
    Fan, Hexun
    Computer Engineering and Applications, 2024, 60 (17) : 148 - 157
  • [38] Self-supervised representation learning for surgical activity recognition
    Daniel Paysan
    Luis Haug
    Michael Bajka
    Markus Oelhafen
    Joachim M. Buhmann
    International Journal of Computer Assisted Radiology and Surgery, 2021, 16 : 2037 - 2044
  • [39] Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
    Chen, Sanyuan
    Wu, Yu
    Wang, Chengyi
    Liu, Shujie
    Chen, Zhuo
    Wang, Peidong
    Liu, Gang
    Li, Jinyu
    Wu, Jian
    Yu, Xiangzhan
    Wei, Furu
    INTERSPEECH 2022, 2022, : 3699 - 3703
  • [40] Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization
    Singh, Prachi
    Ganapathy, Sriram
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1639 - 1649