Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *

被引:1
|
作者
Ge, Zirui [1 ]
Xu, Xinzhou [2 ]
Guo, Haiyan [1 ]
Wang, Tingting [1 ]
Yang, Zhen [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Commun & Informat Engn, Nanjing 2100023, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 2100023, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Speaker recognition; Self-supervised representation; Isomorphic graph attention network; Pooling; ANGULAR MARGIN LOSS;
D O I
10.1016/j.apacoust.2024.109929
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The emergence of self -supervised representation (i.e., wav2vec 2.0) allows speaker -recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub -optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self -supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self -supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self -supervised representation.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Self-supervised Graph Representation Learning with Variational Inference
    Liao, Zihan
    Liang, Wenxin
    Liu, Han
    Mu, Jie
    Zhang, Xianchao
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT III, 2021, 12714 : 116 - 127
  • [22] Self-supervised graph representation learning via bootstrapping
    Che, Feihu
    Yang, Guohua
    Zhang, Dawei
    Tao, Jianhua
    Liu, Tong
    NEUROCOMPUTING, 2021, 456 (456) : 88 - 96
  • [23] Simple Self-supervised Multiplex Graph Representation Learning
    Mo, Yujie
    Chen, Yuhuan
    Peng, Liang
    Shi, Xiaoshuang
    Zhu, Xiaofeng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3301 - 3309
  • [24] stHGC: a self-supervised graph representation learning for spatial domain recognition with hybrid graph and spatial regularization
    Wang, Runqing
    Dai, Qiguo
    Duan, Xiaodong
    Zou, Quan
    BRIEFINGS IN BIOINFORMATICS, 2024, 26 (01)
  • [25] Heuristic Attention Representation Learning for Self-Supervised Pretraining
    Van Nhiem Tran
    Liu, Shen-Hsuan
    Li, Yung-Hui
    Wang, Jia-Ching
    SENSORS, 2022, 22 (14)
  • [26] Self-supervised representation learning using multimodal Transformer for emotion recognition
    Goetz, Theresa
    Arora, Pulkit
    Erick, F. X.
    Holzer, Nina
    Sawant, Shrutika
    PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON SENSOR-BASED ACTIVITY RECOGNITION AND ARTIFICIAL INTELLIGENCE, IWOAR 2023, 2023,
  • [27] SELF-SUPERVISED SPEAKER RECOGNITION WITH LOSS-GATED LEARNING
    Tao, Ruijie
    Lee, Kong Aik
    Das, Rohan Kumar
    Hautamaki, Ville
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6142 - 6146
  • [28] Contrastive Information Maximization Clustering for Self-Supervised Speaker Recognition
    Fatban, Abderrahim
    Alam, Jahangir
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 383 - 388
  • [29] Barlow Twins self-supervised learning for robust speaker recognition
    Mohammadamini, Mohammad
    Matrouf, Driss
    Bonastre, Jean-Francois
    Dowerah, Sandipana
    Serizel, Romain
    Jouvet, Denis
    INTERSPEECH 2022, 2022, : 4033 - 4037
  • [30] Self-supervised Representation Fusion for Speech and Wearable Based Emotion Recognition
    Dissanayake, Vipula
    Seneviratne, Sachith
    Suriyaarachchi, Hussel
    Wen, Elliott
    Nanayakkara, Suranga
    INTERSPEECH 2022, 2022, : 3598 - 3602