Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *

被引：1

作者：

Ge, Zirui ^{[1
]}

Xu, Xinzhou ^{[2
]}

Guo, Haiyan ^{[1
]}

Wang, Tingting ^{[1
]}

Yang, Zhen ^{[1
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Sch Commun & Informat Engn, Nanjing 2100023, Jiangsu, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 2100023, Jiangsu, Peoples R China

来源：

APPLIED ACOUSTICS | 2024年 / 219卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Speaker recognition; Self-supervised representation; Isomorphic graph attention network; Pooling; ANGULAR MARGIN LOSS;

D O I：

10.1016/j.apacoust.2024.109929

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The emergence of self -supervised representation (i.e., wav2vec 2.0) allows speaker -recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub -optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self -supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self -supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self -supervised representation.

引用

页数：8

共 50 条

[21] Self-supervised Graph Representation Learning with Variational Inference
Liao, Zihan
Liang, Wenxin
Liu, Han
Mu, Jie
Zhang, Xianchao
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT III, 2021, 12714 : 116 - 127
[22] Self-supervised graph representation learning via bootstrapping
Che, Feihu
Yang, Guohua
Zhang, Dawei
Tao, Jianhua
Liu, Tong
NEUROCOMPUTING, 2021, 456 (456) : 88 - 96
[23] Simple Self-supervised Multiplex Graph Representation Learning
Mo, Yujie
Chen, Yuhuan
Peng, Liang
Shi, Xiaoshuang
Zhu, Xiaofeng
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3301 - 3309
[24] stHGC: a self-supervised graph representation learning for spatial domain recognition with hybrid graph and spatial regularization
Wang, Runqing
Dai, Qiguo
Duan, Xiaodong
Zou, Quan
BRIEFINGS IN BIOINFORMATICS, 2024, 26 (01)
[25] Heuristic Attention Representation Learning for Self-Supervised Pretraining
Van Nhiem Tran
Liu, Shen-Hsuan
Li, Yung-Hui
Wang, Jia-Ching
SENSORS, 2022, 22 (14)
[26] Self-supervised representation learning using multimodal Transformer for emotion recognition
Goetz, Theresa
Arora, Pulkit
Erick, F. X.
Holzer, Nina
Sawant, Shrutika
PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON SENSOR-BASED ACTIVITY RECOGNITION AND ARTIFICIAL INTELLIGENCE, IWOAR 2023, 2023,
[27] SELF-SUPERVISED SPEAKER RECOGNITION WITH LOSS-GATED LEARNING
Tao, Ruijie
Lee, Kong Aik
Das, Rohan Kumar
Hautamaki, Ville
Li, Haizhou
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6142 - 6146
[28] Contrastive Information Maximization Clustering for Self-Supervised Speaker Recognition
Fatban, Abderrahim
Alam, Jahangir
2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 383 - 388
[29] Barlow Twins self-supervised learning for robust speaker recognition
Mohammadamini, Mohammad
Matrouf, Driss
Bonastre, Jean-Francois
Dowerah, Sandipana
Serizel, Romain
Jouvet, Denis
INTERSPEECH 2022, 2022, : 4033 - 4037
[30] Self-supervised Representation Fusion for Speech and Wearable Based Emotion Recognition
Dissanayake, Vipula
Seneviratne, Sachith
Suriyaarachchi, Hussel
Wen, Elliott
Nanayakkara, Suranga
INTERSPEECH 2022, 2022, : 3598 - 3602

← 1 2 3 4 5 →