Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *

被引：1

作者：

Ge, Zirui ^{[1
]}

Xu, Xinzhou ^{[2
]}

Guo, Haiyan ^{[1
]}

Wang, Tingting ^{[1
]}

Yang, Zhen ^{[1
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Sch Commun & Informat Engn, Nanjing 2100023, Jiangsu, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 2100023, Jiangsu, Peoples R China

来源：

APPLIED ACOUSTICS | 2024年 / 219卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Speaker recognition; Self-supervised representation; Isomorphic graph attention network; Pooling; ANGULAR MARGIN LOSS;

D O I：

10.1016/j.apacoust.2024.109929

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The emergence of self -supervised representation (i.e., wav2vec 2.0) allows speaker -recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub -optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self -supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self -supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self -supervised representation.

引用

页数：8

共 50 条

[31] Enhanced Graph Representation Convolution: Effective Inferring Gene Regulatory Network Using Graph Convolution Network with Self-Attention Graph Pooling Layer
Alawad, Duaa Mohammad
Katebi, Ataur
Hoque, Md Tamjidul
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (03): : 1818 - 1839
[32] Self-supervised air quality estimation with graph neural network assistance and attention enhancement
Vu V.H.
Nguyen D.L.
Nguyen T.H.
Nguyen Q.V.H.
Nguyen P.L.
Huynh T.T.
Neural Computing and Applications, 2024, 36 (19) : 11171 - 11193
[33] Self-supervised representation learning for surgical activity recognition
Paysan, Daniel
Haug, Luis
Bajka, Michael
Oelhafen, Markus
Buhmann, Joachim M.
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (11) : 2037 - 2044
[34] GAN-based self-supervised message passing graph representation learning
Yang, Yining
Xu, Ke
Tang, Ying
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
[35] Self-supervised Implicit Glyph Attention for Text Recognition
Guan, Tongkun
Gu, Chaochen
Tu, Jingzheng
Yang, Xue
Feng, Qi
Zhao, Yudi
Shen, Wei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15285 - 15294
[36] Self-Supervised ECG Representation Learning for Emotion Recognition
Sarkar, Pritam
Etemad, Ali
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1541 - 1554
[37] Self-Supervised Graph Representation Learning Method Based on Data and Feature Augmentation
Xu, Yunfeng
Fan, Hexun
Computer Engineering and Applications, 2024, 60 (17) : 148 - 157
[38] Self-supervised representation learning for surgical activity recognition
Daniel Paysan
Luis Haug
Michael Bajka
Markus Oelhafen
Joachim M. Buhmann
International Journal of Computer Assisted Radiology and Surgery, 2021, 16 : 2037 - 2044
[39] Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Chen, Sanyuan
Wu, Yu
Wang, Chengyi
Liu, Shujie
Chen, Zhuo
Wang, Peidong
Liu, Gang
Li, Jinyu
Wu, Jian
Yu, Xiangzhan
Wei, Furu
INTERSPEECH 2022, 2022, : 3699 - 3703
[40] Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization
Singh, Prachi
Ganapathy, Sriram
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1639 - 1649

← 1 2 3 4 5 →