Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *

被引：1

作者：

Ge, Zirui ^{[1
]}

Xu, Xinzhou ^{[2
]}

Guo, Haiyan ^{[1
]}

Wang, Tingting ^{[1
]}

Yang, Zhen ^{[1
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Sch Commun & Informat Engn, Nanjing 2100023, Jiangsu, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Sch Internet Things, Nanjing 2100023, Jiangsu, Peoples R China

来源：

APPLIED ACOUSTICS | 2024年 / 219卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Speaker recognition; Self-supervised representation; Isomorphic graph attention network; Pooling; ANGULAR MARGIN LOSS;

D O I：

10.1016/j.apacoust.2024.109929

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The emergence of self -supervised representation (i.e., wav2vec 2.0) allows speaker -recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub -optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self -supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self -supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self -supervised representation.

引用

页数：8

共 50 条

[1] Graph Multihead Attention Pooling with Self-Supervised Learning
Wang, Yu
Hu, Liang
Wu, Yang
Gao, Wanfu
ENTROPY, 2022, 24 (12)
[2] Self-supervised Hierarchical Graph Neural Network for Graph Representation
Bandyopadhyay, Sambaran
Aggarwal, Manasvi
Murty, M. Narasimha
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 603 - 608
[3] Siamese Network Based Multiscale Self-Supervised Heterogeneous Graph Representation Learning
Chen, Zijun
Luo, Lihui
Li, Xunkai
Jiang, Bin
Guo, Qiang
Wang, Chunpeng
IEEE ACCESS, 2022, 10 : 98490 - 98500
[4] Self-Supervised Signed Graph Attention Network for Social Recommendation
Zhao, Qin
Liu, Gang
Yang, Fuli
Yang, Ru
Kou, Zuliang
Wang, Dong
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[5] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
Sang, Mufan
Li, Haoqi
Liu, Fang
Arnold, Andrew O.
Wan, Li
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
[6] CopGAT: Co-propagation Self-supervised Graph Attention Network
Zhang, Baoming
Xu, Ming
Chen, Mingcai
Chen, Mingyuan
Wang, Chongjun
2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 18 - 25
[7] Self-Attention Encoding and Pooling for Speaker Recognition
Safari, Pooyan
India, Miquel
Hernando, Javier
INTERSPEECH 2020, 2020, : 941 - 945
[8] Adaptive Self-Supervised Graph Representation Learning
Gong, Yunchi
36TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2022), 2022, : 254 - 259
[9] A Novel Self-supervised Representation Learning Model for an Open-Set Speaker Recognition
Ohi, Abu Quwsar
Gavrilova, Marina L.
COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2023, 2023, 14164 : 270 - 282
[10] Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition
Cai, Danwei
Wang, Weiqing
Li, Ming
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1422 - 1435

← 1 2 3 4 5 →