GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引：10

作者：

Shim, Hye-Jin ^{[1
]}

Heo, Jungwoo ^{[1
]}

Park, Jae-Han ^{[2
]}

Lee, Ga-Hui ^{[2
]}

Yu, Ha-Jin ^{[1
]}

机构：

[1] Univ Seoul, Sch Comp Sci, Seoul, South Korea

[2] KT Corp, Seongnam Si, South Korea

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

speaker verification; feature aggregation; attention; graph attention networks; deep learning;

D O I：

10.1109/ICASSP43922.2022.9746257

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The objective of this paper is to combine multiple frame-level features into a single utterance-level representation considering pairwise relationships. For this purpose, we propose a novel graph attentive feature aggregation module by interpreting each frame-level feature as a node of a graph. The inter-relationship between all possible pairs of features, typically exploited indirectly, can be directly modeled using a graph. The module comprises a graph attention layer and a graph pooling layer followed by a readout operation. The graph attention layer first models the non-Euclidean data manifold between different nodes. Then, the graph pooling layer discards less informative nodes considering the significance of the nodes. Finally, the readout operation combines the remaining nodes into a single representation. We employ two recent systems, SE-ResNet and RawNet2, with different input features and architectures and demonstrate that the proposed feature aggregation module consistently shows a relative improvement over 10%, compared to the baseline.

引用

页码：7972 / 7976

页数：5

共 41 条

[1] Analysis of Length Normalization in End-to-End Speaker Verification System
Cai, Weicheng
Chen, Jinkun
Li, Ming
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3618 - 3622
[2] Chung J. S., 2018, arXiv preprint arXiv:1806.05622
[3] Deng J., 2019, P CVPR
[4] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Desplanques, Brecht
Thienpondt, Jenthe
Demuynck, Kris
[J]. INTERSPEECH 2020, 2020, : 3830 - 3834
[5] Gao H., 2019, P ICML
[6] Graph embedding techniques, applications, and performance: A survey
Goyal, Palash
Ferrara, Emilio
[J]. KNOWLEDGE-BASED SYSTEMS, 2018, 151 : 78 - 94
[7] Heo H. S., 2020, ARXIV200914153
[8] Hu J., 2018, PAPER PRESENTED P IE, P2011
[9] Jung J., 2020, P INTERSPEECH
[10] Jung J., 2021, P ICASSP

← 1 2 3 4 5 →