GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:10
作者
Shim, Hye-Jin [1 ]
Heo, Jungwoo [1 ]
Park, Jae-Han [2 ]
Lee, Ga-Hui [2 ]
Yu, Ha-Jin [1 ]
机构
[1] Univ Seoul, Sch Comp Sci, Seoul, South Korea
[2] KT Corp, Seongnam Si, South Korea
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
speaker verification; feature aggregation; attention; graph attention networks; deep learning;
D O I
10.1109/ICASSP43922.2022.9746257
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The objective of this paper is to combine multiple frame-level features into a single utterance-level representation considering pairwise relationships. For this purpose, we propose a novel graph attentive feature aggregation module by interpreting each frame-level feature as a node of a graph. The inter-relationship between all possible pairs of features, typically exploited indirectly, can be directly modeled using a graph. The module comprises a graph attention layer and a graph pooling layer followed by a readout operation. The graph attention layer first models the non-Euclidean data manifold between different nodes. Then, the graph pooling layer discards less informative nodes considering the significance of the nodes. Finally, the readout operation combines the remaining nodes into a single representation. We employ two recent systems, SE-ResNet and RawNet2, with different input features and architectures and demonstrate that the proposed feature aggregation module consistently shows a relative improvement over 10%, compared to the baseline.
引用
收藏
页码:7972 / 7976
页数:5
相关论文
共 41 条
  • [1] Analysis of Length Normalization in End-to-End Speaker Verification System
    Cai, Weicheng
    Chen, Jinkun
    Li, Ming
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3618 - 3622
  • [2] Chung J. S., 2018, arXiv preprint arXiv:1806.05622
  • [3] Deng J., 2019, P CVPR
  • [4] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
    Desplanques, Brecht
    Thienpondt, Jenthe
    Demuynck, Kris
    [J]. INTERSPEECH 2020, 2020, : 3830 - 3834
  • [5] Gao H., 2019, P ICML
  • [6] Graph embedding techniques, applications, and performance: A survey
    Goyal, Palash
    Ferrara, Emilio
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 151 : 78 - 94
  • [7] Heo H. S., 2020, ARXIV200914153
  • [8] Hu J., 2018, PAPER PRESENTED P IE, P2011
  • [9] Jung J., 2020, P INTERSPEECH
  • [10] Jung J., 2021, P ICASSP