DNN i-vector Speaker Verification with Short, Text-constrained Test Utterances

被引:23
|
作者
Zhong, Jinghua [1 ]
Hu, Wenping [2 ]
Soong, Frank [2 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Speech Grp, Beijing, Peoples R China
关键词
DNN i-vector; DNN adaptation; senone; frame alignment; RECOGNITION; FEATURES;
D O I
10.21437/Interspeech.2017-1036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate how to improve the performance of DNN i-vector based speaker verification for short, text-constrained test utterances, e.g. connected digit strings. A text-constrained verification. due to its smaller, limited vocabulary, can deliver better performance than a text-independent one for a short utterance. We study the problem with "phonetically aware" Deep Neural Net (DNN) in its capability on "stochastic phonetic-alignment" in constructing supervectors and estimating the corresponding i-vectors with two speech databases: a large vocabulary, conversational, speaker independent database (Fisher) and a small vocabulary, continuous digit database (RSR2015 Part III). The phonetic alignment efficiency and resultant speaker verification performance are compared with differently sized senone sets which can characterize the phonetic pronunciations of utterances in the two databases. Performance on RSR2015 Part III evaluation shows a relative improvement of EER, i.e., 7.89% for male speakers and 3.54% for female speakers with only digit related senones. The DNN bottleneck features were also studied to investigate their capability of extracting phonetic sensitive information which is useful for text-independent or text-constrained speaker verifications. We found that by tandeming MFCC with bottleneck features, EERs can be further reduced.
引用
收藏
页码:1507 / 1511
页数:5
相关论文
共 50 条
  • [1] GMM and i-vector based speaker verification using speaker-specific-text for short utterances
    Bharathi, B.
    Nagarajan, T.
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [2] Improved i-vector extraction technique for speaker verification with short utterances
    Poddar A.
    Sahidullah M.
    Saha G.
    International Journal of Speech Technology, 2018, 21 (03) : 473 - 488
  • [3] i-vector Based Speaker Recognition on Short Utterances
    Kanagasundaram, Ahilan
    Vogt, Robbie
    Dean, David
    Sridharan, Sridha
    Mason, Michael
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2352 - +
  • [4] A Text-Constrained Prosodic System for Speaker Verification
    Shriberg, Elizabeth
    Ferrer, Luciana
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2636 - +
  • [5] Deep neural network based i-vector mapping for speaker verification using short utterances
    Guo, Jinxi
    Xu, Ning
    Qian, Kailun
    Shi, Yang
    Xu, Kaiyuan
    Wu, Yingnian
    Alwan, Abeer
    SPEECH COMMUNICATION, 2018, 105 : 92 - 102
  • [6] Text-constrained Speaker Verification using Fuzzy C Means Vector Quantization
    Saswati, Debnath
    Badal, Soni
    Das Pradip, K.
    2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2015, : 1511 - 1515
  • [7] I-Vector DNN Scoring and Calibration for Noise Robust Speaker Verification
    Tan, Zhili
    Mak, Man-Wai
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1562 - 1566
  • [8] Speaker Verification based on Deep Neural Network for Text-Constrained Short Commands
    Kim, Heesu
    Choi, Euntae
    Choi, Kiyoung
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1766 - 1770
  • [9] Phone-Centric Local Variability Vector for Text-Constrained Speaker Verification
    Chen, Liping
    Lee, Kong Aik
    Ma, Bin
    Guo, Wu
    Li, Haizhou
    Dai, Li Rong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 229 - 233
  • [10] A Segmental DNN/i-vector Approach for Digit-Prompted Speaker Verification
    Yan, Jie
    Lei, Xie
    Wang, Guangsen
    Fu, Zhong-Hua
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1 - 5