A COMPLETE END-TO-END SPEAKER VERIFICATION SYSTEM USING DEEP NEURAL NETWORKS: FROM RAW SIGNALS TO VERIFICATION RESULT

被引:0
作者
Jung, Jee-Weon [1 ]
Heo, Hee-Soo [1 ]
Yang, Il-Ho [1 ]
Shim, Hye-Jin [1 ]
Yu, Ha-Jin [1 ]
机构
[1] Univ Seoul, Sch Comp Sci, Seoul, South Korea
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
speaker verification; end-to-end system; raw audio signal;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end systems using deep neural networks have been widely studied in the field of speaker verification. Raw audio signal processing has also been widely studied in the fields of automatic music tagging and speech recognition. However, as far as we know, end-to-end systems using raw audio signals have not been explored in speaker verification. In this paper, a complete end-to-end speaker verification system is proposed, which inputs raw audio signals and outputs the verification results. A pre-processing layer and the embedded speaker feature extraction models were mainly investigated. The proposed pre-emphasis layer was combined with a strided convolution layer for pre-processing at the first two hidden layers. In addition, speaker feature extraction models using convolutional layer and long short-term memory are proposed to be embedded in the proposed end-to-end system.
引用
收藏
页码:5349 / 5353
页数:5
相关论文
共 50 条
[21]   TDMF: TASK-DRIVEN MULTILEVEL FRAMEWORK FOR END-TO-END SPEAKER VERIFICATION [J].
Chen, Chen ;
Han, Jiqing .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :6809-6813
[22]   END-TO-END TEXT-INDEPENDENT SPEAKER VERIFICATION WITH FLEXIBILITY IN UTTERANCE DURATION [J].
Zhang, Chunlei ;
Koishida, Kazuhito .
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, :584-590
[23]   aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems [J].
Mingote, Victoria ;
Miguel, Antonio ;
Ribas, Dayana ;
Ortega, Alfonso ;
Lleida, Eduardo .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 :772-784
[24]   End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances [J].
Zhang, Chunlei ;
Koishida, Kazuhito .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1487-1491
[25]   An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network [J].
Yun, Sungrack ;
Cho, Janghoon ;
Eum, Jungyun ;
Chang, Wonil ;
Hwang, Kyuwoong .
INTERSPEECH 2019, 2019, :2923-2927
[26]   Joint Training of Expanded End-to-end DNN for Text-dependent Speaker Verification [J].
Heo, Hee-soo ;
Jung, Jee-weon ;
Yang, Il-ho ;
Yoon, Sung-hyun ;
Yu, Ha-jin .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1532-1536
[27]   End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification [J].
Heo, Hee-Soo ;
Jung, Jee-weon ;
Yang, IL-Ho ;
Yoon, Sung-Hyun ;
Shim, Hye-jin ;
Yu, Ha-Jin .
INTERSPEECH 2019, 2019, :4035-4039
[28]   JOINT I-VECTOR WITH END-TO-END SYSTEM FOR SHORT DURATION TEXT-INDEPENDENT SPEAKER VERIFICATION [J].
Huang, Zili ;
Wang, Shuai ;
Qian, Yanmin .
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :4869-4873
[29]   MODELLING SPEAKER AND CHANNEL VARIABILITY USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION [J].
Bhattacharya, Gautam ;
Alam, Jahangir ;
Kenny, Patrick ;
Gupta, Vishwa .
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, :192-198
[30]   Speaker verification using committee neural networks [J].
Reddy, NP ;
Butch, OA .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2003, 72 (02) :109-115