A COMPLETE END-TO-END SPEAKER VERIFICATION SYSTEM USING DEEP NEURAL NETWORKS: FROM RAW SIGNALS TO VERIFICATION RESULT

被引：0

作者：

Jung, Jee-Weon ^{[1
]}

Heo, Hee-Soo ^{[1
]}

Yang, Il-Ho ^{[1
]}

Shim, Hye-Jin ^{[1
]}

Yu, Ha-Jin ^{[1
]}

机构：

[1] Univ Seoul, Sch Comp Sci, Seoul, South Korea

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

speaker verification; end-to-end system; raw audio signal;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

End-to-end systems using deep neural networks have been widely studied in the field of speaker verification. Raw audio signal processing has also been widely studied in the fields of automatic music tagging and speech recognition. However, as far as we know, end-to-end systems using raw audio signals have not been explored in speaker verification. In this paper, a complete end-to-end speaker verification system is proposed, which inputs raw audio signals and outputs the verification results. A pre-processing layer and the embedded speaker feature extraction models were mainly investigated. The proposed pre-emphasis layer was combined with a strided convolution layer for pre-processing at the first two hidden layers. In addition, speaker feature extraction models using convolutional layer and long short-term memory are proposed to be embedded in the proposed end-to-end system.

引用

页码：5349 / 5353

页数：5

共 50 条

[41] Two-Microphone End-to-End Speaker Joint Identification and Localization Via Convolutional Neural Networks [J].

Salvati, Daniele ;

Drioli, Carlo ;

Foresti, Gian Luca .

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

[42] Text-independent speaker verification using predictive neural networks [J].

Finan, RA ;

Sapeluk, AT ;

Damper, RI .

FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440) :274-279

[43] Utilization of age information for speaker verification using multi-task learning deep neural networks [J].

Kim, Ju-ho ;

Heo, Hee-Soo ;

Jung, Jee-weon ;

Shim, Hye-jin ;

Kim, Seung-Bin ;

Yu, Ha-Jin .

JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (05) :593-600

[44] END-TO-END DETECTION OF ATTACKS TO AUTOMATIC SPEAKER RECOGNIZERS WITH TIME-ATTENTIVE LIGHT CONVOLUTIONAL NEURAL NETWORKS [J].

Monteiro, Joao ;

Alam, Jahangir ;

Falk, Tiago H. .

2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,

[45] An iVector Extractor Using Pre-trained Neural Networks for Speaker Verification [J].

Zhang, Shanshan ;

Zheng, Rong ;

Xu, Bo .

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :73-77

[46] Robust Speaker Verification Using a New Front End Based on Multitaper and Gammatone Filters [J].

Meriem, Fedila ;

Farid, Harizi ;

Messaoud, Bengherabi ;

Abderrahmene, Amrouche .

10TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS SITIS 2014, 2014, :99-103

[47] SELF-ADAPTIVE SOFT VOICE ACTIVITY DETECTION USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION [J].

Jung, Youngmoon ;

Choi, Yeunju ;

Kim, Hoirin .

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, :365-372

[48] End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention [J].

Tom, Francis ;

Jain, Mohit ;

Dey, Prasenjit .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :681-685

[49] Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal [J].

Banala Saritha ;

Mohammad Azharuddin Laskar ;

Anish Monsley Kirupakaran ;

Rabul Hussain Laskar ;

Madhuchhanda Choudhury ;

Nirupam Shome .

Circuits, Systems, and Signal Processing, 2024, 43 :1839-1861

[50] D-vector based speaker verification system using Raw Waveform CNN [J].

Jung, Jeeweon ;

Heo, Heesoo ;

Yang, Ilho ;

Yoon, Sunghyun ;

Shim, Hyejin ;

Yu, Hajin .

PROCEEDINGS OF THE 2017 INTERNATIONAL SEMINAR ON ARTIFICIAL INTELLIGENCE, NETWORKING AND INFORMATION TECHNOLOGY (ANIT 2017), 2017, 150 :126-131

← 1 2 3 4 5 →