Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

被引:10
作者
Saleem, Sajid [1 ]
Subhan, Fazli [1 ]
Naseer, Noman [2 ]
Bais, Abdul [3 ]
Imtiaz, Ammara [1 ]
机构
[1] Natl Univ Modern Languages, Fac Engn & Comp Sci, H-9, Islamabad, Pakistan
[2] Air Univ, Dept Mechatron Engn, Islamabad, Pakistan
[3] Univ Regina, Fac Engn & Appl Sci, Regina, SK, Canada
来源
FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION | 2020年 / 34卷
关键词
Accent; Language; Forensic speaker recognition; Deep learning; Speech features; IDENTIFICATION; GMM; ROBUST;
D O I
10.1016/j.fsidi.2020.300982
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new method for Forensic Speaker Recognition (FSR). The new method is based on extracting accent and language information from short utterances. Accent Classification (AC) and Lan-guage Identification (LI) play important role in the identification of people of different groups, communities and origins due to different speaking styles and native languages. In a multilingual society, the forensic experts use AC and LI to reduce search space for suspect recognition to regional and ethnic groups. In this paper, we use different baseline and deep learning methods to automate this process. The baseline methods used are Gaussian Mixture Model-Universal Background Model (GMM-UBM), i-vector and Gaussian Mixture Model-Support Vector Machine (GMM-SVM). The Mel-Frequency Cepstral Coefficients (MFCC) are used as speech features in the baseline methods. The deep learning methods used are Convolutional Neural Network (CNN) and Deep Neural Network (DNN). The recently proposed CNN based methods like VGGVox and GMM-CNN are used. VGGVox and GMM-CNN use speech spectrograms. In case of DNN, x-vectors method is used, which is based on DNN embedding. The experimental results show that GMM-SVM demonstrates better FSR performance compared to GMM-UBM and i-vector methods. Whereas, x-vectors method performs better than GMM-CNN and VGGVox methods. It also performs better than GMM-SVM method. The experimental results show that x-vectors method demonstrates 80.4% FSR accuracy. With AC, it achieves 85.4% accuracy. With LI, its accuracy is 90.2%. Whereas by combining AC and LI it obtains 95.1% accuracy. This shows that the proposed method based on AC and LI gives promising results. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:8
相关论文
共 51 条
[1]  
Abbas A., 2012, Photovoltaic Specialists Conference (PVSC), Volume 2, 2012 IEEE 38th, P1, DOI 10.1109/PVSCVol2.2012.6656778
[2]   Automatic Speaker Recognition for Mobile Forensic Applications [J].
Algabri, Mohammed ;
Mathkour, Hassan ;
Bencherif, Mohamed A. ;
Alsulaiman, Mansour ;
Mekhtiche, Mohamed A. .
MOBILE INFORMATION SYSTEMS, 2017, 2017
[3]  
Ali AR, 2014, ADV COMPU INTELL ROB, P1, DOI 10.4018/978-1-4666-6030-4.ch001
[4]  
[Anonymous], 2014, OPEN J STAT
[5]   Automatic speech recognition and speech variability: A review [J].
Benzeghiba, M. ;
De Mori, R. ;
Deroo, O. ;
Dupont, S. ;
Erbes, T. ;
Jouvet, D. ;
Fissore, L. ;
Laface, P. ;
Mertins, A. ;
Ris, C. ;
Rose, R. ;
Tyagi, V. ;
Wellekens, C. .
SPEECH COMMUNICATION, 2007, 49 (10-11) :763-786
[6]  
Bhatia M, 2015, INT J ADV RES COMPUT, V4, P153, DOI [10.17148/IJARCCE.2015.4131, DOI 10.17148/IJARCCE.2015.4131]
[7]  
Brown G., 2016, SIXT ANN C INT SPEEC, P305
[8]   Automatic sociophonetics: Exploring corpora with a forensic accent recognition system [J].
Brown, Georgina ;
Wormald, Jessica .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (01) :422-433
[9]   Support vector machines using GMM supervectors for speaker verification [J].
Campbell, WM ;
Sturim, DE ;
Reynolds, DA .
IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) :308-311
[10]   Accent Issues in Large Vocabulary Continuous Speech Recognition [J].
Chao Huang ;
Tao Chen ;
Eric Chang .
International Journal of Speech Technology, 2004, 7 (2-3) :141-153