Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

被引:10
作者
Saleem, Sajid [1 ]
Subhan, Fazli [1 ]
Naseer, Noman [2 ]
Bais, Abdul [3 ]
Imtiaz, Ammara [1 ]
机构
[1] Natl Univ Modern Languages, Fac Engn & Comp Sci, H-9, Islamabad, Pakistan
[2] Air Univ, Dept Mechatron Engn, Islamabad, Pakistan
[3] Univ Regina, Fac Engn & Appl Sci, Regina, SK, Canada
来源
FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION | 2020年 / 34卷
关键词
Accent; Language; Forensic speaker recognition; Deep learning; Speech features; IDENTIFICATION; GMM; ROBUST;
D O I
10.1016/j.fsidi.2020.300982
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new method for Forensic Speaker Recognition (FSR). The new method is based on extracting accent and language information from short utterances. Accent Classification (AC) and Lan-guage Identification (LI) play important role in the identification of people of different groups, communities and origins due to different speaking styles and native languages. In a multilingual society, the forensic experts use AC and LI to reduce search space for suspect recognition to regional and ethnic groups. In this paper, we use different baseline and deep learning methods to automate this process. The baseline methods used are Gaussian Mixture Model-Universal Background Model (GMM-UBM), i-vector and Gaussian Mixture Model-Support Vector Machine (GMM-SVM). The Mel-Frequency Cepstral Coefficients (MFCC) are used as speech features in the baseline methods. The deep learning methods used are Convolutional Neural Network (CNN) and Deep Neural Network (DNN). The recently proposed CNN based methods like VGGVox and GMM-CNN are used. VGGVox and GMM-CNN use speech spectrograms. In case of DNN, x-vectors method is used, which is based on DNN embedding. The experimental results show that GMM-SVM demonstrates better FSR performance compared to GMM-UBM and i-vector methods. Whereas, x-vectors method performs better than GMM-CNN and VGGVox methods. It also performs better than GMM-SVM method. The experimental results show that x-vectors method demonstrates 80.4% FSR accuracy. With AC, it achieves 85.4% accuracy. With LI, its accuracy is 90.2%. Whereas by combining AC and LI it obtains 95.1% accuracy. This shows that the proposed method based on AC and LI gives promising results. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:8
相关论文
共 51 条
[21]  
Hanani A, 2011, INT CONF ACOUST SPEE, P4876
[22]  
Hautamäki V, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P408
[23]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[24]   Dialect/accent classification using unrestricted audio [J].
Huang, Rongqing ;
Hansen, John H. L. ;
Angkititrakul, Pongtep .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02) :453-464
[25]  
Ilina O, 1999, 1999 C PHON SCI SAN, P157
[26]  
Ioffe S, 2006, LECT NOTES COMPUT SC, V3954, P531
[27]  
Itrat M, 2017, INT J COMPUT SCI NET, V17, P161
[28]   Significance of GMM-UBM based Modelling for Indian Language Identification [J].
Kumar, Ravi, V ;
Vydana, Hari Krishna ;
Vuppala, Anil Kumar .
ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 :231-236
[29]   Spoken Language Recognition: From Fundamentals to Practice [J].
Li, Haizhou ;
Ma, Bin ;
Lee, Kong Aik .
PROCEEDINGS OF THE IEEE, 2013, 101 (05) :1136-1159
[30]   Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification [J].
Li, Ming ;
Narayanan, Shrikanth .
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (04) :940-958