An MFCC-based Speaker Identification System

被引:19
作者
Leu, Fang-Yie [1 ]
Lin, Guan-Liang [1 ]
机构
[1] Tunghai Univ, Comp Sci Dept, Taichung, Taiwan
来源
2017 IEEE 31ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA) | 2017年
关键词
speaker identification; Fourier transformation; Mel-frequency cepstral coefficients; Gaussian mixture model; acoustic model; RECOGNITION;
D O I
10.1109/AINA.2017.130
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, many speech recognition applications have been used by people in the world. Typical examples are the SIRI of iPhone, Google speech recognition system, and mobile phones operated by voice, etc. On the contrary, speaker identification in its current stage is relatively immature. Therefore, in this paper, we study a speaker identification technique which first takes the original voice signals of a person, e.g., Bob, and then normalizes the audio energies of the signals. After that, the audio signals is converted from time domain to frequency domain by employing Fourier transformation approach. Next, a MFCC-based human auditory filtering model is utilized to identify the energy levels of different frequencies as the quantified characteristics of Bob's voice. Further, the probability density function of Gaussian mixture model is utilized to indicate the distribution of the quantified characteristics as Bob's specific acoustic model. When receiving an unknown person, e.g., x's voice, the system processes the voice with the same procedure, and compares the processing result, which is x's acoustic model, with known-people's acoustic models collected in an acoustic-model database beforehand to identify who the most possible speaker is.
引用
收藏
页码:1055 / 1062
页数:8
相关论文
共 13 条
[1]  
Cristianini N., 2000, INTRO SUPPORT VECTOR, DOI [10.1017/CBO9780511801389, DOI 10.1017/CBO9780511801389]
[2]  
Goel Akshay, 2014, 2014 International Conference on Medical Imaging, m-Health and Emerging Communication Systems (MedCom), P202, DOI 10.1109/MedCom.2014.7006004
[3]  
Huang N.E., 2009, ON INSTANTANEOUS FRE, P177
[4]  
Juang B.H., 1998, IEEE SIGNAL PROCESSI, V15, P23
[5]   ENHANCEMENT AND BANDWIDTH COMPRESSION OF NOISY SPEECH [J].
LIM, JS ;
OPPENHEIM, AV .
PROCEEDINGS OF THE IEEE, 1979, 67 (12) :1586-1604
[6]  
Openshaw J. P., 1993, ICASSP-93. 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No.92CH3252-4), P371, DOI 10.1109/ICASSP.1993.319316
[7]  
Peng X., 2005, INT C NAT LANG PROC, P111
[8]  
Rao K.R., 1990, Discrete Cosine Transform: Algorithms, Advantages, Applications
[9]  
Reynolds DA, 2002, INT CONF ACOUST SPEE, P4072
[10]   ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS [J].
REYNOLDS, DA ;
ROSE, RC .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01) :72-83