Bangla Speech Recognition System using LPC and ANN

被引：36

作者：

Paul, Anup Kumar ^{[1
]}

Das, Dipankar ^{[2
]}

Kamal, Md. Mustafa ^{[1
]}

机构：

[1] Dhaka City Coll, Dhaka, Bangladesh

[2] Rajshahi Univ, Dept Informat & Commun Engn, Rajshahi 6205, Bangladesh

来源：

ICAPR 2009: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, PROCEEDINGS | 2009年

关键词：

D O I：

10.1109/ICAPR.2009.80

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents the Bangla speech recognition system. Bangla speech recognition system is divided mainly into two major parts. The first part is speech signal processing and the second part is speech pattern recognition technique. The speech processing stage consists of speech starting and end point detection, windowing, filtering, calculating the Linear Predictive Coding(LPC) and Cepstral Coefficients and finally constructing the codebook by vector quantization. The second part consists of pattern recognition system using Artificial Neural Network(ANN). Speech signals are recorded using an audio wave recorder in the normal room environment. The recorded speech signal is passed through the speech starting and end-point detection algorithm to detect the presence of the speech signal and remove the silence and pauses portions of the signals. The resulting signal is then filtered for the removal of unwanted background noise from the speech signals. The filtered signal is then windowed ensuring half frame overlap. After windowing, the speech signal is then subjected to calculate the LPC coefficient and Cepstral coefficient. The feature extractor uses a standard LPC Cepstrum coder, which converts the incoming speech signal into LPC Cepstrurn feature space. The Self Organizing Map(SOM) Neural Network makes each variable length LPC trajectory of an isolated word into a fixed length LPC trajectory and thereby making the fixed length feature vector, to be fed into to the recognizer. The structures of the neural network is designed with Multi Layer Perceptron approach and tested with 3, 4, 5 hidden layers using the Transfer functions of Tanh Sigmoid for the Bangla speech recognition system. Comparison among different structures of Neural Networks conducted here for a better understanding of the problem and its possible solutions.

引用

页码：171 / 174

页数：4

共 7 条

[1]

ALI ME, THESIS BUET DHAKA

[2]

DEFATTA DJ, 1998, DIGITAL SIGNAL PROCE, P5807

[3]

HASEGAWA H, P 1993 INT JOINT C N

[4]

MINHAZ MN, 1998, INT C COMP INF TECH

[5]

Rabiner L., 1993, Fundamentals of speech recognition

[6]

Rabiner LR., 1978, DIGITAL PROCESSING S

[7]

TEBELSKIS J, 1995, THESIS CARNEGIE MELL

← 1 →