Maxout Based Deep Neural Networks for Arabic Phonemes Recognition

被引:0
作者
AbdAlmisreb, Ali [1 ]
Abidin, Ahmad Farid [1 ]
Tahir, Nooritawati Md [1 ]
机构
[1] Univ Teknol MARA, Fac Elect Engn, Shah Alam 40450, Selangor, Malaysia
来源
2015 IEEE 11TH INTERNATIONAL COLLOQUIUM ON SIGNAL PROCESSING & ITS APPLICATIONS (CSPA 2015) | 2015年
关键词
Maxout Networks; Deep learning; Arabic; Deep Belief Network; Convolutional Neural Network;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Arabic is widely articulated by Malay race due to several factors such as; performing worship and reciting the Holy book of Muslims. Newly, Maxout deep neural networks have conveyed substantial perfections to speech recognition systems. Hence, in this paper, a fully connected feed-forward neural network with Maxout units is introduced. The proposed deep neural network involves three hidden layers, 500 Maxout units and 2 neurons for each unit along with Mel-Frequency Cepstral Coefficients (MFCC) as feature extraction of the phonemes waveforms. Further, the deep neural network is trained and tested over a corpus comprised of consonant Arabic phonemes recorded from 20 Malay speakers. Each person is required to pronounce the twenty eight consonant phonemes within the three chances given to each subjects articulate all the letters. Conversely, continuous recording has been established to record all the letters in each chance. The recording process is accomplished using SAMSON C03U USB multi-pattern condenser microphone. Here, the data are divided into five waveforms for training the proposed Maxout network and fifteen waveforms for testing. Experimentally, the proposed Dropout function for training has shown considerable performance over Sigmoid and Rectified Linear Unit (ReLU) functions. Eventually, testing Maxout network has shown considerable outcome compare to Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Convolutional Neural Network (CNN), the conventional feedforward neural network (NN) and Convolutional AutoEncoder (CAE).
引用
收藏
页码:192 / 197
页数:6
相关论文
共 20 条
  • [1] [Anonymous], NEURAL NETWORKS
  • [2] [Anonymous], 2013, IMPROVING NEURAL NET
  • [3] [Anonymous], 2013, GENOME ANNOUNC, DOI DOI 10.1128/GEN0MEA.00785-13
  • [4] Cai M, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P291, DOI 10.1109/ASRU.2013.6707745
  • [5] Chen MM, 2014, IEEE IJCNN, P1154, DOI 10.1109/IJCNN.2014.6889515
  • [6] Chen X, 2014, 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), P6, DOI 10.1109/ISCSLP.2014.6936617
  • [7] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
  • [8] Goodfellow I, 2013, JMLR W CP, P1319
  • [9] Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
  • [10] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97