The Vietnamese Speech Recognition Based on Rectified Linear Units Deep Neural Network and Spoken Term Detection System Combination

被引:0
|
作者
Xiong, Shifu [1 ]
Guo, Wu [1 ]
Liu, Diyuan [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Changsha, Hunan, Peoples R China
来源
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年
关键词
under-resource speech recognition; deep neural network; rectified linear units; spoken term detection; system combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we report our recent progress on the under-resource language automatic speech recognition (ASR) and the following spoken term detection (STD). The experiments are carried on the National Institute of Standards and Technology (NIST) Open Keyword Search 2013 (OpenKWS13) evaluation Vietnamese corpus. Compared with the conventional ASR system, we made the following modifications to improve recognition accuracy. First, pitch features and tone modeling are applied to cover pitch and tone information since Vietnamese is a tonal language. Second, automatic question generation for decision tree is used for state tying to address the problem of lack of linguistic knowledge. Finally, we investigate rectified linear units (ReLUs) activation function and cross-lingual pre-training in deep neural network (DNN) acoustic model training. In the STD procedure, we adopt term-dependent score normalization and combine the outputs of diverse ASR systems to increase actual term weighted value (ATWV). After applying these methods, our current best single system achieves 48.32% word accuracy and 0.398 ATWV after STD system combination on OpenKWS13 Vietnamese development set.
引用
收藏
页码:183 / 186
页数:4
相关论文
共 50 条
  • [1] Deep neural networks with Elastic Rectified Linear Units for object recognition
    Jiang, Xiaoheng
    Pang, Yanwei
    Li, Xuelong
    Pan, Jing
    Xie, Yinghong
    NEUROCOMPUTING, 2018, 275 : 1132 - 1139
  • [2] Rescoring by a Deep Neural Network for Spoken Term Detection
    Konno, Ryota
    Kojima, Kazunori
    Tanaka, Kazuyo
    Lee, Shi-wook
    Itoh, Yoshiaki
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1207 - 1211
  • [3] Predictive Controller Based on Feedforward Neural Network with Rectified Linear Units
    Dolezel, Petr
    Honc, Daniel
    Stursa, Dominik
    INTELLIGENT SYSTEMS APPLICATIONS IN SOFTWARE ENGINEERING, VOL 1, 2019, 1046 : 1 - 12
  • [4] A novel Leaky Rectified Triangle Linear Unit based Deep Convolutional Neural Network for facial emotion recognition
    Devi, Anjani Suputri D.
    Eluri, Suneetha
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (12) : 18669 - 18689
  • [5] Primi Speech Recognition Based on Deep Neural Network
    Hu, Wenjun
    Fu, Meijun
    Pan, Wenlin
    2016 IEEE 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS (IS), 2016, : 667 - 671
  • [6] A novel Leaky Rectified Triangle Linear Unit based Deep Convolutional Neural Network for facial emotion recognition
    Anjani Suputri Devi D
    Suneetha Eluri
    Multimedia Tools and Applications, 2023, 82 : 18669 - 18689
  • [7] Donggan speech recognition based on deep neural network
    Xu, Haiyan
    Yang, Hongwu
    You, Yuren
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 354 - 358
  • [8] Indonesian speech recognition based on Deep Neural Network
    Yang, Ruolin
    Yang, Jian
    Lu, Yu
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 36 - 41
  • [9] Speech Emotion Recognition Based on Deep Neural Network
    Zhu, Zijiang
    Hu, Yi
    Li, Junshan
    Li, Jianjun
    Wang, Junhua
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
  • [10] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536