The Vietnamese Speech Recognition Based on Rectified Linear Units Deep Neural Network and Spoken Term Detection System Combination

被引：0

作者：

Xiong, Shifu ^{[1
]}

Guo, Wu ^{[1
]}

Liu, Diyuan ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Changsha, Hunan, Peoples R China

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

关键词：

under-resource speech recognition; deep neural network; rectified linear units; spoken term detection; system combination;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we report our recent progress on the under-resource language automatic speech recognition (ASR) and the following spoken term detection (STD). The experiments are carried on the National Institute of Standards and Technology (NIST) Open Keyword Search 2013 (OpenKWS13) evaluation Vietnamese corpus. Compared with the conventional ASR system, we made the following modifications to improve recognition accuracy. First, pitch features and tone modeling are applied to cover pitch and tone information since Vietnamese is a tonal language. Second, automatic question generation for decision tree is used for state tying to address the problem of lack of linguistic knowledge. Finally, we investigate rectified linear units (ReLUs) activation function and cross-lingual pre-training in deep neural network (DNN) acoustic model training. In the STD procedure, we adopt term-dependent score normalization and combine the outputs of diverse ASR systems to increase actual term weighted value (ATWV). After applying these methods, our current best single system achieves 48.32% word accuracy and 0.398 ATWV after STD system combination on OpenKWS13 Vietnamese development set.

引用

页码：183 / 186

页数：4

共 50 条

[1] Deep neural networks with Elastic Rectified Linear Units for object recognition
Jiang, Xiaoheng
Pang, Yanwei
Li, Xuelong
Pan, Jing
Xie, Yinghong
NEUROCOMPUTING, 2018, 275 : 1132 - 1139
[2] Rescoring by a Deep Neural Network for Spoken Term Detection
Konno, Ryota
Kojima, Kazunori
Tanaka, Kazuyo
Lee, Shi-wook
Itoh, Yoshiaki
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1207 - 1211
[3] Predictive Controller Based on Feedforward Neural Network with Rectified Linear Units
Dolezel, Petr
Honc, Daniel
Stursa, Dominik
INTELLIGENT SYSTEMS APPLICATIONS IN SOFTWARE ENGINEERING, VOL 1, 2019, 1046 : 1 - 12
[4] A novel Leaky Rectified Triangle Linear Unit based Deep Convolutional Neural Network for facial emotion recognition
Devi, Anjani Suputri D.
Eluri, Suneetha
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (12) : 18669 - 18689
[5] Primi Speech Recognition Based on Deep Neural Network
Hu, Wenjun
Fu, Meijun
Pan, Wenlin
2016 IEEE 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS (IS), 2016, : 667 - 671
[6] A novel Leaky Rectified Triangle Linear Unit based Deep Convolutional Neural Network for facial emotion recognition
Anjani Suputri Devi D
Suneetha Eluri
Multimedia Tools and Applications, 2023, 82 : 18669 - 18689
[7] Donggan speech recognition based on deep neural network
Xu, Haiyan
Yang, Hongwu
You, Yuren
PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 354 - 358
[8] Indonesian speech recognition based on Deep Neural Network
Yang, Ruolin
Yang, Jian
Lu, Yu
2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 36 - 41
[9] Speech Emotion Recognition Based on Deep Neural Network
Zhu, Zijiang
Hu, Yi
Li, Junshan
Li, Jianjun
Wang, Junhua
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
[10] Deep Neural Network Based Speech Separation for Robust Speech Recognition
Tu Yanhui
Jun, Du
Xu Yong
Dai Lirong
Chin-Hui, Lee
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536

← 1 2 3 4 5 →