An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition

被引：0

作者：

Rukwong, Niyada ^{[1
]}

Pongpinigpinyo, Sunee ^{[1
]}

机构：

[1] Silpakorn Univ, Fac Sci, Dept Comp, Amphoe Muang 73000, Nakhon Pathom, Thailand

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 13期

关键词：

computer-assisted pronunciation training; convolutional neural networks; Thai vowels; speech recognition; mel spectrogram; mel frequency cepstral coefficients; BRITISH ENGLISH; SPEAKING RATE; LEARNERS; PERCEPTION; DIALECT; EXPERIENCE; DURATION; NETWORK; LENGTH;

D O I：

10.3390/app12136595

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

For Thai vowel pronunciation, it is very important to know that when mispronunciation occurs, the meanings of words change completely. Thus, effective and standardized practice is essential to pronouncing words correctly as a native speaker. Since the COVID-19 pandemic, online learning has become increasingly popular. For example, an online pronunciation application system was introduced that has virtual teachers and an intelligent process of evaluating students that is similar to standardized training by a teacher in a real classroom. This research presents an online automatic computer-assisted pronunciation training (CAPT) using deep learning to recognize Thai vowels in speech. The automatic CAPT is developed to solve the inadequacy of instruction specialists and the complex vowel teaching process. It is a unique system that develops computer techniques integrated with linguistic theory. The deep learning model is the most significant part of recognizing vowels pronounced for the automatic CAPT. The major challenge in Thai vowel recognition is the correct identification of Thai vowels when spoken in real-world situations. A convolutional neural network (CNN), a deep learning model, is applied and developed in the classification of pronounced Thai vowels. A new dataset for Thai vowels was designed, collected, and examined by linguists. The result of an optimal CNN model with Mel spectrogram (MS) achieves the highest accuracy of 98.61%, compared with Mel frequency cepstral coefficients (MFCC) with the baseline long short-term memory (LSTM) model and MS with the baseline LSTM model have an accuracy of 94.44% and 90.00% respectively.

引用

页数：28

共 50 条

[21] A traffic state recognition model based on feature map and deep learning
Wang, Chun
Zhang, Weihua
Wu, Cong
Hu, Heng
Ding, Heng
Zhu, Wenjia
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2022, 607
[22] Random Fourier Feature-Based Deep Learning for Wireless Communications
Mitra, Rangeet
Kaddoum, Georges
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (02) : 468 - 479
[23] Feature-based and Deep Learning-based Classification of Environmental Sound
Jatturas, Chinnavat
Chokkoedsakul, Sornsawan
Ayudhya, Pisitpong Devahasting Na
Pankaew, Sukit
Sopavanit, Cherdkul
Asdornwised, Widhyakorn
2019 4TH IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - ASIA (IEEE ICCE-ASIA 2019), 2019, : 126 - 130
[24] Evaluation and generation of attachment concepts based upon a feature-based solid model using a feature-based recognition strategy
Baxter, DH
Gabriele, GA
DESIGN RESEARCH - THEORIES, METHODOLOGIES, AND PRODUCT MODELLING, 2001, : 605 - 612
[25] A Feature-Based Machine Learning Agent for Automatic Rice and Weed Discrimination
Cheng, Beibei
Matson, Eric T.
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2015, 9119 : 517 - 527
[26] OPTICAL IMPLEMENTATION OF A FEATURE-BASED NEURAL NETWORK WITH APPLICATION TO AUTOMATIC TARGET RECOGNITION
CHAO, TH
STONER, WW
APPLIED OPTICS, 1993, 32 (08): : 1359 - 1369
[27] Feature-based Noise Robust Speech Recognition on an Indonesian Language Automatic Speech Recognition System
Satriawan, Cil Hardianto
Lestari, Dessi Puji
2014 International Conference on Electrical Engineering and Computer Science (ICEECS), 2014, : 42 - 46
[28] Automatic Feature Learning for Glaucoma Detection Based on Deep Learning
Chen, Xiangyu
Xu, Yanwu
Yan, Shuicheng
Wong, Damon Wing Kee
Wong, Tien Yin
Liu, Jiang
MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 669 - 677
[29] A Feature Fusion based Custom Deep Learning Model for Vehicle Make and Model Recognition
Ghosh, Triyas
Gayen, Soumyajit
Maity, Sourajit
Valenkova, Daria
Sarkar, Ram
2024 13TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING, MECO 2024, 2024, : 246 - 249
[30] Accumulated Polar Feature-Based Deep Learning for Efficient and Lightweight Automatic Modulation Classification With Channel Compensation Mechanism
Teng, Chieh-Fang
Chou, Ching-Yao
Chen, Chun-Hsiang
Wu, An-Yeu
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (12) : 15472 - 15485

← 1 2 3 4 5 →