An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition

被引:0
|
作者
Rukwong, Niyada [1 ]
Pongpinigpinyo, Sunee [1 ]
机构
[1] Silpakorn Univ, Fac Sci, Dept Comp, Amphoe Muang 73000, Nakhon Pathom, Thailand
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 13期
关键词
computer-assisted pronunciation training; convolutional neural networks; Thai vowels; speech recognition; mel spectrogram; mel frequency cepstral coefficients; BRITISH ENGLISH; SPEAKING RATE; LEARNERS; PERCEPTION; DIALECT; EXPERIENCE; DURATION; NETWORK; LENGTH;
D O I
10.3390/app12136595
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
For Thai vowel pronunciation, it is very important to know that when mispronunciation occurs, the meanings of words change completely. Thus, effective and standardized practice is essential to pronouncing words correctly as a native speaker. Since the COVID-19 pandemic, online learning has become increasingly popular. For example, an online pronunciation application system was introduced that has virtual teachers and an intelligent process of evaluating students that is similar to standardized training by a teacher in a real classroom. This research presents an online automatic computer-assisted pronunciation training (CAPT) using deep learning to recognize Thai vowels in speech. The automatic CAPT is developed to solve the inadequacy of instruction specialists and the complex vowel teaching process. It is a unique system that develops computer techniques integrated with linguistic theory. The deep learning model is the most significant part of recognizing vowels pronounced for the automatic CAPT. The major challenge in Thai vowel recognition is the correct identification of Thai vowels when spoken in real-world situations. A convolutional neural network (CNN), a deep learning model, is applied and developed in the classification of pronounced Thai vowels. A new dataset for Thai vowels was designed, collected, and examined by linguists. The result of an optimal CNN model with Mel spectrogram (MS) achieves the highest accuracy of 98.61%, compared with Mel frequency cepstral coefficients (MFCC) with the baseline long short-term memory (LSTM) model and MS with the baseline LSTM model have an accuracy of 94.44% and 90.00% respectively.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Deep feature transfer learning for automatic pronunciation assessment
    Lin, Binghuai
    Wang, Liyuan
    INTERSPEECH 2021, 2021, : 4438 - 4442
  • [2] An acoustic-phonetic feature-based system for the automatic recognition of fricative consonants
    Ali, AMA
    Van der Speigel, J
    Mueller, P
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 961 - 964
  • [3] Kinship verification and recognition based on handcrafted and deep learning feature-based techniques
    Nader N.
    El-Gamal F.E.-Z.
    El-Sappagh S.
    Kwak K.S.
    Elmogy M.
    PeerJ Computer Science, 2021, 7
  • [4] Kinship verification and recognition based on handcrafted and deep learning feature-based techniques
    Nader, Nermeen
    El-Gamal, Fatma El-Zahraa
    El-Sappagh, Shaker
    Kwak, Kyung Sup
    Elmogy, Mohammed
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [5] Deep feature-based automatic classification of mammograms
    Arora, Ridhi
    Rai, Prateek Kumar
    Raman, Balasubramanian
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2020, 58 (06) : 1199 - 1211
  • [6] An acoustic-phonetic feature-based system for automatic phoneme recognition in continuous speech
    Ali, AMA
    Van der Spiegel, J
    Mueller, P
    Haentjens, G
    Berman, J
    ISCAS '99: PROCEEDINGS OF THE 1999 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3: ANALOG AND DIGITAL SIGNAL PROCESSING, 1999, : 118 - 121
  • [7] Acoustic-phonetic feature-based system for automatic phoneme recognition in continuous speech
    Abdelatty Ali, Ahmed M.
    Van der Spiegel, Jan
    Mueller, Paul
    Haentjens, Gavin
    Berman, Jeffrey
    Proceedings - IEEE International Symposium on Circuits and Systems, 1999, 3
  • [8] Acoustic feature-based emotion recognition and curing using ensemble learning and CNN
    Anand, Raghav, V
    Md, Abdul Quadir
    Sakthivel, G.
    Padmavathy, T., V
    Mohan, Senthilkumar
    Damasevicius, Robertas
    APPLIED SOFT COMPUTING, 2024, 166
  • [9] Articulatory feature-based pronunciation modeling
    Livescu, Karen
    Jyothi, Preethi
    Fosler-Lussier, Eric
    COMPUTER SPEECH AND LANGUAGE, 2016, 36 : 212 - 232
  • [10] Phonological feature-based speech recognition system for pronunciation training in non-native language learning
    Arora, Vipul
    Lahiri, Aditi
    Reetz, Henning
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (01): : 98 - 108