Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation Scoring

被引：0

作者：

Zahran, Ahmed I. ^{[1
]}

Fahmy, Aly A. ^{[1
]}

Wassif, Khaled T. ^{[1
]}

Bayomi, Hanaa ^{[1
]}

机构：

[1] Cairo Univ, Fac Comp & Artificial Intelligence, Giza 12613, Orman, Egypt

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Automatic pronunciation assessment; pronunciation scoring; pre-trained speech representations; self-supervised speech representation learning; wav2vec; 2.0; WavLM; HuBERT;

D O I：

10.1109/ACCESS.2023.3317236

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic pronunciation assessment models are regularly used in language learning applications. Common methodologies for pronunciation assessment use feature-based approaches, such as the Goodness-of-Pronunciation (GOP) approach, or deep learning speech recognition models to perform speech assessment. With the rise of transformers, pre-trained self-supervised learning (SSL) models have been utilized to extract contextual speech representations, showing improvements in various downstream tasks. In this study, we propose the end-to-end regressor (E2E-R) model for pronunciation scoring. E2E-R is trained using a two-step training process. In the first step, the pre-trained SSL model is fine-tuned on a phoneme recognition task to obtain better representations for the pronounced phonemes. In the second step, transfer learning is used to build a pronunciation scoring model that uses a Siamese neural network to compare the pronounced phoneme representations to embeddings of the canonical phonemes and produce the final pronunciation scores. E2E-R achieves a Pearson correlation coefficient (PCC) of 0.68, which is almost similar to the state-of-the-art GOPT-PAII model while eliminating the need for training on additional native speech data, feature engineering, or external forced alignment modules. To our knowledge, this work presents the first utilization of a pre-trained SSL model for end-to-end phoneme-level pronunciation scoring on raw speech waveforms. The code is available at https://github.com/ai-zahran/E2E-R.

引用

页码：112650 / 112663

页数：14

共 50 条

[1] Improving fine-tuning of self-supervised models with Contrastive Initialization
Pan, Haolin
Guo, Yong
Deng, Qinyi
Yang, Haomin
Chen, Jian
Chen, Yiqun
NEURAL NETWORKS, 2023, 159 : 198 - 207
[2] ActiveStereoNet: End-to-End Self-supervised Learning for Active Stereo Systems
Zhang, Yinda
Khamis, Sameh
Rhemann, Christoph
Valentin, Julien
Kowdle, Adarsh
Tankovich, Vladimir
Schoenberg, Michael
Izadi, Shahram
Funkhouser, Thomas
Fanello, Sean
COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 802 - 819
[3] An End-to-End Contrastive Self-Supervised Learning Framework for Language Understanding
Fang, Hongchao
Xie, Pengtao
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1324 - 1340
[4] Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
Yang, Hejung
Kang, Hong-Goo
INTERSPEECH 2023, 2023, : 814 - 818
[5] Self-supervised end-to-end graph local clustering
Zhe Yuan
World Wide Web, 2023, 26 : 1157 - 1179
[6] Self-supervised end-to-end graph local clustering
Yuan, Zhe
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (03): : 1157 - 1179
[7] Kaizen: Practical self-supervised continual learning with continual fine-tuning
Tang, Chi Ian
Qendrol, Lorena
Spathis, Dimitris
Kawsar, Fahim
Mascolo, Cecilia
Mathur, Akhil
2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 2829 - 2838
[8] Self-Supervised Learning With Data-Efficient Supervised Fine-Tuning for Crowd Counting
Wang, Rui
Hao, Yixue
Hu, Long
Chen, Jincai
Chen, Min
Wu, Di
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1538 - 1546
[9] End-to-end learning of self-rectification and self-supervised disparity prediction for stereo vision
Zhang, Xuchong
Zhao, Yongli
Wang, Hang
Zhai, Han
Sun, Hongbin
Zheng, Nanning
NEUROCOMPUTING, 2022, 494 : 308 - 319
[10] Self-Supervised Representations Improve End-to-End Speech Translation
Wu, Anne
Wang, Changhan
Pino, Juan
Gu, Jiatao
INTERSPEECH 2020, 2020, : 1491 - 1495

← 1 2 3 4 5 →