Optimization of computer-aided english pronunciation training data analysis system

被引：0

作者：

Liang C. ^{[1
]}

Shang J. ^{[1
]}

机构：

[1] School of International Education & Europe-Asia Jiaotong, Zhengzhou Railway Vocational and Technical College, Zhengzhou

来源：

Computer-Aided Design and Applications | 2021年 / 18卷 / s4期

关键词：

Analysis system; Computer assistance; English pronunciation; Optimization; Training data;

D O I：

10.14733/CADAPS.2021.S4.37-48

中图分类号：

学科分类号：

摘要：

In this paper, we propose an audiovisual fusion method based on the optimization of a computer-aided English pronunciation training data analysis system, which is based on the Convolutional Neural Network (CNN). An independent CNN structure is utilized to achieve independent modeling and asynchronous information transfer of audiovisual perception and to obtain descriptions of audiovisual parallel data in high-dimensional feature space, and then the long-time dependencies of the audiovisual parallel data in higher dimensions are modeled through a shared full-connection structure immediately following the CNN. Constructing auditory features to visual features characteristics of the generative model; the generative model is then used to automatically generate many visual features, which are combined with the CNN-based audiovisual fusion method to perform bimodal modeling. The experiments show that when the generated model is trained and tested in the same acoustic environment, only a small amount of audio-visual parallel data is required, and in combination with the proposed bimodal method based on visual feature generation. The bimodal method based on visual feature generation can effectively solve the problem of missing visual information in the actual usage environment. The audiovisual fusion method proposed in this paper can model the independence, asynchrony, and long-term interdependence between audiovisual parallel data, which is of great significance for the further study of the audiovisual fusion method based on deep learning. © 2021 CAD Solutions, LLC.

引用

页码：37 / 48

页数：11

共 12 条

[1] Krecichwost M., Miodonska Z., Badura P., Trzaskalik J., Mocko N., Multi-channel acoustic analysis of phoneme/s/mispronunciation for lateral sigmatism detection, Biocybernetics and Biomedical Engineering, 39, 1, pp. 246-255, (2019)
[2] Qian X., Meng H., Soong F., A two-pass framework of mispronunciation detection and diagnosis for computer-aided pronunciation training, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24, 6, pp. 1020-1028, (2016)
[3] Farouk A., Zhen D., Big data analysis techniques for intelligent systems, Journal of Intelligent & Fuzzy Systems, 37, 3, pp. 3067-3071, (2019)
[4] Therese S.-S., Lingam C., Optimisation of training samples in recognition of overlapping speech and identification of speaker in a two speakers situation, International Journal of Advanced Intelligence Paradigms, 17, 1-2, pp. 159-177, (2020)
[5] O'Brien M.-G., Derwing T.-M., Cucchiarini C., Hardison D. -M., Mixdorff H., Thomson R. -I., Levis G. -M, Directions for the future of technology in pronunciation research and teaching, Journal of Second Language Pronunciation, 4, 2, pp. 182-207, (2018)
[6] Sanchez-Lara A., Chochlidakis K. -M., Lampraki E., Molinelli R., Molinelli F., Ercoli C., Comprehensive digital approach with the Digital Smile System: A clinical report, The Journal of prosthetic dentistry, 121, 6, pp. 871-875, (2019)
[7] Ho C. -W. -L., Soon D., Caals K., Kapur J., Governance of automated image analysis and artificial intelligence analytics in healthcare, Clinical radiology, 74, 5, pp. 329-337, (2019)
[8] Huang H., Xu H., Hu Y., Zhou G., A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection, The Journal of the Acoustical Society of America, 142, 5, pp. 3165-3177, (2017)
[9] Pouyanfar S., Sadiq S., Yan Y., Tian H., Tao Y., Reyes M. -P., Iyengar S. -S., A survey on deep learning: Algorithms, techniques, and applications, ACM Computing Surveys (CSUR), 51, 5, pp. 1-36, (2018)
[10] Chien J. -T., Mak M. -W., Guest Editorial: Modern Speech Processing and Learning, Journal of Signal Processing Systems, 92, 8, pp. 775-776, (2020)

← 1 2 →