Frontier Research on Low-Resource Speech Recognition Technology

被引:3
|
作者
Slam, Wushour [1 ]
Li, Yanan [1 ]
Urouvas, Nurmamet [1 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Xinjiang Lab Multilanguage Informat Technol, Xinjiang Multilingual Informat Technol Res Ctr, Urumqi 830046, Peoples R China
关键词
low-resource speech recognition; deep feature extraction; acoustic models; resource expansion; COVARIANCE MATRICES; SPEAKER ADAPTATION; DATA AUGMENTATION; NEURAL-NETWORKS; FEATURES; SYSTEM; ASR; LANGUAGES; LEXICONS; IMPROVE;
D O I
10.3390/s23229096
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
With the development of continuous speech recognition technology, users have put forward higher requirements in terms of speech recognition accuracy. Low-resource speech recognition, as a typical speech recognition technology under restricted conditions, has become a research hotspot nowadays because of its low recognition rate and great application value. Under the premise of low-resource speech recognition technology, this paper reviews the research status of feature extraction and acoustic models, and conducts research on resource expansion. Especially in terms of the technical challenges faced by this technology, solutions are proposed, and future research directions are prospected.
引用
收藏
页数:47
相关论文
共 50 条
  • [41] External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge
    Zhong, Guolong
    Song, Hongyu
    Wang, Ruoyu
    Sun, Lei
    Liu, Diyuan
    Pan, Jia
    Fang, Xin
    Du, Jun
    Zhang, Jie
    Dai, Lirong
    INTERSPEECH 2022, 2022, : 4860 - 4864
  • [42] MLP-HMM Two-Stage Unsupervised Training for Low-Resource Languages on Conversational Telephone Speech Recognition
    Qian, Yanmin
    Liu, Jia
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1815 - 1819
  • [43] MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION
    Li, Xinjian
    Mortensen, David R.
    Metze, Florian
    Black, Alan W.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6958 - 6962
  • [44] Improvement of Acoustic Models Fused with Lip Visual Information for Low-Resource Speech
    Yu, Chongchong
    Yu, Jiaqi
    Qian, Zhaopeng
    Tan, Yuchen
    SENSORS, 2023, 23 (04)
  • [45] Automatic Speech Transcription for Low-Resource Languages - The Case of Yoloxfochitl Mixtec (Mexico)
    Mitral, Vikramjit
    Katholl, Andreas
    Amith, Jonathan D.
    Castillo Garcia, Rey
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3076 - 3080
  • [46] Deep Neural Network based Feature Extraction Using Convex-nonnegative Matrix Factorization for Low-resource Speech Recognition
    Qin, Chuxiong
    Zhang, Lianhai
    2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 1082 - 1086
  • [47] Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition
    Zhang, Xinghua
    Chen, Gaode
    Cui, Shiyao
    Sheng, Jiawei
    Liu, Tingwen
    Xu, Hongbo
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 630 - 640
  • [48] Low-Resource Named Entity Recognition via the Pre-Training Model
    Chen, Siqi
    Pei, Yijie
    Ke, Zunwang
    Silamu, Wushour
    SYMMETRY-BASEL, 2021, 13 (05):
  • [49] Image-Mediated Data Augmentation for Low-Resource Human Activity Recognition
    Wang, Zihao
    Qu, Youli
    Tao, Junru
    Song, Yudan
    PROCEEDINGS OF THE 2019 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTE AND DATA ANALYSIS (ICCDA 2019), 2019, : 49 - 54
  • [50] Language Adaptive DNNs for Improved Low Resource Speech Recognition
    Mueller, Markus
    Stueker, Sebastian
    Waibel, Alex
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3878 - 3882