Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features

被引:0
|
作者
Zhan, Qingran [1 ,2 ]
Motlicek, Petr [2 ]
Du, Shixuan [1 ]
Shan, Yahui [1 ]
Ma, Sifan [1 ]
Xie, Xiang [1 ,3 ]
机构
[1] Beijing Inst Technol, Informat & Elect Inst, Beijing, Peoples R China
[2] Idiap Res Inst, Martigny, Switzerland
[3] Beijing Inst Technol, Shenzhen Res Inst, Shenzhen, Switzerland
来源
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年
关键词
PHONE RECOGNITION; NEURAL-NETWORK; LANGUAGES;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Articulatory features (AFs) provide language-independent attribute by exploiting the speech production knowledge. This paper proposes a cross-lingual automatic speech recognition (ASR) based on AF methods. Various neural network (NN) architectures are explored to extract cross-lingual AFs and their performance is studied. The architectures include muti-layer perception(MLP), convolutional NN (CNN) and long short-term memory recurrent NN (LSTM). In our cross-lingual setup, only the source language (English, representing a well-resourced language) is used to train the AF extractors. AFs are then generated for the target language (Mandarin, representing an under-resourced language) using the trained extractors. The frame-classification accuracy indicates that the LSTM has an ability to perform a knowledge transfer through the robust cross-lingual AFs from well-resourced to under-resourced language. The final ASR system is built using traditional approaches (e.g. hybrid models), combining AFs with conventional MFCCs. The results demonstrate that the cross-lingual AFs improve the performance in under-resourced ASR task even though the source and target languages come from different language family. Overall, the proposed cross-lingual ASR approach provides slight improvement over the monolingual LF-MMI and cross-lingual (acoustic model adaptation-based) ASR systems.
引用
收藏
页码:1912 / 1916
页数:5
相关论文
共 50 条
  • [1] Exploiting Morpheme and Cross-lingual Knowledge to Enhance Mongolian Named Entity Recognition
    Zhang, Songming
    Zhang, Ying
    Chen, Yufeng
    Wu, Du
    Xu, Jinan
    Liu, Jian
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [2] Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition
    Zhan, Qingran
    Xie, Xiang
    Hu, Chenguang
    Zuluaga-Gomez, Juan
    Wang, Jing
    Cheng, Haobo
    ELECTRONICS, 2021, 10 (24)
  • [4] Self-organizing speech recognition that processes acoustic and articulatory features
    Hesdras O. Viana
    Aluízio F. R. Araújo
    Danilo S. Barbosa
    Multimedia Tools and Applications, 2024, 83 : 39169 - 39195
  • [5] Self-organizing speech recognition that processes acoustic and articulatory features
    Viana, Hesdras O.
    Araujo, Aluizio F. R.
    Barbosa, Danilo S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 39169 - 39195
  • [6] A cross-lingual adaptation approach for rapid development of speech recognizers for learning disabled users
    Bohac, Marek
    Kucharova, Michaela
    Callejas, Zoraida
    Nouza, Jan
    Cerva, Petr
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, : 1 - 13
  • [7] Articulatory and excitation source features for speech recognition in read, extempore and conversation modes
    Manjunath, K. E.
    Rao, K. Sreenivasa
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (01) : 121 - 134
  • [8] Robust phone set mapping using decision tree clustering for cross-lingual phone recognition
    Sim, Khe Chai
    Li, Haizhou
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4309 - 4312
  • [9] Evaluating cross-lingual textual similarity on dictionary alignment problem
    Sever, Yigit
    Ercan, Gonenc
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 1059 - 1078
  • [10] Emotion Detection in Cross-Lingual Text Based on Bidirectional LSTM
    Ren, Han
    Wan, Jing
    Ren, Yafeng
    SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 838 - 845