Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features

被引:0
作者
Zhan, Qingran [1 ,2 ]
Motlicek, Petr [2 ]
Du, Shixuan [1 ]
Shan, Yahui [1 ]
Ma, Sifan [1 ]
Xie, Xiang [1 ,3 ]
机构
[1] Beijing Inst Technol, Informat & Elect Inst, Beijing, Peoples R China
[2] Idiap Res Inst, Martigny, Switzerland
[3] Beijing Inst Technol, Shenzhen Res Inst, Shenzhen, Switzerland
来源
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年
关键词
PHONE RECOGNITION; NEURAL-NETWORK; LANGUAGES;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Articulatory features (AFs) provide language-independent attribute by exploiting the speech production knowledge. This paper proposes a cross-lingual automatic speech recognition (ASR) based on AF methods. Various neural network (NN) architectures are explored to extract cross-lingual AFs and their performance is studied. The architectures include muti-layer perception(MLP), convolutional NN (CNN) and long short-term memory recurrent NN (LSTM). In our cross-lingual setup, only the source language (English, representing a well-resourced language) is used to train the AF extractors. AFs are then generated for the target language (Mandarin, representing an under-resourced language) using the trained extractors. The frame-classification accuracy indicates that the LSTM has an ability to perform a knowledge transfer through the robust cross-lingual AFs from well-resourced to under-resourced language. The final ASR system is built using traditional approaches (e.g. hybrid models), combining AFs with conventional MFCCs. The results demonstrate that the cross-lingual AFs improve the performance in under-resourced ASR task even though the source and target languages come from different language family. Overall, the proposed cross-lingual ASR approach provides slight improvement over the monolingual LF-MMI and cross-lingual (acoustic model adaptation-based) ASR systems.
引用
收藏
页码:1912 / 1916
页数:5
相关论文
共 50 条
[11]   Zero-Shot Cross-Lingual Neural Headline Generation [J].
Ayana ;
Shen, Shi-qi ;
Chen, Yun ;
Yang, Cheng ;
Liu, Zhi-yuan ;
Sun, Mao-song .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) :2319-2327
[12]   Improvement of Phone Recognition Accuracy Using Articulatory Features [J].
K. E. Manjunath ;
K. Sreenivasa Rao .
Circuits, Systems, and Signal Processing, 2018, 37 :704-728
[13]   Improvement of Phone Recognition Accuracy Using Articulatory Features [J].
Manjunath, K. E. ;
Rao, K. Sreenivasa .
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (02) :704-728
[14]   EXPLOITING SPEECH KNOWLEDGE IN NEURAL NETS FOR RECOGNITION [J].
HUCKVALE, M .
SPEECH COMMUNICATION, 1990, 9 (01) :1-13
[15]   Employing word mover's distance for cross-lingual plagiarized text detection [J].
Chang C.-M. ;
Chang C.-H. ;
Hwang S.-Y. .
Proceedings of the Association for Information Science and Technology, 2020, 57 (01)
[16]   Estimation of Cross-Lingual News Similarities Using Text-Mining Methods [J].
Wang, Zhouhao ;
Liu, Enda ;
Sakaji, Hiroki ;
Ito, Tomoki ;
Izumi, Kiyoshi ;
Tsubouchi, Kota ;
Yamashita, Tatsuo .
JOURNAL OF RISK AND FINANCIAL MANAGEMENT, 2018, 11 (01)
[17]   Phone Recognition for Lhasa-Tibetan Based on Articulatory Features Augmentation Learning [J].
Zhao, Yue ;
Zhao, Rui ;
Xu, Xiaona ;
Wu, Licheng ;
Ji, Qiang .
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[18]   Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval [J].
Huang, Zhiqi ;
Bonab, Hamed ;
Sarwar, Sheikh Muhammad ;
Rahimi, Razieh ;
Allan, James .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, :760-770
[19]   Automatic speech recognition systems: A survey of discriminative techniques [J].
Kaur, Amrit Preet ;
Singh, Amitoj ;
Sachdeva, Rohit ;
Kukreja, Vinay .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (09) :13307-13339
[20]   Discovering phonetic inventories with crosslingual automatic speech recognition [J].
Zelasko, Piotr ;
Feng, Siyuan ;
Velazquez, Laureano Moro ;
Abavisani, Ali ;
Bhati, Saurabhchand ;
Scharenborg, Odette ;
Hasegawa-Johnson, Mark ;
Dehak, Najim .
COMPUTER SPEECH AND LANGUAGE, 2022, 74