Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features

被引：0

作者：

Zhan, Qingran ^{[1
,2
]}

Motlicek, Petr ^{[2
]}

Du, Shixuan ^{[1
]}

Shan, Yahui ^{[1
]}

Ma, Sifan ^{[1
]}

Xie, Xiang ^{[1
,3
]}

机构：

[1] Beijing Inst Technol, Informat & Elect Inst, Beijing, Peoples R China

[2] Idiap Res Inst, Martigny, Switzerland

[3] Beijing Inst Technol, Shenzhen Res Inst, Shenzhen, Switzerland

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

关键词：

PHONE RECOGNITION; NEURAL-NETWORK; LANGUAGES;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Articulatory features (AFs) provide language-independent attribute by exploiting the speech production knowledge. This paper proposes a cross-lingual automatic speech recognition (ASR) based on AF methods. Various neural network (NN) architectures are explored to extract cross-lingual AFs and their performance is studied. The architectures include muti-layer perception(MLP), convolutional NN (CNN) and long short-term memory recurrent NN (LSTM). In our cross-lingual setup, only the source language (English, representing a well-resourced language) is used to train the AF extractors. AFs are then generated for the target language (Mandarin, representing an under-resourced language) using the trained extractors. The frame-classification accuracy indicates that the LSTM has an ability to perform a knowledge transfer through the robust cross-lingual AFs from well-resourced to under-resourced language. The final ASR system is built using traditional approaches (e.g. hybrid models), combining AFs with conventional MFCCs. The results demonstrate that the cross-lingual AFs improve the performance in under-resourced ASR task even though the source and target languages come from different language family. Overall, the proposed cross-lingual ASR approach provides slight improvement over the monolingual LF-MMI and cross-lingual (acoustic model adaptation-based) ASR systems.

引用

页码：1912 / 1916

页数：5

共 50 条

[11] Zero-Shot Cross-Lingual Neural Headline Generation [J].

Ayana ;

Shen, Shi-qi ;

Chen, Yun ;

Yang, Cheng ;

Liu, Zhi-yuan ;

Sun, Mao-song .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) :2319-2327

[12] Improvement of Phone Recognition Accuracy Using Articulatory Features [J].

K. E. Manjunath ;

K. Sreenivasa Rao .

Circuits, Systems, and Signal Processing, 2018, 37 :704-728

[13] Improvement of Phone Recognition Accuracy Using Articulatory Features [J].

Manjunath, K. E. ;

Rao, K. Sreenivasa .

CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (02) :704-728

[14] EXPLOITING SPEECH KNOWLEDGE IN NEURAL NETS FOR RECOGNITION [J].

HUCKVALE, M .

SPEECH COMMUNICATION, 1990, 9 (01) :1-13

[15] Employing word mover's distance for cross-lingual plagiarized text detection [J].

Chang C.-M. ;

Chang C.-H. ;

Hwang S.-Y. .

Proceedings of the Association for Information Science and Technology, 2020, 57 (01)

[16] Estimation of Cross-Lingual News Similarities Using Text-Mining Methods [J].

Wang, Zhouhao ;

Liu, Enda ;

Sakaji, Hiroki ;

Ito, Tomoki ;

Izumi, Kiyoshi ;

Tsubouchi, Kota ;

Yamashita, Tatsuo .

JOURNAL OF RISK AND FINANCIAL MANAGEMENT, 2018, 11 (01)

[17] Phone Recognition for Lhasa-Tibetan Based on Articulatory Features Augmentation Learning [J].

Zhao, Yue ;

Zhao, Rui ;

Xu, Xiaona ;

Wu, Licheng ;

Ji, Qiang .

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

[18] Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval [J].

Huang, Zhiqi ;

Bonab, Hamed ;

Sarwar, Sheikh Muhammad ;

Rahimi, Razieh ;

Allan, James .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, :760-770

[19] Automatic speech recognition systems: A survey of discriminative techniques [J].

Kaur, Amrit Preet ;

Singh, Amitoj ;

Sachdeva, Rohit ;

Kukreja, Vinay .

MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (09) :13307-13339

[20] Discovering phonetic inventories with crosslingual automatic speech recognition [J].

Zelasko, Piotr ;

Feng, Siyuan ;

Velazquez, Laureano Moro ;

Abavisani, Ali ;

Bhati, Saurabhchand ;

Scharenborg, Odette ;

Hasegawa-Johnson, Mark ;

Dehak, Najim .

COMPUTER SPEECH AND LANGUAGE, 2022, 74

← 1 2 3 4 5 →