Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble

被引:0
作者
Li, Xinjian [1 ]
Metze, Florian [1 ]
Mortensen, David R. [1 ]
Watanabe, Shinji [1 ]
Black, Alan W. [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022) | 2022年
关键词
MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Grapheme-to-Phoneme (G2P) has many applications in NLP and speech fields. Most existing work focuses heavily on languages with abundant training datasets, which limits the scope of target languages to less than 100 languages. This work attempts to apply zero-shot learning to approximate G2P models for all lowresource and endangered languages in Glottolog (about 8k languages). For any unseen target language, we first build the phylogenetic tree (i.e. language family tree) to identify top-k nearest languages for which we have training sets. Then we run models of those languages to obtain a hypothesis set, which we combine into a confusion network to propose a most likely hypothesis as an approximation to the target language. We test our approach on over 600 unseen languages and demonstrate it significantly outperforms baselines.
引用
收藏
页码:2106 / 2115
页数:10
相关论文
共 36 条
[1]  
Ager Simon, 2008, Omniglot writing systems and languages of the world
[2]  
[Anonymous], 2009, Introduction to Algorithms
[3]  
Arik S. 0., 2017, PR MACH LEARN RES, P195
[4]   Joint-sequence models for grapheme-to-phoneme conversion [J].
Bisani, Maximilian ;
Ney, Hermann .
SPEECH COMMUNICATION, 2008, 50 (05) :434-451
[5]  
Black Alan W, 1998, 3 ESCA COCOSDA WORKS
[6]   Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages [J].
Bleyan, Harry ;
Ritchie, Sandy ;
Mortensen, Jonas Fromseier ;
Van Esch, Daan .
INTERSPEECH 2019, 2019, :2100-2104
[7]  
CMU, 2000, CMU PRON DICT
[8]  
Deri A, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P399
[9]  
Dryer, 2013, WORLD ATLAS LANGUAGE
[10]   A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].
Fiscus, JG .
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354