SPEECH RECOGNITION OF FOREIGN OUT-OF-VOCABULARY WORDS USING A HIERARCHICAL LANGUAGE MODEL

被引：0

作者：

Yamamoto, Hirofumi ^{[1
]}

Kikui, Genichiro ^{[2
]}

Nakamura, Satoshi ^{[1
,2
]}

Sagisaka, Yoshinori ^{[1
,3
]}

机构：

[1] Natl Inst Informat & Commun Technol, 2-2-2 Hikaridai, Seika, Kyoto, Japan

[2] ATR Spoken Language Commun Res Labs, Kyoto, Japan

[3] Waseda Univ, GITI, Tokyo, Japan

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

Speech Recognition; Language model; Foreign word; Out-of-Vocabulalry word; Hierarchical language model;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a new speech recognition scheme for foreign out-of-vocabulary words embedded in native-language speech. To recognize foreign names frequently observed in news speech or in translation speech, we adopted a hierarchical language model that had been successfully applied to OOV words covering native vocabularies. In this hierarchical language model, OOV vocabularies are modeled as a word-class model in the upper-layered model, and their statistical phonotactic constraints are modeled in the lower-layered model. Since extra statistics are needed to cover foreign words and their pronunciation differences, we have introduced two techniques. The first is to combine translation target language models and translation source statistics of OOVs using the hierarchical language model. The second is to automatically generate recognition target pronunciations from original pronunciations by syllable-to-syllable mapping. To confirm the validity of this recognition scheme, we have conducted speech recognition experiments using English speech including Japanese personal names as OOV words. The proposed method outperformed the existing algorithm using a lexicon consisting of all the words in the training set. Surprisingly, it achieved better OOV recognition results than the non-OOV condition where all the proper names in the test set are registered in the lexicon.

引用

页码：1870 / +

页数：2

共 50 条

[31] Residual Language Model for End-to-end Speech Recognition [J].

Tsunoo, Emiru ;

Kashiwagi, Yosuke ;

Narisetty, Chaitanya ;

Watanabe, Shinji .

INTERSPEECH 2022, 2022, :3899-3903

[32] ON LANGUAGE MODEL INTEGRATION FOR RNN TRANSDUCER BASED SPEECH RECOGNITION [J].

Zhou, Wei ;

Zheng, Zuoyun ;

Schlueter, Ralf ;

Ney, Hermann .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :8407-8411

[33] Language Model Based Non-speech Recognition Method [J].

Zhang, Qinglin ;

Chen, Jianfeng ;

Bai, Jisheng .

CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,

[34] LATENT DIRICHLIET LANGUAGE MODEL FOR SPEECH RECOGNITION [J].

Chien, Jen-Tzung ;

Chueh, Chuang-Hua .

2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, :201-204

[35] Language Model Score Regularization for Speech Recognition [J].

ZHANG Yike ;

ZHANG Pengyuan ;

YAN Yonghong .

ChineseJournalofElectronics, 2019, 28 (03) :604-609

[36] TOPIC CACHE LANGUAGE MODEL FOR SPEECH RECOGNITION [J].

Chueh, Chuang-Hua ;

Chien, Jen-Tzung .

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :5194-5197

[37] Language Model Score Regularization for Speech Recognition [J].

Zhang Yike ;

Zhang Pengyuan ;

Yan Yonghong .

CHINESE JOURNAL OF ELECTRONICS, 2019, 28 (03) :604-609

[38] LANGUAGE MODEL VERBALIZATION FOR AUTOMATIC SPEECH RECOGNITION [J].

Sak, Hasim ;

Beaufays, Francoise ;

Nakajima, Kaisuke ;

Allauzen, Cyril .

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :8262-8266

[39] Evaluating Spoken Language Model Based on Filler Prediction Model in Speech Recognition [J].

Ohta, Kengo ;

Tsuchiya, Masatoshi ;

Nakagawa, Seiichi .

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, :1558-+

[40] Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition [J].

Yamazaki, Hiroki ;

Iwano, Koji ;

Shinoda, Koichi ;

Furui, Sadaoki ;

Yokota, Haruo .

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, :89-92

← 1 2 3 4 5 →