Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems

被引:1
作者
Mao, Tingzhi [1 ]
Khassanov, Yerbolat [2 ,3 ]
Pham, Van Tung [2 ]
Xu, Haihua [2 ]
Huang, Hao [1 ]
Chng, Eng Siong [2 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[3] Nazarbayev Univ, ISSAI, Baku, Azerbaijan
来源
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2021年
基金
国家重点研发计划;
关键词
speech recognition; named entity recognition; graphemic lexicon; word lattice; word embeddings;
D O I
10.1109/ISCSLP49672.2021.9362062
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance. The underrepresented words correspond to rare or out-of-vocabulary (OOV) words in the training data, and thereby can't be modeled reliably. We begin with graphemic lexicon which allows to drop the necessity of phonetic models in hybrid ASR. We study it under different settings and demonstrate its effectiveness in dealing with underrepresented NEs. Next, we study the impact of neural language model (LM) with letter-based features derived to handle infrequent words. After that, we attempt to enrich representations of underrepresented NEs in pretrained neural LM by borrowing the embedding representations of rich-represented words. This let us gain significant performance improvement on underrepresented NE recognition. Finally, we boost the likelihood scores of utterances containing NEs in the word lattices rescored by neural LMs and gain further performance improvement. The combination of the aforementioned approaches improves NE recognition by up to 42% relatively.
引用
收藏
页数:5
相关论文
共 50 条
[21]   Named Entities Based on the BERT-BILSTM-ACRF Model Recognition Research [J].
Wang, Jingdong ;
Guo, Yongjia .
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, :228-233
[22]   PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAMED ENTITIES [J].
Sim, Khe Chai ;
Beaufays, Francoise ;
Guliani, Arnaud Benard Dhruv ;
Kabel, Andreas ;
Khare, Nikhil ;
Lucassen, Tamar ;
Zadrazil, Petr ;
Zhang, Harry ;
Johnson, Leif ;
Motta, Giovanni ;
Zhou, Lillian .
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, :23-30
[23]   Improving biomedical named entity recognition with syntactic information [J].
Yuanhe Tian ;
Wang Shen ;
Yan Song ;
Fei Xia ;
Min He ;
Kenli Li .
BMC Bioinformatics, 21
[24]   IMPROVING CHINESE NAMED ENTITY RECOGNITION WITH LEXICAL INFORMATION [J].
Fu, Guo-Hong .
PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, :3487-3491
[25]   Improving biomedical named entity recognition with syntactic information [J].
Tian, Yuanhe ;
Shen, Wang ;
Song, Yan ;
Xia, Fei ;
He, Min ;
Li, Kenli .
BMC BIOINFORMATICS, 2020, 21 (01)
[26]   IMPROVING VIETNAMESE ACCENT RECOGNITION USING ASR TRANSFER LEARNING [J].
Ta, Bao Thang ;
Dang, Xuan Vuong ;
Duong, Quang Tien ;
Le, Nhat Minh ;
Do, Van Hai .
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
[27]   Improving Named Entity Recognition for Morphologically Rich Languages using Word Embeddings [J].
Demir, Hakan ;
Ozgur, Arzucan .
2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, :117-122
[28]   An innovative hybrid approach for extracting named entities from unstructured text data [J].
Thomas, Anu ;
Sangeetha, S. .
COMPUTATIONAL INTELLIGENCE, 2019, 35 (04) :799-826
[29]   A Hybrid Named Entity Recognition System for Aviation Text [J].
Bharathi, A. ;
Ramdin, Robin ;
Babu, Preeja ;
Menon, Vijay Krishna ;
Jayaramakrishnan, Chandrasekhar ;
Lakshmikumar, Sudarsan .
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
[30]   Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts [J].
Kim, Mi-Young ;
Xu, Ying ;
Zaiane, Osmar R. ;
Goebel, Randy .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 6 (04)