Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features

被引:3
作者
Asami, Taichi [1 ]
Masumura, Ryo [1 ]
Aono, Yushi [1 ]
Shinoda, Koichi [2 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Tokyo, Japan
[2] Tokyo Inst Technol, Tokyo, Japan
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
speech recognition; OOV word detection; recurrent OOV words; distribution of features; SPEECH RECOGNITION;
D O I
10.21437/Interspeech.2016-562
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The repeated use of out-of-vocabulary (OOV) words in a spoken document seriously degrades a speech recognizer's performance. This paper provides a novel method for accurately detecting such recurrent OOV words. Standard OOV word detection methods classify each word segment into in-vocabulary (IV) or OOV. This word-by-word classification tends to be affected by sudden vocal irregularities in spontaneous speech, triggering false alarms. To avoid this sensitivity to the irregularities, our proposal focuses on consistency of the repeated occurrence of OOV words. The proposed method preliminarily detects recurrent segments, segments that contain the same word, in a spoken document by open vocabulary spoken term discovery using a phoneme recognizer. If the recurrent segments are OOV words, features for OOV detection in those segments should exhibit consistency. We capture this consistency by using the mean and variance (distribution) of features (DOF) derived from the recurrent segments, and use the DOF for IV/OOV classification. Experiments illustrate that the proposed method's use of the DOF significantly improves its performance in recurrent OOV word detection.
引用
收藏
页码:1320 / 1324
页数:5
相关论文
共 18 条
[1]  
[Anonymous], NTT TECHNICAL REV
[2]  
[Anonymous], 1998, PROC BROADCAST NEWS
[3]  
[Anonymous], 2006, P 1 WORKSHOP GRAPH B, DOI DOI 10.3115/1654758.1654774
[4]   Clustering and DiversifyingWeb Search Results with Graph-Based Word Sense Induction [J].
Di Marco, Antonio ;
Navigli, Roberto .
COMPUTATIONAL LINGUISTICS, 2013, 39 (03) :709-754
[5]  
FUCHI T, 1998, P COLING ACL, P409
[6]   Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition [J].
Hori, Takaaki ;
Hori, Chiori ;
Minami, Yasuhiro ;
Nakamura, Atsushi .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1352-1365
[7]  
Katsurada K., 2011, P INTERSPEECH, P909
[8]  
Kuo H., 2014, P ICASSP, P7158
[9]  
Maekawa K., 2000, P LREC, V6, P1
[10]   Finding consensus in speech recognition: word error minimization and other applications of confusion networks [J].
Mangu, L ;
Brill, E ;
Stolcke, A .
COMPUTER SPEECH AND LANGUAGE, 2000, 14 (04) :373-400