Feature analysis for discriminative confidence estimation in spoken term detection

被引：9

作者：

Tejedor, Javier ^{[1
]}

Toledano, Doroteo T. ^{[2
]}

Wang, Dong ^{[3
]}

King, Simon ^{[4
]}

Colas, Jose ^{[1
]}

机构：

[1] Univ Autonoma Madrid, Escuela Politecn Super, Human Comp Technol Lab, E-28049 Madrid, Spain

[2] Univ Autonoma Madrid, Escuela Politecn Super, ATVS Biometr Recognit Grp, E-28049 Madrid, Spain

[3] Tsinghua Univ, Ctr Speech & Language Technol, Beijing 100084, Peoples R China

[4] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland

来源：

COMPUTER SPEECH AND LANGUAGE | 2014年 / 28卷 / 05期

基金：

英国工程与自然科学研究理事会;

关键词：

Feature analysis; Discriminative confidence; Spoken term detection; Speech recognition; FEATURE-SELECTION; WORD; CLASSIFICATION; RECOGNITION; GRAPHEME; SYSTEM; PHONE;

D O I：

10.1016/j.csl.2013.09.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Discriminative confidence based on multi-layer perceptrons (MLPs) and multiple features has shown significant advantage compared to the widely used lattice-based confidence in spoken term detection (STD). Although the MLP-based framework can handle any features derived from a multitude of sources, choosing all possible features may lead to over complex models and hence less generality. In this paper, we design an extensive set of features and analyze their contribution to STD individually and as a group. The main goal is to choose a small set of features that are sufficiently informative while keeping the model simple and generalizable. We employ two established models to conduct the analysis: one is linear regression which targets for the most relevant features and the other is logistic linear regression which targets for the most discriminative features. We find the most informative features are comprised of those derived from diverse sources (ASR decoding, duration and lexical properties) and the two models deliver highly consistent feature ranks. STD experiments on both English and Spanish data demonstrate significant performance gains with the proposed feature sets. (C) 2013 Elsevier Ltd. All rights reserved.

引用

页码：1083 / 1114

页数：32

共 94 条

[1] Open-vocabulary spoken term detection using graphone-based hybrid recognition systems [J].

Akbacak, Murat ;

Vergyri, Dimitra ;

Stolcke, Andreas .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :5240-5243

[2]

Almuallim H., 1991, AAAI-91. Proceedings Ninth National Conference on Artificial Intelligence, P547

[3] LEARNING BOOLEAN CONCEPTS IN THE PRESENCE OF MANY IRRELEVANT FEATURES [J].

ALMUALLIM, H ;

DIETTERICH, TG .

ARTIFICIAL INTELLIGENCE, 1994, 69 (1-2) :279-305

[4]

[Anonymous], P 9 INT WORKSH MACH

[5]

[Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946

[6]

[Anonymous], 2000, Pattern Classification

[7]

Bekkerman R., 2003, Journal of Machine Learning Research, V3, P1183, DOI 10.1162/153244303322753625

[8]

Ben Ayed Y., 2002, Text, Speech and Dialogue. 5th International Conference, TSD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2448), P285

[9]

Ben-Bassat M., 1982, Handbook of statistics, V2, P773, DOI DOI 10.1016/S0169-7161(82)02038-0

[10]

BERGEN Z., 1997, P 5 EUR C SPEECH COM, P819

← 1 2 3 4 5 6 7 8 9 10 →