Speech-Centric Information Processing: An Optimization-Oriented Approach

被引:24
作者
He, Xiaodong [1 ]
Deng, Li [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
Joint optimization; speech recognition; speech-centric information processing (SCIP); spoken language translation (SLT); spoken language understanding (SLU); voice search; SPOKEN LANGUAGE TRANSLATION; STATISTICAL ESTIMATION; RECOGNITION; SEARCH; INEQUALITY; MODELS;
D O I
10.1109/JPROC.2012.2236631
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic speech recognition (ASR) is a central and common component of voice-driven information processing systems in human language technology, including spoken language translation (SLT), spoken language understanding (SLU), voice search, spoken document retrieval, and so on. Interfacing ASR with its downstream text-based processing tasks of translation, understanding, and information retrieval (IR) creates both challenges and opportunities in optimal design of the combined, speech-enabled systems. We present an optimization-oriented statistical framework for the overall system design where the interactions between the subsystems in tandem are fully incorporated and where design consistency is established between the optimization objectives and the end-to-end system performance metrics. Techniques for optimizing such objectives in both the decoding and learning phases of the speech-centric information processing (SCIP) system design are described, in which the uncertainty in speech recognition subsystem's outputs is fully considered and marginalized. This paper provides an overview of the past and current work in this area. Future challenges and new opportunities are also discussed and analyzed.
引用
收藏
页码:1116 / 1135
页数:20
相关论文
共 107 条
[1]  
[Anonymous], 2006, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
[2]  
[Anonymous], P IEEE WORKSH SPOK L
[3]  
[Anonymous], P IEEE INT C AC SPEE
[4]  
[Anonymous], 2011, Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation
[5]  
[Anonymous], 2003, Proceedings of HLT-NAACL
[6]  
[Anonymous], 2006, Pattern recognition and machine learning
[7]   Discriminative estimation of subspace constrained Gaussian mixture models for speech recognition [J].
Axelrod, Scott ;
Goel, Vaibhava ;
Gopinath, Ramesh ;
Olsen, Peder ;
Visweswariah, Karthik .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01) :172-189
[8]   Deploying GOOG-411: Early lessons in data, measurement, and testing [J].
Bacchiani, Michiel ;
Beaufays, Francoise ;
Schalkwyk, Johan ;
Schuster, Mike ;
Strope, Brian .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :5260-5263
[9]   Updated MINDS Report on Speech Recognition and Understanding, Part 2 [J].
Baker, Janet M. ;
Deng, Li ;
Khudanpur, Sanjeev ;
Lee, Chin-Hui ;
Glass, James R. ;
Morgan, Nelson ;
O'Shaughnessy, Douglas .
IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (04) :78-85
[10]   Research Developments and Directions in Speech Recognition and Understanding, Part 1 [J].
Baker, Janet M. ;
Deng, Li ;
Glass, James ;
Khudanpur, Sanjeev ;
Lee, Chin-Hui ;
Morgan, Nelson ;
O'Shaughnessy, Douglas .
IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (03) :75-80