Review of analysis methods for speech applications

被引:5
作者
O'Shaughnessy, Douglas [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Speech analysis; Speech recognition; Speaker verification; Speech coding; Hidden Markov models; Neural networks; RECOGNITION; REPRESENTATION; TUTORIAL; FEATURES; SIGNALS;
D O I
10.1016/j.specom.2023.05.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper reviews methods used to analyze speech signals for various applications such as automatic recognition of associated text and speaker identity, and coding. The survey focuses on the requirements of different appli-cations, as diverse speech tasks have often used the same methods, despite having very different objectives. As relevant information in a speech signal is distributed highly non-uniformly, a variety of time and frequency analysis techniques is examined. The utility of methods is noted in terms of performance, using accuracy, complexity, cost, and latency as criteria.
引用
收藏
页码:64 / 75
页数:12
相关论文
共 56 条
[1]   Unsupervised Raw Waveform Representation Learning for ASR [J].
Agrawal, Purvi ;
Ganapathy, Sriram .
INTERSPEECH 2019, 2019, :3451-3455
[2]   Quasi Closed Phase Glottal Inverse Filtering Analysis With Weighted Linear Prediction [J].
Airaksinen, Manu ;
Raitio, Tuomo ;
Story, Brad ;
Alku, Paavo .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (03) :596-607
[3]   GLOTTAL WAVE ANALYSIS WITH PITCH SYNCHRONOUS ITERATIVE ADAPTIVE INVERSE FILTERING [J].
ALKU, P .
SPEECH COMMUNICATION, 1992, 11 (2-3) :109-118
[4]   Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering [J].
Alku, Paavo ;
Magi, Carlo ;
Yrttiaho, Santeri ;
Backstrom, Tom ;
Story, Brad .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (05) :3289-3305
[5]  
[Anonymous], 1933, P 1 ALL UN C TECHN R
[6]   ADAPTIVE PREDICTIVE CODING OF SPEECH SIGNALS [J].
ATAL, BS ;
SCHROEDER, MR .
BELL SYSTEM TECHNICAL JOURNAL, 1970, 49 (08) :1973-+
[7]   PREDICTIVE CODING OF SPEECH SIGNALS AND SUBJECTIVE ERROR CRITERIA [J].
ATAL, BS ;
SCHROEDER, MR .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (03) :247-254
[8]   Frequency domain linear prediction for temporal features [J].
Athineos, M ;
Ellis, DPW .
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, :261-266
[9]   Spectro-temporal analysis of speech signals using zero-time windowing and group delay function [J].
Bayya, Yegnanarayana ;
Gowda, Dhananjaya N. .
SPEECH COMMUNICATION, 2013, 55 (06) :782-795
[10]   A tutorial on text-independent speaker verification [J].
Bimbot, F ;
Bonastre, JF ;
Fredouille, C ;
Gravier, G ;
Magrin-Chagnolleau, I ;
Meignier, S ;
Merlin, T ;
Ortega-García, J ;
Petrovska-Delacrétaz, D ;
Reynolds, DA .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) :430-451