A tutorial on text-independent speaker verification

被引:445
作者
Bimbot, F [1 ]
Bonastre, JF
Fredouille, C
Gravier, G
Magrin-Chagnolleau, I
Meignier, S
Merlin, T
Ortega-García, J
Petrovska-Delacrétaz, D
Reynolds, DA
机构
[1] IRISA, INRIA, F-35042 Rennes, France
[2] CNRS, F-35042 Rennes, France
[3] Univ Avignon, LIA, F-84911 Avignon 9, France
[4] CNRS, Lab Dynam Langage, F-69369 Lyon 07, France
[5] Univ Politecn Madrid, ATVS, E-28040 Madrid, Spain
[6] Univ Fribourg, Dept Informat, DIVA Lab, CH-1700 Fribourg, Switzerland
[7] MIT, Lincoln Lab, Cambridge, MA 02420 USA
关键词
speaker verification; text-independent; cepstral analysis; Gaussian mixture modeling;
D O I
10.1155/S1110865704310024
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
引用
收藏
页码:430 / 451
页数:22
相关论文
共 87 条
  • [1] A Reynolds D., 1992, GAUSSIAN MIXTURE MOD
  • [2] Adami AG, 2002, INT CONF ACOUST SPEE, P3908
  • [3] AITKEN CGC, 2000, ENCY FORENSIC SCI, V2, P717
  • [4] [Anonymous], 1991, JUDGING SCI SCI KNOW
  • [5] [Anonymous], P ICASSP AC SPEECH S
  • [6] [Anonymous], 1988, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
  • [7] [Anonymous], P INT C SPOK LANG PR
  • [8] [Anonymous], DIGITAL SIGNAL PROCE
  • [9] Ben M, 2002, INT CONF ACOUST SPEE, P689
  • [10] BENNANI Y, 1994, P ESCA WORKSH AUT SP, P95