Voice-based gender identification in multimedia applications

被引:50
作者
Harb, H [1 ]
Chen, LM [1 ]
机构
[1] Ecole Cent Lyon, Dept Math Informat, CNRS, LIRIS,FRE 2672, F-69134 Ecully, France
关键词
content-based audio indexing; piecewise Gaussian modeling; mixture of Neural Networks;
D O I
10.1007/s10844-005-0322-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the context of content-based multimedia indexing gender identification based on speech signal is an important task. In this paper a set of acoustic and pitch features along with different classifiers are compared for the problem of gender identification. We show that the fusion of features and classifiers performs better than any individual classifier. Based on such conclusions we built a system for gender identification in multimedia applications. The system uses a set of Neural Networks with acoustic and Pitch related features. 90% of classification accuracy is obtained for 1 second segments and with independence to the language and the channel of the speech. Practical considerations, such as the continuity of speech and the use of mixture of experts instead of one single expert are shown to improve the classification accuracy to 93%. When used on a subset of the Switchboard database, the classification accuracy attains 98.5% for 5 seconds segments.
引用
收藏
页码:179 / 198
页数:20
相关论文
共 36 条
[1]  
Acero A, 1996, INT CONF ACOUST SPEE, P342, DOI 10.1109/ICASSP.1996.541102
[2]  
Cover TM, 2006, Elements of Information Theory
[3]  
*DARPA TIMIT, AC PHON CONT SPEECH
[4]   DISTBIC: A speaker-based segmentation for audio data indexing [J].
Delacourt, P ;
Wellekens, CJ .
SPEECH COMMUNICATION, 2000, 32 (1-2) :111-126
[5]  
GODFREY J, 1992, P ICASSP, P517
[6]   Robust speech music discrimination using spectrum's first order statistics and neural networks [J].
Harb, H ;
Chen, LM .
SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS, 2003, :125-128
[7]  
Harb H, 2003, 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, P733
[8]  
HARB H, 2004, P IEEEINT C MULT EXP
[9]  
Haykin S., 1994, Neural networks: a comprehensive foundation
[10]  
HEMPHILL CT, 1990, DARPA SPEECH NATURAL