GENDER RECOGNITION FROM SPEECH .2. FINE ANALYSIS

被引:164
作者
CHILDERS, DG [1 ]
WU, K [1 ]
机构
[1] ENTROPIC SPEECH INC,CUPERTINO,CA 95014
关键词
D O I
10.1121/1.401664
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The purpose of this research was to investigate the potential effectiveness of digital speech processing and pattern recognition techniques for the automatic recognition of gender from speech. In part I Coarse Analysis [K. Wu and D. G. Childers, J. Acoust. Soc. Am. 90, 1828-1840 (1991)] various feature vectors and distance measures were examined to determine their appropriateness for recognizing a speaker's gender from vowels, unvoiced fricatives, and voiced fricatives. One recognition scheme based on feature vectors extracted from vowels achieved 100% correct recognition of the speaker's gender using a database of 52 speakers (27 male and 25 female). In this paper a detailed, fine analysis of the characteristics of vowels is performed, including formant frequencies, bandwidths, and amplitudes, as well as speaker fundamental frequency of voicing. The fine analysis used a pitch synchronous closed-phase analysis technique. Detailed formant features, including frequencies, bandwidths, and amplitudes, were extracted by a closed-phase weighted recursive least-squares method that employed a variable forgetting factor, i.e., WRLS-VFT. The electroglottograph signal was used to locate the closed-phase portion of the speech signal. A two-way statistical analysis of variance (ANOVA) was performed to test the differences between gender features. The relative importance of grouped vowel features was evaluated by a pattern recognition approach. Numerous interesting results were obtained, including the fact that the second formant frequency was a slightly better recognizer of gender than fundamental frequency, giving 98.1% versus 96.2% correct recognition, respectively. The statistical tests indicated that the spectra for female speakers had a steeper slope (or tilt) than that for males. The results suggest that redundant gender information was imbedded in the fundamental frequency and vocal tract resonance characteristics. The feature vectors for female voices were observed to have higher within-group variations than those for male voices. The data in this study were also used to replicate portions of the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] study of vowels for male and female speakers.
引用
收藏
页码:1841 / 1856
页数:16
相关论文
共 46 条
  • [1] [Anonymous], 1971, STAT PRINCIPLES EXPT
  • [2] ATAL BS, 1974, SPEECH RECOGNITION, P221
  • [3] BLADON A, 1983, COMPUTER SPEECH PROC, P29
  • [4] BROWN WS, 1977, FOLIA PHONIATR, V29, P248
  • [5] CARLSON TE, 1981, THESIS U FLORIDA GAI
  • [6] ELECTROGLOTTOGRAPHY FOR LARYNGEAL FUNCTION ASSESSMENT AND SPEECH ANALYSIS
    CHILDERS, DG
    LARAR, JN
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1984, 31 (12) : 807 - 817
  • [7] VOICE CONVERSION
    CHILDERS, DG
    WU, K
    HICKS, DM
    YEGNANARAYANA, B
    [J]. SPEECH COMMUNICATION, 1989, 8 (02) : 147 - 158
  • [8] QUALITY OF SPEECH PRODUCED BY ANALYSIS-SYNTHESIS
    CHILDERS, DG
    WU, K
    [J]. SPEECH COMMUNICATION, 1990, 9 (02) : 97 - 117
  • [9] CHILDERS DG, 1988, ICASSP 88 C P, V1, P603
  • [10] CHILDERS DG, 1987, IEEE T ACOUST SPEECH, V1, P293