Classification of general audio data for content-based retrieval

被引:136
作者
Li, DG
Sethi, IK [1 ]
Dimitrova, N
McGee, T
机构
[1] Oakland Univ, Dept Comp Sci & Engn, Intelligent Informat Engn Lab, Rochester, MI 48309 USA
[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
[3] Philips Res, Image Proc & Network Architecture Dept, Briarcliff Manor, NY 10510 USA
关键词
audio classification; audio segmentation; content-based retrieval; mel-frequency cepstral coefficients; pooling;
D O I
10.1016/S0167-8655(00)00119-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the problem of classification of continuous general audio data (GAD) for content-based retrieval, and describe a scheme that is able to classify audio segments into seven categories consisting of silence, single speaker speech, music, environmental noise, multiple speakers' speech, simultaneous speech and music, and speech and noise. We studied a total of 143 classification features for their discrimination capability. Our study shows that cepstral-based features such as the Mel-frequency cepstral coefficients (MFCC) and linear prediction coefficients (LPC) provide better classification accuracy compared to temporal and spectral features. To minimize the classification errors near the boundaries of audio segments of different type in general audio data, a segmentation-pooling scheme is also proposed in this work. This scheme yields classification results that are consistent with human perception. Our classification system provides over 90% accuracy at a processing speed dozens of times faster than the playing rate. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:533 / 544
页数:12
相关论文
共 21 条
  • [1] AGNELLO JG, 1963, THESIS OHIO STATE U
  • [2] [Anonymous], P 3 ACM INT C MULT S
  • [3] BRADY PT, 1965, AT&T TECH J, V44, P1
  • [4] GHIAS A, 1995, P 3 ACM INT C MULT, P231
  • [5] GOPALAKRISHNAN PS, 1996, P DARPA SPEECH REC W
  • [6] Feature analysis and neural network-based classification of speech under stress
    Hansen, JHL
    Womack, BD
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04): : 307 - 313
  • [7] Hart P.E., 1973, Pattern recognition and scene analysis
  • [8] KIMBER D, 1996, P INT C SYDN AUSTR J
  • [9] LI D, 1997, TOOLS AUDIO ANAL CLA
  • [10] Audio feature extraction and analysis for scene segmentation and classification
    Liu, Z
    Wang, Y
    Chen, TH
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 1998, 20 (1-2): : 61 - 79