Audio-Based Semantic Concept Classification for Consumer Video

被引:57
作者
Lee, Keansub [1 ]
Ellis, Daniel P. W. [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期
基金
美国国家科学基金会;
关键词
Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;
D O I
10.1109/TASL.2009.2034776
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.
引用
收藏
页码:1406 / 1416
页数:11
相关论文
共 50 条
  • [41] HTS-AT: A HIERARCHICAL TOKEN-SEMANTIC AUDIO TRANSFORMER FOR SOUND CLASSIFICATION AND DETECTION
    Chen, Ke
    Du, Xingjian
    Zhu, Bilei
    Ma, Zejun
    Berg-Kirkpatrick, Taylor
    Dubnov, Shlomo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 646 - 650
  • [42] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [43] Incorporating concept ontology for hierarchical video classification, annotation, and visualization
    Fan, Jianping
    Luo, Hangzai
    Gao, Yuli
    Jain, Ramesh
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (05) : 939 - 957
  • [44] Design and Implementation of an Audio Classification System Based on SVM
    Wang Shuiping
    Tang Zhenming
    Li Shiqiang
    CEIS 2011, 2011, 15
  • [45] Research on Music Emotion Classification Based on Lyrics and Audio
    Shi, Wanglei
    Feng, Shuang
    PROCEEDINGS OF 2018 IEEE 3RD ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC 2018), 2018, : 1154 - 1159
  • [46] Spectrogram based multi-task audio classification
    Zeng, Yuni
    Mao, Hua
    Peng, Dezhong
    Yi, Zhang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3705 - 3722
  • [47] Audio feature extraction and classification based on wavelet transform
    Xing, Feng
    Zheng, Jiming
    Wu, Yu
    Li, Jing
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: 50 YEARS' ACHIEVEMENTS, FUTURE DIRECTIONS AND SOCIAL IMPACTS, 2006, : 183 - 186
  • [48] Spectrogram based multi-task audio classification
    Yuni Zeng
    Hua Mao
    Dezhong Peng
    Zhang Yi
    Multimedia Tools and Applications, 2019, 78 : 3705 - 3722
  • [49] CULTURAL STYLE BASED MUSIC CLASSIFICATION OF AUDIO SIGNALS
    Liu, Yuxiang
    Xiang, Qiaoliang
    Wang, Ye
    Cai, Lianhong
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 57 - +
  • [50] ZERO-SHOT AUDIO CLASSIFICATION WITH FACTORED LINEAR AND NONLINEAR ACOUSTIC-SEMANTIC PROJECTIONS
    Xie, Huang
    Rasanen, Okko
    Virtanen, Tuomas
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 326 - 330