Audio-Based Semantic Concept Classification for Consumer Video

被引:57
作者
Lee, Keansub [1 ]
Ellis, Daniel P. W. [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期
基金
美国国家科学基金会;
关键词
Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;
D O I
10.1109/TASL.2009.2034776
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.
引用
收藏
页码:1406 / 1416
页数:11
相关论文
共 50 条
  • [1] Audio-based context recognition
    Eronen, AJ
    Peltonen, VT
    Tuomi, JT
    Klapuri, AP
    Fagerlund, S
    Sorsa, T
    Lorho, G
    Huopaniemi, J
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 321 - 329
  • [2] Audio-Visual Atoms for Generic Video Concept Classification
    Jiang, Wei
    Cotton, Courtenay
    Chang, Shih-Fu
    Ellis, Dan
    Loui, Alexander C.
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2010, 6 (03)
  • [3] Semantic concept detection for video based on extreme learning machine
    Lu, Bo
    Wang, Guoren
    Yuan, Ye
    Han, Dong
    NEUROCOMPUTING, 2013, 102 : 176 - 183
  • [4] Audio-based description and structuring of videos
    Harb H.
    Chen L.
    International Journal on Digital Libraries, 2006, 6 (1) : 70 - 81
  • [5] Audio-based queries for video retrieval over Java']Java enabled mobile devices
    Ahmad, I
    Cheikh, FA
    Kiranyaz, S
    Gabbouj, M
    MULTIMEDIA ON MOBILE DEVICES II, 2006, 6074
  • [6] Audio-Based Hate Speech Classification from Online Short-Form Videos
    Ibanez, Michael
    Sapinit, Ranz
    Reyes, Lloyd Antonie
    Hussien, Mohammed
    Imperial, Joseph Marvin
    Rodriguez, Ramon
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 72 - 77
  • [7] "Are You Playing a Shooter Again?!" Deep Representation Learning for Audio-Based Video Game Genre Recognition
    Amiriparian, Shahin
    Cummins, Nicholas
    Gerczuk, Maurice
    Pugachevskiy, Sergey
    Ottl, Sandra
    Schuller, Bjorn
    IEEE TRANSACTIONS ON GAMES, 2020, 12 (02) : 145 - 154
  • [8] EXPLORING AUDIO SEMANTIC CONCEPTS FOR EVENT-BASED VIDEO RETRIEVAL
    Wang, Yipei
    Rawat, Shourabh
    Metze, Florian
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] Audio-Video based Segmentation and Classification Using SVM
    Subashini, K.
    Palanivel, S.
    Ramaligam, V.
    2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION & NETWORKING TECHNOLOGIES (ICCCNT), 2012,
  • [10] Tensor semantic model for an audio classification system
    Xing Ling
    Ma Qiang
    Zhu Min
    SCIENCE CHINA-INFORMATION SCIENCES, 2013, 56 (06) : 1 - 9