Audio-Based Semantic Concept Classification for Consumer Video

被引:57
作者
Lee, Keansub [1 ]
Ellis, Daniel P. W. [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期
基金
美国国家科学基金会;
关键词
Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;
D O I
10.1109/TASL.2009.2034776
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.
引用
收藏
页码:1406 / 1416
页数:11
相关论文
共 50 条
  • [21] The Perspectives of Professional Caregivers on Implementing Audio-Based Technology in Residential Dementia Care
    Houben, Maarten
    Brankaert, Rens
    Kenning, Gail
    Eggen, Berry
    Bongers, Inge
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (17) : 1 - 19
  • [22] REVISITING THE PROBLEM OF AUDIO-BASED HIT SONG PREDICTION USING CONVOLUTIONAL NEURAL NETWORKS
    Yang, Li-Chia
    Chou, Szu-Yu
    Liu, Jen-Yu
    Yang, Yi-Hsuan
    Chen, Yi-An
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 621 - 625
  • [23] Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism
    Wang, Weiqing
    Pan, Jin
    Yi, Hua
    Song, Zhanmei
    Li, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1119 - 1133
  • [24] User Perceptions of Sound Quality: Implications for the Design and Use of Audio-Based Mobile Applications
    Uther, Maria
    Banks, Adrian P.
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2019, 35 (15) : 1388 - 1395
  • [25] Segmentation, classification and watermarking for image/video semantic authentication
    Lin, CY
    Tseng, BL
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2002, : 359 - 362
  • [26] Kernel-based audio classification
    Li, Xiao-Li
    Du, Zhen-Long
    Zhang, Ya-Fen
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 3313 - +
  • [27] AUDIO CLASSIFICATION BASED ON ADAPTIVE PARTITIONING
    Zhang, Jessie Xin
    Brooks, Stephen
    Whalley, Jacqueline L.
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 490 - +
  • [28] CNN-based Audio Event Recognition for Automated Violence Classification and Rating for Prime Video Content
    Sharma, Mayank
    Gupta, Tarun
    Qiu, Kenny
    Hao, Xiang
    Hamid, Raffay
    INTERSPEECH 2022, 2022, : 2758 - 2762
  • [29] Perceptually based techniques for semantic image classification and retrieval
    Depalov, Dejan
    Pappas, Thrasyvoulos
    Li, Dongge
    Gandhi, Bhavan
    HUMAN VISION AND ELECTRONIC IMAGING XI, 2006, 6057
  • [30] Development and validation of audio-based guided imagery and progressive muscle relaxation tools for functional bloating
    Tee, Vincent
    Kuan, Garry
    Kueh, Yee Cheng
    Abdullah, Nurzulaikha
    Sabran, Kamal
    Tagiling, Nashrulhaq
    Sahran, Nur-Fazimah
    Alang, Tengku Ahmad Iskandar Tengku
    Lee, Yeong Yeh
    PLOS ONE, 2022, 17 (09):