Audio-Based Semantic Concept Classification for Consumer Video

被引：57

作者：

Lee, Keansub ^{[1
]}

Ellis, Daniel P. W. ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期

基金：

美国国家科学基金会;

关键词：

Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;

D O I：

10.1109/TASL.2009.2034776

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.

引用

页码：1406 / 1416

页数：11

共 50 条

[21] The Perspectives of Professional Caregivers on Implementing Audio-Based Technology in Residential Dementia Care
Houben, Maarten
Brankaert, Rens
Kenning, Gail
Eggen, Berry
Bongers, Inge
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (17) : 1 - 19
[22] REVISITING THE PROBLEM OF AUDIO-BASED HIT SONG PREDICTION USING CONVOLUTIONAL NEURAL NETWORKS
Yang, Li-Chia
Chou, Szu-Yu
Liu, Jen-Yu
Yang, Yi-Hsuan
Chen, Yi-An
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 621 - 625
[23] Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism
Wang, Weiqing
Pan, Jin
Yi, Hua
Song, Zhanmei
Li, Ming
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1119 - 1133
[24] User Perceptions of Sound Quality: Implications for the Design and Use of Audio-Based Mobile Applications
Uther, Maria
Banks, Adrian P.
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2019, 35 (15) : 1388 - 1395
[25] Segmentation, classification and watermarking for image/video semantic authentication
Lin, CY
Tseng, BL
PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2002, : 359 - 362
[26] Kernel-based audio classification
Li, Xiao-Li
Du, Zhen-Long
Zhang, Ya-Fen
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 3313 - +
[27] AUDIO CLASSIFICATION BASED ON ADAPTIVE PARTITIONING
Zhang, Jessie Xin
Brooks, Stephen
Whalley, Jacqueline L.
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 490 - +
[28] CNN-based Audio Event Recognition for Automated Violence Classification and Rating for Prime Video Content
Sharma, Mayank
Gupta, Tarun
Qiu, Kenny
Hao, Xiang
Hamid, Raffay
INTERSPEECH 2022, 2022, : 2758 - 2762
[29] Perceptually based techniques for semantic image classification and retrieval
Depalov, Dejan
Pappas, Thrasyvoulos
Li, Dongge
Gandhi, Bhavan
HUMAN VISION AND ELECTRONIC IMAGING XI, 2006, 6057
[30] Development and validation of audio-based guided imagery and progressive muscle relaxation tools for functional bloating
Tee, Vincent
Kuan, Garry
Kueh, Yee Cheng
Abdullah, Nurzulaikha
Sabran, Kamal
Tagiling, Nashrulhaq
Sahran, Nur-Fazimah
Alang, Tengku Ahmad Iskandar Tengku
Lee, Yeong Yeh
PLOS ONE, 2022, 17 (09):

← 1 2 3 4 5 →