Audio-Based Semantic Concept Classification for Consumer Video

被引：57

作者：

Lee, Keansub ^{[1
]}

Ellis, Daniel P. W. ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期

基金：

美国国家科学基金会;

关键词：

Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;

D O I：

10.1109/TASL.2009.2034776

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.

引用

页码：1406 / 1416

页数：11

共 50 条

[31] A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities
Chen, Bo-Wei
Wang, Jia-Ching
Wang, Jhing-Fa
IEEE TRANSACTIONS ON MULTIMEDIA, 2009, 11 (02) : 295 - 312
[32] Multi-level feature representations for video semantic concept detection
Li, Haojie
Liu, Lijuan
Sun, Fuming
Bao, Yu
Liu, Chenxin
NEUROCOMPUTING, 2016, 172 : 64 - 70
[33] Audio classification method based on machine learning
Rong, Feng
2016 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY (ICITBS), 2017, : 81 - 84
[34] Audio Songs Classification Based on Music Patterns
Sharma, Rahul
Murthy, Y. V. Srinivasa
Koolagudi, Shashidhar G.
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 157 - 166
[35] Hierarchical VS Non-hierarchical Audio Indexation and Classification for Video Genres
Dammak, Nouha
BenAyed, Yassine
TENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2017), 2018, 10696
[36] An audio-based anger detection algorithm using a hybrid artificial neural network and fuzzy logic model
Surana, Arihant
Rathod, Manish
Gite, Shilpa
Patil, Shruti
Kotecha, Ketan
Selvachandran, Ganeshsree
Quek, Shio Gai
Abraham, Ajith
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 38909 - 38929
[37] An audio-based anger detection algorithm using a hybrid artificial neural network and fuzzy logic model
Arihant Surana
Manish Rathod
Shilpa Gite
Shruti Patil
Ketan Kotecha
Ganeshsree Selvachandran
Shio Gai Quek
Ajith Abraham
Multimedia Tools and Applications, 2024, 83 : 38909 - 38929
[38] An Audio Classification Approach Based on Machine Learning
Dan, Wu
2019 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY (ICITBS), 2019, : 626 - 629
[39] AUDIO CLASSIFICATION BASED ON WEAKLY LABELED DATA
Cheng, Chieh-Feng
Anderson, David, V
Davenport, Mark A.
Rashidi, Abbas
2018 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2018, : 568 - 572
[40] SPORTS AUDIO CLASSIFICATION BASED ON MFCC AND GMM
Liu Jiqing
Dong Yuan
Huang Jun
Zhao Xianyu
Wang Haila
PROCEEDINGS OF 2009 2ND IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK & MULTIMEDIA TECHNOLOGY, 2009, : 482 - +

← 1 2 3 4 5 →