Audio-Based Semantic Concept Classification for Consumer Video

被引：57

作者：

Lee, Keansub ^{[1
]}

Ellis, Daniel P. W. ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期

基金：

美国国家科学基金会;

关键词：

Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;

D O I：

10.1109/TASL.2009.2034776

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.

引用

页码：1406 / 1416

页数：11

共 50 条

[41] HTS-AT: A HIERARCHICAL TOKEN-SEMANTIC AUDIO TRANSFORMER FOR SOUND CLASSIFICATION AND DETECTION
Chen, Ke
Du, Xingjian
Zhu, Bilei
Ma, Zejun
Berg-Kirkpatrick, Taylor
Dubnov, Shlomo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 646 - 650
[42] Audio-visual event detection based on mining of semantic audio-visual labels
Goh, KS
Miyahara, K
Radhakrishan, R
Xiong, ZY
Divakaran, A
STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
[43] Incorporating concept ontology for hierarchical video classification, annotation, and visualization
Fan, Jianping
Luo, Hangzai
Gao, Yuli
Jain, Ramesh
IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (05) : 939 - 957
[44] Design and Implementation of an Audio Classification System Based on SVM
Wang Shuiping
Tang Zhenming
Li Shiqiang
CEIS 2011, 2011, 15
[45] Research on Music Emotion Classification Based on Lyrics and Audio
Shi, Wanglei
Feng, Shuang
PROCEEDINGS OF 2018 IEEE 3RD ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC 2018), 2018, : 1154 - 1159
[46] Spectrogram based multi-task audio classification
Zeng, Yuni
Mao, Hua
Peng, Dezhong
Yi, Zhang
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3705 - 3722
[47] Audio feature extraction and classification based on wavelet transform
Xing, Feng
Zheng, Jiming
Wu, Yu
Li, Jing
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: 50 YEARS' ACHIEVEMENTS, FUTURE DIRECTIONS AND SOCIAL IMPACTS, 2006, : 183 - 186
[48] Spectrogram based multi-task audio classification
Yuni Zeng
Hua Mao
Dezhong Peng
Zhang Yi
Multimedia Tools and Applications, 2019, 78 : 3705 - 3722
[49] CULTURAL STYLE BASED MUSIC CLASSIFICATION OF AUDIO SIGNALS
Liu, Yuxiang
Xiang, Qiaoliang
Wang, Ye
Cai, Lianhong
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 57 - +
[50] ZERO-SHOT AUDIO CLASSIFICATION WITH FACTORED LINEAR AND NONLINEAR ACOUSTIC-SEMANTIC PROJECTIONS
Xie, Huang
Rasanen, Okko
Virtanen, Tuomas
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 326 - 330

← 1 2 3 4 5 →