Audio-visual event detection based on mining of semantic audio-visual labels

被引：0

作者：

Goh, KS ^{[1
]}

Miyahara, K ^{[1
]}

Radhakrishan, R ^{[1
]}

Xiong, ZY ^{[1
]}

Divakaran, A ^{[1
]}

机构：

[1] Mitsubishi Elect Res Labs, Cambridge, MA USA

来源：

STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004 | 2004年 / 5307卷

关键词：

commercial detection; unsupervised clustering; audio classification; motion activity;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Removing commercials from television programs is a much sought-after feature for a personal video recorder. In this paper, we employ an unsupervised clustering scheme (CM-Detect) to detect commercials in television programs. Each program is first divided into W-s-minute chunks, and we extract audio and visual features from each of these chunks. Next, we apply k-means clustering to assign each chunk with a commercial/program, label. In contrast to other methods, we do not make any assumptions regarding the program content. Thus, our method is highly content-adaptive and computationally inexpensive. Through empirical studies on various content; including American news; Japanese news, and sports programs, we demonstrate that our method is able to filter out most of the commercials without falsely removing the regular program.

引用

页码：292 / 299

页数：8

共 50 条

[1] Semantic Audio-Visual Navigation
Chen, Changan
Al-Halah, Ziad
Grauman, Kristen
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15511 - 15520
[2] Semantic and Relation Modulation for Audio-Visual Event Localization
Wang, Hao
Zha, Zheng-Jun
Li, Liang
Chen, Xuejin
Luo, Jiebo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7711 - 7725
[3] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[4] Catching audio-visual mice:: The extrapolation of audio-visual speed
Hofbauer, MM
Wuerger, SM
Meyer, GF
Röhrbein, F
Schill, K
Zetzsche, C
PERCEPTION, 2003, 32 : 96 - 96
[5] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
Tamura, Satoshi
Ishikawa, Masato
Hashiba, Takashi
Takeuchi, Shin'ichi
Hayamizu, Satoru
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
[6] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[7] AUDIO-VISUAL EDUCATION
Brickman, William W.
SCHOOL AND SOCIETY, 1948, 67 (1739): : 320 - 326
[8] Audio-Visual Objects
Kubovy M.
Schutz M.
Review of Philosophy and Psychology, 2010, 1 (1) : 41 - 61
[9] Audio-Visual Segmentation
Zhou, Jinxing
Wang, Jianyuan
Zhang, Jiayi
Sun, Weixuan
Zhang, Jing
Birchfield, Stan
Guo, Dan
Kong, Lingpeng
Wang, Meng
Zhong, Yiran
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 386 - 403
[10] AUDIO-VISUAL CLINICS
GRABER, TM
HANNETT, HA
AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 1963, 49 (07) : 538 - &

← 1 2 3 4 5 →