A multimodal temporal panorama approach for moving vehicle detection, reconstruction and classification

被引：0

作者：

Wang, Tao ^{[1
]}

Zhu, Zhigang ^{[1
]}

机构：

[1] CUNY, Grad Ctr, Dept Comp Sci, New York, NY 10016 USA

来源：

GROUND/AIR MULTISENSOR INTEROPERABILITY, INTEGRATION, AND NETWORKING FOR PERSISTENT ISR III | 2012年 / 8389卷

关键词：

Laser Doppler vibrometry; multimodal; panoramic imaging; audio-visual integration; RECOGNITION; VIDEO;

D O I：

10.1117/12.918793

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Moving vehicle detection and classification using multimodal data is a challenging task in data collection, audio-visual alignment, data labeling and feature selection under uncontrolled environments with occlusions, motion blurs, varying image resolutions and perspective distortions. In this work, we propose an effective multimodal temporal panorama approach for the task using a novel long-range audio-visual sensing system. A new audio-visual vehicle (AVV) dataset for moving vehicle detection and classification is created, which features automatic vehicle detection and audio-visual alignment, accurate vehicle extraction and reconstruction, and efficient data labeling. In particular, vehicles' visual images are reconstructed once detected in order to remove most of the occlusions, motion blurs, and variations of perspective views. Multimodal audio-visual features are extracted, including global geometric features (aspect ratios, profiles), local structure features (HOGs), as well various audio features (MFCCs, etc). Using radial-based SVMs, the effectiveness of the integration of these multimodal features is thoroughly and systemically studied. The concept of MTP may not be only limited to visual, motion and audio modalities; it could also be applicable to other sensing modalities that can obtain data in the temporal domain.

引用

页数：12

共 25 条

[1] [Anonymous], IEEE COMP SOC WORKSH
[2] [Anonymous], 2005, PROC CVPR IEEE
[3] [Anonymous], MULTIMODAL TEMPORAL
[4] EPIPOLAR-PLANE IMAGE-ANALYSIS - AN APPROACH TO DETERMINING STRUCTURE FROM MOTION
BOLLES, RC
BAKER, HH
MARIMONT, DH
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 1987, 1 (01) : 7 - 55
[5] Chellappa R, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS, P793
[6] Cortes C., 1995, Machine Learning, V297, P273, DOI [DOI 10.1007/BF00994018, DOI 10.1023/A:1022627411411]
[7] Audio-visual event recognition in surveillance video sequences
Cristani, Marco
Bicego, Manuele
Murino, Vittorio
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 257 - 267
[8] Dedeoglu Y, 2008, MULTIMED SYST APPL, P143, DOI 10.1007/978-0-387-76316-3_6
[9] Mosaic generation for under vehicle inspection
Dickson, P
Li, J
Zhu, ZG
Hanson, AR
Riseman, EM
Sabrin, H
Schultz, H
Whitten, G
[J]. SIXTH IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION, PROCEEDINGS, 2002, : 251 - 256
[10] A decision-theoretic generalization of on-line learning and an application to boosting
Freund, Y
Schapire, RE
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139

← 1 2 3 →