Versatile Recognition Using Haar-Like Feature and Cascaded Classifier

被引：10

作者：

Nishimura, Jun ^{[1
]}

Kuroda, Tadahiro ^{[1
]}

机构：

[1] Keio Univ, Dept Elect Engn, Yokohama, Kanagawa 2238522, Japan

来源：

IEEE SENSORS JOURNAL | 2010年 / 10卷 / 05期

关键词：

Cascaded classifier; Haar-like feature; sensor networks; versatile recognition;

D O I：

10.1109/JSEN.2009.2038231

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper describes a world first versatile recognition algorithm suitable for processing images, sound and acceleration signals simultaneously with extremely low calculation cost while maintaining high recognition rates. There are three main contributions. The first is the introduction of a versatile recognition using Haar-like feature for images, sound and acceleration signals. The novel 1-D Haar-like features are proposed as very rough band pass filters for signals in temporal dimension. The second is a content-aware classifier which is based on the cascaded classifier and positive estimation. The cascaded classifier with positive estimation is introduced to allow a sensor node to computes finely only when the inputs are target-like and difficult to recognize, and stop computing when inputs obtain enough confidence. The third is a method of intermediate signal representation called Integral Signals and Delta-Integral Signals for calculation cost reduction in Haar-like feature based recognition. In this paper, the proposed recognition is experimented for a variety of sound recognition applications such as speech/non-speech, gender, speaker, emotion, and environmental sounds recognition. The preliminary results on human activity recognition and face detection are also given to show the versatility. The proposed algorithm yields sound recognition performance comparable to the conventional state-of-art method called MFCC while 96%-99% efficient in terms of the total amount of add and multiply operations. The proposed algorithm is evaluated with a versatile recognition processor implemented in 90-nm CMOS technology [15]. For speech/nonspeech classification on 8-kHz 8-bit sound, the power consumption per frame rate is 0.28 mu W/fps. When the sensor is operated with a duty ratio of 1%, the power consumption is reduced to 28.5 mu W.

引用

页码：942 / 951

页数：10

共 29 条

[1]

[Anonymous], NOISEX 92

[2]

BAO L, 2004, P IEEE PERS COMM

[3] Age and gender recognition for telephone applications based on GMM supervectors and support vector machines [J].

Bocklet, Tobias ;

Maier, Andreas ;

Bauer, Josef G. ;

Burkhardt, Felix ;

Noeth, Elmar .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :1605-+

[4] Enhancing human face detection by resampling examples through manifolds [J].

Chen, Jie ;

Wang, Ruiping ;

Yan, Shengye ;

Shan, Shiguang ;

Chen, Xilin ;

Gao, Wen .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (06) :1017-1028

[5]

Cui XY, 2007, 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, P1263

[6]

HANAI Y, 2009, P IEEE INT SOL STAT, P148

[7]

HANAI Y, 2009, P IEEE DSP SPE WORKS, P675

[8] Analog floating-gate, on-chip auditory sensing system interfaces [J].

Hasler, P ;

Smith, PD ;

Graham, D ;

Ellis, R ;

Anderson, DV .

IEEE SENSORS JOURNAL, 2005, 5 (05) :1027-1034

[9]

Huang C, 2004, IEEE IMAGE PROC, P593

[10]

Huynh T., 2005, P JOINT C SMART OBJ

← 1 2 3 →