An Audio Data Representation for Traffic Acoustic Scene Recognition

被引：8

作者：

Jiang, Dazhi ^{[1
,2
]}

Huang, Dongmin ^{[1
]}

Song, Youyi ^{[3
]}

Wu, Kaichao ^{[1
]}

Lu, Huakang ^{[1
]}

Liu, Quanquan ^{[1
]}

Zhou, Teng ^{[1
,2
,3
]}

机构：

[1] Shantou Univ, Coll Engn, Dept Comp Sci, Shantou 515063, Peoples R China

[2] Shantou Univ, Key Lab Intelligent Mfg Technol, Minist Educ, Shantou 515063, Peoples R China

[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Sch Nursing, Hong Kong, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Acoustics; Feature extraction; Spectrogram; Transforms; Histograms; Time-frequency analysis; Visualization; acoustic scene recognition; transportation; acoustic material; HEALTH;

D O I：

10.1109/ACCESS.2020.3027474

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Acoustic scene recognition (ASR), recognizing acoustic environments given an audio recording of the scene, has a wide range of applications, e.g. robotic navigation and audio forensic. However, ASR remains challenging mainly due to the difficulty of representing audio data. In this article, we focus on traffic acoustic data. Traffic acoustic sense recognition provides complementary information to visual information of the scene; for example, it can be used to verify the visual perception result. The acoustic analysis and recognition, in consideration of its simple and convenient, can effectively enhance the perception ability which only applies visual information. We propose an audio data representation method to improve the traffic acoustic scene recognition accuracy. The proposed method employs the constant Q transform (CQT) and histogram of gradient (HOG) to transfer the one-dimensional audio signals into a time-frequency representation. We also propose two data representation mechanisms, called global and local feature selections, in order to select features that are able to describe the shape of time-frequency structures. We finally exploit the least absolute shrinkage and selection operator (LASSO) technique to further improve the recognition accuracy, by further selecting the most representative information for the recognition. We implemented extensive experiments, and the results show that the proposed method is effective, significantly outperforming the state-of-the-art methods.

引用

页码：177863 / 177873

页数：11

共 38 条

[1] Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification [J].

Abidin, Shamsiah ;

Togneri, Roberto ;

Sohel, Ferdous .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) :2112-2121

[2] How Pleasant Sounds Promote and Annoying Sounds Impede Health: A Cognitive Approach [J].

Andringa, Tjeerd C. ;

Lanser, J. Jolie L. .

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2013, 10 (04) :1439-1461

[3]

[Anonymous], 2018, Tech. Rep.

[4]

[Anonymous], 2013, Matrix information geometry

[5] Power of the spacing test for least-angle regression [J].

Azais, Jean-Marc ;

De Castro, Yohann ;

Mourareau, Stephane .

BERNOULLI, 2018, 24 (01) :465-492

[6] Acoustic Scene Classification [J].

Barchiesi, Daniele ;

Giannoulis, Dimitrios ;

Stowell, Dan ;

Plumbley, Mark D. .

IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (03) :16-34

[7]

Bisot V, 2015, EUR SIGNAL PR CONF, P719, DOI 10.1109/EUSIPCO.2015.7362477

[8] CALCULATION OF A CONSTANT-Q SPECTRAL TRANSFORM [J].

BROWN, JC .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 89 (01) :425-434

[9] AN EFFICIENT ALGORITHM FOR THE CALCULATION OF A CONSTANT-Q TRANSFORM [J].

BROWN, JC ;

PUCKETTE, MS .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1992, 92 (05) :2698-2701

[10] A noise-immune LSTM network for short-term traffic flow forecasting [J].

Cai, Lingru ;

Lei, Mingqin ;

Zhang, Shuangyi ;

Yu, Yidan ;

Zhou, Teng ;

Qin, Jing .

CHAOS, 2020, 30 (02)

← 1 2 3 4 →