Acoustic-Based Train Arrival Detection Using Convolutional Neural Networks With Attention

被引:10
作者
Van-Thuan Tran [1 ]
Tsai, Wei-Ho [1 ]
机构
[1] Natl Taipei Univ Technol, Dept Elect Engn, Taipei 10608, Taiwan
关键词
Rail transportation; Feature extraction; Convolutional neural networks; Spectrogram; Sensors; Roads; Safety; Audio classification; attention mechanism; convolutional neural networks; feature aggregation; railway audible warning signals; railway safety; train arrival detection; VEHICLE DETECTION; CLASSIFICATION; SPEECH; SOUNDS;
D O I
10.1109/ACCESS.2022.3185224
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the places of railroad crossings, audible warning signals such as train whistles and railway alarms are utilized to warn the road users of paying attention and giving priority to the approaching train(s). However, road users may sometimes be unaware of warning signals due to various reasons, resulting in inappropriate cooperation or even traffic collision between railway vehicles and non-railway vehicles. This work studies deep learning-based approaches to develop systems for acoustic-based train arrival detection (A-TAD). Firstly, we develop a novel audio dataset of train horns, railway alarms, railway noise, and other urban noises to conduct A-TAD experiments. We then examine the efficiency of handcrafted acoustic features (i.e. MFCC and Mel-spectrogram) in building A-TAD's audio classifier, the MSNet, which is based on two-dimensional convolutional neural networks (2D-CNN). Next, we propose to apply the attention mechanism and utilize MFCC and spectrogram simultaneously to enhance the classification accuracy, in which the combined use of acoustic features is considered at the input level (with InCom-TADNet), high-level feature level (with FCCom-TADNet), and decision level (with DLCom-TADNet). Our experiments have shown the efficiency of MSNet and attention mechanism as the MSNet trained with the single feature is more performant than the baseline models and applying attention modules results in better accuracies. Also, the combined use of MFCC and spectrogram significantly improve the system's accuracy and robustness. A-TAD systems can be utilized to extend the safety function of the railway crossing systems, private cars, and self-driving cars, and particularly be useful for hearing-impaired road users.
引用
收藏
页码:72120 / 72131
页数:12
相关论文
共 37 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]   End-to-end environmental sound classification using a 1D convolutional neural network [J].
Abdoli, Sajjad ;
Cardinal, Patrick ;
Koerich, Alessandro Lameiras .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 136 :252-263
[3]  
Angrisani L., 2010, 2010 IEEE International Instrumentation & Measurement Technology Conference - I2MTC 2010, P898, DOI 10.1109/IMTC.2010.5488089
[4]  
Aytar Y, 2016, ADV NEUR IN, V29
[5]   Classifying environmental sounds using image recognition networks [J].
Boddapati, Venkatesh ;
Petef, Andrej ;
Rasmusson, Jim ;
Lundberg, Lars .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 :2048-2056
[6]  
Chetty K, 2016, IEEE RAD CONF, P34
[7]  
Dai W, 2017, INT CONF ACOUST SPEE, P421, DOI 10.1109/ICASSP.2017.7952190
[8]   Classification of audio signals using AANN and GMM [J].
Dhanalakshmi, P. ;
Palanivel, S. ;
Ramalingam, V. .
APPLIED SOFT COMPUTING, 2011, 11 (01) :716-723
[9]   Attention based CLDNNs for short-duration acoustic scene classification [J].
Guo, Jinxi ;
Xu, Ning ;
Li, Li-Jia ;
Alwan, Abeer .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :469-473
[10]  
Henze Dominic, 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), P352, DOI 10.1109/ICMLA.2019.00066