Adaptation of deep learning auditory event recognition and detection in audio surveillance systems

被引:0
作者
Alsubhi, Sara [1 ]
Alkabsani, Ahad [1 ]
Endargiri, Safiah [1 ]
Laabidi, Kaouther [1 ,2 ]
机构
[1] Univ Jeddah, Coll Comp Sci & Engn, Jeddah 23445, Saudi Arabia
[2] Univ Tunis El Manar, Tunis 1068, Tunisia
关键词
classification; automatic speech recognition; NLP; natural language processing; ANN; artificial neural network; deep learning; CNN; convolutional neural network; audio detection; speaker recognition; speech recognition; Mel spectrogram; CCTV; closed circuit television;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The work of this paper focuses on the idea of adapting computerised machines with the sophisticated abilities to relatively comprehend and act upon an auditory input of natural linguistic nature. In this paper, we emphasise the addition of acoustic-based audio inputs to the current closed circuit television (CCTV) systems for the goal of compensating any incomplete data and to reach the maximum utilisation of the current surveillance systems. In this model, we apply the isolated word technique on a dataset of 8000 audio inputs dedicated to different individuals through the application of two distinct neural networks. The algorithm provides event-based detection capabilities by allowing the detection of unauthorised accesses through the automatic recognition of each spoken input together with the identity of the speaker. The proposed algorithm obtained accuracy rates of 84.1% and 80.1% for both the recognition by the speaker's identity and the spoken input recognition. In addition, it showed its superiority over the support vector machines (SVM) based model.
引用
收藏
页码:241 / 247
页数:7
相关论文
共 20 条
  • [1] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [2] [Anonymous], 2005, AUTOMATIC SPEECH REC
  • [3] Beckert S, 2018, Columbia Stud Hist U, P1
  • [4] Bezoui M, 2019, IAES INT J ARTIF INT, V8, P7
  • [5] Bock S., 2012, Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR, Porto, Portugal, P49
  • [6] Chandrasekhar Vijay., 2011, ISMIR, V20, P801
  • [7] Cook A., 2002, ASSISTIVE TECHNOLOGI, V2nd
  • [8] Audio Example Recognition and Retrieval Based on Geometric Incremental Learning Support Vector Machine System
    Fan, Linyuan
    [J]. IEEE ACCESS, 2020, 8 : 78630 - 78638
  • [9] Gowrishankar B. S., 2016, 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), P140, DOI 10.1109/SCOPES.2016.7955698
  • [10] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97