Design and Implementation of Fast Spoken Foul Language Recognition with Different End-to-End Deep Neural Network Architectures

被引:9
作者
Ba Wazir, Abdulaziz Saleh [1 ]
Karim, Hezerul Abdul [1 ]
Abdullah, Mohd Haris Lye [1 ]
AlDahoul, Nouar [1 ]
Mansor, Sarina [1 ]
Fauzi, Mohammad Faizal Ahmad [1 ]
See, John [2 ]
Naim, Ahmad Syazwan [3 ]
机构
[1] Multimedia Univ, Fac Engn, Cyberjaya 63100, Malaysia
[2] Multimedia Univ, Fac Comp & Informat, Cyberjaya 63100, Malaysia
[3] Telekom Malaysia Berhad, Unifi Content, IPTV Dev, Cyberjaya 63100, Malaysia
关键词
foul language; speech recognition; censorship; deep learning; convolutional neural networks; recurrent neural networks; long short-term memory; CLASSIFICATION; SPEECH;
D O I
10.3390/s21030710
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Given the excessive foul language identified in audio and video files and the detrimental consequences to an individual's character and behaviour, content censorship is crucial to filter profanities from young viewers with higher exposure to uncensored content. Although manual detection and censorship were implemented, the methods proved tedious. Inevitably, misidentifications involving foul language owing to human weariness and the low performance in human visual systems concerning long screening time occurred. As such, this paper proposed an intelligent system for foul language censorship through a mechanized and strong detection method using advanced deep Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) through Long Short-Term Memory (LSTM) cells. Data on foul language were collected, annotated, augmented, and analysed for the development and evaluation of both CNN and RNN configurations. Hence, the results indicated the feasibility of the suggested systems by reporting a high volume of curse word identifications with only 2.53% to 5.92% of False Negative Rate (FNR). The proposed system outperformed state-of-the-art pre-trained neural networks on the novel foul language dataset and proved to reduce the computational cost with minimal trainable parameters.
引用
收藏
页码:1 / 18
页数:17
相关论文
共 44 条
[1]  
Amodei D, 2016, PR MACH LEARN RES, V48
[2]   Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features [J].
Anvarjon, Tursunov ;
Mustaqeem ;
Kwon, Soonil .
SENSORS, 2020, 20 (18) :1-16
[3]  
Badshah AM, 2017, 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), P125
[4]  
Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[5]  
Bengio Y., 2007, P ADV NEUR INF PROC, P153
[6]  
Bishop C.M., 1995, Neural Networks for Pattern Recognition (Advanced Texts inEconometrics(Paperback)): Bishop, DOI DOI 10.1201/9781420050646.PTB6
[7]  
Bozkurt Elif, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P3708, DOI 10.1109/ICPR.2010.903
[8]  
Chiu CC, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4774, DOI 10.1109/ICASSP.2018.8462105
[9]  
Dahake PP, 2016, 2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), P1080, DOI 10.1109/ICACDOT.2016.7877753
[10]  
Day S., 2018, In the Baker Orange