Transfer Detection of YOLO to Focus CNN's Attention on Nude Regions for Adult Content Detection

被引:19
作者
AlDahoul, Nouar [1 ]
Abdul Karim, Hezerul [1 ]
Lye Abdullah, Mohd Haris [1 ]
Ahmad Fauzi, Mohammad Faizal [1 ]
Ba Wazir, Abdulaziz Saleh [1 ]
Mansor, Sarina [1 ]
See, John [2 ]
机构
[1] Multimedia Univ, Fac Engn, Cyberjaya 63100, Malaysia
[2] Multimedia Univ, Fac Comp & Informat, Cyberjaya 63100, Malaysia
来源
SYMMETRY-BASEL | 2021年 / 13卷 / 01期
关键词
pornography detection; nudity detection; convolutional neural network; you only look once; feature extraction; visual attention; region of interest; RECOGNITION; MACHINE;
D O I
10.3390/sym13010026
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Video pornography and nudity detection aim to detect and classify people in videos into nude or normal for censorship purposes. Recent literature has demonstrated pornography detection utilising the convolutional neural network (CNN) to extract features directly from the whole frames and support vector machine (SVM) to classify the extracted features into two categories. However, existing methods were not able to detect the small-scale content of pornography and nudity in frames with diverse backgrounds. This limitation has led to a high false-negative rate (FNR) and misclassification of nude frames as normal ones. In order to address this matter, this paper explores the limitation of the existing convolutional-only approaches focusing the visual attention of CNN on the expected nude regions inside the frames to reduce the FNR. The You Only Look Once (YOLO) object detector was transferred to the pornography and nudity detection application to detect persons as regions of interest (ROIs), which were applied to CNN and SVM for nude/normal classification. Several experiments were conducted to compare the performance of various CNNs and classifiers using our proposed dataset. It was found that ResNet101 with random forest outperformed other models concerning the F1-score of 90.03% and accuracy of 87.75%. Furthermore, an ablation study was performed to demonstrate the impact of adding the YOLO before the CNN. YOLO-CNN was shown to outperform CNN-only in terms of accuracy, which was increased from 85.5% to 89.5%. Additionally, a new benchmark dataset with challenging content, including various human sizes and backgrounds, was proposed.
引用
收藏
页码:1 / 26
页数:26
相关论文
共 53 条
[1]  
AlDahoul N, 2019, IEEE I C SIGNAL IMAG, P128, DOI [10.1109/icsipa45851.2019.8977754, 10.1109/ICSIPA45851.2019.8977754]
[2]   Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models [J].
AlDahoul, Nouar ;
Sabri, Aznul Qalid Md ;
Mansoor, Ali Mohammed .
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2018, 2018
[3]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[4]  
[Anonymous], 2013, NPDI PORNOGRAPHY DAT
[5]  
[Anonymous], 2016, ADV NEURAL INFORM PR, DOI [DOI 10.1145/3065386, DOI 10.2165/00129785-200404040-00005]
[6]  
[Anonymous], 2019, INT J RECENT TECHNOL, V8, P136, DOI [10.35940/ijrte.C1024.1083S19, DOI 10.35940/IJRTE.C1024.1083S19]
[7]   Pooling in image representation: The visual codeword point of view [J].
Avila, Sandra ;
Thome, Nicolas ;
Cord, Matthieu ;
Valle, Eduardo ;
Araujo, Arnaldo de A. .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (05) :453-465
[8]  
Bei-bei Liu, 2008, 2008 Fourth International Conference on Semantics, Knowledge and Grid (SKG), P487, DOI 10.1109/SKG.2008.48
[9]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32