Classifying sensitive content in online advertisements with deep learning

被引：3

作者：

Austin, Daniel ^{[1
]}

Sanzgiri, Ashutosh ^{[2
]}

Sankaran, Kannan ^{[3
]}

Woodard, Ryan ^{[2
]}

Lissack, Amit ^{[4
]}

Seljan, Samuel ^{[2
]}

机构：

[1] Nike, Beaverton, OR 97005 USA

[2] Xandr, Portland, OR 97205 USA

[3] Xandr, New York, NY 10010 USA

[4] Opentrons, New York, NY 11201 USA

来源：

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS | 2020年 / 10卷 / 03期

关键词：

Online advertising; Advertising technology; Computer vision; Image classification; Deep learning; Convolutional neural networks;

D O I：

10.1007/s41060-020-00212-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In online advertising, an important quality control step is to audit advertising images ("creatives") before they appear on publishers' Web pages. This ensures that advertisements only appear on Web pages where the ad is appropriate. If a creative with sensitive content such as gambling and pornography is displayed on the wrong Web page, it can ruin the user's experience, the publisher's reputation, and may have legal implications. To protect against this, humans must audit every creative before it is displayed through our ad exchange; this process is costly and time-consuming. To detect sensitive content, we use a pre-trained deep convolutional neural network (Xception Chollet in: The IEEE conference on computer vision and pattern recognition (CVPR), 2017) to process the creative image, and merge its features with the historical distribution of categories associated with the creative's landing page (the Web page that loads when the ad is clicked, which may also contain sensitive content). This representation is then passed through a series of fully connected layers to predict the sensitive category. The trained model achieves slightly better than human performance (model accuracy 99.92%; human accuracy 99.88%) on a large fraction of creatives (61%), while making 3.5 times fewer mistakes in very sensitive categories. The main challenges we faced were to detect, with high accuracy, creatives from 10 "very sensitive" categories as determined by our Creative Audit team, along with a highly imbalanced data set with 95% of creatives having no sensitive categories. This paper extends the work we described in Austin et al. (in: Proceedings of the 2018 IEEE international conference on data science and advanced analytics (DSAA), DSAA'18, 2018). It demonstrates the successful usage of deep learning in production as a method for detecting sensitive creatives, while respecting the constraints set by business.

引用

页码：265 / 276

页数：12

共 36 条

[1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2] adperium, 2017, ADPERIUM APPNEXUS ST
[3] Andrews M., 2017, FILE NAME HASHING CR
[4] [Anonymous], 2018, MACHINE LEARNING YEA
[5] [Anonymous], 2017, ARXIV170606978STATML
[6] Caruana R, 2001, ADV NEUR IN, V13, P402
[7] Chen Junxuan, 2016, P 24 ACM INT C MULT, P811, DOI 10.1145/2964284
[8] Chollet F., 2017, IEEE C COMP VIS PATT
[9] Chollet F., 2015, KERAS
[10] Clarifai, 2016, CLARIFAI NSFW

← 1 2 3 4 →