Machine learning based heterogeneous web advertisements detection using a diverse feature set

被引：4

作者：

Nengroo, Ab Shaqoor ^{[1
]}

Kuppusamy, K. S. ^{[1
]}

机构：

[1] Pondicherry Univ, Sch Engn & Technol, Dept Comp Sci, Pondicherry 605014, India

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2018年 / 89卷

关键词：

Advertisements; Web accessibility; Content extraction random forest; Machine learning; EXTRACTION;

D O I：

10.1016/j.future.2018.06.028

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Advertisement identification and filtering in web pages gain significance due to various factors such as accessibility, security, privacy, and obtrusiveness. Current practices in this direction involve maintaining URL-based regular expressions called filter lists. Each URL obtained on a web page is matched against this filter list. While effectual, this procedure lacks scalability as it demands regular continuance of the filter list. To counter these limitations, we devise a machine learning based advertisement detection system using a diverse feature set which can distinguish advertisement blocks from non-advertisement blocks. The method can act as a base to provide various accessibility-related features like smooth browsing and text summarization for persons with visual impairments, cognitive impairments, and photosensitive epilepsy. The results from a classifier trained on the proposed feature set achieve 98.6% accuracy in identifying advertisements. (C) 2018 Elsevier B.V. All rights reserved.

引用

页码：68 / 77

页数：10

共 28 条

[1] Adelberg B., 1998, SIGMOD C 1998
[2] Ahuja N., 2016, DIGITAL ADVERTISING
[3] [Anonymous], 2002, Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
[4] Bar-Yossef Z., 2002, P 11 INT C WORLD WID, P580, DOI DOI 10.1145/511446.511522
[5] A survey on feature selection methods
Chandrashekar, Girish
Sahin, Ferat
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28
[6] Crescenzi V., 2001, Proceedings of the 27th International Conference on Very Large Data Bases, P109
[7] Developing a trust model for pervasive computing based on Apriori association rules learning and Bayesian classification
D'Angelo, Gianni
Rampone, Salvatore
Palmieri, Francesco
[J]. SOFT COMPUTING, 2017, 21 (21) : 6297 - 6315
[8] Feature extraction and soft computing methods for aerospace structure defect classification
D'Angelo, Gianni
Rampone, Salvatore
[J]. MEASUREMENT, 2016, 85 : 192 - 209
[9] An uncertainty-managing batch relevance-based approach to network anomaly detection
D'angelo, Gianni
Palmieri, Francesco
Ficco, Massimo
Rampone, Salvatore
[J]. APPLIED SOFT COMPUTING, 2015, 36 : 408 - 418
[10] Eveleth R., POPUP ADS ARE TERRIB

← 1 2 3 →