Efficient Machine Learning Technique for Web Page Classification

被引：2

作者：

Markkandeyan, S. ^{[1
]}

Devi, M. Indra ^{[2
]}

机构：

[1] Ratnavel Subramaniam Coll Engn & Technol, Dept Informat Technol, Dindigul 624005, Tamil Nadu, India

[2] Kamaraj Coll Engn & Technol, Dept Comp Sci & Engn, Virudunagar, Tamil Nadu, India

来源：

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING | 2015年 / 40卷 / 12期

关键词：

Web page classification; Feature selection; Attribute-selected classifier; Principal component analysis; Genetic search; Rank search; SELECTION;

D O I：

10.1007/s13369-015-1844-1

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Web page classification plays a major role in information management and retrieval task. Feature selection is an important process for accurate classification of Web pages. Web pages contain several features, and more number of features reduce the classification accuracy. We propose a hybrid feature selection approach which is both efficient and effective for automatic Web page classification problem and also helps the Web search tool to get relevant results in the relevant category. Experiments were conducted by us with various feature selection methods for Web page classification and keyword search problem. From these experiments, it was found that some features present in the initial feature set (IFS) are irrelevant, redundant, and noisy, and they consume more memory space, increase computational time, and give a poor predictive performance. These features can be eliminated using evaluator methods such as principal component analysis, consistency subset evaluator, and search methods such as genetic search and rank search, resulting in minimal and more relevant features. We call these features as intermediate feature set (IMFS), and further optimization in this feature set gives more accurate results. Finally, attribute-selected classifier which is a part of machine learning meta-classifier was applied to the IMFS to get final feature set (FFS), and it was found that accuracy has increased up to 97% and computational time for all classifiers is minimized compared to IFS using WebKb (Faculty and Course) and ODP (Sports) benchmarking datasets. The proposed method yields better classification performance and reduces space requirements and search time in the Web documents compared with the existing methods.

引用

页码：3555 / 3566

页数：12

共 50 条

[21] Web Page Classification Using RNN
Buber, Ebubekir
Diri, Banu
[J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY [ICICT-2019], 2019, 154 : 62 - 72
[22] Implicit Links Based Kernel to Enrich Support Vector Machine for Web Page Classification
Belmouhcine, Abdelbadie
Benkhalifa, Mohammed
[J]. 2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA), 2015,
[23] Web page classification based on a support vector machine using a weighted vote schema
Chen, Rung-Ching
Hsieh, Chung-Hsun
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2006, 31 (02) : 427 - 435
[24] A machine learning approach to web page filtering using content and structure analysis
Chau, Michael
Chen, Hsinchun
[J]. DECISION SUPPORT SYSTEMS, 2008, 44 (02) : 482 - 494
[25] A Novel Feature Selection Framework for Automatic Web Page Classification
J.Alamelu Mangai
V.Santhosh Kumar
S.Appavu alias Balamurugan
[J]. International Journal of Automation and Computing, 2012, (04) : 442 - 448
[26] Two novel feature selection approaches for web page classification
Chen, Chih-Ming
Lee, Hahn-Ming
Chang, Yu-Jung
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 260 - 272
[27] A Novel Feature Selection Framework for Automatic Web Page Classification
Mangai, J. Alamelu
Kumar, V. Santhosh
Balamurugan, S. Appavu Alias
[J]. INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2012, 9 (04) : 442 - 448
[28] UPCA: An Efficient URL-Pattern Based Algorithm for Accurate Web Page Classification
Yang, Yiming
Zhang, Lei
Liu, Guiquan
Chen, Enhong
[J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 1475 - 1480
[29] Web Page Classification Based on Social Annotations
Shen, J.
Xu, F. Y.
Bi, L.
Wei, L. H.
He, K.
Zhu, Y.
[J]. ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 1, 2008, : 1115 - 1121
[30] A Chinese Web Page Automatic Classification System
Huang, Rongyou
Zhao, Xinjian
[J]. WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 61 - +

← 1 2 3 4 5 →