Efficient Machine Learning Technique for Web Page Classification

被引:2
作者
Markkandeyan, S. [1 ]
Devi, M. Indra [2 ]
机构
[1] Ratnavel Subramaniam Coll Engn & Technol, Dept Informat Technol, Dindigul 624005, Tamil Nadu, India
[2] Kamaraj Coll Engn & Technol, Dept Comp Sci & Engn, Virudunagar, Tamil Nadu, India
关键词
Web page classification; Feature selection; Attribute-selected classifier; Principal component analysis; Genetic search; Rank search; SELECTION;
D O I
10.1007/s13369-015-1844-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Web page classification plays a major role in information management and retrieval task. Feature selection is an important process for accurate classification of Web pages. Web pages contain several features, and more number of features reduce the classification accuracy. We propose a hybrid feature selection approach which is both efficient and effective for automatic Web page classification problem and also helps the Web search tool to get relevant results in the relevant category. Experiments were conducted by us with various feature selection methods for Web page classification and keyword search problem. From these experiments, it was found that some features present in the initial feature set (IFS) are irrelevant, redundant, and noisy, and they consume more memory space, increase computational time, and give a poor predictive performance. These features can be eliminated using evaluator methods such as principal component analysis, consistency subset evaluator, and search methods such as genetic search and rank search, resulting in minimal and more relevant features. We call these features as intermediate feature set (IMFS), and further optimization in this feature set gives more accurate results. Finally, attribute-selected classifier which is a part of machine learning meta-classifier was applied to the IMFS to get final feature set (FFS), and it was found that accuracy has increased up to 97% and computational time for all classifiers is minimized compared to IFS using WebKb (Faculty and Course) and ODP (Sports) benchmarking datasets. The proposed method yields better classification performance and reduces space requirements and search time in the Web documents compared with the existing methods.
引用
收藏
页码:3555 / 3566
页数:12
相关论文
共 50 条
  • [21] Web Page Classification Using RNN
    Buber, Ebubekir
    Diri, Banu
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY [ICICT-2019], 2019, 154 : 62 - 72
  • [22] Implicit Links Based Kernel to Enrich Support Vector Machine for Web Page Classification
    Belmouhcine, Abdelbadie
    Benkhalifa, Mohammed
    [J]. 2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA), 2015,
  • [23] Web page classification based on a support vector machine using a weighted vote schema
    Chen, Rung-Ching
    Hsieh, Chung-Hsun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2006, 31 (02) : 427 - 435
  • [24] A machine learning approach to web page filtering using content and structure analysis
    Chau, Michael
    Chen, Hsinchun
    [J]. DECISION SUPPORT SYSTEMS, 2008, 44 (02) : 482 - 494
  • [25] A Novel Feature Selection Framework for Automatic Web Page Classification
    J.Alamelu Mangai
    V.Santhosh Kumar
    S.Appavu alias Balamurugan
    [J]. International Journal of Automation and Computing, 2012, (04) : 442 - 448
  • [26] Two novel feature selection approaches for web page classification
    Chen, Chih-Ming
    Lee, Hahn-Ming
    Chang, Yu-Jung
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 260 - 272
  • [27] A Novel Feature Selection Framework for Automatic Web Page Classification
    Mangai, J. Alamelu
    Kumar, V. Santhosh
    Balamurugan, S. Appavu Alias
    [J]. INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2012, 9 (04) : 442 - 448
  • [28] UPCA: An Efficient URL-Pattern Based Algorithm for Accurate Web Page Classification
    Yang, Yiming
    Zhang, Lei
    Liu, Guiquan
    Chen, Enhong
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 1475 - 1480
  • [29] Web Page Classification Based on Social Annotations
    Shen, J.
    Xu, F. Y.
    Bi, L.
    Wei, L. H.
    He, K.
    Zhu, Y.
    [J]. ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 1, 2008, : 1115 - 1121
  • [30] A Chinese Web Page Automatic Classification System
    Huang, Rongyou
    Zhao, Xinjian
    [J]. WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 61 - +