Efficient Machine Learning Technique for Web Page Classification

被引:2
|
作者
Markkandeyan, S. [1 ]
Devi, M. Indra [2 ]
机构
[1] Ratnavel Subramaniam Coll Engn & Technol, Dept Informat Technol, Dindigul 624005, Tamil Nadu, India
[2] Kamaraj Coll Engn & Technol, Dept Comp Sci & Engn, Virudunagar, Tamil Nadu, India
关键词
Web page classification; Feature selection; Attribute-selected classifier; Principal component analysis; Genetic search; Rank search; SELECTION;
D O I
10.1007/s13369-015-1844-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Web page classification plays a major role in information management and retrieval task. Feature selection is an important process for accurate classification of Web pages. Web pages contain several features, and more number of features reduce the classification accuracy. We propose a hybrid feature selection approach which is both efficient and effective for automatic Web page classification problem and also helps the Web search tool to get relevant results in the relevant category. Experiments were conducted by us with various feature selection methods for Web page classification and keyword search problem. From these experiments, it was found that some features present in the initial feature set (IFS) are irrelevant, redundant, and noisy, and they consume more memory space, increase computational time, and give a poor predictive performance. These features can be eliminated using evaluator methods such as principal component analysis, consistency subset evaluator, and search methods such as genetic search and rank search, resulting in minimal and more relevant features. We call these features as intermediate feature set (IMFS), and further optimization in this feature set gives more accurate results. Finally, attribute-selected classifier which is a part of machine learning meta-classifier was applied to the IMFS to get final feature set (FFS), and it was found that accuracy has increased up to 97% and computational time for all classifiers is minimized compared to IFS using WebKb (Faculty and Course) and ODP (Sports) benchmarking datasets. The proposed method yields better classification performance and reduces space requirements and search time in the Web documents compared with the existing methods.
引用
收藏
页码:3555 / 3566
页数:12
相关论文
共 50 条
  • [1] Efficient Machine Learning Technique for Web Page Classification
    S. Markkandeyan
    M. Indra Devi
    Arabian Journal for Science and Engineering, 2015, 40 : 3555 - 3566
  • [2] A review of machine learning algorithms for web page classification
    Lassri, Safae
    El Habib, Benlahmar
    Abderrahim, Tragha
    2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 220 - 226
  • [3] Experimental Analysis of the Machine Learning Algorithms for Crime Web Page Classification
    Markkandeyan, S.
    Selvam, L.
    Tamizharasu, K.
    Aandi, Senthilkumar
    IETE JOURNAL OF RESEARCH, 2024, 70 (05) : 4890 - 4902
  • [4] Stemming Text-based Web Page Classification using Machine Learning Algorithms: A Comparison
    Razali, Ansari
    Daud, Salwani Mohd
    Zin, Nor Azan Mat
    Shahidi, Faezehsadat
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 570 - 576
  • [5] Studies on Chinese web page classification
    Shen, D
    Cong, Y
    Sun, JT
    Lu, YC
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 23 - 27
  • [6] Web page classification using an ensemble of support vector machine classifiers
    Zhong S.
    Zou D.
    Journal of Networks, 2011, 6 (11) : 1625 - 1630
  • [7] Web Page Classification Based on an Accurate Technique for Key Data Extraction
    Lassri, Safae
    Benlahmar, El Habib
    Tragha, Abderrahim
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 1124 - 1131
  • [8] A novel approach for effective web page classification
    Mangai, J. Alamelu
    Kumar, V. Santhosh
    Balamurugan, S. Appavu
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2013, 5 (03) : 233 - 245
  • [9] Chinese web-page classification study
    Huang, Weitong
    Lu-Xiong Xu
    Duan, Junfeng
    Lu, Yuchang
    2007 IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION, VOLS 1-7, 2007, : 2141 - +
  • [10] Web Page Classification Using Firefly Optimization
    Sarac, Esra
    Ozel, Selma Ayse
    2013 IEEE INTERNATIONAL SYMPOSIUM ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (IEEE INISTA), 2013,