An Effective and Discriminative Feature Learning for URL based Web Page Classification

被引:5
|
作者
Rajalakshmi, R. [1 ]
Aravindan, Chandrabose [2 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, Tamil Nadu, India
[2] SSN Coll Engn, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
来源
2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC) | 2018年
关键词
D O I
10.1109/SMC.2018.00240
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ever growing World Wide Web results in a large volume of web pages with variety of topics. Many applications such as information filtering and focused crawling demand large scale topic classification of a web page. To classify the web pages, URL based approach is proposed by which downloading the contents of the web page for classification purpose is avoided. In this paper, an automated way of learning category specific universal dictionary of discriminating URL features is proposed. Using this automatically learnt dictionary, the feature vector dimensionality is made independent of training set and it overcomes the difficulty of handling large scale data. For constructing this dictionary, publicly available ODP dataset have been used. The proposed approach was evaluated by applying the automatically learnt URL feature dictionaries on another dataset that contains search results from Google. Through experiments, it is shown that macro-average precision, recall and F1 values of 0.93, 0.85 and 0.88 have been achieved. We have observed that, the difference is not statistically significant when the universal dictionary is applied instead of using dataset-specific term dictionary.
引用
收藏
页码:1374 / 1379
页数:6
相关论文
共 50 条
  • [21] Deep and Discriminative Feature Learning for Fingerprint Classification
    Ge, Shishu
    Bai, Chaochao
    Liu, Yan
    Liu, Yonghong
    Zhao, Tong
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1942 - 1946
  • [22] Preprocessing and Feature Preparation in Chinese Web Page Classification
    Huang, Weitong
    Xu, Luxiong
    Liu, Yanmin
    2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY, VOL I, PROCEEDINGS, 2009, : 64 - +
  • [23] Joint-Feature (JFEAT) Web Page Classification
    Han, Lim Wern
    Alhashmi, Saadat M.
    BUSINESS TRANSFORMATION THROUGH INNOVATION AND KNOWLEDGE MANAGEMENT: AN ACADEMIC PERSPECTIVE, VOLS 1-2, 2010, : 819 - 828
  • [24] Feature selection with rough sets for web page classification
    An, AJ
    Huang, YH
    Huang, XJ
    Cercone, N
    TRANSACTIONS ON ROUGH SETS II: ROUGH SETS AND FUZZY SETS, 2004, 3135 : 1 - 13
  • [25] Classifier and feature set ensembles for web page classification
    Onan, Aytug
    JOURNAL OF INFORMATION SCIENCE, 2016, 42 (02) : 150 - 165
  • [26] Discriminative Dictionary Learning based on Supervised Feature Selection for Image Classification
    Feng, Shaokun
    Lu, Hongtao
    Long, Xianzhong
    2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 1, 2014, : 225 - 228
  • [27] Feature Extraction and Classification Phishing Websites Based on URL
    Aydin, Mustafa
    Baykal, Nazife
    2015 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2015, : 769 - 770
  • [28] DLANet: A manifold-learning-based discriminative feature learning network for scene classification
    Feng, Ziyong Z
    Jin, Lianwen
    Tao, Dapeng
    Huang, Shuangping
    NEUROCOMPUTING, 2015, 157 : 11 - 21
  • [29] Web page classification based on SVM
    Xue, Weimin
    Bao, Hong
    Xue, Weimin
    Huang, Weitong
    Lu, Yuchang
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6111 - +
  • [30] Effective Learning with Joint Discriminative and Representative Feature Selection
    Wang, Shupeng
    Zhang, Xiao-Yu
    Dang, Xianglei
    Li, Binbin
    Wang, Haiping
    COMPUTATIONAL SCIENCE - ICCS 2018, PT III, 2018, 10862 : 632 - 638