Towards understanding the functions of Web element

被引:0
|
作者
Yin, XY [1 ]
Lee, WS
机构
[1] Natl Univ Singapore, Dept Comp Sci, Singapore 117543, Singapore
[2] Natl Univ Singapore, MIT Alliance, Singapore 117543, Singapore
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A web page is a collection of basic elements, and the role of each element in a page is different. For example, an image element can be part of the main content, advertisement, or banner of the site. This paper describes ongoing work using a machine learning approach to classify each element in a web page into six functional categories: Content (C), Related Link (R), Navigation (N), Advertisement (A), Form (F) and Other (O). This allows the extraction of only certain categories of content in a webpage to be delivered to a mobile device to fit user's specific needs, or to facilitate web information processes like web mining or mobile search. We manually labeled 18,864 elements from 150 websites. For each element we extracted both local features (such as the text length, URL, tag name etc) and global features (such as the text match with the other elements) to construct a feature vector. We trained the training set 10,650 elements with a decision tree learning algorithm J48, and it achieved 82% accuracy for stratified cross-validation, and an average F value 0.78 for the six different categories. Testing on 3,043 elements from pages that are not included in the training set gives 58% accuracy rate. Although this is not satisfactory overall, the F value for content category reaches 0.795, indicating that the method could be useful for less demanding applications. We are working on improving the results in order to make automatic functional classification of web elements feasible and to provide new opportunities to push the state of art in the mobile internet and mobile search.
引用
收藏
页码:313 / 324
页数:12
相关论文
共 50 条
  • [1] Towards Understanding Upstream Web Traffic
    Gugelmann, David
    Ager, Bernhard
    Lenders, Vincent
    Happe, Markus
    2015 INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2015, : 538 - 544
  • [2] Towards a better understanding of web applications
    Hassan, AE
    Holt, RC
    WSE 2001: 3RD INTERNATIONAL WORKSHOP ON WEB SITE EVOLUTION, 2001, : 112 - 116
  • [3] Towards understanding information encountering on the Web
    Erdelez, S
    ASIS 2000: PROCEEDINGS OF THE 63RD ASIS ANNUAL MEETING, VOL 37, 2000, 2000, 37 : 363 - 371
  • [4] Towards Understanding of the Behavior of Web Streaming
    Reznik, Yuriy A.
    Lillevold, Karl O.
    Jagannath, Abhijith
    Li, Xiangbo
    2021 PICTURE CODING SYMPOSIUM (PCS), 2021, : 81 - 85
  • [5] Towards web semantization and user understanding
    Peska, Ladislav
    Lasek, Ivo
    Eckhardt, Alan
    Dedek, Jan
    Vojtas, Petor
    Fiser, Dominik
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIV, 2013, 251 : 63 - 81
  • [6] Towards Better Understanding of App Functions
    Tong, Yong-Xin
    She, Jieying
    Chen, Lei
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (05) : 1130 - 1140
  • [7] Towards Better Understanding of App Functions
    Yong-Xin Tong
    Jieying She
    Lei Chen
    Journal of Computer Science and Technology, 2015, 30 : 1130 - 1140
  • [8] Towards understanding the structure of voids in the cosmic web
    Einasto, J.
    Suhhonenko, I.
    Huetsi, G.
    Saar, E.
    Einasto, M.
    Liivamaegi, L. J.
    Mueller, V.
    Starobinsky, A. A.
    Tago, E.
    Tempel, E.
    ASTRONOMY & ASTROPHYSICS, 2011, 534
  • [9] Towards implicit invocation of Web services functions
    Tokuda, T
    Suzuki, T
    Nakayama, H
    INFORMATION MODELLING AND KNOWLEDGE BASES XV, 2004, 105 : 111 - 114
  • [10] Towards a mechanistic understanding of lipodystrophy and seipin functions
    Wee, Kenneth
    Yang, Wulin
    Sugii, Shigeki
    Han, Weiping
    BIOSCIENCE REPORTS, 2014, 34 : 583 - 591