Towards understanding the functions of Web element

被引:0
|
作者
Yin, XY [1 ]
Lee, WS
机构
[1] Natl Univ Singapore, Dept Comp Sci, Singapore 117543, Singapore
[2] Natl Univ Singapore, MIT Alliance, Singapore 117543, Singapore
来源
INFORMATION RETRIEVAL TECHNOLOGY | 2005年 / 3411卷
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A web page is a collection of basic elements, and the role of each element in a page is different. For example, an image element can be part of the main content, advertisement, or banner of the site. This paper describes ongoing work using a machine learning approach to classify each element in a web page into six functional categories: Content (C), Related Link (R), Navigation (N), Advertisement (A), Form (F) and Other (O). This allows the extraction of only certain categories of content in a webpage to be delivered to a mobile device to fit user's specific needs, or to facilitate web information processes like web mining or mobile search. We manually labeled 18,864 elements from 150 websites. For each element we extracted both local features (such as the text length, URL, tag name etc) and global features (such as the text match with the other elements) to construct a feature vector. We trained the training set 10,650 elements with a decision tree learning algorithm J48, and it achieved 82% accuracy for stratified cross-validation, and an average F value 0.78 for the six different categories. Testing on 3,043 elements from pages that are not included in the training set gives 58% accuracy rate. Although this is not satisfactory overall, the F value for content category reaches 0.795, indicating that the method could be useful for less demanding applications. We are working on improving the results in order to make automatic functional classification of web elements feasible and to provide new opportunities to push the state of art in the mobile internet and mobile search.
引用
收藏
页码:313 / 324
页数:12
相关论文
共 50 条
  • [21] Towards a better understanding of Web resources and server responses for improved caching
    Wills, CE
    Mikhailov, M
    COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (11-16): : 1231 - 1243
  • [22] Towards a better understanding of Web resources and server responses for improved caching
    Wills, CE
    Mikhailov, M
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL WORLD WIDE WEB CONFERENCE, 1999, : 153 - 165
  • [23] Towards multiscale functions: enriching finite element spaces with local but not bubble-like functions
    Franca, LP
    Madureira, AL
    Valentin, F
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2005, 194 (27-29) : 3006 - 3021
  • [24] Understanding the Social Web: Towards Defining an Interdisciplinary Research Agenda for Information Systems
    Appleford, Simon J.
    Bottum, James R.
    Thatcher, Jason B.
    DATA BASE FOR ADVANCES IN INFORMATION SYSTEMS, 2014, 45 (01): : 29 - 37
  • [25] Accessibility of web search engines Towards a deeper understanding of barriers for people with disabilities
    Kerkmann, Friederike
    Lewandowski, Dirk
    LIBRARY REVIEW, 2012, 61 (8-9) : 608 - 621
  • [26] Theoretical Foundations of the Web: Cognition, Communication, and Co-Operation. Towards an Understanding of Web 1.0, 2.0, 3.0
    Fuchs, Christian
    Hofkirchner, Wolfgang
    Schafranek, Matthias
    Raffl, Celina
    Sandoval, Marisol
    Bichler, Robert
    FUTURE INTERNET, 2010, 2 (01): : 41 - 59
  • [27] Feed Efficiency : towards a better understanding of it as a key element of sustainable livestock systems
    Cantalapiedra-Hijar, Gonzalo
    Faverdin, Philippe
    Friggens, Nicolas C.
    Martin, Pauline
    INRAE PRODUCTIONS ANIMALES, 2020, 33 (04): : 235 - 248
  • [28] Towards generic user interface for web based systems serving similar functions
    Ahmad, Rashid
    Li, Zhang
    Azam, Farooque
    FOURTH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATIONS, PROCEEDINGS, 2006, : 297 - +
  • [29] The Evolution of Human Basophil Biology from Neglect towards Understanding of Their Immune Functions
    Steiner, Markus
    Huber, Sara
    Harrer, Andrea
    Himly, Martin
    BIOMED RESEARCH INTERNATIONAL, 2016, 2016
  • [30] FROM AN INTERFUNCTIONAL INTERPRETATION OF PERCEPTION TOWARDS A REGULATORY UNDERSTANDING OF MENTAL FUNCTIONS IN BEHAVIOUR
    KOVAC, D
    STUDIA PSYCHOLOGICA, 1971, 13 (02) : 159 - 159