SimiLay: A Developing Web Page Layout Based Visual Similarity Search Engine

被引:0
作者
Bozkir, Ahmet Selman [1 ]
Sezer, Ebru Akcapinar [1 ]
机构
[1] Hacettepe Univ, Comp Sci & Engn Dept, Ankara, Turkey
来源
MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2014 | 2014年 / 8556卷
关键词
Web page visual similarity; spatial pyramid match kernel; bag of words;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web page visual similarity has been a trend topic in last decade. Furthermore, effective methods and approaches are crucial for phishing detection and related issues. In this study, we aim to develop a search engine for web page visual similarity and propose a novel method for capturing and calculating layout similarity of web pages. To achieve this, web page elements are classified and mapped with a novel technique. Furthermore, an extension of well known bag of features approach named spatial pyramid match has been employed via histogram intersection schema for capturing and measuring the partial and whole page layout similarity. Promising results demonstrate that spatial pyramid matching kernel can be used for this field.
引用
收藏
页码:457 / 470
页数:14
相关论文
共 22 条
  • [1] Alpuente M., 2010, P APPL INT SAINT
  • [2] A Visual Technique for Web Pages Comparison
    Alpuente, Maria
    Romero, Daniel
    [J]. ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2009, 235 : 3 - 18
  • [3] [Anonymous], 2005, P ICCV
  • [4] [Anonymous], 2003, VIPS VISION BASED PA
  • [5] [Anonymous], 2012, PROC 10 INT WORKSHOP
  • [6] [Anonymous], 2006, P IEEE COMP SOC C CO
  • [7] Bohunsky Paula., 2010, Proceedings of the 19th International Conference on World Wide Web, WWW 10, P1067
  • [8] Detecting Visually Similar Web Pages: Application to Phishing Detection
    Chen, Teh-Chung
    Dick, Scott
    Miller, James
    [J]. ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2010, 10 (02)
  • [9] Eglin V., 2003, P 7 INT C DOC AN REC
  • [10] GEHRKE D, 1999, P 32 HAW INT C SYST