A Hybrid Method for Extracting Deep Web Information

被引:0
|
作者
Zhang, Yuanpeng [1 ]
Wang, Li [1 ]
Jiang, Kui [1 ]
Qian, Danmin [1 ]
Dong, Jiancheng [1 ]
机构
[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China
来源
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷
关键词
information extraction; clinic expert information; domain model; block importance model; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.
引用
收藏
页码:777 / 782
页数:6
相关论文
共 50 条
  • [1] Hybrid approach to extracting information from web-tables
    Jung, Sung-won
    Kang, Mi-young
    Kwon, Hyuk-chul
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 109 - +
  • [2] A novel method for extracting information from web pages with multiple presentation templates
    Qingzhong L.
    Yanhui D.
    An F.
    Yongquan D.
    Journal of Software, 2010, 5 (05) : 506 - 513
  • [3] An Efficient Method for Extracting Web News Content
    Sun, Jian
    Tang, Luyang
    Liao, Dan
    Chang, Victor
    2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [4] On extracting link information of relationship instances from a web site
    Naing, MM
    Lim, EP
    Goh, DHL
    WEB SERVICES -ICWS-EUROPE 2003, PROCEEDINGS, 2003, 2853 : 213 - 226
  • [5] A hybrid approach for web information extraction
    Xiao, Ji-Yi
    Zhu, Dao-Hui
    Zou, La-Mei
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 1560 - 1563
  • [6] A scalable hybrid approach for extracting head components from Web tables
    Jung, SW
    Kwon, HC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 174 - 187
  • [7] Extracting Web Business Information Based on Domain-Specific Ontology
    Shen, J.
    Bi, L.
    Xu, F. Y.
    He, K.
    Wei, L. H.
    Zhu, Y.
    ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 1, 2008, : 997 - 1003
  • [8] Extracting medical records with hierarchical information extraction method
    Zhu, W., 1600, Asian Network for Scientific Information (12): : 4441 - 4446
  • [9] Using Pattern Discovery Method and Position Details with Tree Matching for Extracting Information from Template Based Web Pages
    Ramana, B. Venkat
    Damodaram, A.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 684 - +
  • [10] The Technology of Extracting Content Information from Web Page Based on DOM Tree
    Yuan, Dingrong
    Mo, Zhuoying
    Xie, Bing
    Xie, Yangcai
    ADVANCED RESEARCH ON ELECTRONIC COMMERCE, WEB APPLICATION, AND COMMUNICATION, PT 2, 2011, 144 : 271 - 278