A Hybrid Method for Extracting Deep Web Information

被引:0
作者
Zhang, Yuanpeng [1 ]
Wang, Li [1 ]
Jiang, Kui [1 ]
Qian, Danmin [1 ]
Dong, Jiancheng [1 ]
机构
[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China
来源
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷
关键词
information extraction; clinic expert information; domain model; block importance model; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.
引用
收藏
页码:777 / 782
页数:6
相关论文
共 50 条
  • [31] Extracting Personal Information from Conversations
    Tigunova, Anna
    WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 284 - 288
  • [32] Deep learning in extracting tropical cyclone intensity and wind radius information from satellite infrared images -A review
    Wang, Chong
    Li, Xiaofeng
    ATMOSPHERIC AND OCEANIC SCIENCE LETTERS, 2023, 16 (04)
  • [33] Extracting News Content with Visual Unit of Web Pages
    Zhu, Wenhao
    Dai, Song
    Song, Yang
    Lu, Zhiguo
    2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 211 - 215
  • [34] Extracting Contextualized Quantity Facts from Web Tables
    Ho, Vinh Thinh
    Pal, Koninika
    Razniewski, Simon
    Berberich, Klaus
    Weikum, Gerhard
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 4033 - 4042
  • [35] A Method of Web Information Extraction Based on Building Different Sub Trees
    Wang, Yuanlong
    Jiang, Hong
    Bing, Zhaohong
    Zhang, Li
    MANUFACTURING PROCESS AND EQUIPMENT, PTS 1-4, 2013, 694-697 : 2513 - +
  • [36] Extracting protein-protein interaction information from biomedical text with SVM
    Mitsumori, Tomohiro
    Murata, Masaki
    Fukuda, Yasushi
    Doi, Kouichi
    Doi, Hirohumi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (08) : 2464 - 2466
  • [37] Towards extracting semantic information from texts
    Trandabat, Diana
    13TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2011), 2012, : 199 - 206
  • [38] Extracting information from unknown protocols on CampusNet
    Yu, Zhuanghui
    Huang, Yongzhong
    Guo, Shaozhong
    Zhou, Bei
    Ren, Hua
    PROCEEDINGS OF THE 2007 1ST INTERNATIONAL SYMPOSIUM ON INFORMATION TECHNOLOGIES AND APPLICATIONS IN EDUCATION (ISITAE 2007), 2007, : 535 - +
  • [40] A METHOD FOR EXTRACTING VEGETATION INFORMATION OF URBAN UNDERLAYING SURFACE ORIENTED TO ECO-ENVIRONMENTAL QUALITY ASSESSMENT
    Zhang, Xiaoyuan
    Song, Yulun
    Wang, Shudong
    Zhang, Lifu
    Zhang, Xia
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 3479 - 3482