A Hybrid Method for Extracting Deep Web Information

被引:0
作者
Zhang, Yuanpeng [1 ]
Wang, Li [1 ]
Jiang, Kui [1 ]
Qian, Danmin [1 ]
Dong, Jiancheng [1 ]
机构
[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China
来源
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷
关键词
information extraction; clinic expert information; domain model; block importance model; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.
引用
收藏
页码:777 / 782
页数:6
相关论文
共 50 条
  • [41] A METHOD FOR EXTRACTING VEGETATION INFORMATION OF URBAN UNDERLAYING SURFACE ORIENTED TO ECO-ENVIRONMENTAL QUALITY ASSESSMENT
    Zhang, Xiaoyuan
    Song, Yulun
    Wang, Shudong
    Zhang, Lifu
    Zhang, Xia
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 3479 - 3482
  • [42] Method for extracting UAV RGB image information based on matching point cloud and HSI color component
    Yang X.
    Zhu D.
    Yang R.
    Zuo X.
    Xie W.
    Fu Z.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2021, 37 (22): : 295 - 301
  • [43] Exploiting the information Web
    Gregg, Dawn G.
    Walczak, Steven
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2007, 37 (01): : 109 - 125
  • [44] An information extraction method based on improved mixed text density web pages
    Zhou, Yuan
    Yin, Xiaojun
    Yan, Jingchen
    EXPERT SYSTEMS, 2024, 41 (06)
  • [45] Research of Extracting Data from HTML Web Pages Automatically
    王茹
    宋瀚涛
    陆玉昌
    Journal of Beijing Institute of Technology, 2003, (S1) : 104 - 108
  • [46] Extracting Rich Semantic Information about Cybersecurity Events
    Satyapanich, Taneeya
    Finin, Tim
    Ferraro, Francis
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5034 - 5042
  • [47] Sequential data search for extracting information from texts
    Charnois, Thierry
    Plantevit, Marc
    Rigotti, Christophe
    Cremilleux, Bruno
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2009, 50 (03): : 59 - 87
  • [48] Extracting information from free text radiology reports
    Johnson D.B.
    Taira R.K.
    Cardenas A.F.
    Aberle D.R.
    International Journal on Digital Libraries, 1997, 1 (3) : 297 - 308
  • [49] Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity
    Maturana, Francisco
    Riveros, Cristian
    Vrgoc, Domagoj
    PODS'18: PROCEEDINGS OF THE 37TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2018, : 125 - 136
  • [50] Extracting trust information from security system of a service
    Bahtiyar, Serif
    Caglayan, Mehmet Ufuk
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2012, 35 (01) : 480 - 490