A Hybrid Method for Extracting Deep Web Information

被引：0

作者：

Zhang, Yuanpeng ^{[1
]}

Wang, Li ^{[1
]}

Jiang, Kui ^{[1
]}

Qian, Danmin ^{[1
]}

Dong, Jiancheng ^{[1
]}

机构：

[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷

关键词：

information extraction; clinic expert information; domain model; block importance model; SVM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.

引用

页码：777 / 782

页数：6

共 50 条

[41] A METHOD FOR EXTRACTING VEGETATION INFORMATION OF URBAN UNDERLAYING SURFACE ORIENTED TO ECO-ENVIRONMENTAL QUALITY ASSESSMENT
Zhang, Xiaoyuan
Song, Yulun
Wang, Shudong
Zhang, Lifu
Zhang, Xia
2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 3479 - 3482
[42] Method for extracting UAV RGB image information based on matching point cloud and HSI color component
Yang X.
Zhu D.
Yang R.
Zuo X.
Xie W.
Fu Z.
Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2021, 37 (22): : 295 - 301
[43] Exploiting the information Web
Gregg, Dawn G.
Walczak, Steven
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2007, 37 (01): : 109 - 125
[44] An information extraction method based on improved mixed text density web pages
Zhou, Yuan
Yin, Xiaojun
Yan, Jingchen
EXPERT SYSTEMS, 2024, 41 (06)
[45] Research of Extracting Data from HTML Web Pages Automatically
王茹
宋瀚涛
陆玉昌
Journal of Beijing Institute of Technology, 2003, (S1) : 104 - 108
[46] Extracting Rich Semantic Information about Cybersecurity Events
Satyapanich, Taneeya
Finin, Tim
Ferraro, Francis
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5034 - 5042
[47] Sequential data search for extracting information from texts
Charnois, Thierry
Plantevit, Marc
Rigotti, Christophe
Cremilleux, Bruno
TRAITEMENT AUTOMATIQUE DES LANGUES, 2009, 50 (03): : 59 - 87
[48] Extracting information from free text radiology reports
Johnson D.B.
Taira R.K.
Cardenas A.F.
Aberle D.R.
International Journal on Digital Libraries, 1997, 1 (3) : 297 - 308
[49] Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity
Maturana, Francisco
Riveros, Cristian
Vrgoc, Domagoj
PODS'18: PROCEEDINGS OF THE 37TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2018, : 125 - 136
[50] Extracting trust information from security system of a service
Bahtiyar, Serif
Caglayan, Mehmet Ufuk
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2012, 35 (01) : 480 - 490

← 1 2 3 4 5 →