A Hybrid Method for Extracting Deep Web Information

被引：0

作者：

Zhang, Yuanpeng ^{[1
]}

Wang, Li ^{[1
]}

Jiang, Kui ^{[1
]}

Qian, Danmin ^{[1
]}

Dong, Jiancheng ^{[1
]}

机构：

[1] Nantong Univ, Sch Med, Dept Med Informat, Nantong 226001, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING | 2015年 / 124卷

关键词：

information extraction; clinic expert information; domain model; block importance model; SVM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in response pages. These two models are both compared with a rule-based method. The experiment results indicate that the domain model yields a precision6.44% higher than that of the rulebased method, whereas the block importance model yields an F1 measure 10.5% higher thanthat of the XPath method.

引用

页码：777 / 782

页数：6

共 6 条

[1]

Bergman M., 2001, Journal of Electronic Publishing, V7, P3

[2] A tutorial on Support Vector Machines for pattern recognition [J].

Burges, CJC .

DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167

[3]

Cope Jared., 2003, P 14 AUSTRALASIAN DA, V17, P181

[4]

Fayzrakhmanov Ruslan R., 2012, Current Trends in Web Engineering. Workshops, Doctoral Symposium, and Tutorials Held at ICWE 2011. Revised Selected Papers, P342, DOI 10.1007/978-3-642-27997-3_37

[5] VECTOR-SPACE MODEL FOR AUTOMATIC INDEXING [J].

SALTON, G ;

WONG, A ;

YANG, CS .

COMMUNICATIONS OF THE ACM, 1975, 18 (11) :613-620

[6]

Yan Fu, 2007, 2007 3rd International Conference on Semantics, Knowledge and Grid, P450, DOI 10.1109/SKG.2007.106

← 1 →