A hybrid approach for web information extraction

被引：1

作者：

Xiao, Ji-Yi ^{[1
]}

Zhu, Dao-Hui ^{[1
]}

Zou, La-Mei ^{[1
]}

机构：

[1] Univ S China, Sch Comp Sci & Technol, Hengyang 421001, Peoples R China

来源：

PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7 | 2008年

关键词：

information extraction; hidden Markov model; maximum entropy; maximum entropy Markov model; generalized iterative scaling;

D O I：

10.1109/ICMLC.2008.4620654

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a new approach based on maximum entropy and maximum entropy Markov model for web information extraction. This approach is not only able to overcome the shortcoming of the less precision and recall of the hidden Markov model. In addition, this approach can make the most of various kinds of contextual information from web. The experiments are found that the hybrid approach has an average precision rate of 87.516% while the hidden Markov model trained by the Baum-Welch algorithm has an average precision rate of 68.630%. This implies that the hybrid approach is more optimized than the hidden Markov model trained by the Baum-Welch algorithm.

引用

页码：1560 / 1563

页数：4

共 50 条

[1] Combining Classification Algorithm with DOM Algorithm for Web Information Extraction - A Hybrid Approach
Bhavanasi, Venkat Ramana
Damodaram, A.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 591 - +
[2] TEG - a hybrid approach to information extraction
Feldman, R
Rosenfeld, B
Fresko, M
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (01) : 1 - 18
[3] An hybrid approach for legal information extraction
Poudyal, Prakash
Quaresma, Paulo
LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2012), 2012, 250 : 115 - +
[4] TEG—a hybrid approach to information extraction
Ronen Feldman
Benjamin Rosenfeld
Moshe Fresko
Knowledge and Information Systems, 2006, 9 : 1 - 18
[5] An approach of automatic web mail information extraction
Li, Yingrun
Shu, Hui
2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 2, 2008, : 1113 - 1118
[6] Product-advisory on the web: An information extraction approach
Schmidt, Sebastian
Mandl, Stefan
Ludwig, Bemd
Stoyan, Herbert
PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 633 - +
[7] Flexible Approach for Web Information Extraction Based on HTML']HTMLParser
Shan, Lin
Qun, Zhang
PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI, 2012, : 683 - 686
[8] Web-Based Information Extraction Technology
孙铁利
教巍巍
刘淑华
JournalofDonghuaUniversity(EnglishEdition), 2007, (02) : 288 - 292
[9] Hybrid approach to extracting information from web-tables
Jung, Sung-won
Kang, Mi-young
Kwon, Hyuk-chul
COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 109 - +
[10] A HYBRID APPROACH FOR INFORMATION EXTRACTION FROM HIGH RESOLUTION SATELLITE IMAGERY
Singh, Pankaj Pratap
Garg, R. D.
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2013, 13 (02)

← 1 2 3 4 5 →