Flexible Approach for Web Information Extraction Based on HTML']HTMLParser

被引：0

作者：

Shan, Lin ^{[1
]}

Qun, Zhang ^{[1
]}

机构：

[1] Hubei Univ Technol, Sch Comp Sci, Wuhan, Peoples R China

来源：

PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI | 2012年

关键词：

information extraction; Web crawler; !text type='HTML']HTML[!/text]Parser; filter; visitor; custom tags;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Nowadays Internet presents a huge amount of information for users. How to extract information quickly and effectively from various sources becomes very important. Web information extraction is the key element not only to Web crawler or search engine, but also for many specialized services such as competitive intelligence tools. This paper recommends a flexible and high-performance approach to the web information extraction. HTMLParser is a parsing library mainly used to transform or extract the Web information with HTML. It uses Node, Abstract Node, and Tag to express HTML page. It can extract information mainly with two ways: filter and visitor. With HTMLParser, we can conveniently extract hyperlink, email, title, etc. In this paper, we also extend HTMLParser to extract custom tags in certain web pages to expand its application area. Experimental results confirm the feasibility of the approach.

引用

页码：683 / 686

页数：4

共 50 条

[1] FLEXIBLE WEB INFORMATION EXTRACTION WITH HTML']HTMLPARSER
Shan, Lin
3RD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE (ITCS 2011), PROCEEDINGS, 2011, : 295 - 298
[2] An approach of automatic web mail information extraction
Li, Yingrun
Shu, Hui
2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 2, 2008, : 1113 - 1118
[3] Towards Flexible Mashup of Web Applications Based on Information Extraction and Transfer
Guo, Junxia
Han, Hao
Tokudal, Takehiro
WEB INFORMATION SYSTEM ENGINEERING-WISE 2010, 2010, 6488 : 602 - +
[4] A HTML']HTML to WML Translating Model Based on Information Extraction for Mobile Commerce
Song, Mingqiu
Yu, Bo
2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 9166 - 9169
[5] A Method Research of Extracting Web Information Based on HTML']HTML 5 New Standard
Liu, Qing-hua
Feng, Li-yun
INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL AND AUTOMATION ENGINEERING (ECAE 2013), 2013, : 520 - 524
[6] A hybrid approach for web information extraction
Xiao, Ji-Yi
Zhu, Dao-Hui
Zou, La-Mei
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 1560 - 1563
[7] Product-advisory on the web: An information extraction approach
Schmidt, Sebastian
Mandl, Stefan
Ludwig, Bemd
Stoyan, Herbert
PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 633 - +
[8] Web-Based Information Extraction Technology
孙铁利
教巍巍
刘淑华
JournalofDonghuaUniversity(EnglishEdition), 2007, (02) : 288 - 292
[9] An Improved Ontology-Based Web Information Extraction
Zhang, Jing
Ding, Wei Ze
2015 INTERNATIONAL CONFERENCE OF EDUCATIONAL INNOVATION THROUGH TECHNOLOGY - EITT 2015, 2015, : 37 - 41
[10] Study of Extraction for Web Pages Information Based on XML
Li, Suming
PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 829 - 832

← 1 2 3 4 5 →