A hybrid approach for web information extraction

被引:1
|
作者
Xiao, Ji-Yi [1 ]
Zhu, Dao-Hui [1 ]
Zou, La-Mei [1 ]
机构
[1] Univ S China, Sch Comp Sci & Technol, Hengyang 421001, Peoples R China
来源
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7 | 2008年
关键词
information extraction; hidden Markov model; maximum entropy; maximum entropy Markov model; generalized iterative scaling;
D O I
10.1109/ICMLC.2008.4620654
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new approach based on maximum entropy and maximum entropy Markov model for web information extraction. This approach is not only able to overcome the shortcoming of the less precision and recall of the hidden Markov model. In addition, this approach can make the most of various kinds of contextual information from web. The experiments are found that the hybrid approach has an average precision rate of 87.516% while the hidden Markov model trained by the Baum-Welch algorithm has an average precision rate of 68.630%. This implies that the hybrid approach is more optimized than the hidden Markov model trained by the Baum-Welch algorithm.
引用
收藏
页码:1560 / 1563
页数:4
相关论文
共 50 条
  • [1] Combining Classification Algorithm with DOM Algorithm for Web Information Extraction - A Hybrid Approach
    Bhavanasi, Venkat Ramana
    Damodaram, A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 591 - +
  • [2] TEG - a hybrid approach to information extraction
    Feldman, R
    Rosenfeld, B
    Fresko, M
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (01) : 1 - 18
  • [3] An hybrid approach for legal information extraction
    Poudyal, Prakash
    Quaresma, Paulo
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2012), 2012, 250 : 115 - +
  • [4] TEG—a hybrid approach to information extraction
    Ronen Feldman
    Benjamin Rosenfeld
    Moshe Fresko
    Knowledge and Information Systems, 2006, 9 : 1 - 18
  • [5] An approach of automatic web mail information extraction
    Li, Yingrun
    Shu, Hui
    2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 2, 2008, : 1113 - 1118
  • [6] Product-advisory on the web: An information extraction approach
    Schmidt, Sebastian
    Mandl, Stefan
    Ludwig, Bemd
    Stoyan, Herbert
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 633 - +
  • [7] Flexible Approach for Web Information Extraction Based on HTML']HTMLParser
    Shan, Lin
    Qun, Zhang
    PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI, 2012, : 683 - 686
  • [8] Web-Based Information Extraction Technology
    孙铁利
    教巍巍
    刘淑华
    JournalofDonghuaUniversity(EnglishEdition), 2007, (02) : 288 - 292
  • [9] Hybrid approach to extracting information from web-tables
    Jung, Sung-won
    Kang, Mi-young
    Kwon, Hyuk-chul
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 109 - +
  • [10] A HYBRID APPROACH FOR INFORMATION EXTRACTION FROM HIGH RESOLUTION SATELLITE IMAGERY
    Singh, Pankaj Pratap
    Garg, R. D.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2013, 13 (02)