Approach for Unwrapping the Unstructured to Structured Data the Case of Classified Ads in HTML']HTML Format

被引:1
作者
Banowosari, Lintang Yuniar [1 ]
Purnamasari, Detty [2 ]
机构
[1] Gunadarma Univ, Informat Management Dept, Jakarta, Indonesia
[2] Gunadarma Univ, Dept Informat Syst, Jakarta, Indonesia
关键词
Classified Ads; Database; Internet; Unwrapping; Unstructured Data;
D O I
10.1166/asl.2016.7739
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Data sources with various forms and formats available on the Internet. Data can be in the form of semi-structured and unstructured data. Research's objective is developing approach for unwrapping the unstructured data available on the internet into structured data/database. Unstructured data used in this study is in the case of classified ads on the Indonesia website, and those unstructured data is in HTML format. The Illustration made to test the approach. The results of the test show the value of f-measure 99.13%.
引用
收藏
页码:1909 / 1913
页数:5
相关论文
共 6 条
[1]  
Baumgartner R., 2001, Proceedings of the 27th International Conference on Very Large Data Bases, P119
[2]  
Gultom Rudy A. G., 2011, Journal of Computer Sciences, V7, P129, DOI 10.3844/jcssp.2011.129.142
[3]  
Lerman K., 2001, P AUT TEXT EXTR MIN
[4]   An automated approach for retrieving hierarchical data from HTML']HTML tables [J].
Lim, SJ ;
Ng, YK .
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, :466-474
[5]   A Framework for Extracting Information from Semi-Structured Web Data Sources [J].
Shaker, Malunoud ;
Ibrahim, Hamidah ;
Mustapha, Aida ;
Abdullah, Lili Nurliyana .
THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, :27-31
[6]   Table extraction for answer retrieval [J].
Wei, Xing ;
Croft, Bruce ;
McCallum, Andrew .
INFORMATION RETRIEVAL, 2006, 9 (05) :589-611