The Partition Heuristic Information Extraction Algorithm of Unstructured Data

被引:0
作者
Li, Cong [1 ]
Zou, Chengming [1 ]
Zhong, Luo [1 ]
Zhu, Jinyang [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430070, Peoples R China
来源
2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA) | 2013年
关键词
unstructured data; information; extraction; logistics information; divide-and-conquer method; words segmentation;
D O I
10.1109/CLOUDCOM-ASIA.2013.104
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a method that extracts attributes of given entity from unstructured data for the field of logistics by using the idea of divide and conquer as to the characters of logistics information. After the full study of logistics information, we make a statistical analysis for the text logistics information and summarize the common attributes of text information entity. According to the different attributes and attribute values, we divided text information entity by the idea of divide and conquer. As to the entity we get from last step we make an internal processing based on segmentation method of tagging and graph. We extracted valuable attributes and attribute values from the unstructured data. Experimental results show that this method is valid for the logistics information which we achieve from a well-known logistics system.
引用
收藏
页码:570 / 576
页数:7
相关论文
共 22 条
[1]  
Bao Jiana, 2012 INT C AS LANG P
[2]  
Chen Wenyu, 2010 INT FOR INF TEC
[3]  
He Hu, 2012, 9 INT C FUZZ SYST KN
[4]  
Hu Jin-hua, 2002, Mini-Micro Systems, V23, P1161
[5]  
[黄德根 Huang Degen], 2003, [中文信息学报, Journal of Chinese Information Processing], V17, P36
[6]  
Hui Jiao, 2007 INT C COMP INT
[7]  
Huo Yan, 2011, INFORM EXTRACTION UN
[8]  
Li Nuo, 2009, Computer Engineering and Applications, V45, P230, DOI 10.3778/j.issn.1002-8331.2009.28.069
[9]  
Liu Jian, 2008, MICROCOMPUTER APPL, V29, P11
[10]  
Liu Qian, 2007, Application Research of Computers, V24, P6