AUTOMATIC WRAPPER SYSTEM FOR SEMI-STRUCTURED DOCUMENTS BASED ON DATA MINING

被引:0
作者
Rancea, Irina [1 ]
Sgarciu, Valentin [1 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Bucharest, Romania
来源
UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE | 2012年 / 74卷 / 04期
关键词
natural language processing; data mining; cluster analysis;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Our world involves understanding and storing of huge information from different sources that need integration and synthesis. The necessity of smart applications that can automatically process and collect such information was critic. These applications use clustering analysis in order to find common groups of data. Also due to the knowledges in the software applications area, the new trend for this domain is process automation, saving in this way time for design of new concepts and architectures. Our paper proposes a combination of discovering and processing information stored in documents in order to automate software processes.
引用
收藏
页码:55 / 66
页数:12
相关论文
共 24 条
[1]  
Baeza-Yates R. A., 1999, MODERN INFORM RETRIE
[2]  
CALIFF M, 1998, P AAAI SPRING S APPL
[3]   Automatic information extraction from semi-structured Web pages by pattern discovery [J].
Chang, CH ;
Hsu, CN ;
Lui, SC .
DECISION SUPPORT SYSTEMS, 2003, 35 (01) :129-147
[4]   VALIDITY STUDIES IN CLUSTERING METHODOLOGIES [J].
DUBES, R ;
JAIN, AK .
PATTERN RECOGNITION, 1979, 11 (04) :235-254
[5]  
Freitag D., 1998, P 15 INT C ART INT I
[6]  
Gosling J., 2005, JAVA LANGUAGE SPECIF
[7]  
Hartigan J., 1975, CLUSTERING ALGORITHM
[8]  
Hobbs J., 1994, P 5 MESS UND C MUC 5
[9]   Generating finite-state transducers for semi-structured data extraction from the Web [J].
Hsu, CN ;
Dung, MT .
INFORMATION SYSTEMS, 1998, 23 (08) :521-538
[10]  
Huffman S., 1996, LEARNING INFORM EXTR