Efficient schema extraction from a large collection of XML documents

被引:0
作者
Xing, Guangming [1 ]
Parthepan, Vijayeandra [1 ]
机构
[1] Western Kentucky Univ, Dept Math & Comp Sci, Bowling Green, KY 42101 USA
来源
PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11) | 2011年
关键词
eXtensible Markup Language; schema inference; regular expression; nondeterministic finite automata;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
XML is becoming the standard format for data exchange on the Internet. In this paper, we present a system that is effective in extracting schema information from a large collection of XML documents. Based on Xtract, we propose using the cost of an NFA simulation to compute the Minimum Length Description. We also studied using frequencies of the sample inputs to improve the effectiveness of the schema extraction. Experimental studies were conducted on synthesized XML data sets, suggesting the efficiency and effectiveness of our approach as a solution for schema inference.
引用
收藏
页码:92 / 96
页数:5
相关论文
共 12 条
  • [1] A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications
    Bertino, E
    Guerrini, G
    Mesiti, M
    [J]. INFORMATION SYSTEMS, 2004, 29 (01) : 23 - 46
  • [2] Bray T., 2004, Extensible Markup Language (XML) 1.0, VThird
  • [3] Approximate matching of XML document with regular hedge grammar
    Canfield, R
    Xing, GM
    [J]. INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2005, 82 (10) : 1191 - 1198
  • [4] Denoyer L., 2006, INEX 2006 LECT NOTES, V4518, P432
  • [5] Garofalakis M., 2000, P 2000 ACM SIGMOD IN, P165
  • [6] Hopcroft J., 2006, INTRO AUTOMATA THEOR, V3
  • [7] TIMBER: A native XML database
    Jagadish, HV
    Al-Khalifa, S
    Chapman, A
    Lakshmanan, LVS
    Nierman, A
    Paparizos, S
    Patel, JM
    Srivastava, D
    Wiwatwattana, N
    Wu, YQ
    Yu, C
    [J]. VLDB JOURNAL, 2002, 11 (04) : 274 - 291
  • [8] Exponential Stability Criteria for Uncertain Stochastic Systems
    Li Yumei
    Guan Xinping
    Peng Dan
    Luo Xiaoyuan
    [J]. PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 3, 2008, : 21 - 25
  • [9] Ugo Galassi and Attilio Giordana, 2005, SARA, P92
  • [10] Xing G., 2006, INEX 2006 LECT NOTES, V4518, P444