An information-theoretic approach to normal forms for relational and XML data

被引:44
作者
Arenas, M [1 ]
Libkin, L [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
关键词
information theory; design; normal forms; normalization algorithms; relational databases; XML;
D O I
10.1145/1059513.1059519
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Normalization as a way of producing good relational database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While, in the relational world, the criteria for being well designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models. Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.
引用
收藏
页码:246 / 283
页数:38
相关论文
共 30 条
[1]  
Abiteboul S., 1995, Foundations of databases, V1st
[2]  
Albert J, 1999, J COMPUT SYST SCI, V58, P512, DOI 10.1006/jcss.1998.1628
[3]   A normal form for XML documents [J].
Arenas, M ;
Libkin, L .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2004, 29 (01) :195-232
[4]  
Arenas M., 2002, P 21 ACM SIGACT SIGM, P85, DOI [10.1145/543613, DOI 10.1145/543613]
[5]  
Beeri C., 1978, Proceedings of the Fourth International Conference on Very Large Data Bases, P113
[6]  
BISKUP J, 1995, SEMANTICS DATABASES, P29
[7]  
Cavallo R., 1987, Proceedings of the Thirteenth International Conference on Very Large Data Bases: 1987 13th VLDB, P71
[8]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[9]  
Dalkilic M.M., 2000, P 19 ACM SIGMOD SIGA, P245
[10]  
EMBLEY D, 2001, P 20 INT C CONC MOD, P426