An information-theoretic approach to normal forms for relational and XML data

被引:44
作者
Arenas, M [1 ]
Libkin, L [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
关键词
information theory; design; normal forms; normalization algorithms; relational databases; XML;
D O I
10.1145/1059513.1059519
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Normalization as a way of producing good relational database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While, in the relational world, the criteria for being well designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models. Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.
引用
收藏
页码:246 / 283
页数:38
相关论文
共 30 条
[11]  
Fagin R., 1977, ACM Transactions on Database Systems, V2, P262, DOI 10.1145/320557.320571
[12]   A NORMAL-FORM FOR RELATIONAL DATABASES THAT IS BASED ON DOMAINS AND KEYS [J].
FAGIN, R .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 1981, 6 (03) :387-415
[13]  
Fagin Ronald, 1979, P 1979 ACM SIGMOD IN, P153, DOI [10.1145/582095.582120, DOI 10.1145/582095.582120]
[14]   RELATIVE INFORMATION CAPACITY OF SIMPLE RELATIONAL DATABASE SCHEMATA [J].
HULL, R .
SIAM JOURNAL ON COMPUTING, 1986, 15 (03) :856-886
[15]  
KANELLAKIS PC, 1990, HDB THEORETICAL COMP, VB, P1075
[17]   Why is the snowflake schema a good data warehouse design? [J].
Levene, M ;
Loizou, G .
INFORMATION SYSTEMS, 2003, 28 (03) :225-240
[18]   Justification for inclusion dependency normal form [J].
Levene, M ;
Vincent, MV .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (02) :281-291
[19]  
LEY M, 2003, DBLP
[20]  
Maier D., 1979, ACM Transactions on Database Systems, V4, P455, DOI 10.1145/320107.320115