Summarizing Large-Scale Database Schema Using Community Detection

被引:0
作者
王雪
周烜
王珊
机构
[1] School of Information,Renmin University of China
[2] Key Laboratory of Data Engineering and Knowledge Engineering,Renmin University of China
基金
中国国家自然科学基金;
关键词
schema; summarization; large scale; community detection;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
Schema summarization on large-scale databases is a challenge.In a typical large database schema,a great proportion of the tables are closely connected through a few high degree tables.It is thus difficult to separate these tables into clusters that represent different topics.Moreover,as a schema can be very big,the schema summary needs to be structured into multiple levels,to further improve the usability.In this paper,we introduce a new schema summarization approach utilizing the techniques of community detection in social networks.Our approach contains three steps.First,we use a community detection algorithm to divide a database schema into subject groups,each representing a specific subject.Second,we cluster the subject groups into abstract domains to form a multi-level navigation structure.Third,we discover representative tables in each cluster to label the schema summary.We evaluate our approach on Freebase,a real world large-scale database.The results show that our approach can identify subject groups precisely.The generated abstract schema layers are very helpful for users to explore database.
引用
收藏
页码:515 / 526
页数:12
相关论文
共 5 条