Efficient Fuzzy Type-Ahead Search in XML Data

被引:12
作者
Feng, Jianhua [1 ]
Li, Guoliang [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci, Tsinghua Natl Lab Informat Sci & Technol TNList, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
XML; keyword search; type-ahead search; fuzzy search; KEYWORD PROXIMITY SEARCH; ALGORITHMS; DATABASES;
D O I
10.1109/TKDE.2010.264
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a traditional keyword-search system over XML data, a user composes a keyword query, submits it to the system, and retrieves relevant answers. In the case where the user has limited knowledge about the data, often the user feels "left in the dark" when issuing queries, and has to use a try-and-see approach for finding information. In this paper, we study fuzzy type-ahead search in XML data, a new information-access paradigm in which the system searches XML data on the fly as the user types in query keywords. It allows users to explore data as they type, even in the presence of minor errors of their keywords. Our proposed method has the following features: 1) Search as you type: It extends Autocomplete by supporting queries with multiple keywords in XML data. 2) Fuzzy: It can find high-quality answers that have keywords matching query keywords approximately. 3) Efficient: Our effective index structures and searching algorithms can achieve a very high interactive speed. We study research challenges in this new search framework. We propose effective index structures and top-k algorithms to achieve a high interactive speed. We examine effective ranking functions and early termination techniques to progressively identify the top-k relevant answers. We have implemented our method on real data sets, and the experimental results show that our method achieves high search efficiency and result quality.
引用
收藏
页码:882 / 895
页数:14
相关论文
共 57 条
[1]   DBXplorer: A system for keyword-based search over relational Databases [J].
Agrawal, S ;
Chaudhuri, S ;
Das, G .
18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, :5-16
[2]  
Amer-Yahia S, 2008, SIGMOD REC, V37, P46
[3]  
[Anonymous], 2007, Proceedings of the 2007 ACM SIGMOD international conference on Management of data
[4]  
[Anonymous], 2003, Proceedings of the 2003 ACM SIGMOD international conference on Management of data
[5]  
[Anonymous], 2006, Proceedings of ACM Symposium on Principles of Database Systems (PODS)
[6]  
[Anonymous], 2001, P NIPS 01
[7]  
[Anonymous], P ACM SIGMOD INT C M
[8]  
[Anonymous], 2004, Proceedings of the Thirtieth international conference on Very Large Databases-Volume
[9]  
[Anonymous], ICDE
[10]  
[Anonymous], 2009, Proceedings of the 18th international conference on World wide web, WWW '09, DOI 10.1145/1526709.1526760