Exploring Big Data with Helix: Finding Needles in a Big Haystack

被引:5
作者
Ellis, Jason [1 ]
Fokoue, Achille [1 ]
Hassanzadeh, Oktie [1 ]
Kementsietsidis, Anastasios [1 ]
Srinivas, Kavitha [1 ]
Ward, Michael J. [1 ]
机构
[1] IBM Res, Zurich, Switzerland
关键词
SEARCH;
D O I
10.1145/2737817.2737829
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While much work has focused on efficient processing of Big Data, little work considers how to understand them. In this paper, we describe Helix, a system for guided exploration of Big Data. Helix provides a unified view of sources, ranging from spreadsheets and XML files with no schema, all the way to RDF graphs and relational data with well-defined schemas. Helix users explore these heterogeneous data sources through a combination of keyword searches and navigation of linked web pages that include information about the schemas, as well as data and semantic links within and across sources. At a technical level, the paper describes the research challenges involved in developing Helix, along with a set of real-world usage scenarios and the lessons learned.
引用
收藏
页码:43 / 54
页数:12
相关论文
共 27 条
[1]  
[Anonymous], ISWC
[2]  
[Anonymous], WEBDB
[3]  
[Anonymous], EDBT
[4]  
[Anonymous], 2006, P 25 ACM SIGMOD SIGA, DOI [DOI 10.1145/1142351.1142352, 10.1145/1142351.1142352]
[5]  
[Anonymous], P WORKSH QUER PROC S
[6]  
[Anonymous], 2013, PVLDB
[7]  
[Anonymous], LDOW2009
[8]  
[Anonymous], ICDE
[9]  
Bornea Mihaela A., 2013, SIGMOD, P121, DOI DOI 10.1145/2463676.2463718
[10]   On the resemblance and containment of documents [J].
Broder, AZ .
COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, :21-29