Toward Scalable Indexing and Search on Distributed and Unstructured Data

被引:2
|
作者
Orhean, Alexandru Iulian [1 ]
Ijagbone, Itua [1 ]
Raicu, Ioan [1 ]
Chard, Kyle [2 ]
Zhao, Dongfang [3 ]
机构
[1] IIT, Dept Comp Sci, Chicago, IL 60616 USA
[2] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[3] Univ Nevada, CSE Dept, Reno, NV 89557 USA
基金
美国国家科学基金会;
关键词
indexing methods; distributed file systems; unstructured data search;
D O I
10.1109/BigDataCongress.2017.14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ubiquity of Big Data has greatly influenced the direction and the development of storage technologies. To meet the needs of storing and analyzing Big Data, researchers and administrators have turned to parallel and distributed storage and compute architectures. While the problems of securely and consistently storing and accessing data in large parallel and distributed file systems have been largely addressed in both the research and production systems, efficiently searching across large unstructured data and metadata has largely been overlooked. According to the International Data Corporation, more than 90% of data found in the digital universe is unstructured, emphasizing the importance of developing efficient solutions for querying distributed data. This paper proposes a novel indexing solution, called FusionDex, that provides an efficient model for querying across distributed file systems. FusionDex leverages state-of-the-art, open-source indexing modules as its building blocks to deliver an integrated system for enabling efficient user-specified queries over distributed and unstructured data. FusionDex has been evaluated on a cluster of 64 nodes, and results show that it outperforms existing tools (in some cases by several orders of magnitude), such as Hadoop Grep and Cloudera Search.
引用
收藏
页码:31 / 38
页数:8
相关论文
共 50 条
  • [21] Scalable Distributed Data Anonymization
    di Vimercati, Sabrina De Capitani
    Facchinetti, Dario
    Foresti, Sara
    Oldani, Gianluca
    Paraboschi, Stefano
    Rossi, Matthew
    Samarati, Pierangela
    2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 401 - 403
  • [22] Scalable distributed data fusion
    Nicholson, D
    Lloyd, CM
    Julier, SJ
    Uhlmann, JK
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOL I, 2002, : 630 - 635
  • [23] A Scalable Distributed Private Stream Search System
    Zhang, Peng
    Li, Yan
    Liu, Qingyun
    Lin, Hailun
    2015 IEEE 35TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2015, : 128 - 135
  • [24] Efficient and scalable indexing techniques for biological sequence data
    Halachev, Mihail
    Shiri, Nematollaah
    Thamildurai, Anand
    BIOINFORMATICS RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2007, 4414 : 464 - +
  • [25] A scalable search algorithm on unstructured P2P networks
    Yuan, Fuyong
    Liu, Jian
    Yin, Chunxia
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 2, PROCEEDINGS, 2007, : 199 - +
  • [26] TOWARD SOFTWARE FOR OPEN, DISTRIBUTED AND SCALABLE SYSTEMS
    GILG, A
    SIEMENS REVIEW, 1994, : 33 - 36
  • [27] Toward expressive and scalable sponsored search auctions
    Martin, David J.
    Gehrke, Johannes
    Halpern, Joseph Y.
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 237 - +
  • [28] Highly Scalable Algorithm For Distributed Real-Time Text Indexing
    Narang, Ankur
    Agarwal, Vikas
    Kedia, Monu
    Garg, Vijay K.
    16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), PROCEEDINGS, 2009, : 332 - 341
  • [29] CINF 32-SeerSuite for distributed indexing, federated search, and meta search
    Giles, C. Lee
    Mitra, Prasenjit
    Mueller, Karl
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2009, 238
  • [30] K-anonymization as spatial indexing: Toward scalable and incremental anonymization
    Iwuchukwu, Tochukwu
    DeWitt, David J.
    Doan, Anhai
    Naughton, Jeffrey F.
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1389 - +