Toward Scalable Indexing and Search on Distributed and Unstructured Data

被引:2
|
作者
Orhean, Alexandru Iulian [1 ]
Ijagbone, Itua [1 ]
Raicu, Ioan [1 ]
Chard, Kyle [2 ]
Zhao, Dongfang [3 ]
机构
[1] IIT, Dept Comp Sci, Chicago, IL 60616 USA
[2] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[3] Univ Nevada, CSE Dept, Reno, NV 89557 USA
基金
美国国家科学基金会;
关键词
indexing methods; distributed file systems; unstructured data search;
D O I
10.1109/BigDataCongress.2017.14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ubiquity of Big Data has greatly influenced the direction and the development of storage technologies. To meet the needs of storing and analyzing Big Data, researchers and administrators have turned to parallel and distributed storage and compute architectures. While the problems of securely and consistently storing and accessing data in large parallel and distributed file systems have been largely addressed in both the research and production systems, efficiently searching across large unstructured data and metadata has largely been overlooked. According to the International Data Corporation, more than 90% of data found in the digital universe is unstructured, emphasizing the importance of developing efficient solutions for querying distributed data. This paper proposes a novel indexing solution, called FusionDex, that provides an efficient model for querying across distributed file systems. FusionDex leverages state-of-the-art, open-source indexing modules as its building blocks to deliver an integrated system for enabling efficient user-specified queries over distributed and unstructured data. FusionDex has been evaluated on a cluster of 64 nodes, and results show that it outperforms existing tools (in some cases by several orders of magnitude), such as Hadoop Grep and Cloudera Search.
引用
收藏
页码:31 / 38
页数:8
相关论文
共 50 条
  • [1] Scalable Multimodal Search with Distributed Indexing by Sparse Hashing
    Mourao, Andre
    Magalhaes, Joao
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 283 - 290
  • [2] An adaptive and scalable middleware for distributed indexing of data streams
    Bulut, A
    Vitenberg, R
    Emekçi, F
    Singh, AK
    DATABASES, INFORMATION SYSTEMS, AND PEER-TO-PEER COMPUTING, 2004, 2944 : 123 - 137
  • [3] Scalable distributed indexing and query processing over Linked Data
    Karnstedt, Marcel
    Sattler, Kai-Uwe
    Hauswirth, Manfred
    JOURNAL OF WEB SEMANTICS, 2012, 10 : 3 - 32
  • [4] Toward Scalable Keyword Search over Relational Data
    Baid, Akanksha
    Rae, Ian
    Li, Jiexing
    Doan, AnHai
    Naughton, Jeffrey
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 140 - 149
  • [5] Scalable indexing for perceptual data
    Qamra, Arun
    Chang, Edward Y.
    MULTIMEDIA CONTENT ANALYSIS AND MINING, PROCEEDINGS, 2007, 4577 : 24 - +
  • [6] On Unstructured Distributed Search over BitTorrent
    Mayor, William
    Cox, Ingemar
    13TH IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P), 2013,
  • [7] Efficient Indexing and Searching Framework for Unstructured Data
    Aye, Kyar Nyo
    Thein, Ni Lar
    FOURTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2011): MACHINE VISION, IMAGE PROCESSING, AND PATTERN ANALYSIS, 2012, 8349
  • [8] Learning to Distribute Vocabulary Indexing for Scalable Visual Search
    Ji, Rongrong
    Duan, Ling-Yu
    Chen, Jie
    Xie, Lexing
    Yao, Hongxun
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (01) : 153 - 166
  • [9] Toward Scalable Indexing for Top-k Queries
    Lee, Jongwuk
    Cho, Hyunsouk
    Lee, Sunyou
    Hwang, Seung-Won
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (12) : 3103 - 3116
  • [10] Indexing and retrieving DICOM data in disperse and unstructured archives
    Costa, Carlos
    Freitas, Filipe
    Pereira, Marco
    Silva, Augusto
    Oliveira, Jose L.
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2009, 4 (01) : 71 - 77