A case study of distributed information retrieval architectures to index one terabyte of text

被引:16
|
作者
Cacheda, F
Plachouras, V
Ounis, I
机构
[1] Univ A Coruna, Fac Informat, Dept Informat & Commun Technol, La Coruna 15071, Spain
[2] Univ Glasgow, Dept Comp Sci, Glasgow G12 8QQ, Lanark, Scotland
基金
英国工程与自然科学研究理事会;
关键词
distributed information retrieval; performance; simulation;
D O I
10.1016/j.ipm.2004.05.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing number of documents to be indexed in many environments (Web, intranets, digital libraries) and the limitations of a single centralised index (lack of scalability, server overloading and failures), lead to the use of distributed information retrieval systems to efficiently search and locate the desired information. This work is a case study of different architectures for a distributed information retrieval system, in order to provide a guide to approximate the optimal architecture with a specific set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture simulating a variable number of workstations (from I up to 4096). A collection of approximately 94 million documents and I terabyte (TB) of text is used to test the performance of the different architectures. In a purely distributed information retrieval system, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a high number of query servers is used, essentially due to the reduction of the network load. However a change in the distribution of the users' queries could reduce the performance of a clustered system. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1141 / 1161
页数:21
相关论文
共 50 条
  • [1] Performance analysis of distributed architectures to index one terabyte of text
    Cacheda, F
    Plachouras, V
    Ounis, I
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2004, 2997 : 394 - 408
  • [2] A cache-based distributed terabyte text retrieval system in CADAL
    Cheng, J
    Gao, W
    Liu, B
    Huang, TJ
    Zhang, L
    DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS, 2002, 2555 : 352 - 353
  • [3] Network analysis for distributed information retrieval architectures
    Cacheda, F
    Carneiro, V
    Plachouras, V
    Ounis, I
    ADVANCES IN INFORMATION RETRIEVAL, 2005, 3408 : 527 - 529
  • [4] Evaluating the performance of distributed architectures for information retrieval using a variety of workloads
    Cahoon, B
    McKinley, KS
    Lu, ZH
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2000, 18 (01) : 1 - 43
  • [5] Information Retrieval of Distributed Databases A Case Study: Search Engines Systems
    Alahmadi, Sarah Hamed
    2018 1ST INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS' 2018), 2018,
  • [6] Study on Text Semantic Similarity in Information Retrieval
    rong, Feng Shao
    jun, Xiao Wen
    2008 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, VOLS 1-4, 2008, : 713 - 717
  • [7] A CASE-STUDY OF CACHING STRATEGIES FOR A DISTRIBUTED FULL TEXT RETRIEVAL-SYSTEM
    MARTIN, TP
    MACLEOD, IA
    RUSSELL, JI
    LEESE, K
    FOSTER, B
    INFORMATION PROCESSING & MANAGEMENT, 1990, 26 (02) : 227 - 247
  • [8] Enrichment of text documents using information retrieval techniques in a distributed environment
    Bueno, Francisco
    Garcia-Serrano, Ana
    Martinez-Fernandez, Jose L.
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) : 8348 - 8358
  • [9] Neural-network-based metalearning for distributed text information retrieval
    Lai, Kin Keung
    Yu, Lean
    Wang, Shouyang
    Huang, Wei
    2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 1302 - 1309
  • [10] The Infinite Index: Information Retrieval on Generative Text-To-Image Models
    Deckers, Niklas
    Froebe, Maik
    Kiesel, Johannes
    Pandolfo, Gianluca
    Schroeder, Christopher
    Stein, Benno
    Potthast, Martin
    PROCEEDINGS OF THE 2023 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL, CHIIR 2023, 2023, : 172 - 186