Hadoop Distributed File System for the Grid

被引:6
|
作者
Attebury, Garhan [1 ]
Baranovski, Andrew [2 ]
Bloom, Ken [1 ]
Bockelman, Brian [1 ]
Kcira, Dorian [3 ]
Letts, James [4 ]
Levshina, Tanya [2 ]
Lundestedt, Carl [1 ]
Martin, Terrence [4 ]
Maier, Will [5 ]
Pi, Haifeng [4 ]
Rana, Abhishek [4 ]
Sfiligoi, Igor [4 ]
Sim, Alexander [6 ]
Thomas, Michael [3 ]
Wuerthwein, Frank [4 ]
机构
[1] Univ Nebraska, Lincoln, NE 68583 USA
[2] Fermilab Natl Accelerator Lab, Batavia, IL 60510 USA
[3] CALTECH, Pasadena, CA 91125 USA
[4] Univ Calif San Diego, La Jolla, CA 92093 USA
[5] Univ Wisconsin, Madison, WI 53706 USA
[6] Lawrence Berkeley Natl Lab, Berkeley, CA USA
来源
2009 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD, VOLS 1-5 | 2009年
关键词
D O I
10.1109/NSSMIC.2009.5402426
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service framework for the Storage Element (SE). Scalability tests show that sustained high inter-Datallode data transfer can be achieved for the cluster fully loaded with data-processing jobs. The WAN transfer to HDFS supported by BeStMan and tuned GridFTP servers shows large scalability and robustness of the system. The hadoop client can be deployed at interactive machines to support remote data access. The ability to automatically replicate precious data is especially important for computing sites, which is demonstrated at the Large Hadron Collider (LHC) computing centers. The simplicity of operations of HDFS-based SE significantly reduces the cost of ownership of Petabyte scale data storage over alternative solutions.
引用
收藏
页码:1056 / +
页数:3
相关论文
共 50 条
  • [1] The Hadoop Distributed File System
    Shvachko, Konstantin
    Kuang, Hairong
    Radia, Sanjay
    Chansler, Robert
    2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [2] Research on Distributed File System with Hadoop
    Xu, JunWu
    Liang, JunLing
    NETWORK COMPUTING AND INFORMATION SECURITY, 2012, 345 : 148 - +
  • [3] The Evolution of the Hadoop Distributed File System
    Maneas, Stathis
    Schroeder, Bianca
    2018 32ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA), 2018, : 67 - 74
  • [4] Data Security in Hadoop Distributed File System
    Shetty, Madhvaraj M.
    Manjaiah, D. H.
    IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
  • [5] High Performance Hadoop Distributed File System
    Elkawkagy, Mohamed
    Elbeh, Heba
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2020, 8 (03) : 119 - 123
  • [6] Analytical Review on Hadoop Distributed File System
    Dwivedi, Kalpana
    Dubey, Sanjay Kumar
    2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 174 - 181
  • [7] Research on reliability of hadoop distributed file system
    Hu, Daming
    Chen, Deyun
    Lou, Shuhui
    Pei, Shujun
    International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11): : 315 - 326
  • [8] High Performance Hadoop Distributed File System
    Mohamed Elkawkagy
    Heba Elbeh
    International Journal of Networked and Distributed Computing, 2020, 8 : 119 - 123
  • [9] Performance Analysis of Hadoop Distributed File System Writing File Process
    Xie, Yunyue
    Farhan, Abobaker Mohammed Qasem
    Zhou, Meihua
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS), 2018, : 116 - 120
  • [10] Dynamic Deduplication Decision in a Hadoop Distributed File System
    Chang, Ruay-Shiung
    Liao, Chih-Shan
    Fan, Kuo-Zheng
    Wu, Chia-Ming
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2014,