An Approach in Big Data Analytics to Improve the Velocity of Unstructured Data Using MapReduce

被引:3
作者
Sundarakumar, M. R. [1 ]
Mahadevan, G. [2 ]
Somula, Ramasubbareddy [3 ]
Sennan, Sankar [4 ]
Rawal, Bharat S. [5 ]
机构
[1] AMC Engn Coll, Dept Comp Sci & Engn, Bengaluru, India
[2] AMC Engn Coll, Bengaluru, India
[3] VNRVJIET, Secunderabad, India
[4] Sona Coll Technol, Salem, India
[5] Gannon Univ, Dept Cyber Secur, Erie, PA USA
关键词
Big Data; CESI; MapReduce; MRBNGS;
D O I
10.4018/IJSDA.20211001.oa6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data analytics is an innovative approach to extract the data from a huge volume of data warehouse systems. Hadoop is a framework, which is used to perform high speed data retrieval from various clusters by MapReduce and HDFS methods. The huge volumes of files are accessed using data mining, machine learning, and deep learning algorithms. However, these techniques take more time to retrieve the data among the clusters. To overcome the latency issue, the proposed work applies the hybrid algorithm, namely compressed elastic search index (CESI) and MapReduce-based next generation sequencing approach (MRBNGSA), in scheduling and shuffling phase. This proposed approach provides the tangible changes over the MapReduce phases. The performance of the proposed CESI-MRBNGSA algorithm provides significant performance than Hadoop BAM and GATK.
引用
收藏
页数:25
相关论文
共 35 条
[1]   WHAD: Wikipedia historical attributes data Historical structured data extraction and vandalism detection from the Wikipedia edit history [J].
Alfonseca, Enrique ;
Garrido, Guillermo ;
Delort, Jean-Yves ;
Penas, Anselmo .
LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (04) :1163-1190
[2]   Cloud-Based Access Control Framework for Effective Role Provisioning in Business Application [J].
Auxilia, M. ;
Raja, K. ;
Kannan, K. .
INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2020, 9 (01) :63-80
[3]  
Duggal P.S., 2013, INT C CLOUD BIG DATA, V15, P269
[4]  
Dutta P, 2017, INT J SYST DYN APPL, V6, P63, DOI 10.4018/IJSDA.2017100104
[5]  
Elfouly FH, 2017, INT J SYST DYN APPL, V6, P38, DOI 10.4018/IJSDA.2017010103
[6]   iShuffle: Improving Hadoop Performance with Shuffle-on-Write [J].
Guo, Yanfei ;
Rao, Jia ;
Cheng, Dazhao ;
Zhou, Xiaobo .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) :1649-1662
[7]   The 10 Vs, Issues and Challenges of Big Data [J].
Khan, Nawsher ;
Alsaqer, Mohammed ;
Shah, Habib ;
Badsha, Gran ;
Abbasi, Aftab Ahmad ;
Salehian, Soulmaz .
PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON BIG DATA AND EDUCATION (ICBDE 2018), 2018, :52-56
[8]   An Improved Algorithm for Optimizing MapReduce Based on Locality and Overlapping [J].
Li, Jianjiang ;
Wang, Jie ;
Lyu, Bin ;
Wu, Jie ;
Yang, Xiaolei .
TSINGHUA SCIENCE AND TECHNOLOGY, 2018, 23 (06) :744-753
[9]   Speeding-Up Association Rule Mining With Inverted Index Compression [J].
Maria Luna, Jose ;
Cano, Alberto ;
Pechenizkiy, Mykola ;
Ventura, Sebastian .
IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (12) :3059-3072
[10]   An Efficient Feed Foreword Network Model with Sine Cosine Algorithm for Breast Cancer Classification [J].
Majhi, Santosh Kumar .
INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2018, 7 (02) :1-14