Study of Distributed Framework Hadoop and Overview of Machine Learning using Apache Mahout

被引:0
|
作者
Solanki, Raxitkumar [1 ]
Ravilla, Sree Harsha [1 ]
Bein, Doina [1 ]
机构
[1] Calif State Univ Fullerton, Dept Comp Sci, Fullerton, CA 92634 USA
来源
2019 IEEE 9TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC) | 2019年
关键词
Hadoop; Hadoop Commons; Map Reduce; HDFS (Hadoop distributed file system); Yarn (yet another resource negotiator); Apache Mahout;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The amount of data generated every day in digital format is overwhelming, so we need storage mechanisms to store it and manage it. The technical solutions to store and manage the data should be scalable to allow extraction of relevant information and analysis. We describe the initial steps on using Apache Mahout to find out the total number of books written by authors of different age groups, analyzing the patterns in the authors' age, and predicting which age group have authored the highest number of books in different calendar years. Publishing houses and literary agents can use our proposed software.
引用
收藏
页码:252 / 257
页数:6
相关论文
共 50 条
  • [1] Apache Mahout: Machine Learning on Distributed Dataflow Systems
    Anil, Robin
    Capan, Gokhan
    Drost-Fromm, Isabel
    Dunning, Ted
    Friedman, Ellen
    Grant, Trevor
    Quinn, Shannon
    Ranjan, Paritosh
    Schelter, Sebastian
    Yilmazel, Ozgur
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [2] Apache Mahout: Machine Learning on Distributed Data ow Systems
    Anil, Robin
    Capan, Gokhan
    Drost-Fromm, Isabel
    Dunning, Ted
    Friedman, Ellen
    Grant, Trevor
    Quinn, Shannon
    Ranjan, Paritosh
    Schelter, Sebastian
    Ylmazel, Ozgur
    Journal of Machine Learning Research, 2020, 21
  • [3] Retrieval and extraction of Unique Patterns from Compressed Text Data using the SVD Technique on Hadoop Apache Mahout Framework
    Dhumal, Poonam
    Deshmukh, S. S.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,
  • [4] Apache Hadoop Based Distributed Denial of Service Detection Framework
    Patil, Nilesh Vishwasrao
    Krishna, C. Rama
    Kumar, Krishan
    INFORMATION, COMMUNICATION AND COMPUTING TECHNOLOGY (ICICCT 2019), 2019, 1025 : 25 - 35
  • [5] Predicting Diabetes using Distributed Machine Learning based on Apache Spark
    Ahmed, Hager
    Younis, Eman M. G.
    Ali, Abdelmgeid A.
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, : 44 - 49
  • [6] Model averaging in distributed machine learning: a case study with Apache Spark
    Guo, Yunyan
    Zhang, Zhipeng
    Jiang, Jiawei
    Wu, Wentao
    Zhang, Ce
    Cui, Bin
    Li, Jianzhong
    VLDB JOURNAL, 2021, 30 (04): : 693 - 712
  • [7] Content Based Audiobooks Indexing using Apache Hadoop Framework
    Shetty, Sonal
    Sabarad, Akash
    Hebballi, Harish
    Husain, Moula
    Meena, S. M.
    Nagaralli, Shiddu
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 496 - 501
  • [8] Model averaging in distributed machine learning: a case study with Apache Spark
    Yunyan Guo
    Zhipeng Zhang
    Jiawei Jiang
    Wentao Wu
    Ce Zhang
    Bin Cui
    Jianzhong Li
    The VLDB Journal, 2021, 30 : 693 - 712
  • [9] Color and Texture Feature Extraction using Apache Hadoop Framework
    Sabarad, Akash K.
    Kankudti, Mohamed Humair
    Meena, S. M.
    Husain, Moula
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 585 - 588
  • [10] PRACTICAL RESULTS USING APACHE HADOOP PLATFORM FOR DISTRIBUTED AND PARALLEL COMPUTING
    Toma, Cristian
    INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY, 2012, : 30 - 35