Classification of Textual Data in Distributed Environment

被引:0
作者
Gupta, Sonali [1 ]
Kumar, Vidit [1 ]
Pant, Bhaskar [1 ]
机构
[1] Graph Era Deemed Univ, Comp Sci & Engn, Dehra Dun, Uttarakhand, India
来源
2018 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, CONTROL AND COMMUNICATION TECHNOLOGY (IAC3T) | 2018年
关键词
Hadoop; Big Data; Mahout; Machine learning; Naive Bayes;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nowadays data is generating at a very fast pace through internet usage and other sources in large volumes termed as Big Data. A large portion of generated data is in text form collected through emails, blogs, social networking sites, e-commerce reviews etc. which requires deep analysis to extract meaningful patterns from it for applications such as business decision making, social media monitoring, spam detection etc. This results in incapability for processing and storing this data. So it must be handled or processed using parallel computing tools and machine learning algorithms. In this work, we have used Naive Bayes classifier to classify textual data in Hadoop environment using Mahout. This experiment is carried out by using 20 news group dataset and achieved accuracy with 88.38%. After evaluating results we have found that when we increase the number of Hadoop clusters, the processing speed on clusters increase as Apache Hadoop can process large volume of datasets efficiently using map-reduce paradigm.
引用
收藏
页码:120 / 124
页数:5
相关论文
共 23 条
  • [1] Efficient Machine Learning for Big Data: A Review
    Al-Jarrah, Omar Y.
    Yoo, Paul D.
    Muhaidat, Sami
    Karagiannidis, George K.
    Taha, Kamal
    [J]. BIG DATA RESEARCH, 2015, 2 (03) : 87 - 93
  • [2] [Anonymous], EMOTION
  • [3] [Anonymous], 2014, arXiv preprint arXiv:1410.5329
  • [4] Big Data: A Survey
    Chen, Min
    Mao, Shiwen
    Liu, Yunhao
    [J]. MOBILE NETWORKS & APPLICATIONS, 2014, 19 (02) : 171 - 209
  • [5] Condie T, 2013, PROC INT CONF DATA, P1242, DOI 10.1109/ICDE.2013.6544913
  • [6] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [7] Dev AV, 2016, 2016 INTERNATIONAL CONFERENCE ON NEXT GENERATION INTELLIGENT SYSTEMS (ICNGIS), P303
  • [8] Eluri V.R., 2016, International Conference on Big Data and Smart City. MEC, P1, DOI [10.1109/icbdsc.2016.7460397, DOI 10.1109/ICBDSC.2016.7460397]
  • [9] Integrating associative rule-based classification with Naive Bayes for text classification
    Hadi, Wa'el
    Al-Radaideh, Qasem A.
    Alhawari, Samer
    [J]. APPLIED SOFT COMPUTING, 2018, 69 : 344 - 356
  • [10] Aspect-based Sentiment Analysis to Review Products Using Naive Bayes
    Mubarok, Mohamad Syahrul
    Adiwijaya
    Aldhi, Muhammad Dwi
    [J]. INTERNATIONAL CONFERENCE ON MATHEMATICS: PURE, APPLIED AND COMPUTATION: EMPOWERING ENGINEERING USING MATHEMATICS, 2017, 1867