Online News Popularity Prediction

被引:0
作者
Namous, Feras [1 ]
Rodan, Ali [2 ]
Javed, Yasir [2 ]
机构
[1] Univ Jordan, King Abdallah II Sch Sci & Technol, Amman, Jordan
[2] Higher Coll Technol, Al Ain Womens Coll, Abu Dhabi, U Arab Emirates
来源
2018 FIFTH HCT INFORMATION TECHNOLOGY TRENDS (ITT): EMERGING TECHNOLOGIES FOR ARTIFICIAL INTELLIGENCE | 2018年
关键词
Machine Learning; Classification; Popularity prediction; Feature selection; Model selection; CLASSIFICATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Working with data mining algorithms in the large dataset is very common and especially with the expansion of the online news, it became very useful. Neural Networks, Random Forest, Support Vector Machines (SVM), Naive Bayes and others are the most common mining algorithms used for classification. In this research, we aimed to find the best model and set of features to predict the popularity of online news, using machine-learning techniques and implement various data mining algorithms on the selected features. The data source was Mashable, a well-known online news website. Precision, Recall, and F-measure were used to evaluate the results and their results were compared to find the better one. In addition, we compared with previous works on the same dataset. Random Forest and Neural Network turn out to be the best model for prediction, and both of them can achieve an accuracy of 65% with optimal parameters. Our work can help online news companies to predict news popularity before publication.
引用
收藏
页码:180 / 184
页数:5
相关论文
共 11 条
[1]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[2]   A review of supervised machine learning algorithms and their applications to ecological data [J].
Crisci, C. ;
Ghattas, B. ;
Perera, G. .
ECOLOGICAL MODELLING, 2012, 240 :113-122
[3]   A K-NEAREST NEIGHBOR CLASSIFICATION RULE-BASED ON DEMPSTER-SHAFER THEORY [J].
DENOEUX, T .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1995, 25 (05) :804-813
[4]   An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization [J].
Dietterich, TG .
MACHINE LEARNING, 2000, 40 (02) :139-157
[5]   A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News [J].
Fernandes, Kelwin ;
Vinagre, Pedro ;
Cortez, Paulo .
PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 :535-546
[6]   Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression [J].
Lemon, SC ;
Roy, J ;
Clark, MA ;
Friedmann, PD ;
Rakowski, W .
ANNALS OF BEHAVIORAL MEDICINE, 2003, 26 (03) :172-181
[7]   AdaBoost classifiers for pecan defect classification [J].
Mathanker, S. K. ;
Weckler, P. R. ;
Bowser, T. J. ;
Wang, N. ;
Maness, N. O. .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2011, 77 (01) :60-68
[8]  
Ray S., 2016, Analytics Vidhya
[9]  
Ren H., 2012, PREDICTING EVALUATIN
[10]   An assessment of the effectiveness of a random forest classifier for land-cover classification [J].
Rodriguez-Galiano, V. F. ;
Ghimire, B. ;
Rogan, J. ;
Chica-Olmo, M. ;
Rigol-Sanchez, J. P. .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2012, 67 :93-104