A Big Data Approach to Black Friday Sales

被引:51
作者
Awan, Mazhar Javed [1 ,2 ]
Rahim, Mohd Shafry Mohd [2 ]
Nobanee, Haitham [3 ,4 ,5 ]
Yasin, Awais [6 ]
Khalaf, Osamah Ibrahim [7 ]
Ishfaq, Umer [2 ]
机构
[1] Univ Management & Technol, Dept Software Engn, Lahore, Pakistan
[2] Univ Teknol Malaysia, Fac Engn, Sch Comp, Johor Baharu, Malaysia
[3] Abu Dhabi Univ, Coll Business, Abu Dhabi, U Arab Emirates
[4] Univ Oxford, Oxford Ctr Islamic Studies, Marston Rd, Oxford, England
[5] Univ Liverpool, Management Sch, Liverpool, Merseyside, England
[6] Natl Univ Technol, Dept Comp Engn, Islamabad, Pakistan
[7] Al Nahrain Univ, AlNahrain Nanorenewable Energy Res Ctr, Baghdad, Iraq
关键词
Big data; correlation and regression analysis; machine learning; numerical algorithms; performance; prediction; Black Friday sales; cloud; CLASSIFICATION; ANALYTICS; BEHAVIOR;
D O I
10.32604/iasc.2021.014216
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retail companies recognize the need to analyze and predict their sales and customer behavior against their products and product categories. Our study aims to help retail companies create personalized deals and promotions for their customers, even during the COVID-19 pandemic, through a big data framework that allows them to handle massive sales volumes with more efficient models. In this paper, we used Black Friday sales data taken from a dataset on the Kaggle website, which contains nearly 550,000 observations analyzed with 10 features: qualitative and quantitative. The class label is purchases and sales (in U.S. dollars). Because the predictor label is continuous, regression models are suited in this case. Using the Apache Spark big data framework, which uses the MLlib machine learning library, we trained two machine learning models: linear regression and random forest. These machine learning algorithms were used to predict future pricing and sales. We first implemented a linear regression model and a random forest model without using the Spark framework and achieved accuracies of 68% and 74%, respectively. Then, we trained these models on the Spark machine learning big data framework where we achieved an accuracy of 72% for the linear regression model and 81% for the random forest model.
引用
收藏
页码:785 / 797
页数:13
相关论文
共 34 条
[1]   Streaming Linear Regression on Spark MLlib and MOA [J].
Akgun, Baris ;
Oguducu, Sule Gunduz .
PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015), 2015, :1244-1247
[2]   A Large-Scale Sentiment Data Classification for Online Reviews Under Apache Spark [J].
Al-Saqqa, Samar ;
Al-Naymat, Ghazi ;
Awajan, Arafat .
9TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN-2018) / 8TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2018), 2018, 141 :183-189
[3]  
Amato G., 2018, A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, P287
[4]  
[Anonymous], 2016, SCI ENG
[5]  
Armel A., 2019, 2019 5 INT C OPTIMIZ, P1, DOI [10.1109/icoa.2019.8727610, DOI 10.1109/ICOA.2019.8727610]
[6]  
Assefi M, 2017, IEEE INT CONF BIG DA, P3492
[7]   Using Behavioral Analytics to Predict Customer Invoice Payment [J].
Bahrami, Mohsen ;
Bozkaya, Burcin ;
Balcisoy, Selim .
BIG DATA, 2020, 8 (01) :25-37
[8]  
Bradley J. K., 2015, SPARK SUMMIT E, V1, P15
[9]  
Catal C., 2019, Balkan Journal of Electrical and Computer Engineering, V7, P20, DOI DOI 10.17694/BAJECE.494920
[10]  
Chen HC, 2012, MIS QUART, V36, P1165