Performance Analysis of Statistical and Supervised Learning Techniques in Stock Data Mining

被引:30
作者
Sharma, Manik [1 ]
Sharma, Samriti [2 ]
Singh, Gurvinder [2 ]
机构
[1] DAV Univ, Dept Comp Sci & Applicat, Jalandhar 144401, India
[2] Guru Nanak Dev Univ, Dept Comp Sci, Amritar 143001, India
来源
DATA | 2018年 / 3卷 / 04期
关键词
stock forecasting; naive Bayes; C4.5; random forest; logistic regression; support vector machine;
D O I
10.3390/data3040054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, overwhelming stock data is available, which areonly of use if it is properly examined and mined. In this paper, the last twelve years of ICICI Bank's stock data have been extensively examined using statistical and supervised learning techniques. This study may be of great interest for those who wish to mine or study the stock data of banks or any financial organization. Different statistical measures have been computed to explore the nature, range, distribution, and deviation of data. The different descriptive statistical measures assist in finding different valuable metrics such as mean, variance, skewness, kurtosis, p-value, a-squared, and 95% confidence mean interval level of ICICI Bank's stock data. Moreover, daily percentage changes occurring over the last 12 years have also been recorded and examined. Additionally, the intraday stock status has been mined using ten different classifiers. The performance of different classifiers has been evaluated on the basis of various parameters such as accuracy, misclassification rate, precision, recall, specificity, and sensitivity. Based upon different parameters, the predictive results obtained using logistic regression are more acceptable than the outcomes of other classifiers, whereas naive Bayes, C4.5, random forest, linear discriminant, and cubic support vector machine (SVM) merely act as a random guessing machine. The outstanding performance of logistic regression has been validated using TOPSIS (technique for order preference by similarity to ideal solution) and WSA (weighted sum approach).
引用
收藏
页数:16
相关论文
共 52 条
[1]  
Al-Radaideh Q.A., 2013, P INT AR C INF TECHN, P1
[2]   Application of data mining: Diabetes health care in young and old patients [J].
Aljumah, Abdullah A. ;
Ahamad, Mohammed Gulam ;
Siddiqui, Mohammad Khubeb .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2013, 25 (02) :127-136
[3]   Text mining and its potential applications in systems biology [J].
Ananiadou, Sophia ;
Kell, Douglas B. ;
Tsujii, Jun-ichi .
TRENDS IN BIOTECHNOLOGY, 2006, 24 (12) :571-579
[4]   Performance Evaluation of Supervised Machine Learning Algorithms for Intrusion Detection [J].
Belavagi, Manjula C. ;
Muniyal, Balachandra .
TWELFTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2016 / TWELFTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2016 / TWELFTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2016, 2016, 89 :117-123
[5]  
Bhargavi P, 2009, INT J COMPUT SCI NET, V9, P117
[6]   CLUSTERING AND REGRESSION TECHNIQUES FOR STOCK PREDICTION [J].
Bini, B. S. ;
Mathew, Tessy .
INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING, SCIENCE AND TECHNOLOGY (ICETEST - 2015), 2016, 24 :1248-1255
[7]   Image Segmentation Based on Statistical Confidence Intervals [J].
Buenestado, Pablo ;
Acho, Leonardo .
ENTROPY, 2018, 20 (01)
[8]   A Statistical Primer: Understanding Descriptive and Inferential Statistics [J].
Byrne, Gillian .
EVIDENCE BASED LIBRARY AND INFORMATION PRACTICE, 2007, 2 (01) :32-47
[9]  
Chandralekha M., 2018, APPL MATH INFORM SCI, V12, P217, DOI [10.18576/amis/120121, DOI 10.18576/AMIS/120121]
[10]   An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree [J].
Chen, Fu-Hsiang ;
Howard, Hu .
SOFT COMPUTING, 2016, 20 (05) :1945-1960