Data mining approach for accelerating the classification accuracy of cardiotocography

被引:11
作者
Potharaju, Sai Prasad [1 ]
Sreedevi, M. [1 ]
Ande, Vinay Kumar [2 ]
Tirandasu, Ravi Kumar [2 ]
机构
[1] KL Univ, Dept CSE, Guntur, AP, India
[2] Sanjivani Coll Engn, Dept Comp Engn, Kopargaon, MH, India
来源
CLINICAL EPIDEMIOLOGY AND GLOBAL HEALTH | 2019年 / 7卷 / 02期
关键词
Balanced; Imbalanced; Lazy learners; SMOTE; Rule based; Tree based;
D O I
10.1016/j.cegh.2018.03.004
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objective: The objective of current study is to increase the classification accuracy of learning algorithms over cardiotocography data by applying preprocessing technique. Due to the diversity of sources, large amount of data is being generated and also has various problems including mislabeled data, missing values, noise, high dimensionality and imbalanced class labels. Method: In this study, we suggested a technique to handle imbalanced data to increase the classification performance of various lazy learners, rule based induction models and tree based models. We used Symmetric Minority Over Sampling Technique (SMOTE) on real dataset to accelerate the performance of various classifiers. We identified that primary dataset is suffering with imbalanced problem, which means the most of the records belong to same class label. Prediction of imbalanced data is biased towards the class with majority instances. To overcome this problem, dataset has to be balanced. Results: As a result of the suggested method the performance of classification algorithms are increased. The obtained result show that majority of classification techniques performed better over balanced dataset when compared with imbalanced dataset. Conclusion: Classification performance over balanced dataset has recorded improved performance than imbalanced dataset after applying the SMOTE.
引用
收藏
页码:160 / 164
页数:5
相关论文
共 21 条
[1]  
Aldhoayan M, 2016, COMP ADV BIO MED SCI, P1
[2]  
Alsayat A, 2016, 2016 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATIONS (SERA), P45, DOI 10.1109/SERA.2016.7516127
[3]   Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules [J].
Anooj, P. K. .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2012, 24 (01) :27-40
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]  
Danades A, 2016, INT CONF SYST ENG, P137, DOI 10.1109/ICSEngT.2016.7849638
[6]  
Gandhi M, 2015, 2015 1ST INTERNATIONAL CONFERENCE ON FUTURISTIC TRENDS ON COMPUTATIONAL ANALYSIS AND KNOWLEDGE MANAGEMENT (ABLAZE), P520, DOI 10.1109/ABLAZE.2015.7154917
[7]   Single stage and multistage classification models for the prediction of liver fibrosis degree in patients with chronic hepatitis C infection [J].
Hashem, Ahmed M. ;
Rasmy, M. Emad M. ;
Wahba, Khaled M. ;
Shaker, Olfat G. .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2012, 105 (03) :194-209
[8]   Efficient Data Mining Method to Predict the Risk of Heart Diseases through Frequent Itemsets [J].
Ilayaraja, M. ;
Meyyappan, T. .
PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS, 2015, 70 :586-592
[9]  
Johnson B, 2012, IEEE SYS MAN CYBERN, P408, DOI 10.1109/ICSMC.2012.6377735
[10]   A fraud detection approach with data mining in health insurance [J].
Kirlidog, Melih ;
Asuk, Cuneyt .
WORLD CONFERENCE ON BUSINESS, ECONOMICS AND MANAGEMENT (BEM-2012), 2012, 62 :989-994