An Iterative Hadoop-Based Ensemble Data Classification Model on Distributed Medical Databases

被引:3
作者
Bikku, Thulasi [1 ]
Nandam, Sambasiva Rao [2 ]
Akepogu, Ananda Rao [3 ]
机构
[1] VNITSW, Dept CSE, Guntur, AP, India
[2] SRITW, Dept CSE, Warangal, Telangana, India
[3] JNTUCEA, Dept CSE, Ananthapuramu, India
来源
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS, ICCII 2016 | 2017年 / 507卷
关键词
Distributed data mining; Hadoop; Ensemble approach; Medical databases;
D O I
10.1007/978-981-10-2471-9_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the size and complexity of the online biomedical databases are growing day by day, finding an essential structure or unstructured patterns in the distributed biomedical applications has become more complex. Traditional Hadoop-based distributed decision tree models such as Probability based decision tree (PDT), Classification And Regression Tree (CART) and Multiclass Classification Decision Tree have failed to discover relational patterns, user-specific patterns and feature-based patterns, due to the large number of feature sets. These models depend on selection of relevant attributes and uniform data distribution. Data imbalance, indexing and sparsity are the three major issues in these distributed decision tree models. In this proposed model, an enhanced attributes selection ranking model and Hadoop-based decision tree model were implemented to extract the user-specific interesting patterns in online biomedical databases. Experimental results show that the proposed model has high true positive, high precision and low error rate compared to traditional distributed decision tree models.
引用
收藏
页码:341 / 351
页数:11
相关论文
共 9 条
[1]  
Al-Khateeb Masud, 2015, IEEE T KNOWL DATA EN, P34
[2]   Faster sequencers, larger datasets, new challenges. [J].
Mason C.E. ;
Elemento O. .
Genome Biology, 13 (3) :314
[3]   Current methods of gene prediction, their strengths and weaknesses [J].
Mathé, C ;
Sagot, MF ;
Schiex, T ;
Rouzé, P .
NUCLEIC ACIDS RESEARCH, 2002, 30 (19) :4103-4117
[4]  
Mendes-Moreira Joao, 2012, ACM COMPUT SURV, V45, P123
[5]  
Raghupathi W, 2014, HEALTH INF SCI SYST, V2, DOI 10.1186/2047-2501-2-3
[6]   The case for cloud computing in genome informatics [J].
Stein, Lincoln D. .
GENOME BIOLOGY, 2010, 11 (05)
[7]  
Tang S., 2013, IEEE T CLOUD COMPUT, V2, P333
[8]   Hybrid Adaptive Classifier Ensemble [J].
Yu, Zhiwen ;
Li, Le ;
Liu, Jiming ;
Han, Guoqiang .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (02) :177-190
[9]   Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine [J].
Zhang, Le ;
Suganthan, Ponnuthurai N. .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (10) :2165-2176