Presentation of a model-based data mining to predict lung cancer

被引:0
作者
Shahhoseini, Reza [1 ]
Ghazvini, Ali [2 ]
Esmaeilpour, Mansour [3 ]
Pourtaghi, Gholamhossein [4 ]
Tofighi, Shahram [5 ]
机构
[1] Baqiyatallah Univ Med Sci, Sch Hlth, Dept Hlth Care Management, Tehran, Iran
[2] Baqiyatallah Univ Med Sci, Sch Med, Dept Internal Med, Tehran, Iran
[3] Islamic Azad Univ, Hamadan Branch, Dept Comp Engn, Hamadan, Iran
[4] Baqiyatallah Univ Med Sci, Hlth Res Ctr, Tehran, Iran
[5] Baqiyatallah Univ Med Sci, Hlth Management Res Ctr, Tehran, Iran
关键词
Data Mining; Lung Cancer; Decision Tree; Neural Networks;
D O I
暂无
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: The data related to patients often have very useful information that can help us to resolve a lot of problems and difficulties in different areas. This study was performed to present a model-based data mining to predict lung cancer in 2014. Methods: In this exploratory and modeling study, information was collected by two methods: library and field methods. All gathered variables were in the format of form of data transferring from those affected by pulmonary problems (303 records) as well as 26 fields including clinical and environmental variables. The validity of form of data transferring was obtained via consensus and meeting group method using purposive sampling through several meetings among members of research group and lung group. The methodology used was based on classification and prediction method of data mining as well as the method of supervision with algorithms of classification and regression tree using Clementine 12 software. Results: For clinical variables, model's precision was high in three parts of training, test and validation. For environmental variables, maximum precision of model in training part relevant to C&R algorithm was equal to 76%, in test part relevant to Neural Net algorithm was equal to 61%, and in validation part relevant to Neural Net algorithm was equal to 57%. Conclusions: In clinical variables, C5.0, CHAID, C & R models were stable and suitable for detection of lung cancer. In addition, in environmental variables, C & R model was stable and suitable for detection of lung cancer. Variables such as pulmonary nodules, effusion of plural fluid, diameter of pulmonary nodules, and place of pulmonary nodules are very important variables that have the greatest impact on detection of lung cancer.
引用
收藏
页码:189 / 195
页数:7
相关论文
共 28 条
[1]  
Agrawal A, 2012, SCI PROGRAMMING-NETH, V20, P29, DOI [10.3233/SPR-2012-0335, 10.1155/2012/920245]
[2]  
Almasi-Hashiani A, 2012, PAYESH, V11, p[2001, 477]
[3]  
Bahader Y, 2008, ANN THORAC MED, V3, pS65
[4]  
Chen HC, 2005, INT SER I S, V8, P3
[5]   Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting [J].
Collins, Gary S. ;
Mallett, Susan ;
Omar, Omar ;
Yu, Ly-Mee .
BMC MEDICINE, 2011, 9
[6]  
Dangare S., 2012, INT J COMPUT APPL, V47, P44, DOI DOI 10.5120/7228-0076
[7]   Analysis of cancer data: a data mining approach [J].
Delen, Dursun .
EXPERT SYSTEMS, 2009, 26 (01) :100-112
[8]  
Esmaeilpour M, 2012, INT J INNOV COMPUT I, V8, P8063
[9]  
Etemadi Arash, 2008, Arch Iran Med, V11, P577
[10]  
Fayyad U, 1996, AI MAG, V17, P37