Computer-Aided Lung Nodule Recognition by SVM Classifier Based on Combination of Random Undersampling and SMOTE

被引:50
作者
Sui, Yuan [1 ]
Wei, Ying [2 ,3 ]
Zhao, Dazhe [3 ]
机构
[1] Northeastern Univ, Software Coll, Shenyang 110004, Peoples R China
[2] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Peoples R China
[3] Minist Educ, Key Lab Med Imaging Calculat, Shenyang 110004, Peoples R China
关键词
PULMONARY NODULES; CT IMAGES; SEGMENTATION; SCANS;
D O I
10.1155/2015/368674
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In lung cancer computer-aided detection/diagnosis (CAD) systems, classification of regions of interest (ROI) is often used to detect/diagnose lung nodule accurately. However, problems of unbalanced datasets often have detrimental effects on the performance of classification. In this paper, both minority and majority classes are resampled to increase the generalization ability. We propose a novel SVM classifier combined with random undersampling (RU) and SMOTE for lung nodule recognition. The combinations of the two resampling methods not only achieve a balanced training samples but also remove noise and duplicate information in the training sample and retain useful information to improve the effective data utilization, hence improving performance of SVM algorithm for pulmonary nodules classification under the unbalanced data. Eight features including 2D and 3D features are extracted for training and classification. Experimental results show that for different sizes of training datasets our RU-SMOTE-SVM classifier gets the highest classification accuracy among the four kinds of classifiers, and the average classification accuracy is more than 92.94%.
引用
收藏
页数:13
相关论文
共 32 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]  
[Anonymous], 2004, ACM Sigkdd Explor. Newsl, DOI DOI 10.1145/1007730.1007736
[3]   Lung image database consortium: Developing a resource for the medical imaging research community [J].
Armato, SG ;
McLennan, G ;
McNitt-Gray, MF ;
Meyer, CR ;
Yankelevitz, D ;
Aberle, DR ;
Henschke, CI ;
Hoffman, EA ;
Kazerooni, EA ;
MacMahon, H ;
Reeves, AP ;
Croft, BY ;
Clarke, LP .
RADIOLOGY, 2004, 232 (03) :739-748
[4]  
Bram V. G., 2001, IEEE T MED IMAGING, V20, P1228
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]  
Chawla NV, 2004, SIGKDD Explor. Newsl., V6, P1
[7]   Automated detection of lung nodules in CT images using shape-based genetic algorithm [J].
Dehmeshki, Jamshid ;
Ye, Xujiong ;
Lin, Xinyu ;
Valdivieso, Manho ;
Amin, Hamdan .
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2007, 31 (06) :408-417
[8]   A multiple resampling method for learning from imbalanced data sets [J].
Estabrooks, A ;
Jo, TH ;
Japkowicz, N .
COMPUTATIONAL INTELLIGENCE, 2004, 20 (01) :18-36
[9]   Preprocessing unbalanced data using support vector machine [J].
Farquad, M. A. H. ;
Bose, Indranil .
DECISION SUPPORT SYSTEMS, 2012, 53 (01) :226-233
[10]   Potential Lung Nodules Identification for Characterization by Variable Multistep Threshold and Shape Indices from CT Images [J].
Iqbal, Saleem ;
Iqbal, Khalid ;
Arif, Fahim ;
Shaukat, Arslan ;
Khanum, Aasia .
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2014, 2014