OVERCOMING MISSING VALUES USING IMPUTATION METHODS IN THE CLASSIFICATION OF TUBERCULOSIS

被引:1
|
作者
Rochman, Eka Mala Sari [1 ,2 ]
Miswanto [1 ]
Suprajitno, Herry [1 ]
机构
[1] Airlangga Univ, Fac Sci & Technol, Dept Math, Surabaya, Indonesia
[2] Univ Trunojoyo Madura, Dept Informat, Fac Engn, Bangkalan, Indonesia
关键词
tuberculosis; imputation; missing value; classification; Naive Bayes; logistics regression; LOGISTIC-REGRESSION; NEURAL-NETWORK; MODELS;
D O I
10.28919/cmbn/7538
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Indonesia is one of the countries with the highest population density in the world with a very high number of Tuberculosis ( TB). This TB disease is very serious because it is very easily transmitted through the air, namely, droplets that come from a TB patient who coughs or sneezes. In diagnosing a disease, missing data often occurs, resulting in researcher errors in the data collection process, so this study proposes the mean Imputation method to overcome missing data. For the classification of TB disease data in Bangkalan Regency, Indonesia, which consists of 886 data, the method used is Naive Bayes compared to Logistics Regression. For the distribution of training and testing data, this research uses multiple trains and tests K-Fold cross-validation with a total of k=10. Based on research trials using the mean imputation method is better than the one imputation method in filling in the missing data for this case with an average accuracy is 97.36% and the F1 score is 95.01% better than one imputation with an average accuracy is 97.35% and F1 score is 94.35 % on the Naive Bayes method. For TB classification, the Naive Bayes method produces an average accuracy is 97.36% and the F1 score is 95.01% better than the logistic regression method in classifying tuberculosis with an accuracy rate is 97.36% with an F1 score is 89.58%.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] On the choice of the best imputation methods for missing values considering three groups of classification methods
    Julián Luengo
    Salvador García
    Francisco Herrera
    Knowledge and Information Systems, 2012, 32 : 77 - 108
  • [2] On the choice of the best imputation methods for missing values considering three groups of classification methods
    Luengo, Julian
    Garcia, Salvador
    Herrera, Francisco
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 77 - 108
  • [3] Imputation of missing values for compositional data using classical and robust methods
    Hron, K.
    Templ, M.
    Filzmoser, P.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (12) : 3095 - 3107
  • [4] Adaptive imputation of missing values for incomplete pattern classification
    Liu, Zhun-ga
    Pan, Quan
    Dezert, Jean
    Martin, Arnaud
    PATTERN RECOGNITION, 2016, 52 : 85 - 95
  • [5] Experimental analysis of methods for imputation of missing values in databases
    Farhangfar, A
    Kurgan, L
    Pedrycz, W
    INTELLIGENT COMPUTING: THEORY AND APPLICATIONS II, 2004, 5421 : 172 - 182
  • [6] ESTIMATION OF MISSING VALUES IN AIR POLLUTION DATASET BY USING VARIOUS IMPUTATION METHODS
    Sukatis, Fahren Fazzer
    Noor, Norazian Mohamed
    Zakaria, Nur Afiqah
    Ul-Saufie, Ahmad Zia
    Suwardi, Annas
    INTERNATIONAL JOURNAL OF CONSERVATION SCIENCE, 2019, 10 (04) : 791 - 804
  • [7] Multiple imputation scheme for overcoming the missing values and variability issues in ITS data
    Ni, DH
    Leonard, JD
    Guin, A
    Feng, CX
    JOURNAL OF TRANSPORTATION ENGINEERING, 2005, 131 (12) : 931 - 938
  • [8] An Integrated Novel Framework for Coping Missing Values Imputation and Classification
    Jena, Monalisa
    Dehuri, Satchidananda
    IEEE ACCESS, 2022, 10 : 69373 - 69387
  • [9] Simple data imputation for missing feature values in binary classification
    Chatterjee, Avishek
    Woodruff, Henry
    Vallieres, Martin
    Seuntjens, Jan
    MEDICAL PHYSICS, 2019, 46 (11) : 5378 - 5378
  • [10] Impact of imputation of missing values on classification error for discrete data
    Farhangfar, Alireza
    Kurgan, Lukasz
    Dy, Jennifer
    PATTERN RECOGNITION, 2008, 41 (12) : 3692 - 3705