A new COVID-19 detection method from human genome sequences using CpG island features and KNN classifier

被引:45
作者
Arslan, Hilal [1 ]
Arslan, Hasan [2 ]
机构
[1] Izmir Bakircay Univ, Dept Comp Engn, Izmir, Turkey
[2] Erciyes Univ, Dept Math, Kayseri, Turkey
来源
ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH | 2021年 / 24卷 / 04期
关键词
COVID-19; SARS-CoV-2; K-Nearest Neighbors; CpG islands; Human coronaviruses;
D O I
10.1016/j.jestch.2020.12.026
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Various viral epidemics have been detected such as the severe acute respiratory syndrome coronavirus and the Middle East respiratory syndrome coronavirus in the last two decades. The coronavirus disease 2019 (COVID-19) is a pandemic caused by a novel betacoronavirus called severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). After the rapid spread of COVID-19, many researchers have investigated diagnosis and treatment for this terrifying disease quickly. Identifying COVID-19 from the other types of coronaviruses is a difficult problem due to their genetic similarity. In this study, we propose a new efficient COVID-19 detection method based on the K-nearest neighbors (KNN) classifier using the complete genome sequences of human coronaviruses in the dataset recorded in 2019 Novel Coronavirus Resource. We also describe two features based on CpG island that efficiently detect COVID-19 cases. Thus, genome sequences including approximately 30,000 nucleotides can be represented by only two real numbers. The KNN method is a simple and effective non-parametric technique for solving classification problems. However, performance of the KNN depends on the distance measure used. We perform 19 distance metrics investigated in five categories to improve the performance of the KNN algorithm. Some efficient performance parameters are computed to evaluate the proposed method. The proposed method achieves 98.4% precision, 99.2% recall, 98.8% F-measure, and 98.4% accuracy in a few seconds when any L1 type metric is used as a distance measure in the KNN. (c) 2020 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:839 / 847
页数:9
相关论文
共 48 条
  • [1] Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review
    Abu Alfeilat, Haneen Arafat
    Hassanat, Ahmad B. A.
    Lasassmeh, Omar
    Tarawneh, Ahmad S.
    Alhasanat, Mahmoud Bashir
    Salman, Hamzeh S. Eyal
    Prasath, V. B. Surya
    [J]. BIG DATA, 2019, 7 (04) : 221 - 248
  • [2] Deep transfer learning-based automated detection of COVID-19 from lung CT scan slices
    Ahuja, Sakshi
    Panigrahi, Bijaya Ketan
    Dey, Nilanjan
    Rajinikanth, Venkatesan
    Gandhi, Tapan Kumar
    [J]. APPLIED INTELLIGENCE, 2021, 51 (01) : 571 - 585
  • [3] Artificial intelligence and machine learning to fight COVID-19
    Alimadadi, Ahmad
    Aryal, Sachin
    Manandhar, Ishan
    Munroe, Patricia B.
    Joe, Bina
    Cheng, Xi
    [J]. PHYSIOLOGICAL GENOMICS, 2020, 52 (04) : 200 - 202
  • [4] Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks
    Apostolopoulos, Ioannis D.
    Mpesiana, Tzani A.
    [J]. PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2020, 43 (02) : 635 - 640
  • [5] Barstugan M, 2020, ARXIV PREPRINT ARXIV, P1, DOI 10.4850/arXiv.2003.1105
  • [6] Basu S, 2020, 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), P2521, DOI 10.1109/SSCI47803.2020.9308571
  • [7] Cha S., 2007, Int. J. Mathematical Models and Methods in Applied Sciences, V1, P300
  • [8] Chen QN, 2003, CHINESE LAW GOV, V36, P12
  • [9] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [10] Origin and evolution of pathogenic coronaviruses
    Cui, Jie
    Li, Fang
    Shi, Zheng-Li
    [J]. NATURE REVIEWS MICROBIOLOGY, 2019, 17 (03) : 181 - 192