Using artificial intelligence techniques for COVID-19 genome analysis

被引:1
作者
M. Saqib Nawaz
Philippe Fournier-Viger
Abbas Shojaee
Hamido Fujita
机构
[1] Harbin Institute of Technology (Shenzhen),School of Humanities and Social Sciences
[2] Yale University School of Medicine,Faculty of Software and Information Science
[3] Iwate Prefectural University,undefined
来源
Applied Intelligence | 2021年 / 51卷
关键词
COVID-19; Sequential pattern mining; Mutation; Genome sequence; Nucleotide bases; 68Uxx; 92-04; 68Wxx;
D O I
暂无
中图分类号
学科分类号
摘要
The genome of the novel coronavirus (COVID-19) disease was first sequenced in January 2020, approximately a month after its emergence in Wuhan, capital of Hubei province, China. COVID-19 genome sequencing is critical to understanding the virus behavior, its origin, how fast it mutates, and for the development of drugs/vaccines and effective preventive strategies. This paper investigates the use of artificial intelligence techniques to learn interesting information from COVID-19 genome sequences. Sequential pattern mining (SPM) is first applied on a computer-understandable corpus of COVID-19 genome sequences to see if interesting hidden patterns can be found, which reveal frequent patterns of nucleotide bases and their relationships with each other. Second, sequence prediction models are applied to the corpus to evaluate if nucleotide base(s) can be predicted from previous ones. Third, for mutation analysis in genome sequences, an algorithm is designed to find the locations in the genome sequences where the nucleotide bases are changed and to calculate the mutation rate. Obtained results suggest that SPM and mutation analysis techniques can reveal interesting information and patterns in COVID-19 genome sequences to examine the evolution and variations in COVID-19 strains respectively.
引用
收藏
页码:3086 / 3103
页数:17
相关论文
共 57 条
[1]  
Wu F(2020)A new coronavirus associated with human respiratory disease in China Nature 579 265-269
[2]  
Sohrabi C(2020)World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19) Intern J Surge 76 71-76
[3]  
Cucinotta D(2020)WHO declares COVID-19 a pandemic Acta Biomed 91 157-160
[4]  
Vanelli M(2020)Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding Lancet 395 565-574
[5]  
Lu R(2020)Pattern analysis of genetic and genomics: a survey of the state-of-art Multim. Tools Appli. 79 11163-11194
[6]  
Chaki J(2017)A survey of sequential pattern mining. Data Sci Patt. Recog. 1 54-77
[7]  
Dey N(2017)Mining significant high utility gene regulation sequential patterns BMC Syst Biol 11 109-57
[8]  
Fournier-Viger P(2013)An efficient approach to mining maximal contiguous frequent patterns from large DNA sequence databases Genomics Informat 10 51-148
[9]  
Zihayat M(2013)Exploration of DNA sequences using pattern mining J Biomed Informa 2 144-774
[10]  
Davoudi H(2015)Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts J Biomed Seman 6 27-22