Identifying mutation positions in all segments of influenza genome enables better differentiation between pandemic and seasonal strains

被引:8
作者
Kargarfard, Fatemeh [1 ,2 ]
Sami, Ashkan [2 ]
Hemmatzadeh, Farhid [3 ]
Ebrahimie, Esmaeil [3 ,4 ,5 ,6 ]
机构
[1] Univ Technol Sydney, Fac Engn & IT, Sydney, NSW, Australia
[2] Shiraz Univ, Sch Elect Engn & Comp, Dept Comp Sci & Engn, Shiraz, Iran
[3] Univ Adelaide, Sch Anim & Vet Sci, Adelaide, SA, Australia
[4] La Trobe Univ, Genom Res Platform, Melbourne, Vic 3086, Australia
[5] Univ South Australia, Div Informat Technol Engn & Environm, Sch Informat Technol & Math Sci, Adelaide, SA, Australia
[6] Flinders Univ S Australia, Fac Sci & Engn, Sch Biol Sci, Adelaide, SA, Australia
关键词
Association rule mining; CBA; Expert system; Hot spots; Ripper algorithm; Pandemic influenza; CLASSIFICATION; PREDICTION; SWINE; MECHANISM; PROTEINS; VIRUSES;
D O I
10.1016/j.gene.2019.01.014
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Influenza has a negative sense, single-stranded, and segmented RNA. In the context of pandemic influenza research, most studies have focused on variations in the surface proteins (Hemagglutinin and Neuraminidase). However, new findings suggest that all internal and external proteins of influenza viruses can contribute in pandemic emergence, pathogenicity and increasing host range. The occurrence of the 2009 influenza pandemic and the availability of many external and internal segments of pandemic and non-pandemic sequences offer a unique opportunity to evaluate the performance of machine learning models in discrimination of pandemic from seasonal sequences using mutation positions in all segments. In this study, we hypothesized that identifying mutation positions in all segments (proteins) encoded by the influenza genome would enable pandemic and seasonal strains to be more reliably distinguished. In a large scale study, we applied a range of data mining techniques to all segments of influenza for rule discovery and discrimination of pandemic from seasonal strains. CBA (classification based on association rule mining), Ripper and Decision tree algorithms were utilized to extract association rules among mutations. CBA outperformed the other models. Our approach could discriminate pandemic sequences from seasonal ones with more than 95% accuracy for PA and NP, 99.33% accuracy for NA and 100% accuracy, precision, specificity and sensitivity (recall) for M1, M2, PB1, NS1, and NS2. The values of precision, specificity, and sensitivity were more than 90% for other segments except PB2. If sequences of all segments of one strain were available, the accuracy of discrimination of pandemic strains was 100%. General rules extracted by rule base classification approaches, such as M1-V1471, NP-N334H, NS1-V1121, and PB1-L3641, were able to detect pandemic sequences with high accuracy. We observed that mutations on internal proteins of influenza can contribute in distinguishing the pandemic viruses, similar to the external ones.
引用
收藏
页码:78 / 85
页数:8
相关论文
共 53 条
[1]  
Agrawal R., P 20 INT C VERY LARG
[2]  
Attaluri PK., 2009, BIOT 2009, P21
[3]   Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology [J].
Bakhtiarizadeh, Mohammad Reza ;
Moradi-Shahrbabak, Mohammad ;
Ebrahimi, Mansour ;
Ebrahimie, Esmaeil .
JOURNAL OF THEORETICAL BIOLOGY, 2014, 356 :213-222
[4]   In vitro dissection of the membrane and RNP binding activities of influenza virus M1 protein [J].
Baudin, F ;
Petit, I ;
Weissenhorn, W ;
Ruigrok, RWH .
VIROLOGY, 2001, 281 (01) :102-108
[5]   Diversity of influenza viruses in swine and the emergence of a novel human pandemic influenza A (H1N1) [J].
Brockwell-Staats, Christy ;
Webster, Robert G. ;
Webby, Richard J. .
INFLUENZA AND OTHER RESPIRATORY VIRUSES, 2009, 3 (05) :207-213
[6]   Genomic Signatures of Influenza A Pandemic (H1N1) 2009 Virus [J].
Chen, Guang-Wu ;
Shih, Shin-Ru .
EMERGING INFECTIOUS DISEASES, 2009, 15 (12) :1897-1903
[7]  
Cheng V. C., 2006, EDUCATION, V11
[8]  
Cohen W. W., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P115
[9]  
Daud N.R., 2009, P EUROPEAN COMPUTING, P787
[10]   Computational 3D structures of drug-targeting proteins in the 2009-H1N1 influenza A virus [J].
Du, Qi-Shi ;
Wang, Shu-Qing ;
Huang, Ri-Bo ;
Chou, Kuo-Chen .
CHEMICAL PHYSICS LETTERS, 2010, 485 (1-3) :191-195