Detecting operons in bacterial genomes via visual representation learning

被引:8
作者
Assaf, Rida [1 ]
Xia, Fangfang [3 ,4 ]
Stevens, Rick [2 ,3 ]
机构
[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
[2] Univ Chicago, Consortium Adv Sci & Engn, Chicago, IL 60637 USA
[3] Argonne Natl Lab, Comp Environm & Life Sci Div, Lemont, IL 60439 USA
[4] Argonne Natl Lab, Data Sci & Learning Div, Lemont, IL 60439 USA
基金
美国国家卫生研究院;
关键词
GUIDED GENETIC ALGORITHM; ESCHERICHIA-COLI; PREDICTION; DATABASE; UNITS;
D O I
10.1038/s41598-021-81169-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Contiguous genes in prokaryotes are often arranged into operons. Detecting operons plays a critical role in inferring gene functionality and regulatory networks. Human experts annotate operons by visually inspecting gene neighborhoods across pileups of related genomes. These visual representations capture the inter-genic distance, strand direction, gene size, functional relatedness, and gene neighborhood conservation, which are the most prominent operon features mentioned in the literature. By studying these features, an expert can then decide whether a genomic region is part of an operon. We propose a deep learning based method named Operon Hunter that uses visual representations of genomic fragments to make operon predictions. Using transfer learning and data augmentation techniques facilitates leveraging the powerful neural networks trained on image datasets by re-training them on a more limited dataset of extensively validated operons. Our method outperforms the previously reported state-of-the-art tools, especially when it comes to predicting full operons and their boundaries accurately. Furthermore, our approach makes it possible to visually identify the features influencing the network's decisions to be subsequently cross-checked by human experts.
引用
收藏
页数:10
相关论文
共 45 条
[1]  
[Anonymous], 2019, ARXIV PREPR RXIV
[2]  
[Anonymous], 2018, RETR IM CLASS NEW CA
[3]  
Assaf R, 2019, NUCL ACIDS RES, DOI [10.1101/525030, DOI 10.1101/525030]
[4]   Operon prediction for sequenced bacterial Genomes without experimental information [J].
Bergman, Nicholas H. ;
Passalacqua, Karla D. ;
Hanna, Philip C. ;
Qin, Zhaohui S. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2007, 73 (03) :846-854
[5]   A Bayesian network approach to operon prediction [J].
Bockhorst, J ;
Craven, M ;
Page, D ;
Shavlik, J ;
Glasner, J .
BIOINFORMATICS, 2003, 19 (10) :1227-1235
[6]   The relative value of operon predictions [J].
Brouwer, Rutger W. W. ;
Kuipers, Oscar P. ;
van Hijum, Sacha A. F. T. .
BRIEFINGS IN BIOINFORMATICS, 2008, 9 (05) :367-375
[7]  
Chen Xin, 2004, Genome Inform, V15, P211
[8]  
Craven M, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P116
[9]   Operon prediction using both genome-specific and general genomic information [J].
Dam, Phuongan ;
Olman, Victor ;
Harris, Kyle ;
Su, Zhengchang ;
Xu, Ying .
NUCLEIC ACIDS RESEARCH, 2007, 35 (01) :288-298
[10]   PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database [J].
Davis, James J. ;
Gerdes, Svetlana ;
Olsen, Gary J. ;
Olson, Robert ;
Pusch, Gordon D. ;
Shukla, Maulik ;
Vonstein, Veronika ;
Wattam, Alice R. ;
Yoo, Hyunseung .
FRONTIERS IN MICROBIOLOGY, 2016, 7