A computational platform to identify origins of replication sites in eukaryotes

被引:74
作者
Dao, Fu-Ying [1 ]
Lv, Hao [1 ]
Zulfiqar, Hasan [1 ]
Yang, Hui [1 ]
Su, Wei [1 ]
Gao, Hui [1 ]
Ding, Hui [1 ]
Lin, Hao [1 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Informat Biol, Chengdu 610054, Peoples R China
关键词
origins of replication site; eukaryote; feature extraction; webserver; classification algorithm; DNA-REPLICATION; PREDICTION; IDENTIFICATION; SEQUENCES; REVEALS;
D O I
10.1093/bib/bbaa017
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The locations of the initiation of genomic DNA replication are defined as origins of replication sites (ORIs), which regulate the onset of DNA replication and play significant roles in the DNA replication process. The study of ORIs is essential for understanding the cell-division cycle and gene expression regulation. Accurate identification of ORIs will provide important clues for DNA replication research and drug development by developing computational methods. In this paper, the first integrated predictor named iORI-Euk was built to identify ORIs in multiple eukaryotes and multiple cell types. In the predictor, seven eukaryotic (Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Pichia pastoris, Schizosaccharomyces pombe and Kluyveromyces lactis) ORI data was collected from public database to construct benchmark datasets. Subsequently, three feature extraction strategies which are k-mer, binary encoding and combination of k-mer and binary were used to formulate DNA sequence samples. We also compared the different classification algorithms' performance. As a result, the best results were obtained by using support vector machine in 5-fold cross-validation test and independent dataset test. Based on the optimal model, an online web server called iORI-Euk (http://lin-group.cn/server/iO RI- Euk/) was established for the novel ORI identification.
引用
收藏
页码:1940 / 1950
页数:11
相关论文
共 48 条
[1]   Strength in numbers: preventing rereplication via multiple mechanisms in eukaryotic cells [J].
Arias, Emily E. ;
Walter, Johannes C. .
GENES & DEVELOPMENT, 2007, 21 (05) :497-518
[2]   MEME SUITE: tools for motif discovery and searching [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Buske, Fabian A. ;
Frith, Martin ;
Grant, Charles E. ;
Clementi, Luca ;
Ren, Jingyuan ;
Li, Wilfred W. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W202-W208
[3]   Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening [J].
Basith, Shaherin ;
Manavalan, Balachandran ;
Shin, Tae Hwan ;
Lee, Gwang .
MEDICINAL RESEARCH REVIEWS, 2020, 40 (04) :1276-1314
[4]   iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree [J].
Basith, Shaherin ;
Manavalan, Balachandran ;
Shin, Tae Hwan ;
Lee, Gwang .
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2018, 16 :412-420
[5]   Mechanisms for initiating cellular DNA replication [J].
Bleichert, Franziska ;
Botchan, Michael R. ;
Berger, James M. .
SCIENCE, 2017, 355 (6327)
[6]  
Breier AM., 2004, Genome Biol, V5
[7]  
Cao Renzhi., 2017, Molecules, V22, P1732
[8]   Genome-scale analysis of metazoan replication origins reveals their organization in specific but flexible sites defined by conserved features [J].
Cayrou, Christelle ;
Coulombe, Philippe ;
Vigneron, Alice ;
Stanojcic, Slavica ;
Ganier, Olivier ;
Peiffer, Isabelle ;
Rivals, Eric ;
Puy, Aurore ;
Laurent-Chabalier, Sabine ;
Desprat, Romain ;
Mechali, Marcel .
GENOME RESEARCH, 2011, 21 (09) :1438-1449
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]  
Chaudhry Raheel., 2019, Biochemistry, DNA Replication