Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae

被引:18
|
作者
He, Wenying [1 ]
Ju, Ying [2 ]
Zeng, Xiangxiang [2 ]
Liu, Xiangrong [2 ]
Zou, Quan [1 ,3 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Xiamen Univ, Sch Informat Sci & Technol, Xiamen, Peoples R China
[3] Dezhou Univ, Inst Biophys, Shandong Prov Key Lab Biophys, Dezhou, Peoples R China
基金
中国国家自然科学基金;
关键词
non-coding DNA; DNA sequence; feature representation; genome synthesis; support vector machine; TERT PROMOTER MUTATIONS; PHYSICOCHEMICAL PROPERTIES; FEATURE-EXTRACTION; FEATURE-SELECTION; WEB SERVERS; PROTEIN; SITES; INFORMATION; RECURRENT; GENOME;
D O I
10.3389/fmicb.2018.02174
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Identifying non-coding somatic cancer driver mutations using sequence-based models
    Urzua-Traslavina, Carlos
    van Lieshout, Tijs
    Barbadilla-Martinez, Lucia
    Klaassen, Noud
    Franceschini-Santos, Vinicius
    de Ridder, Jeroen
    van Steensel, Bas
    Franke, Lude
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1659 - 1659
  • [2] Sequence-based heuristics for faster annotation of non-coding RNA families
    Weinberg, Z
    Ruzzo, WL
    BIOINFORMATICS, 2006, 22 (01) : 35 - 39
  • [3] Non-coding RNAs in Saccharomyces cerevisiae: what is the function?
    Wu, Jian
    Delneri, Daniela
    O'Keefe, Raymond T.
    BIOCHEMICAL SOCIETY TRANSACTIONS, 2012, 40 : 907 - 911
  • [4] Non-Coding RNA Prediction and Verification in Saccharomyces cerevisiae
    Kavanaugh, Laura A.
    Dietrich, Fred S.
    PLOS GENETICS, 2009, 5 (01)
  • [5] Non-coding RNAs as cell wall regulators in Saccharomyces cerevisiae
    Novacic, Ana
    Vucenovic, Ivan
    Primig, Michael
    Stuparevic, Igor
    CRITICAL REVIEWS IN MICROBIOLOGY, 2020, 46 (01) : 15 - 25
  • [6] Sequence-based approach for identification of cell wall proteins in Saccharomyces cerevisiae
    Terashima, H
    Fukuchi, S
    Nakai, K
    Arisawa, M
    Hamada, K
    Yabuki, N
    Kitada, K
    CURRENT GENETICS, 2002, 40 (05) : 311 - 316
  • [7] Evidence for abundant transcription of non-coding regions in the Saccharomyces cerevisiae genome
    Moshe Havilio
    Erez Y Levanon
    Galia Lerman
    Martin Kupiec
    Eli Eisenberg
    BMC Genomics, 6
  • [8] Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics
    McCutcheon, JP
    Eddy, SR
    NUCLEIC ACIDS RESEARCH, 2003, 31 (14) : 4119 - 4128
  • [9] Sequence-based approach for identification of cell wall proteins in Saccharomyces cerevisiae
    Hiromichi Terashima
    Satoshi Fukuchi
    Kenta Nakai
    Mikio Arisawa
    Kenji Hamada
    Nami Yabuki
    Kunio Kitada
    Current Genetics, 2002, 40 : 311 - 316
  • [10] Sequence-Based Analysis Uncovers an Abundance of Non-Coding RNA in the Total Transcriptome of Mycobacterium tuberculosis
    Arnvig, Kristine B.
    Comas, Inaki
    Thomson, Nicholas R.
    Houghton, Joanna
    Boshoff, Helena I.
    Croucher, Nicholas J.
    Rose, Graham
    Perkins, Timothy T.
    Parkhill, Julian
    Dougan, Gordon
    Young, Douglas B.
    PLOS PATHOGENS, 2011, 7 (11)