Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae

被引：18

作者：

He, Wenying ^{[1
]}

Ju, Ying ^{[2
]}

Zeng, Xiangxiang ^{[2
]}

Liu, Xiangrong ^{[2
]}

Zou, Quan ^{[1
,3
]}

机构：

[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China

[2] Xiamen Univ, Sch Informat Sci & Technol, Xiamen, Peoples R China

[3] Dezhou Univ, Inst Biophys, Shandong Prov Key Lab Biophys, Dezhou, Peoples R China

来源：

FRONTIERS IN MICROBIOLOGY | 2018年 / 9卷

基金：

中国国家自然科学基金;

关键词：

non-coding DNA; DNA sequence; feature representation; genome synthesis; support vector machine; TERT PROMOTER MUTATIONS; PHYSICOCHEMICAL PROPERTIES; FEATURE-EXTRACTION; FEATURE-SELECTION; WEB SERVERS; PROTEIN; SITES; INFORMATION; RECURRENT; GENOME;

D O I：

10.3389/fmicb.2018.02174

中图分类号：

Q93 [微生物学];

学科分类号：

071005 ; 100705 ;

摘要：

With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98.

引用

页数：9

共 50 条

[21] A sequence-based multiple kernel model for identifying DNA-binding proteins
Qian, Yuqing
Jiang, Limin
Ding, Yijie
Tang, Jijun
Guo, Fei
BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
[22] CBNA: A control theory based method for identifying coding and non-coding cancer drivers
Pham, Vu V. H.
Liu, Lin
Bracken, Cameron P.
Goodall, Gregory J.
Long, Qi
Li, Jiuyong
Le, Thuc D.
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (12)
[23] iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels
Ding, Hui
Deng, En-Ze
Yuan, Lu-Feng
Liu, Li
Lin, Hao
Chen, Wei
Chou, Kuo-Chen
BIOMED RESEARCH INTERNATIONAL, 2014, 2014
[24] CPPred-RF: A Sequence-based Predictor for Identifying Cell Penetrating Peptides and Their Uptake Efficiency
Wei, Leyi
Xing, PengWei
Su, Ran
Shi, Gaotao
Ma, Zhanshan Sam
Zou, Quan
JOURNAL OF PROTEOME RESEARCH, 2017, 16 (05) : 2044 - 2053
[25] Evidence for abundant transcription of non-coding regions in the Saccharomyces cerevisiae genome -: art. no. 93
Havilio, M
Levanon, EY
Lerman, G
Kupiec, M
Eisenberg, E
BMC GENOMICS, 2005, 6 (1)
[26] Functional characterisation of long intergenic non-coding RNAs through genetic interaction profiling in Saccharomyces cerevisiae
Kyriakou, Dimitris
Stavrou, Emmanouil
Demosthenous, Panayiota
Angelidou, Georgia
Luis, Bryan-Joseph San
Boone, Charles
Promponas, Vasilis J.
Kirmizis, Antonis
BMC BIOLOGY, 2016, 14
[27] Functional characterisation of long intergenic non-coding RNAs through genetic interaction profiling in Saccharomyces cerevisiae
Dimitris Kyriakou
Emmanouil Stavrou
Panayiota Demosthenous
Georgia Angelidou
Bryan-Joseph San Luis
Charles Boone
Vasilis J. Promponas
Antonis Kirmizis
BMC Biology, 14
[28] ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors
Chu, Wen-Yi
Huang, Yu-Feng
Huang, Chun-Chin
Cheng, Yi-Sheng
Huang, Chien-Kang
Oyang, Yen-Jen
NUCLEIC ACIDS RESEARCH, 2009, 37 : W396 - W401
[29] CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition
Guo, Jin-Cheng
Fang, Shuang-Sang
Wu, Yang
Zhang, Jian-Hua
Chen, Yang
Liu, Jing
Wu, Bo
Wu, Jia-Rui
Li, En-Min
Xu, Li-Yan
Sun, Liang
Zhao, Yi
NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W516 - W522
[30] SEQUENCE-BASED DEEP LEARNING MODEL LINKS NON-CODING ACTIVITY-DEPENDENT REGULATORY POTENTIAL TO EFFECTS ON EDUCATIONAL ATTAINMENT
Bahl, Ethan
Chatterjee, Snehajyoti
Abel, Ted
Michaelson, Jacob
EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2019, 29 : S128 - S129

← 1 2 3 4 5 →