共 50 条
Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
被引:18
|作者:
He, Wenying
[1
]
Ju, Ying
[2
]
Zeng, Xiangxiang
[2
]
Liu, Xiangrong
[2
]
Zou, Quan
[1
,3
]
机构:
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Xiamen Univ, Sch Informat Sci & Technol, Xiamen, Peoples R China
[3] Dezhou Univ, Inst Biophys, Shandong Prov Key Lab Biophys, Dezhou, Peoples R China
基金:
中国国家自然科学基金;
关键词:
non-coding DNA;
DNA sequence;
feature representation;
genome synthesis;
support vector machine;
TERT PROMOTER MUTATIONS;
PHYSICOCHEMICAL PROPERTIES;
FEATURE-EXTRACTION;
FEATURE-SELECTION;
WEB SERVERS;
PROTEIN;
SITES;
INFORMATION;
RECURRENT;
GENOME;
D O I:
10.3389/fmicb.2018.02174
中图分类号:
Q93 [微生物学];
学科分类号:
071005 ;
100705 ;
摘要:
With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98.
引用
收藏
页数:9
相关论文