SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

被引:1783
作者
Shen, Wei [1 ]
Le, Shuai [1 ]
Li, Yan [2 ]
Hu, Fuquan [1 ]
机构
[1] Third Mil Med Univ, Coll Basic Med Sci, Dept Microbiol, 30 Gaotanyan St, Chongqing, Peoples R China
[2] Third Mil Med Univ, Southwest Hosp, Med Res Ctr, 29 Gaotanyan St, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
FORMAT;
D O I
10.1371/journal.pone.0163962
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Githubathttps://github.com/shenwei356/seqkit.
引用
收藏
页数:10
相关论文
共 8 条
[1]   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[2]  
Hester J., COLLECTION SCRIPTS D
[3]  
Kortschak RD, 2015, BIOGO SIMPLE HIGH PE
[4]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[5]   RAPID AND SENSITIVE PROTEIN SIMILARITY SEARCHES [J].
LIPMAN, DJ ;
PEARSON, WR .
SCIENCE, 1985, 227 (4693) :1435-1441
[6]  
Quinlan Aaron R, 2014, Curr Protoc Bioinformatics, V47, DOI 10.1002/0471250953.bi1112s47
[7]   A novel algorithm for detecting multiple covariance and clustering of biological sequences [J].
Shen, Wei ;
Li, Yan .
SCIENTIFIC REPORTS, 2016, 6
[8]  
SHIRLEY MD, 2015, PEERJ PREPRINTS, V3, pE1196, DOI DOI 10.7717/PEERJ.1196