Rapid and sensitive detection of genome contamination at scale with FCS-GX

被引:102
作者
Astashyn, Alexander [1 ]
Tvedte, Eric S. [1 ]
Sweeney, Deacon [1 ]
Sapojnikov, Victor [1 ]
Bouk, Nathan [1 ]
Joukov, Victor [1 ]
Mozes, Eyal [1 ]
Strope, Pooja K. [1 ]
Sylla, Pape M. [1 ]
Wagner, Lukas [1 ]
Bidwell, Shelby L. [1 ]
Brown, Larissa C. [1 ]
Clark, Karen [1 ]
Davis, Emily W. [1 ]
Smith-White, Brian [1 ]
Hlavina, Wratko [1 ]
Pruitt, Kim D. [1 ]
Schneider, Valerie A. [1 ]
Murphy, Terence D. [1 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20892 USA
基金
美国国家卫生研究院;
关键词
Genome contamination; Genome quality; Genome assembly; GenBank; RefSeq; Software; GENE-TRANSFER;
D O I
10.1186/s13059-024-03198-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084.
引用
收藏
页数:25
相关论文
共 41 条
[41]  
Xing B., 2023, Tropical Plants, V2, P1, DOI DOI 10.48130/TP-2023-0003