Rapid and sensitive detection of genome contamination at scale with FCS-GX

被引:102
作者
Astashyn, Alexander [1 ]
Tvedte, Eric S. [1 ]
Sweeney, Deacon [1 ]
Sapojnikov, Victor [1 ]
Bouk, Nathan [1 ]
Joukov, Victor [1 ]
Mozes, Eyal [1 ]
Strope, Pooja K. [1 ]
Sylla, Pape M. [1 ]
Wagner, Lukas [1 ]
Bidwell, Shelby L. [1 ]
Brown, Larissa C. [1 ]
Clark, Karen [1 ]
Davis, Emily W. [1 ]
Smith-White, Brian [1 ]
Hlavina, Wratko [1 ]
Pruitt, Kim D. [1 ]
Schneider, Valerie A. [1 ]
Murphy, Terence D. [1 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20892 USA
基金
美国国家卫生研究院;
关键词
Genome contamination; Genome quality; Genome assembly; GenBank; RefSeq; Software; GENE-TRANSFER;
D O I
10.1186/s13059-024-03198-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084.
引用
收藏
页数:25
相关论文
共 41 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
Astashyn A, 2024, NCBI FTP
[3]  
Astashyn A, 2024, Sensitivity/specificity FASTA datasets
[4]  
Astashyn A, 2024, GitHub
[5]  
Astashyn A, 2024, FCS
[6]  
Astashyn A, 2024, FCS-GX database r2023-01-24
[7]  
Astashyn A, 2024, Zenodo, DOI [10.5281/zenodo.10651084, DOI 10.5281/ZENODO.10651084]
[8]  
Astashyn A, 2024, FCS-GX v0.4.0
[9]   The NIH Comparative Genomics Resource: addressing the promises and challenges of comparative genomics on human health [J].
Bornstein, Kristin ;
Gryan, Gary ;
Chang, E. Sally ;
Marchler-Bauer, Aron ;
Schneider, Valerie A. .
BMC GENOMICS, 2023, 24 (01)
[10]   Human contamination in bacterial genomes has created thousands of spurious proteins [J].
Breitwieser, Florian P. ;
Pertea, Mihaela ;
Zimin, Aleksey V. ;
Salzberg, Steven L. .
GENOME RESEARCH, 2019, 29 (06) :954-960