Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers

被引:122
作者
Girardot, Charles [1 ]
Scholtalbers, Jelle [1 ]
Sauer, Sajoscha [1 ]
Su, Shu-Yi [1 ]
Furlong, Eileen E. M. [1 ]
机构
[1] European Mol Biol Lab, Genome Biol Unit, D-69117 Heidelberg, Germany
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Software; Genomics; NGS; UMI; Multiplexing; Duplicates; GALAXY; NOISE;
D O I
10.1186/s12859-016-1284-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina (R) indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs. Results: Existing tools do not support these complex barcoding configurations and custom code development is frequently required. Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account. Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36 %, compared to when UMIs are ignored. Conclusions: Je is implemented in JAVA and uses the Picard API. Code, executables and documentation are freely available at http://gbcs.embl.de/Je.Je can also be easily installed in Galaxy through the Galaxy toolshed.
引用
收藏
页数:6
相关论文
共 17 条
[1]   Dissemination of scientific software with Galaxy ToolShed [J].
Blankenberg, Daniel ;
Von Kuster, Gregory ;
Bouvier, Emil ;
Baker, Dannon ;
Afgan, Enis ;
Stoler, Nicholas ;
Team, Galaxy ;
Taylor, James ;
Nekrutenko, Anton .
GENOME BIOLOGY, 2014, 15 (02)
[2]  
Brennecke P, 2013, NAT METHODS, V10, P1093, DOI [10.1038/nmeth.2645, 10.1038/NMETH.2645]
[3]   TagGD: Fast and Accurate Software for DNA Tag Generation and Demultiplexing [J].
Costea, Paul Igor ;
Lundeberg, Joakim ;
Akan, Pelin .
PLOS ONE, 2013, 8 (03)
[4]  
Dodt Matthias, 2012, Biology (Basel), V1, P895, DOI 10.3390/biology1030895
[5]   Galaxy: A platform for interactive large-scale genome analysis [J].
Giardine, B ;
Riemer, C ;
Hardison, RC ;
Burhans, R ;
Elnitski, L ;
Shah, P ;
Zhang, Y ;
Blankenberg, D ;
Albert, I ;
Taylor, J ;
Miller, W ;
Kent, WJ ;
Nekrutenko, A .
GENOME RESEARCH, 2005, 15 (10) :1451-1455
[6]   GBSX: a toolkit for experimental design and demultiplexing genotyping by sequencing experiments [J].
Herten, Koen ;
Hestand, Matthew S. ;
Vermeesch, Joris R. ;
Van Houdt, Jeroen K. J. .
BMC BIOINFORMATICS, 2015, 16
[7]  
Islam S, 2014, NAT METHODS, V11, P163, DOI [10.1038/NMETH.2772, 10.1038/nmeth.2772]
[8]  
Kivioja T, 2012, NAT METHODS, V9, P72, DOI [10.1038/NMETH.1778, 10.1038/nmeth.1778]
[9]   deML: robust demultiplexing of Illumina sequences using a likelihood-based approach [J].
Renaud, Gabriel ;
Stenzel, Udo ;
Maricic, Tomislav ;
Wiebe, Victor ;
Kelso, Janet .
BIOINFORMATICS, 2015, 31 (05) :770-772
[10]   Detection of ultra-rare mutations by next-generation sequencing [J].
Schmitt, Michael W. ;
Kennedy, Scott R. ;
Salk, Jesse J. ;
Fox, Edward J. ;
Hiatt, Joseph B. ;
Loeb, Lawrence A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (36) :14508-14513