A bedr way of genomic interval processing

被引:17
作者
Haider, Syed [1 ]
Waggott, Daryl [1 ]
Lalonde, Emilie [1 ,2 ,3 ,4 ]
Fung, Clement [1 ]
Liu, Fei-Fei [2 ,3 ,4 ,5 ,6 ]
Boutros, Paul C. [1 ,2 ,3 ,4 ]
机构
[1] Ontario Inst Canc Res, Informat & Biocomp Platform, Toronto, ON M5G 0A3, Canada
[2] Univ Toronto, Dept Radiat Oncol, Toronto, ON M5G 2M9, Canada
[3] Univ Toronto, Dept Pharmacol & Toxicol, Toronto, ON M5G 2M9, Canada
[4] Univ Toronto, Dept Med Biophys, Toronto, ON M5G 2M9, Canada
[5] Univ Hlth Network, Princess Margaret Hosp, Ontario Canc Inst, Toronto M5G 2M9, ON, Canada
[6] Univ Hlth Network, Princess Margaret Hosp, Campbell Family Inst Canc Res, Toronto M5G 2M9, ON, Canada
来源
SOURCE CODE FOR BIOLOGY AND MEDICINE | 2016年 / 11卷
关键词
Genomic intervals; BED format; Sequence algebra; Data integration;
D O I
10.1186/s13029-016-0059-5
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Next-generation sequencing is making it critical to robustly and rapidly handle genomic ranges within standard pipelines. Standard use-cases include annotating sequence ranges with gene or other genomic annotation, merging multiple experiments together and subsequently quantifying and visualizing the overlap. The most widely-used tools for these tasks work at the command-line (e.g. BEDTools) and the small number of available R packages are either slow or have distinct semantics and features from command-line interfaces. Results: To provide a robust R-based interface to standard command-line tools for genomic coordinate manipulation, we created bedr. This open-source R package can use either BEDTools or BEDOPS as a back-end and performs data-manipulation extremely quickly, creating R data structures that can be readily interfaced with existing computational pipelines. It includes data-visualization capabilities and a number of data-access functions that interface with standard databases like UCSC and COSMIC. Conclusions: bedr package provides an open source solution to enable genomic interval data manipulation and restructuring in R programming language which is commonly used in bioinformatics, and therefore would be useful to bioinformaticians and genomic researchers.
引用
收藏
页数:7
相关论文
共 13 条
[1]   VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R [J].
Chen, Hanbo ;
Boutros, Paul C. .
BMC BIOINFORMATICS, 2011, 12
[2]   The Distributed Annotation System [J].
Dowell, Robin D. ;
Jokerst, Rodney M. ;
Day, Allen ;
Eddy, Sean R. ;
Stein, Lincoln .
BMC BIOINFORMATICS, 2001, 2 (1)
[3]   COSMIC: exploring the world's knowledge of somatic mutations in human cancer [J].
Forbes, Simon A. ;
Beare, David ;
Gunasekaran, Prasad ;
Leung, Kenric ;
Bindal, Nidhi ;
Boutselakis, Harry ;
Ding, Minjie ;
Bamford, Sally ;
Cole, Charlotte ;
Ward, Sari ;
Kok, Chai Yin ;
Jia, Mingming ;
De, Tisham ;
Teague, Jon W. ;
Stratton, Michael R. ;
McDermott, Ultan ;
Campbell, Peter J. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D805-D811
[4]   Genenames.org: the HGNC resources in 2015 [J].
Gray, Kristian A. ;
Yates, Bethan ;
Seal, Ruth L. ;
Wright, Mathew W. ;
Bruford, Elspeth A. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D1079-D1085
[5]  
Hahne F, 2016, METHODS MOL BIOL, V1418, P335, DOI 10.1007/978-1-4939-3578-9_16
[6]  
Huber W, 2015, NAT METHODS, V12, P115, DOI [10.1038/NMETH.3252, 10.1038/nmeth.3252]
[7]   The UCSC Table Browser data retrieval tool [J].
Karolchik, D ;
Hinrichs, AS ;
Furey, TS ;
Roskin, KM ;
Sugnet, CW ;
Haussler, D ;
Kent, WJ .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D493-D496
[8]   Software for Computing and Annotating Genomic Ranges [J].
Lawrence, Michael ;
Huber, Wolfgang ;
Pages, Herve ;
Aboyoun, Patrick ;
Carlson, Marc ;
Gentleman, Robert ;
Morgan, Martin T. ;
Carey, Vincent J. .
PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (08)
[9]   POINTS OF VIEW Sets and intersections [J].
Lex, Alexander ;
Gehlenborg, Nils .
NATURE METHODS, 2014, 11 (08) :779-779
[10]   Tabix: fast retrieval of sequence features from generic TAB-delimited files [J].
Li, Heng .
BIOINFORMATICS, 2011, 27 (05) :718-719