SETH detects and normalizes genetic variants in text

被引:22
作者
Thomas, Philippe [1 ,2 ]
Rocktaschel, Tim [3 ]
Hakenberg, Jorg [4 ]
Lichtblau, Yvonne [2 ]
Leser, Ulf [2 ]
机构
[1] DFKI Berlin, Language Technol Lab, Berlin, Germany
[2] Humboldt Univ, Inst Comp Sci, Knowledge Management Bioinformat, Unter Linden 6, D-10099 Berlin, Germany
[3] UCL, Gower St, London WC1E 6BT, England
[4] Illumina Inc, 451 El Camino Real, Santa Clara, CA 95050 USA
关键词
BIOMEDICAL LITERATURE; SEQUENCE VARIANTS; MUTATIONS; NOMENCLATURE; CANCER; SYSTEM;
D O I
10.1093/bioinformatics/btw234
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Descriptions of genetic variations and their effect are widely spread across the biomedical literature. However, finding all mentions of a specific variation, or all mentions of variations in a specific gene, is difficult to achieve due to the many ways such variations are described. Here, we describe SETH, a tool for the recognition of variations from text and their subsequent normalization to dbSNP or UniProt. SETH achieves high precision and recall on several evaluation corpora of PubMed abstracts. It is freely available and encompasses stand-alone scripts for isolated application and evaluation as well as a thorough documentation for integration into other applications.
引用
收藏
页码:2883 / 2885
页数:3
相关论文
共 14 条
[1]   The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website [J].
Bamford, S ;
Dawson, E ;
Forbes, S ;
Clements, J ;
Pettett, R ;
Dogan, A ;
Flanagan, A ;
Teague, J ;
Futreal, PA ;
Stratton, MR ;
Wooster, R .
BRITISH JOURNAL OF CANCER, 2004, 91 (02) :355-358
[2]  
Beaudet AL, 1996, HUM MUTAT, V8, P197
[3]   MutationFinder: a high-performance system for extracting point mutation mentions from text [J].
Caporaso, J. Gregory ;
Baumgartner, William A., Jr. ;
Randolph, David A. ;
Cohen, K. Bretonnel ;
Hunter, Lawrence .
BIOINFORMATICS, 2007, 23 (14) :1862-1865
[4]  
den Dunnen JT, 2000, HUM MUTAT, V15, P7
[5]   Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature [J].
Doughty, Emily ;
Kertesz-Farkas, Attila ;
Bodenreider, Olivier ;
Thompson, Gary ;
Adadey, Asa ;
Peterson, Thomas ;
Kann, Maricel G. .
BIOINFORMATICS, 2011, 27 (03) :408-415
[6]   OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature [J].
Furlong, Laura I. ;
Dach, Holger ;
Hofmann-Apitius, Martin ;
Sanz, Ferran .
BMC BIOINFORMATICS, 2008, 9 (1)
[7]   The GNAT library for local and remote gene mention normalization [J].
Hakenberg, Joerg ;
Gerner, Martin ;
Haeussler, Maximilian ;
Solt, Illes ;
Plake, Conrad ;
Schroeder, Michael ;
Gonzalez, Graciela ;
Nenadic, Goran ;
Bergman, Casey M. .
BIOINFORMATICS, 2011, 27 (19) :2769-2771
[8]   A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form [J].
Laros, Jeroen F. J. ;
Blavier, Andre ;
den Dunnen, Johan T. ;
Taschner, Peter E. M. .
BMC BIOINFORMATICS, 2011, 12
[9]   Automated extraction and semantic analysis of mutation impacts from the biomedical literature [J].
Naderi, Nona ;
Witte, Rene .
BMC GENOMICS, 2012, 13 :S10
[10]   GeneView: a comprehensive semantic search engine for PubMed [J].
Thomas, Philippe ;
Starlinger, Johannes ;
Vowinkel, Alexander ;
Arzt, Sebastian ;
Leser, Ulf .
NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) :W585-W591