Biosom: gene synonym analysis by self-organizing map

被引:0
作者
Otemaier, K. R. [1 ]
Steffens, M. B. R. [1 ,2 ]
Raittz, R. T. [1 ]
Brawerman, A. [1 ]
Marchaukoski, J. N. [1 ]
机构
[1] Univ Fed Parana, Programa Posgrad Bioinformat, Curitiba, PR, Brazil
[2] Univ Fed Parana, Dept Bioquim & Biol Mol, Nucl Fixacao Biol Nitrogenio, Curitiba, PR, Brazil
关键词
Gene nomenclature; Gene ambiguity; Kohonen; Gene synonym prediction; Self-organizing map; Matrix-U; DATABASE; PROTEIN; NOMENCLATURE; NAMES;
D O I
10.4238/2015.February.20.1
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
There are several guidelines for gene nomenclature, but they are not always applied to the names of newly identified genes. The lack of standardization in naming genes generates inconsistent databases with errors such as genes with the same function and different names, genes with different functions and the same name, and use of an abbreviated name. This paper presents a methodology for predicting synonyms in a given gene nomenclature, thereby detecting and minimizing naming redundancy and inconsistency and facilitating the annotation of new genes and data mining in public databases. To identify gene synonyms, i.e., gene ambiguity, the methodology proposed begins by grouping genes according to their names using a Kohonen self-organizing map artificial neural network. Afterwards, it identifies the groups generated employing the Matrix-U technique. The employment of such techniques allows one to infer the synonyms of genes, to predict probable hypothetical gene names and to point out possible errors in a database record. Many mistakes related to gene nomenclature were detected in this research, demonstrating the importance of predicting synonyms. The methodology developed is applicable for describing hypothetical, putative and other types of genes without a known function. Moreover, it can also indicate a possible function for genes after grouping them.
引用
收藏
页码:1461 / 1468
页数:8
相关论文
共 19 条
[1]   The Universal Protein Resource (UniProt) 2009 [J].
Bairoch, Amos ;
Consortium, UniProt ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Ciapina, Luciane ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Delbard, Gwennaelle ;
Dornevil, Dolnide ;
Roggli, Paula Duek ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
James, Janet ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Kappler, Thomas ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D169-D174
[2]   Mining Virulence Genes Using Metagenomics [J].
Belda-Ferre, Pedro ;
Cabrera-Rubio, Raul ;
Moya, Andres ;
Mira, Alex .
PLOS ONE, 2011, 6 (10)
[3]  
Bruijn BD, 2003, 12 TEXT RETR C TREC, P486
[4]  
COHEN KB, 2002, P WORKSH NAT LANG PR, P14
[5]  
DEMEREC M, 1966, GENETICS, V54, P61
[6]   The HUGO gene nomenclature database, 2006 updates [J].
Eyre, Tina A. ;
Ducluzeau, Fabrice ;
Sneddon, Tam P. ;
Povey, Sue ;
Bruford, Elspeth A. ;
Lush, Michael J. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D319-D321
[7]  
Fang H. R., 2006, P BIONLP WORKSH LINK, P41
[8]   Genome-Wide Comparative Gene Family Classification [J].
Frech, Christian ;
Chen, Nansheng .
PLOS ONE, 2010, 5 (10)
[9]   CD-HIT Suite: a web server for clustering and comparing biological sequences [J].
Huang, Ying ;
Niu, Beifang ;
Gao, Ying ;
Fu, Limin ;
Li, Weizhong .
BIOINFORMATICS, 2010, 26 (05) :680-682
[10]   THE SELF-ORGANIZING MAP [J].
KOHONEN, T .
PROCEEDINGS OF THE IEEE, 1990, 78 (09) :1464-1480