Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics

被引:96
作者
Fermin, Damian
Allen, Baxter B.
Blackwell, Thomas W.
Menon, Rajasree
Adamski, Marcin
Xu, Yin
Ulintz, Peter
Omenn, Gilbert S.
States, David J. [1 ]
机构
[1] Univ Michigan, Bioinformat Program, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Internal Med, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Dept Human Genet, Ann Arbor, MI 48109 USA
关键词
D O I
10.1186/gb-2006-7-4-r35
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. Results: Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. Conclusion: This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.
引用
收藏
页数:13
相关论文
共 32 条
[1]   Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project [J].
Adamski, M ;
Blackwell, T ;
Menon, R ;
Martens, L ;
Hermjakob, H ;
Taylor, C ;
Omenn, GS ;
States, DJ .
PROTEOMICS, 2005, 5 (13) :3246-3261
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[4]   THE SIR2 GENE FAMILY, CONSERVED FROM BACTERIA TO HUMANS, FUNCTIONS IN SILENCING, CELL-CYCLE PROGRESSION, AND CHROMOSOME STABILITY [J].
BRACHMANN, CB ;
SHERMAN, JM ;
DEVINE, SE ;
CAMERON, EE ;
PILLUS, L ;
BOEKE, JD .
GENES & DEVELOPMENT, 1995, 9 (23) :2888-2902
[5]   Recent advances in gene structure prediction [J].
Brent, MR ;
Guigó, R .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) :264-272
[6]   Potential for false positive identifications from large databases through tandem mass spectrometry [J].
Cargile, BJ ;
Bundy, JL ;
Stephenson, JL .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (05) :1082-1085
[7]   The need for guidelines in publication of peptide and protein identification data - Working group on publication guidelines for peptide and protein identification data [J].
Carr, S ;
Aebersold, R ;
Baldwin, M ;
Burlingame, A ;
Clauser, K ;
Nesvizhskii, A .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :531-533
[8]  
Choudhary JS, 2001, PROTEOMICS, V1, P651, DOI 10.1002/1615-9861(200104)1:5<651::AID-PROT651>3.0.CO
[9]  
2-N
[10]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467