An Educational Bioinformatics Project to Improve Genome Annotation

被引:5
作者
Amatore, Zoie [1 ]
Gunn, Susan [2 ]
Harris, Laura K. [1 ]
机构
[1] Davenport Univ, Sci Dept, Harris Interdisciplinary Res, Lansing, MI 49512 USA
[2] Davenport Univ, Coll Urban Educ, Grand Rapids, MI USA
关键词
bioinformatics; hypothetical protein; genome annotation; education; classroom; undergraduate; HYPOTHETICAL PROTEINS; FUNCTIONAL ANNOTATION; PSI-BLAST; INTERACTION NETWORKS; ENRICHMENT ANALYSIS; I-TASSER; PREDICTION; IDENTIFICATION; SEQUENCE; SEARCH;
D O I
10.3389/fmicb.2020.577497
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Scientific advancement is hindered without proper genome annotation because biologists lack a complete understanding of cellular protein functions. In bacterial cells, hypothetical proteins (HPs) are open reading frames with unknown functions. HPs result from either an outdated database or insufficient experimental evidence (i.e., indeterminate annotation). While automated annotation reviews help keep genome annotation up to date, often manual reviews are needed to verify proper annotation. Students can provide the manual review necessary to improve genome annotation. This paper outlines an innovative classroom project that determines if HPs have outdated or indeterminate annotation. The Hypothetical Protein Characterization Project uses multiple well-documented, freely available, web-based, bioinformatics resources that analyze an amino acid sequence to (1) detect sequence similarities to other proteins, (2) identify domains, (3) predict tertiary structure including active site characterization and potential binding ligands, and (4) determine cellular location. Enough evidence can be generated from these analyses to support re-annotation of HPs or prioritize HPs for experimental examinations such as structural determination via X-ray crystallography. Additionally, this paper details several approaches for selecting HPs to characterize using the Hypothetical Protein Characterization Project. These approaches include student- and instructor-directed random selection, selection using differential gene expression from mRNA expression data, and selection based on phylogenetic relations. This paper also provides additional resources to support instructional use of the Hypothetical Protein Characterization Project, such as example assignment instructions with grading rubrics, links to training videos in YouTube, and several step-by-step example projects to demonstrate and interpret the range of achievable results that students might encounter. Educational use of the Hypothetical Protein Characterization Project provides students with an opportunity to learn and apply knowledge of bioinformatic programs to address scientific questions. The project is highly customizable in that HP selection and analysis can be specifically formulated based on the scope and purpose of each student's investigations. Programs used for HP analysis can be easily adapted to course learning objectives. The project can be used in both online and in-seat instruction for a wide variety of undergraduate and graduate classes as well as undergraduate capstone, honor's, and experiential learning projects.
引用
收藏
页数:15
相关论文
共 94 条
[1]  
Abdennadher N, 2007, ST HEAL T, V126, P55
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases [J].
Altschul, SF ;
Koonin, EV .
TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (11) :444-447
[4]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[5]   LL-37 enhances adaptive antitumor immune response in a murine model when genetically fused with M-CSFRJ6-1 DNA vaccine [J].
An, LL ;
Yang, YH ;
Ma, XT ;
Lin, YM ;
Li, G ;
Song, YH ;
Wu, KF .
LEUKEMIA RESEARCH, 2005, 29 (05) :535-543
[6]   The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures [J].
Andreeva, Antonina ;
Kulesha, Eugene ;
Gough, Julian ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D376-D382
[7]   SCOP2 prototype: a new approach to protein structure mining [J].
Andreeva, Antonina ;
Howorth, Dave ;
Chothia, Cyrus ;
Kulesha, Eugene ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D310-D314
[8]  
[Anonymous], 1971, NATURE-NEW BIOL, V233, P223
[9]  
[Anonymous], 2002, GENOMES
[10]   In silico functional prediction of hypothetical proteins from the core genome of Corynebacterium pseudotuberculosis biovar ovis [J].
Araujo, Carlos Leonardo ;
Blanco, Iago ;
Souza, Luciana ;
Tiwari, Sandeep ;
Pereira, Lino Cesar ;
Ghosh, Preetam ;
Azevedo, Vasco ;
Silva, Artur ;
Folador, Adriana .
PEERJ, 2020, 8