GENCODE 2021

被引:653
作者
Frankish, Adam [1 ]
Diekhans, Mark [2 ]
Jungreis, Irwin [3 ,4 ]
Lagarde, Julien [5 ]
Loveland, Jane E. [1 ]
Mudge, Jonathan M. [1 ]
Sisu, Cristina [6 ,7 ]
Wright, James C. [8 ]
Armstrong, Joel [2 ]
Barnes, If [1 ]
Berry, Andrew [1 ]
Bignell, Alexandra [1 ]
Boix, Carles [3 ,4 ,9 ]
Carbonell Sala, Silvia [5 ]
Cunningham, Fiona [1 ]
Di Domenico, Tomas [10 ]
Donaldson, Sarah [1 ]
Fiddes, Ian T. [2 ]
Giron, Carlos Garcia [1 ]
Gonzalez, Jose Manuel [1 ]
Grego, Tiago [1 ]
Hardy, Matthew [1 ]
Hourlier, Thibaut [1 ]
Howe, Kevin L. [1 ]
Hunt, Toby [1 ]
Izuogu, Osagie G. [1 ]
Johnson, Rory [11 ,12 ]
Martin, Fergal J. [1 ]
Martinez, Laura [10 ]
Mohanan, Shamika [1 ]
Muir, Paul [13 ,14 ]
Navarro, Fabio C. P. [6 ]
Parker, Anne [1 ]
Pei, Baikang [6 ]
Pozo, Fernando [10 ]
Riera, Ferriol Calvet [1 ]
Ruffier, Magali [1 ]
Schmitt, Bianca M. [1 ]
Stapleton, Eloise [1 ]
Suner, Marie-Marthe [1 ]
Sycheva, Irina [1 ]
Uszczynska-Ratajczak, Barbara [15 ]
Wolf, Maxim Y. [16 ]
Xu, Jinuri [6 ]
Yang, Yucheng T. [6 ,17 ]
Yates, Andrew [1 ]
Zerbino, Daniel [1 ]
Zhang, Yan [6 ,18 ]
Choudhary, Jyoti S. [8 ]
Gerstein, Mark [6 ,17 ,19 ]
机构
[1] European Bioinformat Inst, European Mol Biol Lab, Wellcome Genome Campus, Cambridge CB10 1SD, England
[2] Univ Calif Santa Cruz, UC Santa Cruz Genom Inst, Santa Cruz, CA 95064 USA
[3] MIT, Comp Sci & Artificial Intelligence Lab, 32 Vassar St, Cambridge, MA 02139 USA
[4] Broad Inst MIT & Harvard, 415 Main St, Cambridge, MA 02142 USA
[5] Barcelona Inst Sci & Technol, Ctr Genom Regulat CRG, Dr Aiguader 88, E-08003 Barcelona, Catalonia, Spain
[6] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[7] Brunel Univ London, Dept Biosci, Uxbridge UB8 3PH, Middx, England
[8] Inst Canc Res, Div Canc Biol, Funct Prote, 237 Fulham Rd, London SW3 6JB, England
[9] MIT, Computat & Syst Biol Program, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[10] Spanish Natl Canc Res Ctr CNIO, Bioinformat Unit, Madrid, Spain
[11] Univ Bern, Univ Hosp, Dept Med Oncol, Inselspital, Bern, Switzerland
[12] Univ Bern, Dept Biomed Res DBMR, Bern, Switzerland
[13] Yale Univ, Dept Mol Cellular & Dev Biol, New Haven, CT 06520 USA
[14] Yale Univ, Syst Biol Inst, West Haven, CT 06516 USA
[15] Univ Warsaw, Ctr New Technol, Warsaw, Poland
[16] Harvard Med Sch, Dept Biomed Informat, 10 Shattuck St,Suite 514, Boston, MA 02115 USA
[17] Yale Univ, Program Computat Biol & Bioinformat, Bass 432,266 Whitney Ave, New Haven, CT 06520 USA
[18] Ohio State Univ, Coll Med, Dept Biomed Informat, Columbus, OH 43210 USA
[19] Yale Univ, Dept Comp Sci, Bass 432,266 Whitney Ave, New Haven, CT 06520 USA
[20] Univ Pompeu Fabra UPF, E-08003 Barcelona, Catalonia, Spain
[21] Guys Hosp, Kings Coll London, Dept Med & Mol Genet, Great Maze Pond, London SE1 9RT, England
基金
英国生物技术与生命科学研究理事会; 美国国家卫生研究院; 英国惠康基金; 瑞士国家科学基金会;
关键词
LONG NONCODING RNAS; ANNOTATION; DATABASE; ATLAS;
D O I
10.1093/nar/gkaa1087
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs.
引用
收藏
页码:D916 / D923
页数:8
相关论文
共 33 条
  • [1] The Ensembl gene annotation system
    Aken, Bronwen L.
    Ayling, Sarah
    Barrell, Daniel
    Clarke, Laura
    Curwen, Valery
    Fairley, Susan
    Banet, Julio Fernandez
    Billis, Konstantinos
    Giron, Carlos Garcia
    Hourlier, Thibaut
    Howe, Kevin
    Kahari, Andreas
    Kokocinski, Felix
    Martin, Fergal J.
    Murphy, Daniel N.
    Nag, Rishi
    Ruffier, Magali
    Schuster, Michael
    Tang, Y. Amy
    Vogel, Jan-Hinnerk
    White, Simon
    Zadissa, Amonida
    Flicek, Paul
    Searle, Stephen M. J.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [2] Progressive Cactus is a multiple-genome aligner for the thousand-genome era
    Armstrong, Joel
    Hickey, Glenn
    Diekhans, Mark
    Fiddes, Ian T.
    Novak, Adam M.
    Deran, Alden
    Fang, Qi
    Xie, Duo
    Feng, Shaohong
    Stiller, Josefin
    Genereux, Diane
    Johnson, Jeremy
    Marinescu, Voichita Dana
    Alfoldi, Jessica
    Harris, Robert S.
    Lindblad-Toh, Kerstin
    Haussler, David
    Karlsson, Elinor
    Jarvis, Erich D.
    Zhang, Guojie
    Paten, Benedict
    [J]. NATURE, 2020, 587 (7833) : 246 - +
  • [3] Expert curation of the human and mouse olfactory receptor gene repertoires identifies conserved coding regions split across two exons
    Barnes, If H. A.
    Ibarra-Soria, Ximena
    Fitzgerald, Stephen
    Gonzalez, Jose M.
    Davidson, Claire
    Hardy, Matthew P.
    Manthravadi, Deepa
    Van Gerven, Laura
    Jorissen, Mark
    Zeng, Zhen
    Khan, Mona
    Mombaerts, Peter
    Harrow, Jennifer
    Logan, Darren W.
    Frankish, Adam
    [J]. BMC GENOMICS, 2020, 21 (01)
  • [4] High-efficiency full-length cDNA cloning by biotinylated CAP trapper
    Carninci, P
    Kvam, C
    Kitamura, A
    Ohsumi, T
    Okazaki, Y
    Itoh, M
    Kamiya, M
    Shibata, K
    Sasaki, N
    Izawa, M
    Muramatsu, M
    Hayashizaki, Y
    Schneider, C
    [J]. GENOMICS, 1996, 37 (03) : 327 - 336
  • [5] Locus Reference Genomic sequences: an improved basis for describing human DNA variants
    Dalgleish, Raymond
    Flicek, Paul
    Cunningham, Fiona
    Astashyn, Alex
    Tully, Raymond E.
    Proctor, Glenn
    Chen, Yuan
    McLaren, William M.
    Larsson, Pontus
    Vaughan, Brendan W.
    Beroud, Christophe
    Dobson, Glen
    Lehvaeslaiho, Heikki
    Taschner, Peter E. M.
    den Dunnen, Johan T.
    Devereau, Andrew
    Birney, Ewan
    Brookes, Anthony J.
    Maglott, Donna R.
    [J]. GENOME MEDICINE, 2010, 2
  • [6] NONCODEV5: a comprehensive annotation database for long non-coding RNAs
    Fang, ShuangSang
    Zhang, LiLi
    Guo, JinCheng
    Niu, YiWei
    Wu, Yang
    Li, Hui
    Zhao, Lian He
    Li, Xi Yuan
    Teng, Xue Yi
    Sun, XianHui
    Sun, Liang
    Zhang, Michael Q.
    Chen, RunSheng
    Zhao, Yi
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D308 - D314
  • [7] Gordon David E, 2020, bioRxiv, DOI 10.1101/2020.03.22.002386
  • [8] GENCODE: producing a reference annotation for ENCODE
    Harrow, Jennifer
    Denoeud, France
    Frankish, Adam
    Reymond, Alexandre
    Chen, Chao-Kung
    Chrast, Jacqueline
    Lagarde, Julien
    Gilbert, James Gr
    Storey, Roy
    Swarbreck, David
    Rossier, Colette
    Ucla, Catherine
    Hubbard, Tim
    Antonarakis, Stylianos E.
    Guigo, Roderic
    [J]. GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [9] GENCODE: The reference human genome annotation for The ENCODE Project
    Harrow, Jennifer
    Frankish, Adam
    Gonzalez, Jose M.
    Tapanari, Electra
    Diekhans, Mark
    Kokocinski, Felix
    Aken, Bronwen L.
    Barrell, Daniel
    Zadissa, Amonida
    Searle, Stephen
    Barnes, If
    Bignell, Alexandra
    Boychenko, Veronika
    Hunt, Toby
    Kay, Mike
    Mukherjee, Gaurab
    Rajan, Jeena
    Despacio-Reyes, Gloria
    Saunders, Gary
    Steward, Charles
    Harte, Rachel
    Lin, Michael
    Howald, Cedric
    Tanzer, Andrea
    Derrien, Thomas
    Chrast, Jacqueline
    Walters, Nathalie
    Balasubramanian, Suganthi
    Pei, Baikang
    Tress, Michael
    Manuel Rodriguez, Jose
    Ezkurdia, Iakes
    van Baren, Jeltje
    Brent, Michael
    Haussler, David
    Kellis, Manolis
    Valencia, Alfonso
    Reymond, Alexandre
    Gerstein, Mark
    Guigo, Roderic
    Hubbard, Tim J.
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1760 - 1774
  • [10] An atlas of human long non-coding RNAs with accurate 5′ ends
    Hon, Chung-Chau
    Ramilowski, Jordan A.
    Harshbarger, Jayson
    Bertin, Nicolas
    Rackham, Owen J. L.
    Gough, Julian
    Denisenko, Elena
    Schmeier, Sebastian
    Poulsen, Thomas M.
    Severin, Jessica
    Lizio, Marina
    Kawaji, Hideya
    Kasukawa, Takeya
    Itoh, Masayoshi
    Burroughs, A. Maxwell
    Noma, Shohei
    Djebali, Sarah
    Alam, Tanvir
    Medvedeva, Yulia A.
    Testa, Alison C.
    Lipovich, Leonard
    Yip, Chi-Wai
    Abugessaisa, Imad
    Mendez, Mickael
    Hasegawa, Akira
    Tang, Dave
    Lassmann, Timo
    Heutink, Peter
    Babina, Magda
    Wells, Christine A.
    Kojima, Soichi
    Nakamura, Yukio
    Suzuki, Harukazu
    Daub, Carsten O.
    de Hoon, Michiel J. L.
    Arner, Erik
    Hayashizaki, Yoshihide
    Carninci, Piero
    Forrest, Alistair R. R.
    [J]. NATURE, 2017, 543 (7644) : 199 - +