GENCODE: reference annotation for the human and mouse genomes in 2023

被引:172
作者
Frankish, Adam [1 ]
Carbonell-Sala, Silvia [2 ]
Diekhans, Mark [3 ]
Jungreis, Irwin [4 ,5 ]
Loveland, Jane E. [1 ]
Mudge, Jonathan M.
Sisu, Cristina [6 ,7 ]
Wright, James C. [8 ]
Arnan, Carme
Barnes, If [1 ]
Banerjee, Abhimanyu [9 ,10 ]
Bennett, Ruth [1 ]
Berry, Andrew [1 ]
Bignell, Alexandra [1 ]
Boix, Carles [4 ,5 ]
Calvet, Ferriol [2 ]
Cerdan-Velez, Daniel [11 ]
Cunningham, Fiona [1 ]
Davidson, Claire [1 ]
Donaldson, Sarah [1 ]
Dursun, Cagatay [12 ]
Fatima, Reham [1 ]
Giorgetti, Stefano [1 ]
Giron, Carlos Garcia [1 ]
Gonzalez, Jose Manuel [1 ]
Hardy, Matthew [1 ]
Harrison, Peter W. [1 ]
Hourlier, Thibaut [1 ]
Hollis, Zoe [1 ]
Hunt, Toby [1 ]
James, Benjamin [4 ,5 ]
Jiang, Yunzhe [12 ]
Johnson, Rory [13 ,14 ]
Kay, Mike [1 ]
Lagarde, Julien
Martin, Fergal J. [1 ]
Gomez, Laura Martinez [11 ]
Nair, Surag [9 ,10 ]
Ni, Pengyu [12 ]
Pozo, Fernando [11 ]
Ramalingam, Vivek [10 ]
Ruffier, Magali [1 ]
Schmitt, Bianca M. [1 ]
Schreiber, Jacob M. [9 ,10 ]
Steed, Emily [1 ]
Suner, Marie-Marthe [1 ]
Sumathipala, Dulika [1 ]
Sycheva, Irina [1 ]
Uszczynska-Ratajczak, Barbara [15 ]
Wass, Elizabeth [1 ]
机构
[1] European Bioinformat Inst, European Mol Biol Lab, Wellcome Genome Campus, Cambridge CB10 1SD, England
[2] Barcelona Inst Sci & Technol, Ctr Genom Regulat CRG, Dept Bioinformat & Genom, Dr Aiguader 88, Catalonia 08003, Spain
[3] Univ Calif Santa Cruz, UC Santa Cruz Genom Inst, Santa Cruz, CA 95064 USA
[4] MIT, Comp Sci & Artificial Intelligence Lab, 32 Vassar St, Cambridge, MA 02139 USA
[5] Broad Inst MIT & Harvard, 415 MainSt, Cambridge, MA 02142 USA
[6] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[7] Brunel Univ London, Dept Life Sci, Uxbridge UB8 3PH, England
[8] Inst Canc Res, Div Canc Biol, Funct Prote, 237 Fulham Rd, London SW36JB, England
[9] Stanford Univ, Dept Genet, Palo Alto, CA USA
[10] Spanish Natl Canc Res Ctr CNIO, Bioinformat Unit, Calle Melchor Fernandez Almagro 3, Madrid 28029, Spain
[11] Bern Univ Hosp, Dept Med Oncol, Murtenstr 35, CH-3008 Bern, Switzerland
[12] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[13] Bern Univ Hosp, Dept Med Oncol, Murtenstr 35, CH-3008 Bern, Switzerland
[14] Univ Coll Dublin, Sch Biol & Environm Sci, Dublin D04V1W8, Ireland
[15] Inst Bioorgan Chem, Polish Acad Sci, Computat Biol Noncoding RNA, Noskowskiego 12-14, PL-61704 Poznan, Poland
[16] Fudan Univ, Inst Sci & Technol Brain Inspired Intelligence, Shanghai 200433, Peoples R China
[17] Univ Pompeu Fabra UPF, Dept Ciencies Expt & Salut, E-08003 Barcelona, Spain
[18] Guys Hosp, Kings Coll London, Dept Med & Mol Genet, Great Maze Pond, London SE1 9RT, England
基金
英国生物技术与生命科学研究理事会; 英国惠康基金; 美国国家卫生研究院;
关键词
LONG NONCODING RNAS; SEQUENCE; DATABASE;
D O I
10.1093/nar/gkac1071
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
引用
收藏
页码:D942 / D949
页数:8
相关论文
共 40 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]   Base-resolution models of transcription-factor binding reveal soft motif syntax [J].
Avsec, Ziga ;
Weilert, Melanie ;
Shrikumar, Avanti ;
Krueger, Sabrina ;
Alexandari, Amr ;
Dalal, Khyati ;
Fropf, Robin ;
McAnany, Charles ;
Gagneur, Julien ;
Kundaje, Anshul ;
Zeitlinger, Julia .
NATURE GENETICS, 2021, 53 (03) :354-+
[3]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[4]   Regulatory genomic circuitry of human disease loci by integrative epigenomics [J].
Boix, Carles A. ;
James, Benjamin T. ;
Park, Yongjin P. ;
Meuleman, Wouter ;
Kellis, Manolis .
NATURE, 2021, 590 (7845) :300-307
[5]   Ensembl 2022 [J].
Cunningham, Fiona ;
Allen, James E. ;
Allen, Jamie ;
Alvarez-Jarreta, Jorge ;
Amode, M. Ridwan ;
Armean, Irina M. ;
Austine-Orimoloye, Olanrewaju ;
Azov, Andrey G. ;
Barnes, If ;
Bennett, Ruth ;
Berry, Andrew ;
Bhai, Jyothish ;
Bignell, Alexandra ;
Billis, Konstantinos ;
Boddu, Sanjay ;
Brooks, Lucy ;
Charkhchi, Mehrnaz ;
Cummins, Carla ;
Fioretto, Luca Da Rin ;
Davidson, Claire ;
Dodiya, Kamalkumar ;
Donaldson, Sarah ;
El Houdaigui, Bilal ;
El Naboulsi, Tamara ;
Fatima, Reham ;
Giron, Carlos Garcia ;
Genez, Thiago ;
Martinez, Jose Gonzalez ;
Guijarro-Clarke, Cristina ;
Gymer, Arthur ;
Hardy, Matthew ;
Hollis, Zoe ;
Hourlier, Thibaut ;
Hunt, Toby ;
Juettemann, Thomas ;
Kaikala, Vinay ;
Kay, Mike ;
Lavidas, Ilias ;
Le, Tuan ;
Lemos, Diana ;
Marugan, Jose Carlos ;
Mohanan, Shamika ;
Mushtaq, Aleena ;
Naven, Marc ;
Ogeh, Denye N. ;
Parker, Anne ;
Parton, Andrew ;
Perry, Malcolm ;
Pilizota, Ivana ;
Prosovetskaia, Irina .
NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) :D988-D995
[6]  
Desiere F, 2006, NUCLEIC ACIDS RES, V34, pD655, DOI 10.1007/978-1-60761-444-9_19
[7]   Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation [J].
Fiddes, Ian T. ;
Armstrong, Joel ;
Diekhans, Mark ;
Nachtweide, Stefanie ;
Kronenberg, Zev N. ;
Underwood, Jason G. ;
Gordon, David ;
Earl, Dent ;
Keane, Thomas ;
Eichler, Evan E. ;
Haussler, David ;
Stanke, Mario ;
Paten, Benedict .
GENOME RESEARCH, 2018, 28 (07) :1029-1038
[8]   GENCODE 2021 [J].
Frankish, Adam ;
Diekhans, Mark ;
Jungreis, Irwin ;
Lagarde, Julien ;
Loveland, Jane E. ;
Mudge, Jonathan M. ;
Sisu, Cristina ;
Wright, James C. ;
Armstrong, Joel ;
Barnes, If ;
Berry, Andrew ;
Bignell, Alexandra ;
Boix, Carles ;
Carbonell Sala, Silvia ;
Cunningham, Fiona ;
Di Domenico, Tomas ;
Donaldson, Sarah ;
Fiddes, Ian T. ;
Giron, Carlos Garcia ;
Gonzalez, Jose Manuel ;
Grego, Tiago ;
Hardy, Matthew ;
Hourlier, Thibaut ;
Howe, Kevin L. ;
Hunt, Toby ;
Izuogu, Osagie G. ;
Johnson, Rory ;
Martin, Fergal J. ;
Martinez, Laura ;
Mohanan, Shamika ;
Muir, Paul ;
Navarro, Fabio C. P. ;
Parker, Anne ;
Pei, Baikang ;
Pozo, Fernando ;
Riera, Ferriol Calvet ;
Ruffier, Magali ;
Schmitt, Bianca M. ;
Stapleton, Eloise ;
Suner, Marie-Marthe ;
Sycheva, Irina ;
Uszczynska-Ratajczak, Barbara ;
Wolf, Maxim Y. ;
Xu, Jinuri ;
Yang, Yucheng T. ;
Yates, Andrew ;
Zerbino, Daniel ;
Zhang, Yan ;
Choudhary, Jyoti S. ;
Gerstein, Mark .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D916-D923
[9]   GENCODE reference annotation for the human and mouse genomes [J].
Frankish, Adam ;
Diekhans, Mark ;
Ferreira, Anne-Maud ;
Johnson, Rory ;
Jungreis, Irwin ;
Loveland, Jane ;
Mudge, Jonathan M. ;
Sisu, Cristina ;
Wright, James ;
Armstrong, Joel ;
Barnes, If ;
Berry, Andrew ;
Bignell, Alexandra ;
Sala, Silvia Carbonell ;
Chrast, Jacqueline ;
Cunningham, Fiona ;
Di Domenico, Tomas ;
Donaldson, Sarah ;
Fiddes, Ian T. ;
Giron, Carlos Garcia ;
Gonzalez, Jose Manuel ;
Grego, Tiago ;
Hardy, Matthew ;
Hourlier, Thibaut ;
Hunt, Toby ;
Izuogu, Osagie G. ;
Lagarde, Julien ;
Martin, Fergal J. ;
Martinez, Laura ;
Mohanan, Shamika ;
Muir, Paul ;
Navarro, Fabio C. P. ;
Parker, Anne ;
Pei, Baikang ;
Pozo, Fernando ;
Ruffier, Magali ;
Schmitt, Bianca M. ;
Stapleton, Eloise ;
Suner, Marie-Marthe ;
Sycheva, Irina ;
Uszczynska-Ratajczak, Barbara ;
Xu, Jinuri ;
Yates, Andrew ;
Zerbino, Daniel ;
Zhang, Yan ;
Aken, Bronwen ;
Choudhary, Jyoti S. ;
Gerstein, Mark ;
Guigo, Roderic ;
Hubbard, Tim J. P. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D766-D773
[10]   Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction [J].
Frankish, Adam ;
Uszczynska, Barbara ;
Ritchie, Graham R. S. ;
Gonzalez, Jose M. ;
Pervouchine, Dmitri ;
Petryszak, Robert ;
Mudge, Jonathan M. ;
Fonseca, Nuno ;
Brazma, Alvis ;
Guigo, Roderic ;
Harrow, Jennifer .
BMC GENOMICS, 2015, 16