Distinguishing protein-coding and noncoding genes in the human genome

被引:363
|
作者
Clamp, Michele [1 ]
Fry, Ben
Kamal, Mike
Xie, Xiaohui
Cuff, James
Lin, Michael F.
Kellis, Manolis
Lindblad-Toh, Kerstin
Lander, Eric S.
机构
[1] MIT, Broad Inst, Cambridge, MA 02142 USA
[2] Harvard, Cambridge Ctr 7, Cambridge, MA USA
[3] MIT, Dept Biol, Cambridge, MA 02139 USA
[4] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[5] Whitehead Inst Biomed Res, Cambridge Ctr 9, Cambridge, MA 02142 USA
[6] Harvard Univ, Sch Med, Dept Syst Biol, Boston, MA 02115 USA
关键词
comparative genomics;
D O I
10.1073/pnas.0709013104
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of approximate to 24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs-specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to approximate to 20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.
引用
收藏
页码:19428 / 19433
页数:6
相关论文
共 50 条
  • [1] Natural selection on protein-coding genes in the human genome
    Carlos D. Bustamante
    Adi Fledel-Alon
    Scott Williamson
    Rasmus Nielsen
    Melissa Todd Hubisz
    Stephen Glanowski
    David M. Tanenbaum
    Thomas J. White
    John J. Sninsky
    Ryan D. Hernandez
    Daniel Civello
    Mark D. Adams
    Michele Cargill
    Andrew G. Clark
    Nature, 2005, 437 : 1153 - 1157
  • [2] Natural selection on protein-coding genes in the human genome
    Bustamante, CD
    Fledel-Alon, A
    Williamson, S
    Nielsen, R
    Hubisz, MT
    Glanowski, S
    Tanenbaum, DM
    White, TJ
    Sninsky, JJ
    Hernandez, RD
    Civello, D
    Adams, MD
    Cargill, M
    Clark, AG
    NATURE, 2005, 437 (7062) : 1153 - 1157
  • [3] Overlapping protein-coding genes in human genome and their coincidental expression in tissues
    Chao-Hsin Chen
    Chao-Yu Pan
    Wen-chang Lin
    Scientific Reports, 9
  • [4] Overlapping protein-coding genes in human genome and their coincidental expression in tissues
    Chen, Chao-Hsin
    Pan, Chao-Yu
    Lin, Wen-chang
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [5] POSITIONING OF PROTEIN-CODING GENES ON THE SOYBEAN CHLOROPLAST GENOME
    SINGH, GP
    WALLEN, DG
    PILLAY, DTN
    PLANT MOLECULAR BIOLOGY, 1985, 4 (2-3) : 87 - 93
  • [6] Obtaining estimates for the ages of all the protein-coding genes and most of the ontology-identified noncoding genes of the human genome, assigned to 19 phylostrata
    Litman, Thomas
    Stein, Wilfred D.
    SEMINARS IN ONCOLOGY, 2019, 46 (01) : 3 - 9
  • [7] Defining Essentiality Score of Protein-Coding Genes and Long Noncoding RNAs
    Zeng, Pan
    Chen, Ji
    Meng, Yuhong
    Zhou, Yuan
    Yang, Jichun
    Cui, Qinghua
    FRONTIERS IN GENETICS, 2018, 9
  • [8] A genome-wide transcriptomic analysis of protein-coding genes in human blood cells
    Uhlen, Mathias
    Karlsson, Max J.
    Zhong, Wen
    Tebani, Abdellah
    Pou, Christian
    Mikes, Jaromir
    Lakshmikanth, Tadepally
    Forsstrom, Bjorn
    Edfors, Fredrik
    Odeberg, Jacob
    Mardinoglu, Adil
    Zhang, Cheng
    von Feilitzen, Kalle
    Mulder, Jan
    Sjostedt, Evelina
    Hober, Andreas
    Oksvold, Per
    Zwahlen, Martin
    Ponten, Fredrik
    Lindskog, Cecilia
    Sivertsson, Asa
    Fagerberg, Linn
    Brodin, Petter
    SCIENCE, 2019, 366 (6472) : 1471 - +
  • [9] An Updated Functional Annotation of Protein-Coding Genes in the Cucumber Genome
    Song, Hongtao
    Lin, Kui
    Hu, Jinglu
    Pang, Erli
    FRONTIERS IN PLANT SCIENCE, 2018, 9
  • [10] How many protein-coding genes are there in the Saccharomyces cerevisiae genome?
    Mackiewicz, P
    Kowalczuk, M
    Mackiewicz, D
    Nowicka, A
    Dudkiewicz, M
    Laszkiewicz, A
    Dudek, MR
    Cebrat, S
    YEAST, 2002, 19 (07) : 619 - 629