Pseudo2GO: A Graph-Based Deep Learning Method for Pseudogene Function Prediction by Borrowing Information From Coding Genes

被引:4
作者
Fan, Kunjie [1 ]
Zhang, Yan [1 ,2 ]
机构
[1] Ohio State Univ, Coll Med, Dept Biomed Informat, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Comprehens Canc, Columbus, OH 43210 USA
关键词
pseudogene; function prediction; graph neural networks; deep learning; gene ontology; feature propagation; semi-supervised learning; PROTEIN FUNCTION; SEQUENCE; EXPRESSION; NETWORKS; RESOURCE; ONTOLOGY; HEALTH;
D O I
10.3389/fgene.2020.00807
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Pseudogenes are indicating more and more functional potentials recently, though historically were regarded as relics of evolution. Computational methods for predicting pseudogene functions on Gene Ontology is important for directing experimental discovery. However, no pseudogene-specific computational methods have been proposed to directly predict their Gene Ontology (GO) terms. The biggest challenge for pseudogene function prediction is the lack of enough features and functional annotations, making training a predictive model difficult. Considering the close functional similarity between pseudogenes and their parent coding genes that share great amount of DNA sequence, as well as that coding genes have rich annotations, we aim to predict pseudogene functions by borrowing information from coding genes in a graph-based way. Here we propose Pseudo2GO, a graph-based deep learning semi-supervised model for pseudogene function prediction. A sequence similarity graph is first constructed to connect pseudogenes and coding genes. Multiple features are incorporated into the model as the node attributes to enable the graph an attributed graph, including expression profiles, interactions with microRNAs, protein-protein interactions (PPIs), and genetic interactions. Graph convolutional networks are used to propagate node attributes across the graph to make classifications on pseudogenes. Comparing Pseudo2GO with other frameworks adapted from popular protein function prediction methods, we demonstrated that our method has achieved state-of-the-art performance, significantly outperforming other methods in terms of the M-AUPR metric.
引用
收藏
页数:9
相关论文
共 49 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] Pseudogenes regulate parental gene expression via ceRNA network
    An, Yang
    Furber, Kendra L.
    Ji, Shaoping
    [J]. JOURNAL OF CELLULAR AND MOLECULAR MEDICINE, 2017, 21 (01) : 185 - 192
  • [3] [Anonymous], 2006, P 23 INT C MACH LEAR, DOI [DOI 10.1145/1143844.1143874, 10.1145/1143844.1143874]
  • [4] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [5] The Gene Ontology Resource: 20 years and still GOing strong
    Carbon, S.
    Douglass, E.
    Dunn, N.
    Good, B.
    Harris, N. L.
    Lewis, S. E.
    Mungall, C. J.
    Basu, S.
    Chisholm, R. L.
    Dodson, R. J.
    Hartline, E.
    Fey, P.
    Thomas, P. D.
    Albou, L. P.
    Ebert, D.
    Kesling, M. J.
    Mi, H.
    Muruganujian, A.
    Huang, X.
    Poudel, S.
    Mushayahama, T.
    Hu, J. C.
    LaBonte, S. A.
    Siegele, D. A.
    Antonazzo, G.
    Attrill, H.
    Brown, N. H.
    Fexova, S.
    Garapati, P.
    Jones, T. E. M.
    Marygold, S. J.
    Millburn, G. H.
    Rey, A. J.
    Trovisco, V.
    dos Santos, G.
    Emmert, D. B.
    Falls, K.
    Zhou, P.
    Goodman, J. L.
    Strelets, V. B.
    Thurmond, J.
    Courtot, M.
    Osumi-Sutherland, D.
    Parkinson, H.
    Roncaglia, P.
    Acencio, M. L.
    Kuiper, M.
    Laegreid, A.
    Logie, C.
    Lovering, R. C.
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D330 - D338
  • [6] Noncoding RNA:RNA Regulatory Networks in Cancer
    Chan, Jia Jia
    Tay, Yvonne
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2018, 19 (05)
  • [7] ESG: extended similarity group method for automated protein function prediction
    Chitale, Meghana
    Hawkins, Troy
    Park, Changsoon
    Kihara, Daisuke
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1739 - 1745
  • [8] Compact Integration of Multi-Network Topology for Functional Analysis of Genes
    Cho, Hyunghoon
    Berger, Bonnie
    Peng, Jian
    [J]. CELL SYSTEMS, 2016, 3 (06) : 540 - +
  • [9] miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions
    Chou, Chih-Hung
    Shrestha, Sirjana
    Yang, Chi-Dung
    Chang, Nai-Wen
    Lin, Yu-Ling
    Liao, Kuang-Wen
    Huang, Wei-Chi
    Sun, Ting-Hsuan
    Tu, Siang-Jyun
    Lee, Wei-Hsiang
    Chiew, Men-Yee
    Tai, Chun-San
    Wei, Ting-Yen
    Tsai, Tzi-Ren
    Huang, Hsin-Tzu
    Wang, Chung-Yu
    Wu, Hsin-Yi
    Ho, Shu-Yi
    Chen, Pin-Rong
    Chuang, Cheng-Hsun
    Hsieh, Pei-Jung
    Wu, Yi-Shin
    Chen, Wen-Liang
    Li, Meng-Ju
    Wu, Yu-Chun
    Huang, Xin-Yi
    Ng, Fung Ling
    Buddhakosai, Waradee
    Huang, Pei-Chun
    Lan, Kuan-Chun
    Huang, Chia-Yen
    Weng, Shun-Long
    Cheng, Yeong-Nan
    Liang, Chao
    Hsu, Wen-Lian
    Huang, Hsien-Da
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D296 - D302
  • [10] Using indirect protein interactions for the prediction of Gene Ontology functions
    Chua, Hon Nian
    Sung, Wing-Kin
    Wong, Limsoon
    [J]. BMC BIOINFORMATICS, 2007, 8 (Suppl 4)